All of lore.kernel.org
 help / color / mirror / Atom feed
* crash in csched_load_balance after xl vcpu-pin
@ 2018-04-10  8:57 Olaf Hering
  2018-04-10  9:34 ` George Dunlap
                   ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Olaf Hering @ 2018-04-10  8:57 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Dario Faggioli


[-- Attachment #1.1: Type: text/plain, Size: 5015 bytes --]

While hunting some other bug we run into the single BUG in
sched_credit.c:csched_load_balance(). This happens with all versions
since 4.7, staging is also affected. Testsystem is a Haswell model 63
system with 4 NUMA nodes and 144 threads.

(XEN) Xen BUG at sched_credit.c:1694
(XEN) ----[ Xen-4.11.20180407T144959.e62e140daa-2.bug1087289_411  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    30
(XEN) RIP:    e008:[<ffff82d08022879d>] sched_credit.c#csched_schedule+0xaad/0xba0
(XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
(XEN) rax: ffff83077ffe76d0   rbx: ffff83077fe571d0   rcx: 000000000000001e
(XEN) rdx: ffff83005d082000   rsi: 0000000000000000   rdi: ffff83077fe575b0
(XEN) rbp: ffff82d08094a480   rsp: ffff83077fe4fd00   r8:  ffff83077fe581a0
(XEN) r9:  ffff82d080227cf0   r10: 0000000000000000   r11: ffff830060b62060
(XEN) r12: 000014f4e864c2d4   r13: ffff83077fe575b0   r14: ffff83077fe58180
(XEN) r15: ffff82d08094a480   cr0: 000000008005003b   cr4: 00000000001526e0
(XEN) cr3: 0000000049416000   cr2: 00007fb24e1b7277
(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d08022879d> (sched_credit.c#csched_schedule+0xaad/0xba0):
(XEN)  18 01 00 e9 73 f7 ff ff <0f> 0b 48 8b 43 28 be 01 00 00 00 bf 0a 20 02 00
(XEN) Xen stack trace from rsp=ffff83077fe4fd00:
(XEN)    ffff82d0803577ef 0000001e00000000 80000000803577ef ffff830f9d5b2aa0
(XEN)    ffff82d0803577ef ffff83077a6c59e0 ffff83077fe4fe38 ffff82d0803577fb
(XEN)    0000000000000000 0000000000000000 0000000001c9c380 0000000000000000
(XEN)    ffff83077fe4ffff 000000000000001e 000014f4e86c885e ffff83077fe4ffff
(XEN)    ffff82d08094a480 000014f4e86c73be 0000000080230c80 ffff830060b38000
(XEN)    ffff83077fe58300 0000000000000046 ffff830f9d4f6018 0000000000000082
(XEN)    000000000000001e ffff83077fe581c8 0000000000000001 000000000000001e
(XEN)    ffff83005d1f0000 ffff83077fe58188 000014f4e86c885e ffff83077fe58180
(XEN)    ffff82d08094a480 ffff82d08023153d ffff830700000000 ffff83077fe581a0
(XEN)    0000000000000206 ffff82d080268705 ffff83077fe58300 ffff830060b38060
(XEN)    ffff830845d83010 ffff82d080238578 ffff83077fe4ffff 00000000ffffffff
(XEN)    ffffffffffffffff ffff83077fe4ffff ffff82d080933c00 ffff82d08094a480
(XEN)    ffff83077fe4ffff ffff82d080234cb2 ffff82d08095f1f0 ffff82d080934b00
(XEN)    ffff82d08095f1f0 000000000000001e 000000000000001e ffff82d08026daf5
(XEN)    ffff83005d1f0000 ffff83005d1f0000 ffff83005d1f0000 ffff83077fe58188
(XEN)    000014f4e86a43ab ffff83077fe58180 ffff82d08094a480 ffff88011dd88000
(XEN)    ffff88011dd88000 ffff88011dd88000 0000000000000000 000000000000002b
(XEN)    ffffffff81d4c180 0000000000000000 00000013fe969894 0000000000000001
(XEN)    0000000000000000 ffffffff81020e50 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 000000fc00000000 ffffffff81060182
(XEN) Xen call trace:
(XEN)    [<ffff82d08022879d>] sched_credit.c#csched_schedule+0xaad/0xba0
(XEN)    [<ffff82d0803577ef>] common_interrupt+0x8f/0x110
(XEN)    [<ffff82d0803577ef>] common_interrupt+0x8f/0x110
(XEN)    [<ffff82d0803577fb>] common_interrupt+0x9b/0x110
(XEN)    [<ffff82d08023153d>] schedule.c#schedule+0xdd/0x5d0
(XEN)    [<ffff82d080268705>] reprogram_timer+0x75/0xe0
(XEN)    [<ffff82d080238578>] timer.c#timer_softirq_action+0x138/0x210
(XEN)    [<ffff82d080234cb2>] softirq.c#__do_softirq+0x62/0x90
(XEN)    [<ffff82d08026daf5>] domain.c#idle_loop+0x45/0xb0
(XEN) ****************************************
(XEN) Panic on CPU 30:
(XEN) Xen BUG at sched_credit.c:1694
(XEN) ****************************************
(XEN) Reboot in five seconds...

But after that the system hangs hard, one has to pull the plug.
Running the debug version of xen.efi did not trigger any ASSERT.


This happens if there are many busy backend/frontend pairs in a number
of domUs. I think more domUs will trigger it sooner, overcommit helps as
well. It was not seen with a single domU.

The testcase is like that:
- boot dom0 with "dom0_max_vcpus=30 dom0_mem=32G dom0_vcpus_pin"
- create a tmpfs in dom0
- create files in that tmpfs to be exported to domUs via file://path,xvdtN,w
- assign these files to HVM domUs
- inside the domUs, create a filesystem on the xvdtN devices
- mount the filesystem
- run fio(1) on the filesystem
- in dom0, run 'xl vcpu-pin domU $node1-3 $nodeN' in a loop to move domU between node 1 to 3.

After a low number of iterations Xen crashes in csched_load_balance.

In my setup I had 16 HVM domUs with 64 vcpus, each one had 3 vbd devices.
It was reported also with fewer and smaller domUs.
Scripts exist to recreate the setup easily.


In one case I have seen this:

(XEN) d32v60 VMRESUME error: 0x5
(XEN) domain_crash_sync called from vmcs.c:1673
(XEN) Domain 32 (vcpu#60) crashed on cpu#139:
(XEN) ----[ Xen-4.11.20180407T144959.e62e140daa-2.bug1087289_411  x86_64  debug=n   Not tainted ]----


Any idea what might causing this crash?

Olaf

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10  8:57 crash in csched_load_balance after xl vcpu-pin Olaf Hering
@ 2018-04-10  9:34 ` George Dunlap
  2018-04-10 10:33   ` Dario Faggioli
  2018-04-10 15:18 ` Olaf Hering
  2018-04-10 15:59 ` Olaf Hering
  2 siblings, 1 reply; 58+ messages in thread
From: George Dunlap @ 2018-04-10  9:34 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Dario Faggioli, xen-devel



> On Apr 10, 2018, at 9:57 AM, Olaf Hering <olaf@aepfle.de> wrote:
> 
> While hunting some other bug we run into the single BUG in
> sched_credit.c:csched_load_balance(). This happens with all versions
> since 4.7, staging is also affected. Testsystem is a Haswell model 63
> system with 4 NUMA nodes and 144 threads.
> 
> (XEN) Xen BUG at sched_credit.c:1694
> (XEN) ----[ Xen-4.11.20180407T144959.e62e140daa-2.bug1087289_411  x86_64  debug=n   Not tainted ]----
> (XEN) CPU:    30
> (XEN) RIP:    e008:[<ffff82d08022879d>] sched_credit.c#csched_schedule+0xaad/0xba0
> (XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
> (XEN) rax: ffff83077ffe76d0   rbx: ffff83077fe571d0   rcx: 000000000000001e
> (XEN) rdx: ffff83005d082000   rsi: 0000000000000000   rdi: ffff83077fe575b0
> (XEN) rbp: ffff82d08094a480   rsp: ffff83077fe4fd00   r8:  ffff83077fe581a0
> (XEN) r9:  ffff82d080227cf0   r10: 0000000000000000   r11: ffff830060b62060
> (XEN) r12: 000014f4e864c2d4   r13: ffff83077fe575b0   r14: ffff83077fe58180
> (XEN) r15: ffff82d08094a480   cr0: 000000008005003b   cr4: 00000000001526e0
> (XEN) cr3: 0000000049416000   cr2: 00007fb24e1b7277
> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen code around <ffff82d08022879d> (sched_credit.c#csched_schedule+0xaad/0xba0):
> (XEN)  18 01 00 e9 73 f7 ff ff <0f> 0b 48 8b 43 28 be 01 00 00 00 bf 0a 20 02 00
> (XEN) Xen stack trace from rsp=ffff83077fe4fd00:
> (XEN)    ffff82d0803577ef 0000001e00000000 80000000803577ef ffff830f9d5b2aa0
> (XEN)    ffff82d0803577ef ffff83077a6c59e0 ffff83077fe4fe38 ffff82d0803577fb
> (XEN)    0000000000000000 0000000000000000 0000000001c9c380 0000000000000000
> (XEN)    ffff83077fe4ffff 000000000000001e 000014f4e86c885e ffff83077fe4ffff
> (XEN)    ffff82d08094a480 000014f4e86c73be 0000000080230c80 ffff830060b38000
> (XEN)    ffff83077fe58300 0000000000000046 ffff830f9d4f6018 0000000000000082
> (XEN)    000000000000001e ffff83077fe581c8 0000000000000001 000000000000001e
> (XEN)    ffff83005d1f0000 ffff83077fe58188 000014f4e86c885e ffff83077fe58180
> (XEN)    ffff82d08094a480 ffff82d08023153d ffff830700000000 ffff83077fe581a0
> (XEN)    0000000000000206 ffff82d080268705 ffff83077fe58300 ffff830060b38060
> (XEN)    ffff830845d83010 ffff82d080238578 ffff83077fe4ffff 00000000ffffffff
> (XEN)    ffffffffffffffff ffff83077fe4ffff ffff82d080933c00 ffff82d08094a480
> (XEN)    ffff83077fe4ffff ffff82d080234cb2 ffff82d08095f1f0 ffff82d080934b00
> (XEN)    ffff82d08095f1f0 000000000000001e 000000000000001e ffff82d08026daf5
> (XEN)    ffff83005d1f0000 ffff83005d1f0000 ffff83005d1f0000 ffff83077fe58188
> (XEN)    000014f4e86a43ab ffff83077fe58180 ffff82d08094a480 ffff88011dd88000
> (XEN)    ffff88011dd88000 ffff88011dd88000 0000000000000000 000000000000002b
> (XEN)    ffffffff81d4c180 0000000000000000 00000013fe969894 0000000000000001
> (XEN)    0000000000000000 ffffffff81020e50 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 000000fc00000000 ffffffff81060182
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08022879d>] sched_credit.c#csched_schedule+0xaad/0xba0
> (XEN)    [<ffff82d0803577ef>] common_interrupt+0x8f/0x110
> (XEN)    [<ffff82d0803577ef>] common_interrupt+0x8f/0x110
> (XEN)    [<ffff82d0803577fb>] common_interrupt+0x9b/0x110
> (XEN)    [<ffff82d08023153d>] schedule.c#schedule+0xdd/0x5d0
> (XEN)    [<ffff82d080268705>] reprogram_timer+0x75/0xe0
> (XEN)    [<ffff82d080238578>] timer.c#timer_softirq_action+0x138/0x210
> (XEN)    [<ffff82d080234cb2>] softirq.c#__do_softirq+0x62/0x90
> (XEN)    [<ffff82d08026daf5>] domain.c#idle_loop+0x45/0xb0
> (XEN) ****************************************
> (XEN) Panic on CPU 30:
> (XEN) Xen BUG at sched_credit.c:1694
> (XEN) ****************************************
> (XEN) Reboot in five seconds...
> 
> But after that the system hangs hard, one has to pull the plug.
> Running the debug version of xen.efi did not trigger any ASSERT.
> 
> 
> This happens if there are many busy backend/frontend pairs in a number
> of domUs. I think more domUs will trigger it sooner, overcommit helps as
> well. It was not seen with a single domU.
> 
> The testcase is like that:
> - boot dom0 with "dom0_max_vcpus=30 dom0_mem=32G dom0_vcpus_pin"
> - create a tmpfs in dom0
> - create files in that tmpfs to be exported to domUs via file://path,xvdtN,w
> - assign these files to HVM domUs
> - inside the domUs, create a filesystem on the xvdtN devices
> - mount the filesystem
> - run fio(1) on the filesystem
> - in dom0, run 'xl vcpu-pin domU $node1-3 $nodeN' in a loop to move domU between node 1 to 3.
> 
> After a low number of iterations Xen crashes in csched_load_balance.
> 
> In my setup I had 16 HVM domUs with 64 vcpus, each one had 3 vbd devices.
> It was reported also with fewer and smaller domUs.
> Scripts exist to recreate the setup easily.
> 
> 
[snip]
> 
> Any idea what might causing this crash?

Assuming the bug is this one:

BUG_ON( cpu != snext->vcpu->processor );

a nasty race condition… a vcpu has just been taken off the runqueue of the current pcpu, but it’s apparently been assigned to a different cpu.

Let me take a look.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10  9:34 ` George Dunlap
@ 2018-04-10 10:33   ` Dario Faggioli
  2018-04-10 10:59     ` George Dunlap
  0 siblings, 1 reply; 58+ messages in thread
From: Dario Faggioli @ 2018-04-10 10:33 UTC (permalink / raw)
  To: George Dunlap, Olaf Hering; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3346 bytes --]

On Tue, 2018-04-10 at 09:34 +0000, George Dunlap wrote:
> Assuming the bug is this one:
> 
> BUG_ON( cpu != snext->vcpu->processor );
> 
Yes, it is that one.

Another stack trace, this time from a debug=y built hypervisor, of what
we are thinking it is the same bug (although reproduced in a slightly
different way) is this:

(XEN) ----[ Xen-4.7.2_02-36.1.12847.11.PTF  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    45
(XEN) RIP:    e008:[<ffff82d08012508f>] sched_credit.c#csched_schedule+0x361/0xaa9
...
(XEN) Xen call trace:
(XEN)    [<ffff82d08012508f>] sched_credit.c#csched_schedule+0x361/0xaa9
(XEN)    [<ffff82d08012c233>] schedule.c#schedule+0x109/0x5d6
(XEN)    [<ffff82d08012fb5f>] softirq.c#__do_softirq+0x7f/0x8a
(XEN)    [<ffff82d08012fbb4>] do_softirq+0x13/0x15
(XEN)    [<ffff82d0801fd5c5>] vmx_asm_do_vmentry+0x25/0x2a

(I can provide it all, if necessary.)

I've done some analysis, although when we still were not entirely sure
that changing the affinities was the actual cause (or, at least, what
is triggering the whole thing).

In the specific case of this stack trace, the current vcpu running on
CPU 45 is d3v11. It is not in the runqueue, because it has been
removed, and not added back to it, and the reason is it is not runnable
(it has VPF_migrating on in pause_flags).

The runqueue of pcpu 45 looks fine (i.e., it is not corrupt or anything
like that), it has d3v10,d9v1,d32767v45 in it (in this order)

d3v11->processor is 45, so that is also fine.

Basically, d3v11 wants to move away from pcpu 45, and this might (but
that's not certain) be the reson because we're rescheduling. The fact
that there are vcpus wanting to migrate can very well be the cause of
affinity being changed.

Now, the problem is that, looking into the runqueue, I found out that
d3v10->processor=32. I.e., d3v10 is queued in pcpu 45's runqueue, with
processor=32, which really shouldn't happen.

This leads to the bug triggering, as, in csched_schedule(), we read the
head of the runqueue with:

snext = __runq_elem(runq->next);

and then we pass snext to csched_load_balance(), where the BUG_ON is.

Another thing that I've found out, is that all "misplaced" vcpus (i.e.,
in this and also in other manifestations of this bug) have their
csched_vcpu.flags=4, which is CSCHED_FLAGS_VCPU_MIGRATING.

This, basically, is again a sign of vcpu_migrate() having been called,
on d3v10 as well, which in turn has called csched_vcpu_pick().

> a nasty race condition… a vcpu has just been taken off the runqueue
> of the current pcpu, but it’s apparently been assigned to a different
> cpu.
> 
Nasty indeed. I've been looking into this on and off, but so far I
haven't found the root cause.

Now that we know for sure that it is changing affinity that trigger it,
the field of the investigation can be narrowed a little bit... But I
still am finding hard to spot where the race happens.

I'll look more into this later in the afternoon. I'll let know if
something comes to mind.

> Let me take a look.
> 
Thanks! :-)
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 10:33   ` Dario Faggioli
@ 2018-04-10 10:59     ` George Dunlap
  2018-04-10 11:29       ` Dario Faggioli
                         ` (4 more replies)
  0 siblings, 5 replies; 58+ messages in thread
From: George Dunlap @ 2018-04-10 10:59 UTC (permalink / raw)
  To: Dario Faggioli, Olaf Hering; +Cc: xen-devel

On 04/10/2018 11:33 AM, Dario Faggioli wrote:
> On Tue, 2018-04-10 at 09:34 +0000, George Dunlap wrote:
>> Assuming the bug is this one:
>>
>> BUG_ON( cpu != snext->vcpu->processor );
>>
> Yes, it is that one.
> 
> Another stack trace, this time from a debug=y built hypervisor, of what
> we are thinking it is the same bug (although reproduced in a slightly
> different way) is this:
> 
> (XEN) ----[ Xen-4.7.2_02-36.1.12847.11.PTF  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    45
> (XEN) RIP:    e008:[<ffff82d08012508f>] sched_credit.c#csched_schedule+0x361/0xaa9
> ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08012508f>] sched_credit.c#csched_schedule+0x361/0xaa9
> (XEN)    [<ffff82d08012c233>] schedule.c#schedule+0x109/0x5d6
> (XEN)    [<ffff82d08012fb5f>] softirq.c#__do_softirq+0x7f/0x8a
> (XEN)    [<ffff82d08012fbb4>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d0801fd5c5>] vmx_asm_do_vmentry+0x25/0x2a
> 
> (I can provide it all, if necessary.)
> 
> I've done some analysis, although when we still were not entirely sure
> that changing the affinities was the actual cause (or, at least, what
> is triggering the whole thing).
> 
> In the specific case of this stack trace, the current vcpu running on
> CPU 45 is d3v11. It is not in the runqueue, because it has been
> removed, and not added back to it, and the reason is it is not runnable
> (it has VPF_migrating on in pause_flags).
> 
> The runqueue of pcpu 45 looks fine (i.e., it is not corrupt or anything
> like that), it has d3v10,d9v1,d32767v45 in it (in this order)
> 
> d3v11->processor is 45, so that is also fine.
> 
> Basically, d3v11 wants to move away from pcpu 45, and this might (but
> that's not certain) be the reson because we're rescheduling. The fact
> that there are vcpus wanting to migrate can very well be the cause of
> affinity being changed.
> 
> Now, the problem is that, looking into the runqueue, I found out that
> d3v10->processor=32. I.e., d3v10 is queued in pcpu 45's runqueue, with
> processor=32, which really shouldn't happen.
> 
> This leads to the bug triggering, as, in csched_schedule(), we read the
> head of the runqueue with:
> 
> snext = __runq_elem(runq->next);
> 
> and then we pass snext to csched_load_balance(), where the BUG_ON is.
> 
> Another thing that I've found out, is that all "misplaced" vcpus (i.e.,
> in this and also in other manifestations of this bug) have their
> csched_vcpu.flags=4, which is CSCHED_FLAGS_VCPU_MIGRATING.
> 
> This, basically, is again a sign of vcpu_migrate() having been called,
> on d3v10 as well, which in turn has called csched_vcpu_pick().
> 
>> a nasty race condition… a vcpu has just been taken off the runqueue
>> of the current pcpu, but it’s apparently been assigned to a different
>> cpu.
>>
> Nasty indeed. I've been looking into this on and off, but so far I
> haven't found the root cause.
> 
> Now that we know for sure that it is changing affinity that trigger it,
> the field of the investigation can be narrowed a little bit... But I
> still am finding hard to spot where the race happens.
> 
> I'll look more into this later in the afternoon. I'll let know if
> something comes to mind.

Actually, it looks quite simple:  schedule.c:vcpu_move_locked() is
supposed to actually do the moving; if vcpu_scheduler()->migrate is
defined, it calls that; otherwise, it just sets v->processor.  Credit1
doesn't define migrate.  So when changing the vcpu affinity on credit1,
v->processor is simply modified without it changing runqueues.

The real question is why it's so hard to actually trigger any problems!

All in all it looks like the migration / cpu_pick could be made a bit
more rational... we do this weird thing where we call cpu_pick, and if
it's different we call migrate; but of course if the vcpu is running, we
just set the VPF_migrating bit and raise a schedule_softirq, which will
cause cpu_pick() to be called yet another time.

But as a quick fix, implementing csched_vcpu_migrate() is probably the
best solution.  Do you want to pick that up, or should I?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 10:59     ` George Dunlap
@ 2018-04-10 11:29       ` Dario Faggioli
  2018-04-10 15:25         ` George Dunlap
  2018-04-10 11:30       ` Dario Faggioli
                         ` (3 subsequent siblings)
  4 siblings, 1 reply; 58+ messages in thread
From: Dario Faggioli @ 2018-04-10 11:29 UTC (permalink / raw)
  To: George Dunlap, Olaf Hering; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5223 bytes --]

On Tue, 2018-04-10 at 11:59 +0100, George Dunlap wrote:
> On 04/10/2018 11:33 AM, Dario Faggioli wrote:
> > On Tue, 2018-04-10 at 09:34 +0000, George Dunlap wrote:
> > > Assuming the bug is this one:
> > > 
> > > BUG_ON( cpu != snext->vcpu->processor );
> > > 
> > 
> > Yes, it is that one.
> > 
> > Another stack trace, this time from a debug=y built hypervisor, of
> > what
> > we are thinking it is the same bug (although reproduced in a
> > slightly
> > different way) is this:
> > 
> > (XEN) ----[ Xen-4.7.2_02-36.1.12847.11.PTF  x86_64  debug=y  Not
> > tainted ]----
> > (XEN) CPU:    45
> > (XEN) RIP:    e008:[<ffff82d08012508f>]
> > sched_credit.c#csched_schedule+0x361/0xaa9
> > ...
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d08012508f>]
> > sched_credit.c#csched_schedule+0x361/0xaa9
> > (XEN)    [<ffff82d08012c233>] schedule.c#schedule+0x109/0x5d6
> > (XEN)    [<ffff82d08012fb5f>] softirq.c#__do_softirq+0x7f/0x8a
> > (XEN)    [<ffff82d08012fbb4>] do_softirq+0x13/0x15
> > (XEN)    [<ffff82d0801fd5c5>] vmx_asm_do_vmentry+0x25/0x2a
> > 
> > (I can provide it all, if necessary.)
> > 
> > I've done some analysis, although when we still were not entirely
> > sure
> > that changing the affinities was the actual cause (or, at least,
> > what
> > is triggering the whole thing).
> > 
> > In the specific case of this stack trace, the current vcpu running
> > on
> > CPU 45 is d3v11. It is not in the runqueue, because it has been
> > removed, and not added back to it, and the reason is it is not
> > runnable
> > (it has VPF_migrating on in pause_flags).
> > 
> > The runqueue of pcpu 45 looks fine (i.e., it is not corrupt or
> > anything
> > like that), it has d3v10,d9v1,d32767v45 in it (in this order)
> > 
> > d3v11->processor is 45, so that is also fine.
> > 
> > Basically, d3v11 wants to move away from pcpu 45, and this might
> > (but
> > that's not certain) be the reson because we're rescheduling. The
> > fact
> > that there are vcpus wanting to migrate can very well be the cause
> > of
> > affinity being changed.
> > 
> > Now, the problem is that, looking into the runqueue, I found out
> > that
> > d3v10->processor=32. I.e., d3v10 is queued in pcpu 45's runqueue,
> > with
> > processor=32, which really shouldn't happen.
> > 
> > This leads to the bug triggering, as, in csched_schedule(), we read
> > the
> > head of the runqueue with:
> > 
> > snext = __runq_elem(runq->next);
> > 
> > and then we pass snext to csched_load_balance(), where the BUG_ON
> > is.
> > 
> > Another thing that I've found out, is that all "misplaced" vcpus
> > (i.e.,
> > in this and also in other manifestations of this bug) have their
> > csched_vcpu.flags=4, which is CSCHED_FLAGS_VCPU_MIGRATING.
> > 
> > This, basically, is again a sign of vcpu_migrate() having been
> > called,
> > on d3v10 as well, which in turn has called csched_vcpu_pick().
> > 
> > > a nasty race condition… a vcpu has just been taken off the
> > > runqueue
> > > of the current pcpu, but it’s apparently been assigned to a
> > > different
> > > cpu.
> > > 
> > 
> > Nasty indeed. I've been looking into this on and off, but so far I
> > haven't found the root cause.
> > 
> > Now that we know for sure that it is changing affinity that trigger
> > it,
> > the field of the investigation can be narrowed a little bit... But
> > I
> > still am finding hard to spot where the race happens.
> > 
> > I'll look more into this later in the afternoon. I'll let know if
> > something comes to mind.
> 
> Actually, it looks quite simple:  schedule.c:vcpu_move_locked() is
> supposed to actually do the moving; if vcpu_scheduler()->migrate is
> defined, it calls that; otherwise, it just sets v-
> >processor.  Credit1
> doesn't define migrate.  So when changing the vcpu affinity on
> credit1,
> v->processor is simply modified without it changing runqueues.
> 
> The real question is why it's so hard to actually trigger any
> problems!
> 
Wait, but when vcpu_move_locked() is called, the vcpu being moved
should not be in any runqueue.

In fact, it is called from vcpu_migrate() which, in its turn, is always
 preceded by a call to vcpu_sleep_nosync(), that removes the vcpu from
the runqueue.

The only exception is when it is called from context_saved(). But then
again, the vcpu on which it is called is not on the runqueue, because
it was found not runnable.

That is why things works... well, apart from this bug. :-)

I mean, the root cause of this bug may very well be that there is a
code path that leads to calling vcpu_move_locked() on a vcpu that is
still in a runqueue... but have you actually identified it?

> But as a quick fix, implementing csched_vcpu_migrate() is probably
> the
> best solution.  Do you want to pick that up, or should I?
> 
And what should csched_vcpu_migrate() do, apart from changing
vc->processor?

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 10:59     ` George Dunlap
  2018-04-10 11:29       ` Dario Faggioli
@ 2018-04-10 11:30       ` Dario Faggioli
  2018-04-10 11:31       ` Dario Faggioli
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 58+ messages in thread
From: Dario Faggioli @ 2018-04-10 11:30 UTC (permalink / raw)
  To: George Dunlap, Olaf Hering; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5223 bytes --]

On Tue, 2018-04-10 at 11:59 +0100, George Dunlap wrote:
> On 04/10/2018 11:33 AM, Dario Faggioli wrote:
> > On Tue, 2018-04-10 at 09:34 +0000, George Dunlap wrote:
> > > Assuming the bug is this one:
> > > 
> > > BUG_ON( cpu != snext->vcpu->processor );
> > > 
> > 
> > Yes, it is that one.
> > 
> > Another stack trace, this time from a debug=y built hypervisor, of
> > what
> > we are thinking it is the same bug (although reproduced in a
> > slightly
> > different way) is this:
> > 
> > (XEN) ----[ Xen-4.7.2_02-36.1.12847.11.PTF  x86_64  debug=y  Not
> > tainted ]----
> > (XEN) CPU:    45
> > (XEN) RIP:    e008:[<ffff82d08012508f>]
> > sched_credit.c#csched_schedule+0x361/0xaa9
> > ...
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d08012508f>]
> > sched_credit.c#csched_schedule+0x361/0xaa9
> > (XEN)    [<ffff82d08012c233>] schedule.c#schedule+0x109/0x5d6
> > (XEN)    [<ffff82d08012fb5f>] softirq.c#__do_softirq+0x7f/0x8a
> > (XEN)    [<ffff82d08012fbb4>] do_softirq+0x13/0x15
> > (XEN)    [<ffff82d0801fd5c5>] vmx_asm_do_vmentry+0x25/0x2a
> > 
> > (I can provide it all, if necessary.)
> > 
> > I've done some analysis, although when we still were not entirely
> > sure
> > that changing the affinities was the actual cause (or, at least,
> > what
> > is triggering the whole thing).
> > 
> > In the specific case of this stack trace, the current vcpu running
> > on
> > CPU 45 is d3v11. It is not in the runqueue, because it has been
> > removed, and not added back to it, and the reason is it is not
> > runnable
> > (it has VPF_migrating on in pause_flags).
> > 
> > The runqueue of pcpu 45 looks fine (i.e., it is not corrupt or
> > anything
> > like that), it has d3v10,d9v1,d32767v45 in it (in this order)
> > 
> > d3v11->processor is 45, so that is also fine.
> > 
> > Basically, d3v11 wants to move away from pcpu 45, and this might
> > (but
> > that's not certain) be the reson because we're rescheduling. The
> > fact
> > that there are vcpus wanting to migrate can very well be the cause
> > of
> > affinity being changed.
> > 
> > Now, the problem is that, looking into the runqueue, I found out
> > that
> > d3v10->processor=32. I.e., d3v10 is queued in pcpu 45's runqueue,
> > with
> > processor=32, which really shouldn't happen.
> > 
> > This leads to the bug triggering, as, in csched_schedule(), we read
> > the
> > head of the runqueue with:
> > 
> > snext = __runq_elem(runq->next);
> > 
> > and then we pass snext to csched_load_balance(), where the BUG_ON
> > is.
> > 
> > Another thing that I've found out, is that all "misplaced" vcpus
> > (i.e.,
> > in this and also in other manifestations of this bug) have their
> > csched_vcpu.flags=4, which is CSCHED_FLAGS_VCPU_MIGRATING.
> > 
> > This, basically, is again a sign of vcpu_migrate() having been
> > called,
> > on d3v10 as well, which in turn has called csched_vcpu_pick().
> > 
> > > a nasty race condition… a vcpu has just been taken off the
> > > runqueue
> > > of the current pcpu, but it’s apparently been assigned to a
> > > different
> > > cpu.
> > > 
> > 
> > Nasty indeed. I've been looking into this on and off, but so far I
> > haven't found the root cause.
> > 
> > Now that we know for sure that it is changing affinity that trigger
> > it,
> > the field of the investigation can be narrowed a little bit... But
> > I
> > still am finding hard to spot where the race happens.
> > 
> > I'll look more into this later in the afternoon. I'll let know if
> > something comes to mind.
> 
> Actually, it looks quite simple:  schedule.c:vcpu_move_locked() is
> supposed to actually do the moving; if vcpu_scheduler()->migrate is
> defined, it calls that; otherwise, it just sets v-
> >processor.  Credit1
> doesn't define migrate.  So when changing the vcpu affinity on
> credit1,
> v->processor is simply modified without it changing runqueues.
> 
> The real question is why it's so hard to actually trigger any
> problems!
> 
Wait, but when vcpu_move_locked() is called, the vcpu being moved
should not be in any runqueue.

In fact, it is called from vcpu_migrate() which, in its turn, is always
 preceded by a call to vcpu_sleep_nosync(), that removes the vcpu from
the runqueue.

The only exception is when it is called from context_saved(). But then
again, the vcpu on which it is called is not on the runqueue, because
it was found not runnable.

That is why things works... well, apart from this bug. :-)

I mean, the root cause of this bug may very well be that there is a
code path that leads to calling vcpu_move_locked() on a vcpu that is
still in a runqueue... but have you actually identified it?

> But as a quick fix, implementing csched_vcpu_migrate() is probably
> the
> best solution.  Do you want to pick that up, or should I?
> 
And what should csched_vcpu_migrate() do, apart from changing
vc->processor?

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 10:59     ` George Dunlap
  2018-04-10 11:29       ` Dario Faggioli
  2018-04-10 11:30       ` Dario Faggioli
@ 2018-04-10 11:31       ` Dario Faggioli
  2018-04-10 11:32       ` Dario Faggioli
  2018-04-10 11:33       ` Dario Faggioli
  4 siblings, 0 replies; 58+ messages in thread
From: Dario Faggioli @ 2018-04-10 11:31 UTC (permalink / raw)
  To: George Dunlap, Olaf Hering; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5223 bytes --]

On Tue, 2018-04-10 at 11:59 +0100, George Dunlap wrote:
> On 04/10/2018 11:33 AM, Dario Faggioli wrote:
> > On Tue, 2018-04-10 at 09:34 +0000, George Dunlap wrote:
> > > Assuming the bug is this one:
> > > 
> > > BUG_ON( cpu != snext->vcpu->processor );
> > > 
> > 
> > Yes, it is that one.
> > 
> > Another stack trace, this time from a debug=y built hypervisor, of
> > what
> > we are thinking it is the same bug (although reproduced in a
> > slightly
> > different way) is this:
> > 
> > (XEN) ----[ Xen-4.7.2_02-36.1.12847.11.PTF  x86_64  debug=y  Not
> > tainted ]----
> > (XEN) CPU:    45
> > (XEN) RIP:    e008:[<ffff82d08012508f>]
> > sched_credit.c#csched_schedule+0x361/0xaa9
> > ...
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d08012508f>]
> > sched_credit.c#csched_schedule+0x361/0xaa9
> > (XEN)    [<ffff82d08012c233>] schedule.c#schedule+0x109/0x5d6
> > (XEN)    [<ffff82d08012fb5f>] softirq.c#__do_softirq+0x7f/0x8a
> > (XEN)    [<ffff82d08012fbb4>] do_softirq+0x13/0x15
> > (XEN)    [<ffff82d0801fd5c5>] vmx_asm_do_vmentry+0x25/0x2a
> > 
> > (I can provide it all, if necessary.)
> > 
> > I've done some analysis, although when we still were not entirely
> > sure
> > that changing the affinities was the actual cause (or, at least,
> > what
> > is triggering the whole thing).
> > 
> > In the specific case of this stack trace, the current vcpu running
> > on
> > CPU 45 is d3v11. It is not in the runqueue, because it has been
> > removed, and not added back to it, and the reason is it is not
> > runnable
> > (it has VPF_migrating on in pause_flags).
> > 
> > The runqueue of pcpu 45 looks fine (i.e., it is not corrupt or
> > anything
> > like that), it has d3v10,d9v1,d32767v45 in it (in this order)
> > 
> > d3v11->processor is 45, so that is also fine.
> > 
> > Basically, d3v11 wants to move away from pcpu 45, and this might
> > (but
> > that's not certain) be the reson because we're rescheduling. The
> > fact
> > that there are vcpus wanting to migrate can very well be the cause
> > of
> > affinity being changed.
> > 
> > Now, the problem is that, looking into the runqueue, I found out
> > that
> > d3v10->processor=32. I.e., d3v10 is queued in pcpu 45's runqueue,
> > with
> > processor=32, which really shouldn't happen.
> > 
> > This leads to the bug triggering, as, in csched_schedule(), we read
> > the
> > head of the runqueue with:
> > 
> > snext = __runq_elem(runq->next);
> > 
> > and then we pass snext to csched_load_balance(), where the BUG_ON
> > is.
> > 
> > Another thing that I've found out, is that all "misplaced" vcpus
> > (i.e.,
> > in this and also in other manifestations of this bug) have their
> > csched_vcpu.flags=4, which is CSCHED_FLAGS_VCPU_MIGRATING.
> > 
> > This, basically, is again a sign of vcpu_migrate() having been
> > called,
> > on d3v10 as well, which in turn has called csched_vcpu_pick().
> > 
> > > a nasty race condition… a vcpu has just been taken off the
> > > runqueue
> > > of the current pcpu, but it’s apparently been assigned to a
> > > different
> > > cpu.
> > > 
> > 
> > Nasty indeed. I've been looking into this on and off, but so far I
> > haven't found the root cause.
> > 
> > Now that we know for sure that it is changing affinity that trigger
> > it,
> > the field of the investigation can be narrowed a little bit... But
> > I
> > still am finding hard to spot where the race happens.
> > 
> > I'll look more into this later in the afternoon. I'll let know if
> > something comes to mind.
> 
> Actually, it looks quite simple:  schedule.c:vcpu_move_locked() is
> supposed to actually do the moving; if vcpu_scheduler()->migrate is
> defined, it calls that; otherwise, it just sets v-
> >processor.  Credit1
> doesn't define migrate.  So when changing the vcpu affinity on
> credit1,
> v->processor is simply modified without it changing runqueues.
> 
> The real question is why it's so hard to actually trigger any
> problems!
> 
Wait, but when vcpu_move_locked() is called, the vcpu being moved
should not be in any runqueue.

In fact, it is called from vcpu_migrate() which, in its turn, is always
 preceded by a call to vcpu_sleep_nosync(), that removes the vcpu from
the runqueue.

The only exception is when it is called from context_saved(). But then
again, the vcpu on which it is called is not on the runqueue, because
it was found not runnable.

That is why things works... well, apart from this bug. :-)

I mean, the root cause of this bug may very well be that there is a
code path that leads to calling vcpu_move_locked() on a vcpu that is
still in a runqueue... but have you actually identified it?

> But as a quick fix, implementing csched_vcpu_migrate() is probably
> the
> best solution.  Do you want to pick that up, or should I?
> 
And what should csched_vcpu_migrate() do, apart from changing
vc->processor?

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 10:59     ` George Dunlap
                         ` (2 preceding siblings ...)
  2018-04-10 11:31       ` Dario Faggioli
@ 2018-04-10 11:32       ` Dario Faggioli
  2018-04-10 11:33       ` Dario Faggioli
  4 siblings, 0 replies; 58+ messages in thread
From: Dario Faggioli @ 2018-04-10 11:32 UTC (permalink / raw)
  To: George Dunlap, Olaf Hering; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5223 bytes --]

On Tue, 2018-04-10 at 11:59 +0100, George Dunlap wrote:
> On 04/10/2018 11:33 AM, Dario Faggioli wrote:
> > On Tue, 2018-04-10 at 09:34 +0000, George Dunlap wrote:
> > > Assuming the bug is this one:
> > > 
> > > BUG_ON( cpu != snext->vcpu->processor );
> > > 
> > 
> > Yes, it is that one.
> > 
> > Another stack trace, this time from a debug=y built hypervisor, of
> > what
> > we are thinking it is the same bug (although reproduced in a
> > slightly
> > different way) is this:
> > 
> > (XEN) ----[ Xen-4.7.2_02-36.1.12847.11.PTF  x86_64  debug=y  Not
> > tainted ]----
> > (XEN) CPU:    45
> > (XEN) RIP:    e008:[<ffff82d08012508f>]
> > sched_credit.c#csched_schedule+0x361/0xaa9
> > ...
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d08012508f>]
> > sched_credit.c#csched_schedule+0x361/0xaa9
> > (XEN)    [<ffff82d08012c233>] schedule.c#schedule+0x109/0x5d6
> > (XEN)    [<ffff82d08012fb5f>] softirq.c#__do_softirq+0x7f/0x8a
> > (XEN)    [<ffff82d08012fbb4>] do_softirq+0x13/0x15
> > (XEN)    [<ffff82d0801fd5c5>] vmx_asm_do_vmentry+0x25/0x2a
> > 
> > (I can provide it all, if necessary.)
> > 
> > I've done some analysis, although when we still were not entirely
> > sure
> > that changing the affinities was the actual cause (or, at least,
> > what
> > is triggering the whole thing).
> > 
> > In the specific case of this stack trace, the current vcpu running
> > on
> > CPU 45 is d3v11. It is not in the runqueue, because it has been
> > removed, and not added back to it, and the reason is it is not
> > runnable
> > (it has VPF_migrating on in pause_flags).
> > 
> > The runqueue of pcpu 45 looks fine (i.e., it is not corrupt or
> > anything
> > like that), it has d3v10,d9v1,d32767v45 in it (in this order)
> > 
> > d3v11->processor is 45, so that is also fine.
> > 
> > Basically, d3v11 wants to move away from pcpu 45, and this might
> > (but
> > that's not certain) be the reson because we're rescheduling. The
> > fact
> > that there are vcpus wanting to migrate can very well be the cause
> > of
> > affinity being changed.
> > 
> > Now, the problem is that, looking into the runqueue, I found out
> > that
> > d3v10->processor=32. I.e., d3v10 is queued in pcpu 45's runqueue,
> > with
> > processor=32, which really shouldn't happen.
> > 
> > This leads to the bug triggering, as, in csched_schedule(), we read
> > the
> > head of the runqueue with:
> > 
> > snext = __runq_elem(runq->next);
> > 
> > and then we pass snext to csched_load_balance(), where the BUG_ON
> > is.
> > 
> > Another thing that I've found out, is that all "misplaced" vcpus
> > (i.e.,
> > in this and also in other manifestations of this bug) have their
> > csched_vcpu.flags=4, which is CSCHED_FLAGS_VCPU_MIGRATING.
> > 
> > This, basically, is again a sign of vcpu_migrate() having been
> > called,
> > on d3v10 as well, which in turn has called csched_vcpu_pick().
> > 
> > > a nasty race condition… a vcpu has just been taken off the
> > > runqueue
> > > of the current pcpu, but it’s apparently been assigned to a
> > > different
> > > cpu.
> > > 
> > 
> > Nasty indeed. I've been looking into this on and off, but so far I
> > haven't found the root cause.
> > 
> > Now that we know for sure that it is changing affinity that trigger
> > it,
> > the field of the investigation can be narrowed a little bit... But
> > I
> > still am finding hard to spot where the race happens.
> > 
> > I'll look more into this later in the afternoon. I'll let know if
> > something comes to mind.
> 
> Actually, it looks quite simple:  schedule.c:vcpu_move_locked() is
> supposed to actually do the moving; if vcpu_scheduler()->migrate is
> defined, it calls that; otherwise, it just sets v-
> >processor.  Credit1
> doesn't define migrate.  So when changing the vcpu affinity on
> credit1,
> v->processor is simply modified without it changing runqueues.
> 
> The real question is why it's so hard to actually trigger any
> problems!
> 
Wait, but when vcpu_move_locked() is called, the vcpu being moved
should not be in any runqueue.

In fact, it is called from vcpu_migrate() which, in its turn, is always
 preceded by a call to vcpu_sleep_nosync(), that removes the vcpu from
the runqueue.

The only exception is when it is called from context_saved(). But then
again, the vcpu on which it is called is not on the runqueue, because
it was found not runnable.

That is why things works... well, apart from this bug. :-)

I mean, the root cause of this bug may very well be that there is a
code path that leads to calling vcpu_move_locked() on a vcpu that is
still in a runqueue... but have you actually identified it?

> But as a quick fix, implementing csched_vcpu_migrate() is probably
> the
> best solution.  Do you want to pick that up, or should I?
> 
And what should csched_vcpu_migrate() do, apart from changing
vc->processor?

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 10:59     ` George Dunlap
                         ` (3 preceding siblings ...)
  2018-04-10 11:32       ` Dario Faggioli
@ 2018-04-10 11:33       ` Dario Faggioli
  4 siblings, 0 replies; 58+ messages in thread
From: Dario Faggioli @ 2018-04-10 11:33 UTC (permalink / raw)
  To: George Dunlap, Olaf Hering; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5223 bytes --]

On Tue, 2018-04-10 at 11:59 +0100, George Dunlap wrote:
> On 04/10/2018 11:33 AM, Dario Faggioli wrote:
> > On Tue, 2018-04-10 at 09:34 +0000, George Dunlap wrote:
> > > Assuming the bug is this one:
> > > 
> > > BUG_ON( cpu != snext->vcpu->processor );
> > > 
> > 
> > Yes, it is that one.
> > 
> > Another stack trace, this time from a debug=y built hypervisor, of
> > what
> > we are thinking it is the same bug (although reproduced in a
> > slightly
> > different way) is this:
> > 
> > (XEN) ----[ Xen-4.7.2_02-36.1.12847.11.PTF  x86_64  debug=y  Not
> > tainted ]----
> > (XEN) CPU:    45
> > (XEN) RIP:    e008:[<ffff82d08012508f>]
> > sched_credit.c#csched_schedule+0x361/0xaa9
> > ...
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d08012508f>]
> > sched_credit.c#csched_schedule+0x361/0xaa9
> > (XEN)    [<ffff82d08012c233>] schedule.c#schedule+0x109/0x5d6
> > (XEN)    [<ffff82d08012fb5f>] softirq.c#__do_softirq+0x7f/0x8a
> > (XEN)    [<ffff82d08012fbb4>] do_softirq+0x13/0x15
> > (XEN)    [<ffff82d0801fd5c5>] vmx_asm_do_vmentry+0x25/0x2a
> > 
> > (I can provide it all, if necessary.)
> > 
> > I've done some analysis, although when we still were not entirely
> > sure
> > that changing the affinities was the actual cause (or, at least,
> > what
> > is triggering the whole thing).
> > 
> > In the specific case of this stack trace, the current vcpu running
> > on
> > CPU 45 is d3v11. It is not in the runqueue, because it has been
> > removed, and not added back to it, and the reason is it is not
> > runnable
> > (it has VPF_migrating on in pause_flags).
> > 
> > The runqueue of pcpu 45 looks fine (i.e., it is not corrupt or
> > anything
> > like that), it has d3v10,d9v1,d32767v45 in it (in this order)
> > 
> > d3v11->processor is 45, so that is also fine.
> > 
> > Basically, d3v11 wants to move away from pcpu 45, and this might
> > (but
> > that's not certain) be the reson because we're rescheduling. The
> > fact
> > that there are vcpus wanting to migrate can very well be the cause
> > of
> > affinity being changed.
> > 
> > Now, the problem is that, looking into the runqueue, I found out
> > that
> > d3v10->processor=32. I.e., d3v10 is queued in pcpu 45's runqueue,
> > with
> > processor=32, which really shouldn't happen.
> > 
> > This leads to the bug triggering, as, in csched_schedule(), we read
> > the
> > head of the runqueue with:
> > 
> > snext = __runq_elem(runq->next);
> > 
> > and then we pass snext to csched_load_balance(), where the BUG_ON
> > is.
> > 
> > Another thing that I've found out, is that all "misplaced" vcpus
> > (i.e.,
> > in this and also in other manifestations of this bug) have their
> > csched_vcpu.flags=4, which is CSCHED_FLAGS_VCPU_MIGRATING.
> > 
> > This, basically, is again a sign of vcpu_migrate() having been
> > called,
> > on d3v10 as well, which in turn has called csched_vcpu_pick().
> > 
> > > a nasty race condition… a vcpu has just been taken off the
> > > runqueue
> > > of the current pcpu, but it’s apparently been assigned to a
> > > different
> > > cpu.
> > > 
> > 
> > Nasty indeed. I've been looking into this on and off, but so far I
> > haven't found the root cause.
> > 
> > Now that we know for sure that it is changing affinity that trigger
> > it,
> > the field of the investigation can be narrowed a little bit... But
> > I
> > still am finding hard to spot where the race happens.
> > 
> > I'll look more into this later in the afternoon. I'll let know if
> > something comes to mind.
> 
> Actually, it looks quite simple:  schedule.c:vcpu_move_locked() is
> supposed to actually do the moving; if vcpu_scheduler()->migrate is
> defined, it calls that; otherwise, it just sets v-
> >processor.  Credit1
> doesn't define migrate.  So when changing the vcpu affinity on
> credit1,
> v->processor is simply modified without it changing runqueues.
> 
> The real question is why it's so hard to actually trigger any
> problems!
> 
Wait, but when vcpu_move_locked() is called, the vcpu being moved
should not be in any runqueue.

In fact, it is called from vcpu_migrate() which, in its turn, is always
 preceded by a call to vcpu_sleep_nosync(), that removes the vcpu from
the runqueue.

The only exception is when it is called from context_saved(). But then
again, the vcpu on which it is called is not on the runqueue, because
it was found not runnable.

That is why things works... well, apart from this bug. :-)

I mean, the root cause of this bug may very well be that there is a
code path that leads to calling vcpu_move_locked() on a vcpu that is
still in a runqueue... but have you actually identified it?

> But as a quick fix, implementing csched_vcpu_migrate() is probably
> the
> best solution.  Do you want to pick that up, or should I?
> 
And what should csched_vcpu_migrate() do, apart from changing
vc->processor?

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10  8:57 crash in csched_load_balance after xl vcpu-pin Olaf Hering
  2018-04-10  9:34 ` George Dunlap
@ 2018-04-10 15:18 ` Olaf Hering
  2018-04-10 15:29   ` George Dunlap
  2018-04-10 15:59 ` Olaf Hering
  2 siblings, 1 reply; 58+ messages in thread
From: Olaf Hering @ 2018-04-10 15:18 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Dario Faggioli


[-- Attachment #1.1: Type: text/plain, Size: 4000 bytes --]

On Tue, Apr 10, Olaf Hering wrote:

> (XEN) Xen BUG at sched_credit.c:1694

Another variant:

This time the domUs had just vcpus=36 and cpus=nodes:N,node:^0/cpus_soft=nodes:N,node:^0

(XEN) Xen BUG at sched_credit.c:280
(XEN) ----[ Xen-4.11.20180407T144959.e62e140daa-2.bug1087289_411  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    54
(XEN) RIP:    e008:[<ffff82d0803591b1>] sched_credit.c#__runq_insert.part.13+0/0x2
(XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor (d96v20)
(XEN) rax: ffff82d08095f100   rbx: ffff830670506ea0   rcx: ffff830779f4ae80
(XEN) rdx: 00000036f95d7080   rsi: 0000000000000000   rdi: ffff830670506ea0
(XEN) rbp: ffff82d08094a480   rsp: ffff830e7ab2fd30   r8:  ffff830779f361a0
(XEN) r9:  ffff82d080227cf0   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000033c2684bb20   r13: ffff830779f4ae80   r14: ffff830779f36180
(XEN) r15: 0000033c269c6f66   cr0: 000000008005003b   cr4: 00000000001526e0
(XEN) cr3: 000000067058e000   cr2: 00007f1299b17000
(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d0803591b1> (sched_credit.c#__runq_insert.part.13):
(XEN)  f1 ff 5a 5b 31 c0 5d c3 <0f> 0b 0f 0b 0f 0b 48 89 e2 48 8d 05 eb 5d 60 00
(XEN) Xen stack trace from rsp=ffff830e7ab2fd30:
(XEN)    ffff82d080228845 ffff82e030ac7f80 00000036563fc000 00000000ffffffff
(XEN)    00000000000000a3 00000000000000c0 ffff83077a6c59e0 ffff830e7ab2fe70
(XEN)    ffff82d0802354b5 ffff82d0802fff50 0000000000000000 0000000001c9c380
(XEN)    000000008027bcd8 ffff82d0802255d0 0000000000000036 0000033c269c6f66
(XEN)    ffff8307798d4f30 0000000000000000 ffff830779f361a0 0000000000000036
(XEN)    ffff82d0802386cc ffff830779f361a0 0000000000000046 ffff82d08023827b
(XEN)    0000000000000096 0000000000000036 ffff830779f361c8 ffff82d08030f9ab
(XEN)    0000000000000036 ffff83007ba30000 ffff830779f36188 0000033c269c6f66
(XEN)    ffff830779f36180 ffff82d08094a480 ffff82d08023153d ffff82d000000000
(XEN)    ffff830779f361a0 0000000000000000 ffff82d0802e13d5 ffff83007ba30000
(XEN)    ffff83007ba30000 0000000000000000 ffff82d08030bef6 ffff82d08030f9ab
(XEN)    00000000ffffffff ffffffffffffffff ffff830e7ab2ffff ffff82d080933c00
(XEN)    0000000000000000 0000000000000000 ffff82d080234cb2 0000000000000000
(XEN)    ffff83007ba30000 0000000000000000 0000000000000000 0000000000000000
(XEN)    ffff82d08030fb6b 0000000000000000 0000000000000100 0000000000540000
(XEN)    0000000000000001 ffff88011ff16c80 ffff8800e1e20000 0000000000000000
(XEN)    ffff88011f000858 ffff88011f0006c8 0000000000000000 0000000000000000
(XEN)    0000000000000001 0000000000000001 00000000000000ad 00000000000000a5
(XEN)    000000fb00000000 ffffffff810c8da3 0000000000000000 0000000000000046
(XEN)    ffff8800ea3af910 0000000000000000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d0803591b1>] sched_credit.c#__runq_insert.part.13+0/0x2
(XEN)    [<ffff82d080228845>] sched_credit.c#csched_schedule+0xb55/0xba0
(XEN)    [<ffff82d0802354b5>] smp_call_function_interrupt+0x85/0xa0
(XEN)    [<ffff82d0802fff50>] vmcs.c#__vmx_clear_vmcs+0/0xe0
(XEN)    [<ffff82d0802255d0>] sched_credit.c#csched_vcpu_yield+0/0x10
(XEN)    [<ffff82d0802386cc>] timer.c#remove_entry+0x7c/0x90
(XEN)    [<ffff82d08023827b>] timer.c#add_entry+0x4b/0xb0
(XEN)    [<ffff82d08030f9ab>] vmx_asm_vmexit_handler+0xab/0x240
(XEN)    [<ffff82d08023153d>] schedule.c#schedule+0xdd/0x5d0
(XEN)    [<ffff82d0802e13d5>] hvm_interrupt_blocked+0x15/0xd0
(XEN)    [<ffff82d08030bef6>] nvmx_switch_guest+0x86/0x1a00
(XEN)    [<ffff82d08030f9ab>] vmx_asm_vmexit_handler+0xab/0x240
(XEN)    [<ffff82d080234cb2>] softirq.c#__do_softirq+0x62/0x90
(XEN)    [<ffff82d08030fb6b>] vmx_asm_do_vmentry+0x2b/0x30
(XEN) ****************************************
(XEN) Panic on CPU 54:
(XEN) Xen BUG at sched_credit.c:280
(XEN) ****************************************
(XEN) Reboot in five seconds...


Olaf

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 11:29       ` Dario Faggioli
@ 2018-04-10 15:25         ` George Dunlap
  2018-04-10 15:36           ` Dario Faggioli
       [not found]           ` <960702b6d9dfb67bfae72ae02ae502210695416b.camel@suse.com>
  0 siblings, 2 replies; 58+ messages in thread
From: George Dunlap @ 2018-04-10 15:25 UTC (permalink / raw)
  To: Dario Faggioli, Olaf Hering; +Cc: xen-devel

On 04/10/2018 12:29 PM, Dario Faggioli wrote:
> On Tue, 2018-04-10 at 11:59 +0100, George Dunlap wrote:
>> On 04/10/2018 11:33 AM, Dario Faggioli wrote:
>>> On Tue, 2018-04-10 at 09:34 +0000, George Dunlap wrote:
>>>> Assuming the bug is this one:
>>>>
>>>> BUG_ON( cpu != snext->vcpu->processor );
>>>>
>>>
>>> Yes, it is that one.
>>>
>>> Another stack trace, this time from a debug=y built hypervisor, of
>>> what
>>> we are thinking it is the same bug (although reproduced in a
>>> slightly
>>> different way) is this:
>>>
>>> (XEN) ----[ Xen-4.7.2_02-36.1.12847.11.PTF  x86_64  debug=y  Not
>>> tainted ]----
>>> (XEN) CPU:    45
>>> (XEN) RIP:    e008:[<ffff82d08012508f>]
>>> sched_credit.c#csched_schedule+0x361/0xaa9
>>> ...
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82d08012508f>]
>>> sched_credit.c#csched_schedule+0x361/0xaa9
>>> (XEN)    [<ffff82d08012c233>] schedule.c#schedule+0x109/0x5d6
>>> (XEN)    [<ffff82d08012fb5f>] softirq.c#__do_softirq+0x7f/0x8a
>>> (XEN)    [<ffff82d08012fbb4>] do_softirq+0x13/0x15
>>> (XEN)    [<ffff82d0801fd5c5>] vmx_asm_do_vmentry+0x25/0x2a
>>>
>>> (I can provide it all, if necessary.)
>>>
>>> I've done some analysis, although when we still were not entirely
>>> sure
>>> that changing the affinities was the actual cause (or, at least,
>>> what
>>> is triggering the whole thing).
>>>
>>> In the specific case of this stack trace, the current vcpu running
>>> on
>>> CPU 45 is d3v11. It is not in the runqueue, because it has been
>>> removed, and not added back to it, and the reason is it is not
>>> runnable
>>> (it has VPF_migrating on in pause_flags).
>>>
>>> The runqueue of pcpu 45 looks fine (i.e., it is not corrupt or
>>> anything
>>> like that), it has d3v10,d9v1,d32767v45 in it (in this order)
>>>
>>> d3v11->processor is 45, so that is also fine.
>>>
>>> Basically, d3v11 wants to move away from pcpu 45, and this might
>>> (but
>>> that's not certain) be the reson because we're rescheduling. The
>>> fact
>>> that there are vcpus wanting to migrate can very well be the cause
>>> of
>>> affinity being changed.
>>>
>>> Now, the problem is that, looking into the runqueue, I found out
>>> that
>>> d3v10->processor=32. I.e., d3v10 is queued in pcpu 45's runqueue,
>>> with
>>> processor=32, which really shouldn't happen.
>>>
>>> This leads to the bug triggering, as, in csched_schedule(), we read
>>> the
>>> head of the runqueue with:
>>>
>>> snext = __runq_elem(runq->next);
>>>
>>> and then we pass snext to csched_load_balance(), where the BUG_ON
>>> is.
>>>
>>> Another thing that I've found out, is that all "misplaced" vcpus
>>> (i.e.,
>>> in this and also in other manifestations of this bug) have their
>>> csched_vcpu.flags=4, which is CSCHED_FLAGS_VCPU_MIGRATING.
>>>
>>> This, basically, is again a sign of vcpu_migrate() having been
>>> called,
>>> on d3v10 as well, which in turn has called csched_vcpu_pick().

Right; csched_cpu_pick() is only called from csched_vcpu_insert(), and
from vcpu_migrate() and restore_vcpu_affinity().

Assuming we haven't been messing around with suspend / resume or
cpupools, that means it must have happened as a result of vcpu_migrate().

If it happened as a result of vcpu_migrate(), then it can only be set
between the very first call to pick_cpu(), and the next vcpu_wake() --
whenever that is.  (Possibly at the end of the current call to
vcpu_migrate(), possibly at the end of a vcpu_migrate() triggered in
context_saved() due to VPF_migrating.)

vcpu_migrate() is called from:
 - vcpu_force_reschedule(), which is called from
VCPUOP_{set,stop}_periodic_timer
 - cpu_disable_schedler(), when doing hotplug or cpupool operations on a cpu
 - vcpu_set_affinity()
 - vcpu_pin_override()

But in any case, v->processor is only set from vcpu_move_locked(), which
is only called if v->is_running is false; if v->is_running is false,
then one way or another v can't be on any runqueue.  And if v isn't on
any runqueue, and we hold v's current processor lock, then it's safe to
modify v->processor.

But obviously there's a flaw in that logic somewhere. :-)

One thing we might consider doing is implementing the migrate() callback
for the Credit scheduler, and just have it make a bunch of sanity checks
(v->processor lock held, new_cpu lock held, vcpu not on any runqueue, &c).

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 15:18 ` Olaf Hering
@ 2018-04-10 15:29   ` George Dunlap
  0 siblings, 0 replies; 58+ messages in thread
From: George Dunlap @ 2018-04-10 15:29 UTC (permalink / raw)
  To: Olaf Hering, xen-devel; +Cc: Dario Faggioli

On 04/10/2018 04:18 PM, Olaf Hering wrote:
> On Tue, Apr 10, Olaf Hering wrote:
> 
>> (XEN) Xen BUG at sched_credit.c:1694
> 
> Another variant:
> 
> This time the domUs had just vcpus=36 and cpus=nodes:N,node:^0/cpus_soft=nodes:N,node:^0
> 
> (XEN) Xen BUG at sched_credit.c:280
> (XEN) ----[ Xen-4.11.20180407T144959.e62e140daa-2.bug1087289_411  x86_64  debug=n   Not tainted ]----
> (XEN) CPU:    54
> (XEN) RIP:    e008:[<ffff82d0803591b1>] sched_credit.c#__runq_insert.part.13+0/0x2
> (XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor (d96v20)
> (XEN) rax: ffff82d08095f100   rbx: ffff830670506ea0   rcx: ffff830779f4ae80
> (XEN) rdx: 00000036f95d7080   rsi: 0000000000000000   rdi: ffff830670506ea0
> (XEN) rbp: ffff82d08094a480   rsp: ffff830e7ab2fd30   r8:  ffff830779f361a0
> (XEN) r9:  ffff82d080227cf0   r10: 0000000000000000   r11: 0000000000000000
> (XEN) r12: 0000033c2684bb20   r13: ffff830779f4ae80   r14: ffff830779f36180
> (XEN) r15: 0000033c269c6f66   cr0: 000000008005003b   cr4: 00000000001526e0
> (XEN) cr3: 000000067058e000   cr2: 00007f1299b17000
> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen code around <ffff82d0803591b1> (sched_credit.c#__runq_insert.part.13):
> (XEN)  f1 ff 5a 5b 31 c0 5d c3 <0f> 0b 0f 0b 0f 0b 48 89 e2 48 8d 05 eb 5d 60 00
> (XEN) Xen stack trace from rsp=ffff830e7ab2fd30:
> (XEN)    ffff82d080228845 ffff82e030ac7f80 00000036563fc000 00000000ffffffff
> (XEN)    00000000000000a3 00000000000000c0 ffff83077a6c59e0 ffff830e7ab2fe70
> (XEN)    ffff82d0802354b5 ffff82d0802fff50 0000000000000000 0000000001c9c380
> (XEN)    000000008027bcd8 ffff82d0802255d0 0000000000000036 0000033c269c6f66
> (XEN)    ffff8307798d4f30 0000000000000000 ffff830779f361a0 0000000000000036
> (XEN)    ffff82d0802386cc ffff830779f361a0 0000000000000046 ffff82d08023827b
> (XEN)    0000000000000096 0000000000000036 ffff830779f361c8 ffff82d08030f9ab
> (XEN)    0000000000000036 ffff83007ba30000 ffff830779f36188 0000033c269c6f66
> (XEN)    ffff830779f36180 ffff82d08094a480 ffff82d08023153d ffff82d000000000
> (XEN)    ffff830779f361a0 0000000000000000 ffff82d0802e13d5 ffff83007ba30000
> (XEN)    ffff83007ba30000 0000000000000000 ffff82d08030bef6 ffff82d08030f9ab
> (XEN)    00000000ffffffff ffffffffffffffff ffff830e7ab2ffff ffff82d080933c00
> (XEN)    0000000000000000 0000000000000000 ffff82d080234cb2 0000000000000000
> (XEN)    ffff83007ba30000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    ffff82d08030fb6b 0000000000000000 0000000000000100 0000000000540000
> (XEN)    0000000000000001 ffff88011ff16c80 ffff8800e1e20000 0000000000000000
> (XEN)    ffff88011f000858 ffff88011f0006c8 0000000000000000 0000000000000000
> (XEN)    0000000000000001 0000000000000001 00000000000000ad 00000000000000a5
> (XEN)    000000fb00000000 ffffffff810c8da3 0000000000000000 0000000000000046
> (XEN)    ffff8800ea3af910 0000000000000000 0000000000000000 0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0803591b1>] sched_credit.c#__runq_insert.part.13+0/0x2
> (XEN)    [<ffff82d080228845>] sched_credit.c#csched_schedule+0xb55/0xba0
> (XEN)    [<ffff82d0802354b5>] smp_call_function_interrupt+0x85/0xa0
> (XEN)    [<ffff82d0802fff50>] vmcs.c#__vmx_clear_vmcs+0/0xe0
> (XEN)    [<ffff82d0802255d0>] sched_credit.c#csched_vcpu_yield+0/0x10
> (XEN)    [<ffff82d0802386cc>] timer.c#remove_entry+0x7c/0x90
> (XEN)    [<ffff82d08023827b>] timer.c#add_entry+0x4b/0xb0
> (XEN)    [<ffff82d08030f9ab>] vmx_asm_vmexit_handler+0xab/0x240
> (XEN)    [<ffff82d08023153d>] schedule.c#schedule+0xdd/0x5d0
> (XEN)    [<ffff82d0802e13d5>] hvm_interrupt_blocked+0x15/0xd0
> (XEN)    [<ffff82d08030bef6>] nvmx_switch_guest+0x86/0x1a00
> (XEN)    [<ffff82d08030f9ab>] vmx_asm_vmexit_handler+0xab/0x240
> (XEN)    [<ffff82d080234cb2>] softirq.c#__do_softirq+0x62/0x90
> (XEN)    [<ffff82d08030fb6b>] vmx_asm_do_vmentry+0x2b/0x30
> (XEN) ****************************************
> (XEN) Panic on CPU 54:
> (XEN) Xen BUG at sched_credit.c:280
> (XEN) ****************************************
> (XEN) Reboot in five seconds...

Ooh:

    BUG_ON( __vcpu_on_runq(svc) );

So we're trying to insert a vcpu onto a runqueue, but someone's already
put it on a runqueue.  Which still doesn't quite make sense...

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 15:25         ` George Dunlap
@ 2018-04-10 15:36           ` Dario Faggioli
       [not found]           ` <960702b6d9dfb67bfae72ae02ae502210695416b.camel@suse.com>
  1 sibling, 0 replies; 58+ messages in thread
From: Dario Faggioli @ 2018-04-10 15:36 UTC (permalink / raw)
  To: George Dunlap, Olaf Hering; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1610 bytes --]

On Tue, 2018-04-10 at 16:25 +0100, George Dunlap wrote:
> On 04/10/2018 12:29 PM, Dario Faggioli wrote:
> > 
> whenever that is.  (Possibly at the end of the current call to
> vcpu_migrate(), possibly at the end of a vcpu_migrate() triggered in
> context_saved() due to VPF_migrating.)
> 
> vcpu_migrate() is called from:
>  - vcpu_force_reschedule(), which is called from
> VCPUOP_{set,stop}_periodic_timer
>  - cpu_disable_schedler(), when doing hotplug or cpupool operations
> on a cpu
>  - vcpu_set_affinity()
>  - vcpu_pin_override()
> 
> But in any case, v->processor is only set from vcpu_move_locked(),
> which
> is only called if v->is_running is false; if v->is_running is false,
> then one way or another v can't be on any runqueue.  And if v isn't
> on
> any runqueue, and we hold v's current processor lock, then it's safe
> to
> modify v->processor.
> 
Indeed.

> But obviously there's a flaw in that logic somewhere. :-)
> 
frustratingly, yes. :-/

> One thing we might consider doing is implementing the migrate()
> callback
> for the Credit scheduler, and just have it make a bunch of sanity
> checks
> (v->processor lock held, new_cpu lock held, vcpu not on any runqueue,
> &c).
> 
Yep, and in fact, this is exactly what the debug patch that I will send
to Olaf (after I'll be out of a meeting) does. :-)

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10  8:57 crash in csched_load_balance after xl vcpu-pin Olaf Hering
  2018-04-10  9:34 ` George Dunlap
  2018-04-10 15:18 ` Olaf Hering
@ 2018-04-10 15:59 ` Olaf Hering
  2018-04-10 16:28   ` Dario Faggioli
  2 siblings, 1 reply; 58+ messages in thread
From: Olaf Hering @ 2018-04-10 15:59 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Dario Faggioli


[-- Attachment #1.1: Type: text/plain, Size: 3715 bytes --]

On Tue, Apr 10, Olaf Hering wrote:

> (XEN) Xen BUG at sched_credit.c:1694

And another one with debug=y and this config:
memory=4444
vcpus=36
cpu="nodes:1,^node:0"
cpu_soft="nodes:1,^node:0"
(nodes=1 cycles between 1-3 for each following domU).

(XEN) Assertion 'CSCHED_PCPU(cpu)->nr_runnable >= 1' failed at sched_credit.c:269
(XEN) ----[ Xen-4.11.20180407T144959.e62e140daa-4.bug1087289_411  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    18
(XEN) RIP:    e008:[<ffff82d08022b2e8>] sched_credit.c#csched_schedule+0x8fe/0xd42
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor (d0v18)
(XEN) rax: ffff830779e9e970   rbx: ffff83007ba44000   rcx: 0000000000000046
(XEN) rdx: 00000036f953b080   rsi: ffff83077a738140   rdi: ffff830779e9a18e
(XEN) rbp: ffff83077a737e18   rsp: ffff83077a737d18   r8:  000000000000000b
(XEN) r9:  ffff83077a7383c0   r10: 0000000000000000   r11: 0000017e70349000
(XEN) r12: ffff8309d55879f0   r13: 0000000000000044   r14: ffff830779eae188
(XEN) r15: ffff8309d55879f0   cr0: 000000008005003b   cr4: 00000000001526e0
(XEN) cr3: 0000000dd1056000   cr2: 0000557e1f370028
(XEN) fsb: 0000000000000000   gsb: ffff880885080000   gss: 0000000000000000
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d08022b2e8> (sched_credit.c#csched_schedule+0x8fe/0xd42):
(XEN)  10 18 83 78 18 00 75 02 <0f> 0b 48 8d 05 0f 3e 73 00 48 8b 44 10 18 83 68
(XEN) Xen stack trace from rsp=ffff83077a737d18:
(XEN)    0000000000000004 00000000ef047000 ffff82d08095f0e0 ffff830779eae188
(XEN)    0000017e6f25c71c ffff82d08095f0c0 0000000100000044 ffff82d08095f0c0
(XEN)    0000000001c9c380 ffff83077a737e60 ffff83077ffe7720 ffff82d08095f100
(XEN)    ffff83077a6c59e0 ffff82d08095f0c0 0000001200000000 ffff83077a73c570
(XEN)    ffff82d08095f100 0000000100000028 ffff830700000046 0000004400000012
(XEN)    ffff82d08023d5f0 ffff83077a7381a0 7ffb5fe000000000 00000000000000bd
(XEN)    0000000000000000 0000000000000000 0000000000000092 ffff830060ae3000
(XEN)    ffff82d08095f100 ffff83077a738188 0000017e6f25c71c 0000000000000012
(XEN)    ffff83077a737ea8 ffff82d080236406 ffff82d080372434 ffff83077a7381a0
(XEN)    0000001200737ef8 ffff83077a738180 ffff83077a737ee8 ffff82d08036a04a
(XEN)    02ff82d080372434 0000000000000001 0000000000000000 deadbeefdeadf00d
(XEN)    deadbeefdeadf00d ffff82d080934500 ffff82d080933c00 ffffffffffffffff
(XEN)    ffff83077a737fff 0000000000000000 ffff83077a737ed8 ffff82d080239ec5
(XEN)    ffff830060ae3000 0000000000000000 0000000000000000 0000000000000000
(XEN)    ffff83077a737ee8 ffff82d080239f1a 00007cf8858c80e7 ffff82d08036e566
(XEN)    ffff880181710000 ffff880181710000 ffff880181710000 0000000000000000
(XEN)    0000000000000012 ffffffff81d4c180 0000000000000246 0000000000007ff0
(XEN)    0000000000000001 0000000000000000 0000000000000000 ffffffff810013aa
(XEN)    0000000000000012 deadbeefdeadf00d deadbeefdeadf00d 0000010000000000
(XEN)    ffffffff810013aa 000000000000e033 0000000000000246 ffff880181713ee0
(XEN) Xen call trace:
(XEN)    [<ffff82d08022b2e8>] sched_credit.c#csched_schedule+0x8fe/0xd42
(XEN)    [<ffff82d080236406>] schedule.c#schedule+0x107/0x627
(XEN)    [<ffff82d080239ec5>] softirq.c#__do_softirq+0x85/0x90
(XEN)    [<ffff82d080239f1a>] do_softirq+0x13/0x15
(XEN)    [<ffff82d08036e566>] x86_64/entry.S#process_softirqs+0x6/0x10
(XEN) ****************************************
(XEN) Panic on CPU 18:
(XEN) Assertion 'CSCHED_PCPU(cpu)->nr_runnable >= 1' failed at sched_credit.c:269
(XEN) ****************************************
(XEN) Reboot in five seconds...



dom0 is still alive after that attempt to reboot and for some reason triple
ctrl-a appears to work. But it seems 'R' still fails.

Olaf

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 15:59 ` Olaf Hering
@ 2018-04-10 16:28   ` Dario Faggioli
  2018-04-10 19:03     ` Olaf Hering
  0 siblings, 1 reply; 58+ messages in thread
From: Dario Faggioli @ 2018-04-10 16:28 UTC (permalink / raw)
  To: Olaf Hering, xen-devel; +Cc: George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 2286 bytes --]

On Tue, 2018-04-10 at 17:59 +0200, Olaf Hering wrote:
> On Tue, Apr 10, Olaf Hering wrote:
> 
> > (XEN) Xen BUG at sched_credit.c:1694
> 
> And another one with debug=y and this config:
>
Wow...

> memory=4444
> vcpus=36
> cpu="nodes:1,^node:0"
> cpu_soft="nodes:1,^node:0"
>
As said, its cpus= and cpus_soft=, and you probably just need

cpus="node:1"
cpus_soft="node:1"

Or, even just:

cpus="node:1"

as, if soft-affinity is set to be equal to hard, it is just ignored.

> (nodes=1 cycles between 1-3 for each following domU).
> 
> (XEN) Assertion 'CSCHED_PCPU(cpu)->nr_runnable >= 1' failed at
> sched_credit.c:269
> (XEN) ----[ Xen-4.11.20180407T144959.e62e140daa-
> 4.bug1087289_411  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    18
> (XEN) RIP:    e008:[<ffff82d08022b2e8>]
> sched_credit.c#csched_schedule+0x8fe/0xd42
> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor (d0v18)
> ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08022b2e8>]
> sched_credit.c#csched_schedule+0x8fe/0xd42
> (XEN)    [<ffff82d080236406>] schedule.c#schedule+0x107/0x627
> (XEN)    [<ffff82d080239ec5>] softirq.c#__do_softirq+0x85/0x90
> (XEN)    [<ffff82d080239f1a>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d08036e566>]
> x86_64/entry.S#process_softirqs+0x6/0x10
>
Yeah, thanks for trying with debugging on. Unfortunately, stack traces
in these case are not very helpful, as they only tell us that
schedule() is being called by do_softirq()... :-P

Still...

> (XEN) ****************************************
> (XEN) Panic on CPU 18:
> (XEN) Assertion 'CSCHED_PCPU(cpu)->nr_runnable >= 1' failed at
> sched_credit.c:269
>
...it is another, different, one, this time when removing (or not
reinserting) the vcpu from the runqueue.

What would be helpful, would be to catch the other side of the race,
i.e., the point when the vcpu is being re-insterted in the runqueue, or
when v->processor of a vcpu in the runqueue is changed.... Let's see if
the debug patch will help with this.

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 16:28   ` Dario Faggioli
@ 2018-04-10 19:03     ` Olaf Hering
  2018-04-10 20:02       ` Dario Faggioli
  0 siblings, 1 reply; 58+ messages in thread
From: Olaf Hering @ 2018-04-10 19:03 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1259 bytes --]

On Tue, Apr 10, Dario Faggioli wrote:

> On Tue, 2018-04-10 at 17:59 +0200, Olaf Hering wrote:
> > memory=4444
> > vcpus=36
> > cpu="nodes:1,^node:0"
> > cpu_soft="nodes:1,^node:0"
> As said, its cpus= and cpus_soft=, and you probably just need
> cpus="node:1"
> cpus_soft="node:1"
> Or, even just:
> cpus="node:1"
> as, if soft-affinity is set to be equal to hard, it is just ignored.

Well, that was a noop. But xl.cfg states "nodes:0-3,^node:2", so this
should work:
cpus="nodes:3,^node:0"
cpus_soft="nodes:3,^node:0"

xl create -f fv_sles12sp1.f.tst.cfg
libxl: error: libxl_sched.c:62:libxl__set_vcpuaffinity: Domain 16:Setting vcpu affinity: Invalid argument
libxl: error: libxl_dom.c:461:libxl__build_pre: setting affinity failed on vcpu `0'
libxl: error: libxl_create.c:1265:domcreate_rebuild_done: Domain 16:cannot (re-)build domain: -3
libxl: error: libxl_domain.c:1034:libxl__destroy_domid: Domain 16:Non-existant domain
libxl: error: libxl_domain.c:993:domain_destroy_callback: Domain 16:Unable to destroy guest
libxl: error: libxl_domain.c:920:domain_destroy_cb: Domain 16:Destruction of domain failed

Same for nodes:2..., just nodes:1... works.

And after some attempts, cpus="nodes:2/3" fails too.
There is no indication what is invalid.

Olaf

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 19:03     ` Olaf Hering
@ 2018-04-10 20:02       ` Dario Faggioli
  2018-04-10 20:09         ` Olaf Hering
  0 siblings, 1 reply; 58+ messages in thread
From: Dario Faggioli @ 2018-04-10 20:02 UTC (permalink / raw)
  To: Olaf Hering; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2470 bytes --]

On Tue, 2018-04-10 at 21:03 +0200, Olaf Hering wrote:
> On Tue, Apr 10, Dario Faggioli wrote:
> 
> > As said, its cpus= and cpus_soft=, and you probably just need
> > cpus="node:1"
> > cpus_soft="node:1"
> > Or, even just:
> > cpus="node:1"
> > as, if soft-affinity is set to be equal to hard, it is just
> > ignored.
> 
> Well, that was a noop. But xl.cfg states "nodes:0-3,^node:2", so this
> should work:
> cpus="nodes:3,^node:0"
> cpus_soft="nodes:3,^node:0"
> 
Well, but "nodes:0-3,^node:2" is a way to say that you want nodes 0, 1
and 3. I.e., you are defining a set made up of 0,1,2,3, and then you
remove 2.

With "nodes:3,^node:0", you're saying that you want node 3, but not
node 0. I.e., basically, you are creating a set with 3 in it, and then
trying to remove 0... I agree this is not technically wrong, but it
does not make much sense. Why you're not using
"node:3,^node:0,^node:1,^node:2" then?

So, really, the way to achieve what you seem to me to be wanting to
achieve is:

cpus="node:3"

All that being said, yes, "nodes:3,^node:0" should work (and behave
exactly as "node:3" :-) ).

And in fact...

> xl create -f fv_sles12sp1.f.tst.cfg
>
... parsing, at the xl level, worked, or xl itself would have errored
out, with its own message.

> libxl: error: libxl_sched.c:62:libxl__set_vcpuaffinity: Domain
> 16:Setting vcpu affinity: Invalid argument
> libxl: error: libxl_dom.c:461:libxl__build_pre: setting affinity
> failed on vcpu `0'
>
This is xc_vcpu_setaffinity() failing with EINVAL, in
libxl__set_vcpuaffinity().

> Same for nodes:2..., just nodes:1... works.
> 
> And after some attempts, cpus="nodes:2/3" fails too.
>
"nodes:2/3" is not supported.

> There is no indication what is invalid.
> 
Mmm... I seem to recall having tested the parser against various corner
cases and/or ill-defined input. Still, my guess is that using ^ like
that (i.e., excluding something which was not there in the first
place), may result in a weird/corrupted cpumask.

If that is the case, it indeed would be a bug. I'll check the code
tomorrow.

In the meanwhile --let me repeat myself-- just go ahead with "node:2",
"node:3", etc. :-D

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 20:02       ` Dario Faggioli
@ 2018-04-10 20:09         ` Olaf Hering
  2018-04-10 20:13           ` Olaf Hering
  0 siblings, 1 reply; 58+ messages in thread
From: Olaf Hering @ 2018-04-10 20:09 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 163 bytes --]

On Tue, Apr 10, Dario Faggioli wrote:

> In the meanwhile --let me repeat myself-- just go ahead with "node:2",
> "node:3", etc. :-D

I did, and that fails.

Olaf

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 20:09         ` Olaf Hering
@ 2018-04-10 20:13           ` Olaf Hering
  2018-04-10 20:41             ` Dario Faggioli
  0 siblings, 1 reply; 58+ messages in thread
From: Olaf Hering @ 2018-04-10 20:13 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 445 bytes --]

On Tue, Apr 10, Olaf Hering wrote:

> On Tue, Apr 10, Dario Faggioli wrote:
> 
> > In the meanwhile --let me repeat myself-- just go ahead with "node:2",
> > "node:3", etc. :-D
> 
> I did, and that fails.

I think the man page is not that clear, to me. If there is a difference
between 'node' vs. 'nodes' for a single digit it may need a dedicated
sentence to state that fact. I will try that once it comes back from reboot.

Olaf

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
       [not found]           ` <960702b6d9dfb67bfae72ae02ae502210695416b.camel@suse.com>
@ 2018-04-10 20:37             ` Olaf Hering
  2018-04-10 22:59               ` Dario Faggioli
  0 siblings, 1 reply; 58+ messages in thread
From: Olaf Hering @ 2018-04-10 20:37 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3634 bytes --]

On Tue, Apr 10, Dario Faggioli wrote:

> So, Olaf, if you're fancy giving this a tray anyway, well, go ahead.

    BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));

(XEN) Xen BUG at sched_credit.c:876
(XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5-3.bug1087289_411  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    118
(XEN) RIP:    e008:[<ffff82d080229ab4>] sched_credit.c#csched_vcpu_migrate+0x27/0x51
(XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
(XEN) rax: ffff83087b8f5010   rbx: ffff830779cc6188   rcx: ffff82d080803640
(XEN) rdx: 000000000000005f   rsi: ffff83007ba37000   rdi: ffff82d080803640
(XEN) rbp: ffff831c7d877d18   rsp: ffff831c7d877d18   r8:  0000000000000004
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000ffff0000ffff
(XEN) r12: ffff830779cc6188   r13: 000000000000005f   r14: 0000000000000076
(XEN) r15: ffff83007ba37000   cr0: 0000000080050033   cr4: 00000000001526e0
(XEN) cr3: 0000000bf4af5000   cr2: 00007f377e8fd594
(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d080229ab4> (sched_credit.c#csched_vcpu_migrate+0x27/0x51):
(XEN)  00 00 00 48 3b 00 74 02 <0f> 0b 48 8d 15 43 56 73 00 48 63 76 04 48 8d 0d
(XEN) Xen stack trace from rsp=ffff831c7d877d18:
(XEN)    ffff831c7d877d28 ffff82d080236348 ffff831c7d877da8 ffff82d08023764c
(XEN)    0000000000000000 ffff82d08095f0e0 ffff82d08095f100 ffff830779da8188
(XEN)    ffff83007ba37000 0000005f01000000 0000000000000000 0000000000000296
(XEN)    ffff830779cc602c ffff83007ba37000 ffff83007ba37000 ffff83077a6c4000
(XEN)    0000000000000076 ffff83087bb8b000 ffff831c7d877dc8 ffff82d08023935f
(XEN)    ffff83077a6c4000 ffff83005d1d0000 ffff831c7d877e18 ffff82d08027797d
(XEN)    ffff831c7d877de8 ffff82d0802a4f50 ffff831c7d877e18 ffff83007ba37000
(XEN)    ffff83005d1d0000 ffff830779cc6188 00000baa8fa4f354 0000000000000001
(XEN)    ffff831c7d877ea8 ffff82d080236943 ffff82d08031f411 ffff830779cc61a0
(XEN)    0000007600b8b000 ffff830779cc6180 ffff831c7d877e68 ffff82d0802f8fd3
(XEN)    ffff83007ba37000 ffff83005d1d0000 ffffffffffffffff 0000000000000000
(XEN)    ffff831c7d877ee8 ffff82d080937700 ffff82d080933c00 ffffffffffffffff
(XEN)    ffff831c7d877fff 0000000000000000 ffff831c7d877ed8 ffff82d080239f15
(XEN)    ffff83007ba37000 0000000000000000 0000000000000000 0000000000000000
(XEN)    ffff831c7d877ee8 ffff82d080239f6a 00007ce3827880e7 ffff82d08031f5db
(XEN)    ffff88011e034000 ffff88011e034000 ffff88011e034000 0000000000000000
(XEN)    000000000000000d ffffffff81d4c180 0000000000000008 00000013bb9ba8f8
(XEN)    0000000000000001 0000000000000000 ffffffff81020e50 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000beef0000beef
(XEN)    ffffffff81060182 000000bf0000beef 0000000000000246 ffff88011e037ed8
(XEN) Xen call trace:
(XEN)    [<ffff82d080229ab4>] sched_credit.c#csched_vcpu_migrate+0x27/0x51
(XEN)    [<ffff82d080236348>] schedule.c#vcpu_move_locked+0xbb/0xc2
(XEN)    [<ffff82d08023764c>] schedule.c#vcpu_migrate+0x226/0x25b
(XEN)    [<ffff82d08023935f>] context_saved+0x8d/0x94
(XEN)    [<ffff82d08027797d>] context_switch+0xe66/0xeb0
(XEN)    [<ffff82d080236943>] schedule.c#schedule+0x5f4/0x627
(XEN)    [<ffff82d080239f15>] softirq.c#__do_softirq+0x85/0x90
(XEN)    [<ffff82d080239f6a>] do_softirq+0x13/0x15
(XEN)    [<ffff82d08031f5db>] vmx_asm_do_vmentry+0x2b/0x30
(XEN) ****************************************
(XEN) Panic on CPU 118:
(XEN) Xen BUG at sched_credit.c:876
(XEN) ****************************************
(XEN) Reboot in five seconds...


Olaf

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 20:13           ` Olaf Hering
@ 2018-04-10 20:41             ` Dario Faggioli
  2018-04-11  6:23               ` Olaf Hering
  0 siblings, 1 reply; 58+ messages in thread
From: Dario Faggioli @ 2018-04-10 20:41 UTC (permalink / raw)
  To: Olaf Hering; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1224 bytes --]

Il Mar 10 Apr 2018, 22:16 Olaf Hering <olaf@aepfle.de> ha scritto:

> On Tue, Apr 10, Olaf Hering wrote:
>
> > On Tue, Apr 10, Dario Faggioli wrote:
> >
> > > In the meanwhile --let me repeat myself-- just go ahead with "node:2",
> > > "node:3", etc. :-D
> >
> > I did, and that fails.
>
> I think the man page is not that clear, to me. If there is a difference
> between 'node' vs. 'nodes' for a single digit it may need a dedicated
> sentence to state that fact.


Mmm... I honestly don't recall, and I don't have the code in front of me
any longer.

I remember specifically wanting for it to support not only "nodes:", but
also "node:", because I thought that, e.g., "nodes:3" would have sound
weird to users.

I'd also say, however, that both "node:0-4" and "nodes:3" should work, but
I may be wrong.

Sorry for the manpage not being clear... I tried hard, back then, to come
up with a nice interface, and to describe it properly, but it is very much
possible that I failed. :-/

Regards,
Dario


I will try that once it comes back from reboot.
>
> Olaf
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

[-- Attachment #1.2: Type: text/html, Size: 2310 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 20:37             ` Olaf Hering
@ 2018-04-10 22:59               ` Dario Faggioli
  2018-04-11  7:31                 ` Dario Faggioli
       [not found]                 ` <9c857d1a-d592-8db5-827c-30fbc97477e0@citrix.com>
  0 siblings, 2 replies; 58+ messages in thread
From: Dario Faggioli @ 2018-04-10 22:59 UTC (permalink / raw)
  To: Olaf Hering; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4562 bytes --]

[Adding Andrew, not because I expect anything, but just because we've 
 chatted about this issue on IRC :-) ]

On Tue, 2018-04-10 at 22:37 +0200, Olaf Hering wrote:
> On Tue, Apr 10, Dario Faggioli wrote:
> 
>     BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
>
> (XEN) Xen BUG at sched_credit.c:876
> (XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5-
> 3.bug1087289_411  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    118
> (XEN) RIP:    e008:[<ffff82d080229ab4>]
> sched_credit.c#csched_vcpu_migrate+0x27/0x51
> ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d080229ab4>]
> sched_credit.c#csched_vcpu_migrate+0x27/0x51
> (XEN)    [<ffff82d080236348>] schedule.c#vcpu_move_locked+0xbb/0xc2
> (XEN)    [<ffff82d08023764c>] schedule.c#vcpu_migrate+0x226/0x25b
> (XEN)    [<ffff82d08023935f>] context_saved+0x8d/0x94
> (XEN)    [<ffff82d08027797d>] context_switch+0xe66/0xeb0
> (XEN)    [<ffff82d080236943>] schedule.c#schedule+0x5f4/0x627
> (XEN)    [<ffff82d080239f15>] softirq.c#__do_softirq+0x85/0x90
> (XEN)    [<ffff82d080239f6a>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d08031f5db>] vmx_asm_do_vmentry+0x2b/0x30
>
Hey... unless I've really put there a totally bogous BUG_ON(), this
looks interesting and potentially useful.

It says that the vcpu which is being context switched out, and on which
we are calling vcpu_migrate() on, because we found it to be
VPF_migrating, is actually in the runqueue already when we actually get
to execute vcpu_migrate()->vcpu_move_locked().

Mmm... let's see.

 CPU A                                  CPU B
 .                                      .
 schedule(current == v)                 vcpu_set_affinity(v)
  prev = current     // == v             .
  schedule_lock(CPU A)                   .
   csched_schedule()                     schedule_lock(CPU A)
   if (runnable(v))  //YES               x
    runq_insert(v)                       x
   return next != v                      x
  schedule_unlock(CPU A)                 x // takes the lock
  context_switch(prev,next)              set_bit(v, VPF_migrating)  [*]
   context_saved(prev) // still == v     .
    v->is_running = 0                    schedule_unlock(CPU A)
    SMP_MB                               .
    if (test_bit(v, VPF_migrating)) // YES!!
     vcpu_migrate(v)                     .
      for {                              .
       schedule_lock(CPU A)              .
       SCHED_OP(v, pick_cpu)             .
        set_bit(v, CSCHED_MIGRATING)     .
        return CPU C                     .
       pick_called = 1                   .
       schedule_unlock(CPU A)            .
       schedule_lock(CPU A + CPU C)      .
       if (pick_called && ...) // YES    .
        break                            .
      }                                  .
      // v->is_running is 0              .
      //!test_and_clear(v, VPF_migrating)) is false!!
      clear_bit(v, VPF_migrating)        .
      vcpu_move_locked(v, CPU C)         .
      BUG_ON(__vcpu_on_runq(v))          .

[*] after this point, and until someone manages to call vcpu_sleep(),  
      v sits in CPU A's runqueue with the VPF_migrating pause flag set

So, basically, the race is between context_saved() and
vcpu_set_affinity(). Basically, vcpu_set_affinity() sets the
VPF_migrating pause flags on a vcpu in a runqueue, with the intent of
letting either a vcpu_sleep_nosync() or a reschedule remove it from
there, but context_saved() manage to see the flag, before the removal
can happen.

And I think this explains also the original BUG at sched_credit.c:1694
(it's just a bit more involved).

As it can be seen above (and also in the code comment in ) there is a
barrier (which further testify that this is indeed a tricky passage),
but I guess it is not that effective! :-/

TBH, I have actually never fully understood what that comment really
meant, what the barrier was protecting, and how... e.g., isn't it
missing its paired one? In fact, there's another comment, clearly
related, right in vcpu_set_affinity(). But again I'm a bit at loss at
properly figuring out what the big idea is.

George, what do you think? Does this make sense?

Well, I'll think more about this, and to a possible fix, tomorrow
morning.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 20:41             ` Dario Faggioli
@ 2018-04-11  6:23               ` Olaf Hering
  2018-04-11  8:42                 ` Dario Faggioli
  0 siblings, 1 reply; 58+ messages in thread
From: Olaf Hering @ 2018-04-11  6:23 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 865 bytes --]

On Tue, Apr 10, Dario Faggioli wrote:

> I remember specifically wanting for it to support not only "nodes:", but also
> "node:", because I thought that, e.g., "nodes:3" would have sound weird to
> users.

It turned out that I had a typo all the time in my template, it used
'cpu=' rather than 'cpus='. On this system none of this works:
#pus="node:${node}"
cpus="nodes:${node}"
#pus="nodes:${node},^node:0"
#pus_soft="nodes:${node},^node:0"

Only 'cpus=node:1' or 'cpus=nodes:1' works, cpus=node:2 or node:3 does
not. There is room for domUs:
numa_info              :
node:    memsize    memfree    distances
   0:     30720       1912      10,21,21,21
   1:     28672      22355      21,10,21,21
   2:     24576      24502      21,21,10,21
   3:     32768      31760      21,21,21,10

But, that is a separate issue. The BUG triggers without the cpus= knob.

Olaf

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-10 22:59               ` Dario Faggioli
@ 2018-04-11  7:31                 ` Dario Faggioli
  2018-04-11  7:39                   ` Juergen Gross
  2018-04-11 10:00                   ` Olaf Hering
       [not found]                 ` <9c857d1a-d592-8db5-827c-30fbc97477e0@citrix.com>
  1 sibling, 2 replies; 58+ messages in thread
From: Dario Faggioli @ 2018-04-11  7:31 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Cooper, George Dunlap, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 5058 bytes --]

On Wed, 2018-04-11 at 00:59 +0200, Dario Faggioli wrote:
> [Adding Andrew, not because I expect anything, but just because
> we've chatted about this issue on IRC :-) ]
> 
Except, I did not add it. :-P

Anyway...

> On Tue, 2018-04-10 at 22:37 +0200, Olaf Hering wrote:
> > On Tue, Apr 10, Dario Faggioli wrote:
> > 
> >     BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
> > 
... patch attached.

Olaf, can you give it a try? It should be fine to run it on top of the
last debug patch (the one that produced this crash).

Regards,
Dario

> > (XEN) Xen BUG at sched_credit.c:876
> > (XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5-
> > 3.bug1087289_411  x86_64  debug=y   Not tainted ]----
> > (XEN) CPU:    118
> > (XEN) RIP:    e008:[<ffff82d080229ab4>]
> > sched_credit.c#csched_vcpu_migrate+0x27/0x51
> > ...
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d080229ab4>]
> > sched_credit.c#csched_vcpu_migrate+0x27/0x51
> > (XEN)    [<ffff82d080236348>] schedule.c#vcpu_move_locked+0xbb/0xc2
> > (XEN)    [<ffff82d08023764c>] schedule.c#vcpu_migrate+0x226/0x25b
> > (XEN)    [<ffff82d08023935f>] context_saved+0x8d/0x94
> > (XEN)    [<ffff82d08027797d>] context_switch+0xe66/0xeb0
> > (XEN)    [<ffff82d080236943>] schedule.c#schedule+0x5f4/0x627
> > (XEN)    [<ffff82d080239f15>] softirq.c#__do_softirq+0x85/0x90
> > (XEN)    [<ffff82d080239f6a>] do_softirq+0x13/0x15
> > (XEN)    [<ffff82d08031f5db>] vmx_asm_do_vmentry+0x2b/0x30
> > 
> 
> Hey... unless I've really put there a totally bogous BUG_ON(), this
> looks interesting and potentially useful.
> 
> It says that the vcpu which is being context switched out, and on
> which
> we are calling vcpu_migrate() on, because we found it to be
> VPF_migrating, is actually in the runqueue already when we actually
> get
> to execute vcpu_migrate()->vcpu_move_locked().
> 
> Mmm... let's see.
> 
>  CPU A                                  CPU B
>  .                                      .
>  schedule(current == v)                 vcpu_set_affinity(v)
>   prev = current     // == v             .
>   schedule_lock(CPU A)                   .
>    csched_schedule()                     schedule_lock(CPU A)
>    if (runnable(v))  //YES               x
>     runq_insert(v)                       x
>    return next != v                      x
>   schedule_unlock(CPU A)                 x // takes the lock
>   context_switch(prev,next)              set_bit(v,
> VPF_migrating)  [*]
>    context_saved(prev) // still == v     .
>     v->is_running = 0                    schedule_unlock(CPU A)
>     SMP_MB                               .
>     if (test_bit(v, VPF_migrating)) // YES!!
>      vcpu_migrate(v)                     .
>       for {                              .
>        schedule_lock(CPU A)              .
>        SCHED_OP(v, pick_cpu)             .
>         set_bit(v, CSCHED_MIGRATING)     .
>         return CPU C                     .
>        pick_called = 1                   .
>        schedule_unlock(CPU A)            .
>        schedule_lock(CPU A + CPU C)      .
>        if (pick_called && ...) // YES    .
>         break                            .
>       }                                  .
>       // v->is_running is 0              .
>       //!test_and_clear(v, VPF_migrating)) is false!!
>       clear_bit(v, VPF_migrating)        .
>       vcpu_move_locked(v, CPU C)         .
>       BUG_ON(__vcpu_on_runq(v))          .
> 
> [*] after this point, and until someone manages to call
> vcpu_sleep(),  
>       v sits in CPU A's runqueue with the VPF_migrating pause flag
> set
> 
> So, basically, the race is between context_saved() and
> vcpu_set_affinity(). Basically, vcpu_set_affinity() sets the
> VPF_migrating pause flags on a vcpu in a runqueue, with the intent of
> letting either a vcpu_sleep_nosync() or a reschedule remove it from
> there, but context_saved() manage to see the flag, before the removal
> can happen.
> 
> And I think this explains also the original BUG at
> sched_credit.c:1694
> (it's just a bit more involved).
> 
> As it can be seen above (and also in the code comment in ) there is a
> barrier (which further testify that this is indeed a tricky passage),
> but I guess it is not that effective! :-/
> 
> TBH, I have actually never fully understood what that comment really
> meant, what the barrier was protecting, and how... e.g., isn't it
> missing its paired one? In fact, there's another comment, clearly
> related, right in vcpu_set_affinity(). But again I'm a bit at loss at
> properly figuring out what the big idea is.
> 
> George, what do you think? Does this make sense?
> 
> Well, I'll think more about this, and to a possible fix, tomorrow
> morning.
> 
> Regards,
> Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.1.2: xen-sched-debug-vcpumigrate-race.patch --]
[-- Type: text/x-patch, Size: 1832 bytes --]

commit 4d052ed2cb95dc69f45da6772b805f8e5beb654b
Author: Dario Faggioli <dfaggioli@suse.com>
Date:   Wed Apr 11 09:03:19 2018 +0200

    xen: sched: fix race between context switch and setting affinity
    
    vcpu_set_affinity() may set the VPF_migrating flag on
    the vcpu that is being context switched out, without
    having the chance to also call vcpu_sleep_nosync() on
    it, before that context switching code (in context_saved())
    calls vcpu_migrate().
    
    This, eventually, results in vcpu_move_locked() being
    called on a runnable vcpu, which causes various issues
    in sched_credit.c, sched_credit2.c, etc.
    
    For instance, when using Credit, it leads to this crash:
    
    https://lists.xenproject.org/archives/html/xen-devel/2018-04/msg00664.html
    
    Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
    ---
    Cc: George Dunlap <george.dunlap@citrix.com>
    Cc: Olaf Hering <olaf@aepfle.de>
    Cc: Andrew Cooper <andrew.cooper3@citrix.com>

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 343ab6306e..2a60301849 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1554,7 +1554,17 @@ void context_saved(struct vcpu *prev)
     SCHED_OP(vcpu_scheduler(prev), context_saved, prev);
 
     if ( unlikely(prev->pause_flags & VPF_migrating) )
+    {
+        /*
+         * If someone (e.g., vcpu_set_affinity()) has set VPF_migrating
+         * on prev in between when schedule() releases the scheduler
+         * lock and here, we need to make sure we properly mark the
+         * vcpu as not runnable (and all it comes with that), with
+         * vcpu_sleep_nosync(), before calling vcpu_migrate().
+         */
+        vcpu_sleep_nosync(prev);
         vcpu_migrate(prev);
+    }
 }
 
 /* The scheduler timer: force a run through the scheduler */

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-11  7:31                 ` Dario Faggioli
@ 2018-04-11  7:39                   ` Juergen Gross
  2018-04-11  7:42                     ` Dario Faggioli
  2018-04-11 10:00                   ` Olaf Hering
  1 sibling, 1 reply; 58+ messages in thread
From: Juergen Gross @ 2018-04-11  7:39 UTC (permalink / raw)
  To: Dario Faggioli, Olaf Hering; +Cc: Andrew Cooper, George Dunlap, xen-devel

On 11/04/18 09:31, Dario Faggioli wrote:
> On Wed, 2018-04-11 at 00:59 +0200, Dario Faggioli wrote:
>> [Adding Andrew, not because I expect anything, but just because
>> we've chatted about this issue on IRC :-) ]
>>
> Except, I did not add it. :-P
> 
> Anyway...
> 
>> On Tue, 2018-04-10 at 22:37 +0200, Olaf Hering wrote:
>>> On Tue, Apr 10, Dario Faggioli wrote:
>>>
>>>     BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
>>>
> ... patch attached.

Wouldn't it make more sense to add the call of vcpu_sleep_nosync()
to vcpu_migrate() and drop all the other calls of vcpu_sleep_nosync()
before calling vcpu_migrate()?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-11  7:39                   ` Juergen Gross
@ 2018-04-11  7:42                     ` Dario Faggioli
  0 siblings, 0 replies; 58+ messages in thread
From: Dario Faggioli @ 2018-04-11  7:42 UTC (permalink / raw)
  To: Juergen Gross, Olaf Hering; +Cc: Andrew Cooper, George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 856 bytes --]

On Wed, 2018-04-11 at 09:39 +0200, Juergen Gross wrote:
> On 11/04/18 09:31, Dario Faggioli wrote:
> > > On Tue, 2018-04-10 at 22:37 +0200, Olaf Hering wrote:
> > > > On Tue, Apr 10, Dario Faggioli wrote:
> > > > 
> > > >     BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
> > > > 
> > 
> > ... patch attached.
> 
> Wouldn't it make more sense to add the call of vcpu_sleep_nosync()
> to vcpu_migrate() and drop all the other calls of vcpu_sleep_nosync()
> before calling vcpu_migrate()?
>
Absolutely. But let's first see if this actually fixes the problem. :-)

Regards,
Dario
<pre>-- <br></pre>&lt;&lt;This happens because I choose it to happen!&gt;&gt; (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-11  6:23               ` Olaf Hering
@ 2018-04-11  8:42                 ` Dario Faggioli
  2018-04-11  8:48                   ` Olaf Hering
  0 siblings, 1 reply; 58+ messages in thread
From: Dario Faggioli @ 2018-04-11  8:42 UTC (permalink / raw)
  To: Olaf Hering; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3203 bytes --]

On Wed, 2018-04-11 at 08:23 +0200, Olaf Hering wrote:
> It turned out that I had a typo all the time in my template, it used
> 'cpu=' rather than 'cpus='. On this system none of this works:
> #pus="node:${node}"
> cpus="nodes:${node}"
> #pus="nodes:${node},^node:0"
> #pus_soft="nodes:${node},^node:0"
> 
> Only 'cpus=node:1' or 'cpus=nodes:1' works, cpus=node:2 or node:3
> does
> not.
> There is room for domUs:
> numa_info              :
> node:    memsize    memfree    distances
>    0:     30720       1912      10,21,21,21
>    1:     28672      22355      21,10,21,21
>    2:     24576      24502      21,21,10,21
>    3:     32768      31760      21,21,21,10
> 
So, now, when you say 'does not work', do you mean 'domain creation is
aborted with errors' or 'domain is created, but memory is not where it
should be'.

IAC, here, when using `xl vcpu-pin':

root@Zhaman:/home/dario# xl vcpu-pin 1 all node:0-1,^nodes:0,4-7,^5
root@Zhaman:/home/dario# xl vcpu-list 1
Name                                ID  VCPU   CPU State   Time(s) Affinity (Hard / Soft)
leap15                               1     0   14   -b-      15.5  4,6-15 / all
leap15                               1     1   11   -b-       4.7  4,6-15 / all
leap15                               1     2   12   -b-       4.6  4,6-15 / all
leap15                               1     3   14   -b-       4.3  4,6-15 / all
leap15                               1     4   14   -b-       5.8  4,6-15 / all
leap15                               1     5    8   -b-       4.5  4,6-15 / all
leap15                               1     6   10   -b-       4.3  4,6-15 / all
leap15                               1     7    9   -b-       3.7  4,6-15 / all

If I shut the domain down, and re-create it with cpus="..." and
cpus_soft="...":

root@Zhaman:/home/dario# cat vms/hvm/leap15.cfg |grep -e "cpus[=|_]"
cpus="node:0,^4-6,nodes:1,^12,^14"
cpus_soft="nodes:0,^node:1"
root@Zhaman:/home/dario# xl vcpu-list 2
Name                                ID  VCPU   CPU State   Time(s) Affinity (Hard / Soft)
leap15                               2     0   13   -b-      17.3  0-3,7-11,13,15 / 0-7
leap15                               2     1    7   -b-       7.4  0-3,7-11,13,15 / 0-7
leap15                               2     2    0   -b-       6.0  0-3,7-11,13,15 / 0-7
leap15                               2     3    2   -b-       7.9  0-3,7-11,13,15 / 0-7

And from `xl debug-key u':
(XEN) [ 3841.835310] Domain 2 (total: 1044554):
(XEN) [ 3841.844555]     Node 0: 1044554
(XEN) [ 3841.844559]     Node 1: 0

which is fine, because soft-affinity is used, if it is explicitly specified.

So, I'd say that all seems to work fine, even using nodes and then only
a single digit, or using node and then a range. And also doing all the
various set manipulations, like "nodes:0,^node:1".

I really am not sure what the issue could be, there at your side...

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-11  8:42                 ` Dario Faggioli
@ 2018-04-11  8:48                   ` Olaf Hering
  2018-04-11 10:20                     ` Dario Faggioli
  2018-04-11 10:20                     ` Dario Faggioli
  0 siblings, 2 replies; 58+ messages in thread
From: Olaf Hering @ 2018-04-11  8:48 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 350 bytes --]

On Wed, Apr 11, Dario Faggioli wrote:

> So, now, when you say 'does not work', do you mean 'domain creation is
> aborted with errors' or 'domain is created, but memory is not where it
> should be'.

domU can not be created due to "libxl__set_vcpuaffinity: setting vcpu
affinity: Invalid argument". I guess something is special on this system.

Olaf

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-11  7:31                 ` Dario Faggioli
  2018-04-11  7:39                   ` Juergen Gross
@ 2018-04-11 10:00                   ` Olaf Hering
       [not found]                     ` <298ec681a9c38eb7618e6b3e226486691e9eab4d.camel@suse.com>
  2018-04-11 15:03                     ` Olaf Hering
  1 sibling, 2 replies; 58+ messages in thread
From: Olaf Hering @ 2018-04-11 10:00 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andrew Cooper, George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 236 bytes --]

On Wed, Apr 11, Dario Faggioli wrote:

> Olaf, can you give it a try? It should be fine to run it on top of the
> last debug patch (the one that produced this crash).

Yes, with both changes it did >4k iterations already. Thanks.

Olaf

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-11  8:48                   ` Olaf Hering
@ 2018-04-11 10:20                     ` Dario Faggioli
  2018-04-11 12:45                       ` Olaf Hering
  2018-04-11 10:20                     ` Dario Faggioli
  1 sibling, 1 reply; 58+ messages in thread
From: Dario Faggioli @ 2018-04-11 10:20 UTC (permalink / raw)
  To: Olaf Hering; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 967 bytes --]

On Wed, 2018-04-11 at 10:48 +0200, Olaf Hering wrote:
> On Wed, Apr 11, Dario Faggioli wrote:
> > So, now, when you say 'does not work', do you mean 'domain creation
> > is
> > aborted with errors' or 'domain is created, but memory is not where
> > it
> > should be'.
> 
> domU can not be created due to "libxl__set_vcpuaffinity: setting vcpu
> affinity: Invalid argument". I guess something is special on this
> system.
>
Looks like it. :-O

If you're interested in figuring out, I'd like to see:
- full output of `xl info -n'
- output of `xl debug-key u'
- xl vcpu-list
- xl list -n

right before trying to create the domain.

And I guess having also a look at `xl dmesg' won't hurt.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-11  8:48                   ` Olaf Hering
  2018-04-11 10:20                     ` Dario Faggioli
@ 2018-04-11 10:20                     ` Dario Faggioli
  1 sibling, 0 replies; 58+ messages in thread
From: Dario Faggioli @ 2018-04-11 10:20 UTC (permalink / raw)
  To: Olaf Hering; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 967 bytes --]

On Wed, 2018-04-11 at 10:48 +0200, Olaf Hering wrote:
> On Wed, Apr 11, Dario Faggioli wrote:
> > So, now, when you say 'does not work', do you mean 'domain creation
> > is
> > aborted with errors' or 'domain is created, but memory is not where
> > it
> > should be'.
> 
> domU can not be created due to "libxl__set_vcpuaffinity: setting vcpu
> affinity: Invalid argument". I guess something is special on this
> system.
>
Looks like it. :-O

If you're interested in figuring out, I'd like to see:
- full output of `xl info -n'
- output of `xl debug-key u'
- xl vcpu-list
- xl list -n

right before trying to create the domain.

And I guess having also a look at `xl dmesg' won't hurt.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
       [not found]                 ` <9c857d1a-d592-8db5-827c-30fbc97477e0@citrix.com>
@ 2018-04-11 11:00                   ` Dario Faggioli
  0 siblings, 0 replies; 58+ messages in thread
From: Dario Faggioli @ 2018-04-11 11:00 UTC (permalink / raw)
  To: George Dunlap, Olaf Hering; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2449 bytes --]

On Wed, 2018-04-11 at 11:37 +0100, George Dunlap wrote:
> On 04/10/2018 11:59 PM, Dario Faggioli wrote:
> > 
> > So, basically, the race is between context_saved() and
> > vcpu_set_affinity(). Basically, vcpu_set_affinity() sets the
> > VPF_migrating pause flags on a vcpu in a runqueue, with the intent
> > of
> > letting either a vcpu_sleep_nosync() or a reschedule remove it from
> > there, but context_saved() manage to see the flag, before the
> > removal
> > can happen.
> > 
> Yep, that looks correct.  I had considered some sort of race between
> set_affinity() and context_switch(), but just never noticed that it
> could omit to take the vcpu off the runqueue.
> 
Yeah, it's very subtle. IN fact, when I considered that race, I was
assuming that, if we are in context_saved() with VPF_migrating set, the
vcpu can't be in any runqueue, as the scheduler would have seen it was
not runnable, and not queue it.

I was missing the fact that someone could raise the flag on a vcpu
which is already in the runqueue, between when the scheduler lock is
dropped, and this check in context_saved()!

> > TBH, I have actually never fully understood what that comment
> > really
> > meant, what the barrier was protecting, and how... e.g., isn't it
> > missing its paired one? In fact, there's another comment, clearly
> > related, right in vcpu_set_affinity(). But again I'm a bit at loss
> > at
> > properly figuring out what the big idea is.
> 
> I think the idea is to make sure that the change to v->is_running
> happens before whatever happens to come next (i.e., that the compiler
> doesn't reorder the write as part of its normal optimization
> activities).  As it happens nothing that comes next looks like it
> really
> needs such ordering (particularly as you can't reorder things over a
> function call, AFAIUI), but it's good to have those in place in case
> anybody *does* add that sort of thing.
> 
Sure, I wasn't planning to remove them. I was curious, and in
particular, I was curious of whether they were actually meant at trying
to prevent this (or a similar) race... I'll do a bit of archeology, if
I find some time.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
       [not found]                     ` <298ec681a9c38eb7618e6b3e226486691e9eab4d.camel@suse.com>
@ 2018-04-11 11:02                       ` George Dunlap
  2018-04-11 12:31                         ` Jan Beulich
  0 siblings, 1 reply; 58+ messages in thread
From: George Dunlap @ 2018-04-11 11:02 UTC (permalink / raw)
  To: Dario Faggioli, Olaf Hering; +Cc: Andrew Cooper, xen-devel

On 04/11/2018 11:17 AM, Dario Faggioli wrote:
> On Wed, 2018-04-11 at 12:00 +0200, Olaf Hering wrote:
>> On Wed, Apr 11, Dario Faggioli wrote:
>>
>>> Olaf, can you give it a try? It should be fine to run it on top of
>>> the
>>> last debug patch (the one that produced this crash).
>>
>> Yes, with both changes it did >4k iterations already. Thanks.
>>
> That's great to hear! :-D
> 
> Now, I think I'll submit it as a proper patch in the variant that
> Juergen suggested, and that I also were thinking to use.
> 
> George, any opinion? I'm going somewhere now. If I don't hear any
> pushback, I'll do that as soon as back.

I think for simplicity / reliability of backporting, we should start
with a patch like the one you gave to Olaf (i.e., adding the "missing"
vcpu_sleep_nosync()).

Moving forward we should definitely move things around so that there's
no risk of accidentally forgetting to take the vcpu off the runqueue,
but there are some other changes it might be nice to make as well; for
instance, it looks like on a busy system there may be a fair amount of
duplicate cpu_pick() calculations; it would be nice to avoid that.

But those probably shouldn't be done during the feature freeze.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-11 11:02                       ` George Dunlap
@ 2018-04-11 12:31                         ` Jan Beulich
  0 siblings, 0 replies; 58+ messages in thread
From: Jan Beulich @ 2018-04-11 12:31 UTC (permalink / raw)
  To: George Dunlap, Dario Faggioli; +Cc: Andrew Cooper, Olaf Hering, xen-devel

>>> On 11.04.18 at 13:02, <george.dunlap@citrix.com> wrote:
> On 04/11/2018 11:17 AM, Dario Faggioli wrote:
>> On Wed, 2018-04-11 at 12:00 +0200, Olaf Hering wrote:
>>> On Wed, Apr 11, Dario Faggioli wrote:
>>>
>>>> Olaf, can you give it a try? It should be fine to run it on top of
>>>> the
>>>> last debug patch (the one that produced this crash).
>>>
>>> Yes, with both changes it did >4k iterations already. Thanks.
>>>
>> That's great to hear! :-D
>> 
>> Now, I think I'll submit it as a proper patch in the variant that
>> Juergen suggested, and that I also were thinking to use.
>> 
>> George, any opinion? I'm going somewhere now. If I don't hear any
>> pushback, I'll do that as soon as back.
> 
> I think for simplicity / reliability of backporting, we should start
> with a patch like the one you gave to Olaf (i.e., adding the "missing"
> vcpu_sleep_nosync()).

Not sure - I've fallen into pitfalls like this a couple of times recently.
If backports didn't move the call into vcpu_migrate(), and later we'd
add a new call to that function somewhere else, a backport thereof
would have basically no chance of noticing that a call to
vcpu_sleep_nosync() would need to be added as well.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-11 10:20                     ` Dario Faggioli
@ 2018-04-11 12:45                       ` Olaf Hering
  2018-04-17 12:39                         ` Dario Faggioli
  0 siblings, 1 reply; 58+ messages in thread
From: Olaf Hering @ 2018-04-11 12:45 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 1289 bytes --]

On Wed, Apr 11, Dario Faggioli wrote:

> If you're interested in figuring out, I'd like to see:
> - full output of `xl info -n'
> - output of `xl debug-key u'
> - xl vcpu-list
> - xl list -n

Logs for this .cfg attached:

name='fv_sles12sp1.0'
vif=[ 'mac=00:18:3e:58:00:c1,bridge=br0' ]
memory=4444
vcpus=36
serial="pty"
builder="hvm"
kernel="/xen100.migration/olh/bug1088498/nfsroot_sles12sp2.bug1088498/boot/vmlinuz"
ramdisk="/xen100.migration/olh/bug1088498/nfsroot_sles12sp2.bug1088498/boot/initrd"
cmdline="quiet panic=9 root=nfs:xen100:/share/migration/olh/bug1088498/nfsroot_sles12sp2.bug1088498,vers=3,tcp,actimeo=1,nolock readonlyroot ro Xignore_loglevel Xdebug Xsystemd.log_target=kmsg    Xsystemd.log_level=debug Xrd.debug Xrd.shell Xrd.udev.debug Xudev.log-priority=debug Xrd.udev.log-priority=debug console=ttyS0"
cpus="node:2"
#pus="nodes:2"
#pus="nodes:2,^node:0"
#pus_soft="nodes:2,^node:0"
#isk=[ 'file:/xen100.migration/olh/bug1088498/vdisk.fv_sles12sp1.0.disk0.raw,xvda,w',
disk=[ 'file:/xen100.migration/olh/bug1088498/vdisk.fv_sles12sp1.0.disk0.raw,xvda,w',
'file:/fio_tmpfs_bug1081897/fv_sles12sp1.0.ramdisk01.raw,xvdta,w',
'file:/fio_tmpfs_bug1081897/fv_sles12sp1.0.ramdisk02.raw,xvdtb,w',
'file:/fio_tmpfs_bug1081897/fv_sles12sp1.0.ramdisk03.raw,xvdtc,w',
]


Olaf

[-- Attachment #1.1.2: xl-create.txt --]
[-- Type: text/plain, Size: 7012 bytes --]

Parsing config from /xen100.migration/olh/bug1088498/fv_sles12sp1.0.tst.cfg
{
    "c_info": {
        "type": "hvm",
        "name": "fv_sles12sp1.0",
        "uuid": "bf6db3fe-4553-456c-b185-c544cd232fb2",
        "run_hotplug_scripts": "True"
    },
    "b_info": {
        "max_vcpus": 36,
        "avail_vcpus": [
            0,
            1,
            2,
            3,
            4,
            5,
            6,
            7,
            8,
            9,
            10,
            11,
            12,
            13,
            14,
            15,
            16,
            17,
            18,
            19,
            20,
            21,
            22,
            23,
            24,
            25,
            26,
            27,
            28,
            29,
            30,
            31,
            32,
            33,
            34,
            35
        ],
        "vcpu_hard_affinity": [
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ]
        ],
        "numa_placement": "False",
        "max_memkb": 4550656,
        "target_memkb": 4550656,
        "shadow_memkb": 72416,
        "sched_params": {

        },
        "claim_mode": "True",
        "kernel": "/xen100.migration/olh/bug1088498/nfsroot_sles12sp2.bug1088498/boot/vmlinuz",
        "cmdline": "quiet panic=9 root=nfs:xen100:/share/migration/olh/bug1088498/nfsroot_sles12sp2.bug1088498,vers=3,tcp,actimeo=1,nolock readonlyroot ro Xignore_loglevel Xdebug Xsystemd.log_target=kmsg    Xsystemd.log_level=debug Xrd.debug Xrd.shell Xrd.udev.debug Xudev.log-priority=debug Xrd.udev.log-priority=debug console=ttyS0",
        "ramdisk": "/xen100.migration/olh/bug1088498/nfsroot_sles12sp2.bug1088498/boot/initrd",
        "type.hvm": {
            "vga": {

            },
            "vnc": {

            },
            "sdl": {

            },
            "spice": {

            },
            "serial": "pty",
            "rdm": {

            }
        },
        "arch_arm": {

        }
    },
    "disks": [
        {
            "pdev_path": "/xen100.migration/olh/bug1088498/vdisk.fv_sles12sp1.0.disk0.raw",
            "vdev": "xvda",
            "format": "raw",
            "readwrite": 1
        },
        {
            "pdev_path": "/fio_tmpfs_bug1081897/fv_sles12sp1.0.ramdisk01.raw",
            "vdev": "xvdta",
            "format": "raw",
            "readwrite": 1
        },
        {
            "pdev_path": "/fio_tmpfs_bug1081897/fv_sles12sp1.0.ramdisk02.raw",
            "vdev": "xvdtb",
            "format": "raw",
            "readwrite": 1
        },
        {
            "pdev_path": "/fio_tmpfs_bug1081897/fv_sles12sp1.0.ramdisk03.raw",
            "vdev": "xvdtc",
            "format": "raw",
            "readwrite": 1
        }
    ],
    "nics": [
        {
            "devid": 0,
            "mac": "00:18:3e:58:00:c1",
            "bridge": "br0"
        }
    ],
    "on_reboot": "restart",
    "on_soft_reset": "soft_reset"
}
libxl: debug: libxl_create.c:1670:do_domain_create: Domain 0:ao 0x1593560: create: how=(nil) callback=(nil) poller=0x1595dd0
libxl: debug: libxl_device.c:397:libxl__device_disk_set_backend: Disk vdev=xvda spec.backend=unknown
libxl: debug: libxl_device.c:432:libxl__device_disk_set_backend: Disk vdev=xvda, using backend phy
libxl: debug: libxl_device.c:397:libxl__device_disk_set_backend: Disk vdev=xvdta spec.backend=unknown
libxl: debug: libxl_device.c:432:libxl__device_disk_set_backend: Disk vdev=xvdta, using backend phy
libxl: debug: libxl_device.c:397:libxl__device_disk_set_backend: Disk vdev=xvdtb spec.backend=unknown
libxl: debug: libxl_device.c:432:libxl__device_disk_set_backend: Disk vdev=xvdtb, using backend phy
libxl: debug: libxl_device.c:397:libxl__device_disk_set_backend: Disk vdev=xvdtc spec.backend=unknown
libxl: debug: libxl_device.c:432:libxl__device_disk_set_backend: Disk vdev=xvdtc, using backend phy
libxl: debug: libxl_create.c:1007:initiate_domain_create: Domain 1:running bootloader
libxl: debug: libxl_bootloader.c:328:libxl__bootloader_run: Domain 1:not a PV/PVH domain, skipping bootloader
libxl: debug: libxl_event.c:686:libxl__ev_xswatch_deregister: watch w=0x1596ec0: deregister unregistered
libxl: error: libxl_sched.c:62:libxl__set_vcpuaffinity: Domain 1:Setting vcpu affinity: Invalid argument
libxl: error: libxl_dom.c:461:libxl__build_pre: setting affinity failed on vcpu `0'
libxl: error: libxl_create.c:1266:domcreate_rebuild_done: Domain 1:cannot (re-)build domain: -3
libxl: debug: libxl_domain.c:1172:devices_destroy_cb: Domain 1:Forked pid 3667 for destroy of domain
libxl: debug: libxl_create.c:1707:do_domain_create: Domain 0:ao 0x1593560: inprogress: poller=0x1595dd0, flags=i
libxl: debug: libxl_event.c:1869:libxl__ao_complete: ao 0x1593560: complete, rc=-3
libxl: debug: libxl_event.c:1838:libxl__ao__destroy: ao 0x1593560: destroy
libxl: debug: libxl_domain.c:902:libxl_domain_destroy: Domain 1:ao 0x1593560: create: how=(nil) callback=(nil) poller=0x1595dd0
libxl: error: libxl_domain.c:1034:libxl__destroy_domid: Domain 1:Non-existant domain
libxl: error: libxl_domain.c:993:domain_destroy_callback: Domain 1:Unable to destroy guest
libxl: error: libxl_domain.c:920:domain_destroy_cb: Domain 1:Destruction of domain failed
libxl: debug: libxl_event.c:1869:libxl__ao_complete: ao 0x1593560: complete, rc=-21
libxl: debug: libxl_domain.c:911:libxl_domain_destroy: Domain 1:ao 0x1593560: inprogress: poller=0x1595dd0, flags=ic
libxl: debug: libxl_event.c:1838:libxl__ao__destroy: ao 0x1593560: destroy
xencall:buffer: debug: total allocations:59 total releases:59
xencall:buffer: debug: current allocations:0 maximum allocations:2
xencall:buffer: debug: cache current size:2
xencall:buffer: debug: cache hits:46 misses:2 toobig:11
xencall:buffer: debug: total allocations:0 total releases:0
xencall:buffer: debug: current allocations:0 maximum allocations:0
xencall:buffer: debug: cache current size:0
xencall:buffer: debug: cache hits:0 misses:0 toobig:0

[-- Attachment #1.1.3: xl-dmesg.txt --]
[-- Type: text/plain, Size: 47072 bytes --]

09] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x0a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x0b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x0c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x0d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x0e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x0f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x10] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x11] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x12] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x13] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x14] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x15] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x16] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x17] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x18] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x19] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x1a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x1b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x1c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x1d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x1e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x1f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x20] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x21] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x22] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x23] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x24] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x25] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x26] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x27] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x28] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x29] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x2a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x2b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x2c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x2d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x2e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x2f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x30] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x31] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x32] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x33] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x34] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x35] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x36] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x37] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x38] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x39] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x3a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x3b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x3c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x3d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x3e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x3f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x40] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x41] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x42] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x43] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x44] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x45] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x46] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x47] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x48] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x49] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x4a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x4b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x4c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x4d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x4e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x4f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x50] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x51] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x52] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x53] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x54] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x55] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x56] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x57] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x58] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x59] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x5a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x5b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x5c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x5d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x5e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x5f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x60] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x61] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x62] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x63] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x64] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x65] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x66] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x67] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x68] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x69] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x6a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x6b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x6c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x6d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x6e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x6f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x70] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x71] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x72] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x73] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x74] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x75] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x76] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x77] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x78] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x79] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x7a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x7b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x7c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x7c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x7d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x7e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x7f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x80] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x81] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x82] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x83] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x84] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x85] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x86] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x87] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x88] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x89] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x8a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x8b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x8c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x8d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x8f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x90] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x91] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x92] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x93] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x94] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x95] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x96] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x97] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x98] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x99] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x9a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x9b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x9c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x9d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x9e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x9f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa0] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa1] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa2] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa3] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa4] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa5] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa6] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa7] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa8] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa9] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xaa] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xab] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xac] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xad] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xae] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xaf] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb0] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb1] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb2] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb3] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb4] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb5] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb6] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb7] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb8] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb9] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xba] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xbb] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xbc] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xbd] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xbe] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xbf] high level lint[0x1])
(XEN) Overriding APIC driver with bigsmp
(XEN) ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: IOAPIC (id[0x09] address[0xfec01000] gsi_base[24])
(XEN) IOAPIC[1]: apic_id 9, version 32, address 0xfec01000, GSI 24-47
(XEN) ACPI: IOAPIC (id[0x0a] address[0xfec40000] gsi_base[48])
(XEN) IOAPIC[2]: apic_id 10, version 32, address 0xfec40000, GSI 48-71
(XEN) ACPI: IOAPIC (id[0x0b] address[0xfec80000] gsi_base[72])
(XEN) IOAPIC[3]: apic_id 11, version 32, address 0xfec80000, GSI 72-95
(XEN) ACPI: IOAPIC (id[0x0c] address[0xfecc0000] gsi_base[96])
(XEN) IOAPIC[4]: apic_id 12, version 32, address 0xfecc0000, GSI 96-119
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:  Phys.  Using 5 I/O APICs
(XEN) ACPI: HPET id: 0x8086a301 base: 0xfed00000
(XEN) ERST table was not found
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) SMP: Allowing 192 CPUs (48 hotplug CPUs)
(XEN) IRQ limits: 120 GSI, 27544 MSI/MSI-X
(XEN) Not enabling x2APIC: depends on iommu_supports_eim.
(XEN) xstate: size: 0x340 and states: 0x7
(XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, SER, CMCI
(XEN) CMCI: threshold 0x2 too large for CPU0 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU0 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU0 bank 19, using 0x1
(XEN) CPU0: Intel machine check reporting enabled
(XEN) Speculative mitigation facilities:
(XEN)   Hardware features:
(XEN)   Compiled-in support: INDIRECT_THUNK
(XEN) BTI mitigations: Thunk RETPOLINE, Others: RSB_NATIVE RSB_VMEXIT
(XEN) XPTI: enabled
(XEN) Using scheduler: SMP Credit Scheduler rev2 (credit2)
(XEN) Initializing Credit2 scheduler
(XEN)  load_precision_shift: 18
(XEN)  load_window_shift: 30
(XEN)  underload_balance_tolerance: 0
(XEN)  overload_balance_tolerance: -3
(XEN)  runqueues arrangement: socket
(XEN)  cap enforcement granularity: 10ms
(XEN) load tracking window length 1073741824 ns
(XEN) Adding cpu 0 to runqueue 0
(XEN)  First cpu on runqueue, activating
(XEN) Platform timer is 14.318MHz HPET
(XEN) Detected 2493.993 MHz processor.
(XEN) EFI memory map:
(XEN)  0000000000000-000000008dfff type=7 attr=000000000000000f
(XEN)  000000008e000-000000008ffff type=0 attr=000000000000000f
(XEN)  0000000090000-000000009dfff type=7 attr=000000000000000f
(XEN)  000000009e000-000000009ffff type=2 attr=000000000000000f
(XEN)  0000000100000-00000003fffff type=7 attr=000000000000000f
(XEN)  0000000400000-0000000505fff type=3 attr=000000000000000f
(XEN)  0000000506000-000002e54efff type=7 attr=000000000000000f
(XEN)  000002e54f000-0000048289fff type=2 attr=000000000000000f
(XEN)  000004828a000-0000048309fff type=4 attr=000000000000000f
(XEN)  000004830a000-00000484d1fff type=7 attr=000000000000000f
(XEN)  00000484d2000-0000048a7afff type=2 attr=000000000000000f
(XEN)  0000048a7b000-0000049c7afff type=1 attr=000000000000000f
(XEN)  0000049c7b000-000005ba84fff type=4 attr=000000000000000f
(XEN)  000005ba85000-000005ba85fff type=3 attr=000000000000000f
(XEN)  000005ba86000-000005bc9dfff type=7 attr=000000000000000f
(XEN)  000005bc9e000-000005be88fff type=2 attr=000000000000000f
(XEN)  000005be89000-000005beb8fff type=7 attr=000000000000000f
(XEN)  000005beb9000-000005c288fff type=1 attr=000000000000000f
(XEN)  000005c289000-000005cdbffff type=7 attr=000000000000000f
(XEN)  000005cdc0000-000005d288fff type=3 attr=000000000000000f
(XEN)  000005d289000-000005d688fff type=6 attr=800000000000000f
(XEN)  000005d689000-000005de88fff type=5 attr=800000000000000f
(XEN)  000005de89000-000005e785fff type=0 attr=000000000000000f
(XEN)  000005e786000-0000060952fff type=10 attr=000000000000000f
(XEN)  0000060953000-0000060adbfff type=9 attr=000000000000000f
(XEN)  0000060adc000-0000079c4efff type=7 attr=000000000000000f
(XEN)  0000079c4f000-000007a22dfff type=4 attr=000000000000000f
(XEN)  000007a22e000-000007a22ffff type=7 attr=000000000000000f
(XEN)  000007a230000-000007a28afff type=4 attr=000000000000000f
(XEN)  000007a28b000-000007a2d3fff type=7 attr=000000000000000f
(XEN)  000007a2d4000-000007bafffff type=4 attr=000000000000000f
(XEN)  0000100000000-0001c7fffffff type=7 attr=000000000000000f
(XEN)  00000000a0000-00000000bffff type=0 attr=0000000000000001
(XEN)  00000000c0000-00000000dffff type=0 attr=0000000000000000
(XEN)  00000000e0000-00000000fffff type=0 attr=0000000000000001
(XEN)  000007bb00000-000007bffffff type=0 attr=0000000000000008
(XEN)  000007c000000-000007fbfffff type=0 attr=0000000000000001
(XEN)  000007fc00000-000007fffffff type=0 attr=0000000000000008
(XEN)  0000080000000-000008fffffff type=11 attr=8000000000000001
(XEN)  00000fed1c000-00000fed1ffff type=11 attr=8000000000000001
(XEN) Initing memory sharing.
(XEN) alt table ffff82d0806717f0 -> ffff82d080673658
(XEN) PCI: MCFG configuration 0: base 80000000 segment 0000 buses 00 - ff
(XEN) PCI: MCFG area at 80000000 reserved in E820
(XEN) PCI: Using MCFG for segment 0000 bus 00-ff
(XEN) I/O virtualisation disabled
(XEN) nr_sockets: 5
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using old ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) TSC deadline timer enabled
(XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.
(XEN) Defaulting to alternative key handling; send 'A' to switch to normal mode.
(XEN) Allocated console ring of 2048 KiB.
(XEN) mwait-idle: disabled
(XEN) VMX: Supported advanced features:
(XEN)  - APIC MMIO access virtualisation
(XEN)  - APIC TPR shadow
(XEN)  - Extended Page Tables (EPT)
(XEN)  - Virtual-Processor Identifiers (VPID)
(XEN)  - Virtual NMI
(XEN)  - MSR direct-access bitmap
(XEN)  - Unrestricted Guest
(XEN)  - APIC Register Virtualization
(XEN)  - Virtual Interrupt Delivery
(XEN)  - Posted Interrupt Processing
(XEN)  - VMCS shadowing
(XEN)  - VM Functions
(XEN) HVM: ASIDs enabled.
(XEN) HVM: VMX enabled
(XEN) HVM: Hardware Assisted Paging (HAP) detected
(XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB
(XEN) Adding cpu 1 to runqueue 0
(XEN) Adding cpu 2 to runqueue 0
(XEN) Adding cpu 3 to runqueue 0
(XEN) Adding cpu 4 to runqueue 0
(XEN) Adding cpu 5 to runqueue 0
(XEN) Adding cpu 6 to runqueue 0
(XEN) Adding cpu 7 to runqueue 0
(XEN) Adding cpu 8 to runqueue 0
(XEN) Adding cpu 9 to runqueue 0
(XEN) Adding cpu 10 to runqueue 0
(XEN) Adding cpu 11 to runqueue 0
(XEN) Adding cpu 12 to runqueue 0
(XEN) Adding cpu 13 to runqueue 0
(XEN) Adding cpu 14 to runqueue 0
(XEN) Adding cpu 15 to runqueue 0
(XEN) Adding cpu 16 to runqueue 0
(XEN) Adding cpu 17 to runqueue 0
(XEN) Adding cpu 18 to runqueue 0
(XEN) Adding cpu 19 to runqueue 0
(XEN) Adding cpu 20 to runqueue 0
(XEN) Adding cpu 21 to runqueue 0
(XEN) Adding cpu 22 to runqueue 0
(XEN) Adding cpu 23 to runqueue 0
(XEN) Adding cpu 24 to runqueue 0
(XEN) Adding cpu 25 to runqueue 0
(XEN) Adding cpu 26 to runqueue 0
(XEN) Adding cpu 27 to runqueue 0
(XEN) Adding cpu 28 to runqueue 0
(XEN) Adding cpu 29 to runqueue 0
(XEN) Adding cpu 30 to runqueue 0
(XEN) Adding cpu 31 to runqueue 0
(XEN) Adding cpu 32 to runqueue 0
(XEN) Adding cpu 33 to runqueue 0
(XEN) Adding cpu 34 to runqueue 0
(XEN) Adding cpu 35 to runqueue 0
(XEN) CMCI: threshold 0x2 too large for CPU36 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU36 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU36 bank 19, using 0x1
(XEN) Adding cpu 36 to runqueue 1
(XEN)  First cpu on runqueue, activating
(XEN) Adding cpu 37 to runqueue 1
(XEN) Adding cpu 38 to runqueue 1
(XEN) Adding cpu 39 to runqueue 1
(XEN) Adding cpu 40 to runqueue 1
(XEN) Adding cpu 41 to runqueue 1
(XEN) Adding cpu 42 to runqueue 1
(XEN) Adding cpu 43 to runqueue 1
(XEN) Adding cpu 44 to runqueue 1
(XEN) Adding cpu 45 to runqueue 1
(XEN) Adding cpu 46 to runqueue 1
(XEN) Adding cpu 47 to runqueue 1
(XEN) Adding cpu 48 to runqueue 1
(XEN) Adding cpu 49 to runqueue 1
(XEN) Adding cpu 50 to runqueue 1
(XEN) Adding cpu 51 to runqueue 1
(XEN) Adding cpu 52 to runqueue 1
(XEN) Adding cpu 53 to runqueue 1
(XEN) Adding cpu 54 to runqueue 1
(XEN) Adding cpu 55 to runqueue 1
(XEN) Adding cpu 56 to runqueue 1
(XEN) Adding cpu 57 to runqueue 1
(XEN) Adding cpu 58 to runqueue 1
(XEN) Adding cpu 59 to runqueue 1
(XEN) Adding cpu 60 to runqueue 1
(XEN) Adding cpu 61 to runqueue 1
(XEN) Adding cpu 62 to runqueue 1
(XEN) Adding cpu 63 to runqueue 1
(XEN) Adding cpu 64 to runqueue 1
(XEN) Adding cpu 65 to runqueue 1
(XEN) Adding cpu 66 to runqueue 1
(XEN) Adding cpu 67 to runqueue 1
(XEN) Adding cpu 68 to runqueue 1
(XEN) Adding cpu 69 to runqueue 1
(XEN) Adding cpu 70 to runqueue 1
(XEN) Adding cpu 71 to runqueue 1
(XEN) CMCI: threshold 0x2 too large for CPU72 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU72 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU72 bank 19, using 0x1
(XEN) Adding cpu 72 to runqueue 2
(XEN)  First cpu on runqueue, activating
(XEN) Adding cpu 73 to runqueue 2
(XEN) Adding cpu 74 to runqueue 2
(XEN) Adding cpu 75 to runqueue 2
(XEN) Adding cpu 76 to runqueue 2
(XEN) Adding cpu 77 to runqueue 2
(XEN) Adding cpu 78 to runqueue 2
(XEN) Adding cpu 79 to runqueue 2
(XEN) Adding cpu 80 to runqueue 2
(XEN) Adding cpu 81 to runqueue 2
(XEN) Adding cpu 82 to runqueue 2
(XEN) Adding cpu 83 to runqueue 2
(XEN) Adding cpu 84 to runqueue 2
(XEN) Adding cpu 85 to runqueue 2
(XEN) Adding cpu 86 to runqueue 2
(XEN) Adding cpu 87 to runqueue 2
(XEN) Adding cpu 88 to runqueue 2
(XEN) Adding cpu 89 to runqueue 2
(XEN) Adding cpu 90 to runqueue 2
(XEN) Adding cpu 91 to runqueue 2
(XEN) Adding cpu 92 to runqueue 2
(XEN) Adding cpu 93 to runqueue 2
(XEN) Adding cpu 94 to runqueue 2
(XEN) Adding cpu 95 to runqueue 2
(XEN) Adding cpu 96 to runqueue 2
(XEN) Adding cpu 97 to runqueue 2
(XEN) Adding cpu 98 to runqueue 2
(XEN) Adding cpu 99 to runqueue 2
(XEN) Adding cpu 100 to runqueue 2
(XEN) Adding cpu 101 to runqueue 2
(XEN) Adding cpu 102 to runqueue 2
(XEN) Adding cpu 103 to runqueue 2
(XEN) Adding cpu 104 to runqueue 2
(XEN) Adding cpu 105 to runqueue 2
(XEN) Adding cpu 106 to runqueue 2
(XEN) Adding cpu 107 to runqueue 2
(XEN) CMCI: threshold 0x2 too large for CPU108 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU108 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU108 bank 19, using 0x1
(XEN) Adding cpu 108 to runqueue 3
(XEN)  First cpu on runqueue, activating
(XEN) Adding cpu 109 to runqueue 3
(XEN) Adding cpu 110 to runqueue 3
(XEN) Adding cpu 111 to runqueue 3
(XEN) Adding cpu 112 to runqueue 3
(XEN) Adding cpu 113 to runqueue 3
(XEN) Adding cpu 114 to runqueue 3
(XEN) Adding cpu 115 to runqueue 3
(XEN) Adding cpu 116 to runqueue 3
(XEN) Adding cpu 117 to runqueue 3
(XEN) Adding cpu 118 to runqueue 3
(XEN) Adding cpu 119 to runqueue 3
(XEN) Adding cpu 120 to runqueue 3
(XEN) Adding cpu 121 to runqueue 3
(XEN) Adding cpu 122 to runqueue 3
(XEN) Adding cpu 123 to runqueue 3
(XEN) Adding cpu 124 to runqueue 3
(XEN) Adding cpu 125 to runqueue 3
(XEN) Adding cpu 126 to runqueue 3
(XEN) Adding cpu 127 to runqueue 3
(XEN) Adding cpu 128 to runqueue 3
(XEN) Adding cpu 129 to runqueue 3
(XEN) Adding cpu 130 to runqueue 3
(XEN) Adding cpu 131 to runqueue 3
(XEN) Adding cpu 132 to runqueue 3
(XEN) Adding cpu 133 to runqueue 3
(XEN) Adding cpu 134 to runqueue 3
(XEN) Adding cpu 135 to runqueue 3
(XEN) Adding cpu 136 to runqueue 3
(XEN) Adding cpu 137 to runqueue 3
(XEN) Adding cpu 138 to runqueue 3
(XEN) Adding cpu 139 to runqueue 3
(XEN) Adding cpu 140 to runqueue 3
(XEN) Adding cpu 141 to runqueue 3
(XEN) Adding cpu 142 to runqueue 3
(XEN) Adding cpu 143 to runqueue 3
(XEN) Brought up 144 CPUs
(XEN) build-id: 2137921bc738bc97e99c82c05988f698
(XEN) Running stub recovery selftests...
(XEN) traps.c:1569: GPF (0000): ffff82d0bffff041 [ffff82d0bffff041] -> ffff82d0803753f2
(XEN) traps.c:754: Trap 12: ffff82d0bffff040 [ffff82d0bffff040] -> ffff82d0803753f2
(XEN) traps.c:1096: Trap 3: ffff82d0bffff041 [ffff82d0bffff041] -> ffff82d0803753f2
(XEN) ACPI sleep modes: S3
(XEN) VPMU: disabled
(XEN) mcheck_poll: Machine check polling timer started.
(XEN) Dom0 has maximum 1656 PIRQs
(XEN) grant_table.c:1769:IDLEv0 Expanding d0 grant table from 0 to 1 frames
(XEN) NX (Execute Disable) protection active
(XEN) *** Building a PV Dom0 ***
(XEN) ELF: phdr: paddr=0x1000000 memsz=0xabf000
(XEN) ELF: phdr: paddr=0x1c00000 memsz=0x15b000
(XEN) ELF: phdr: paddr=0x1d5b000 memsz=0x17518
(XEN) ELF: phdr: paddr=0x1d73000 memsz=0x497000
(XEN) ELF: memory: 0x1000000 -> 0x220a000
(XEN) ELF: note: GUEST_OS = "linux"
(XEN) ELF: note: GUEST_VERSION = "2.6"
(XEN) ELF: note: XEN_VERSION = "xen-3.0"
(XEN) ELF: note: VIRT_BASE = 0xffffffff80000000
(XEN) ELF: note: INIT_P2M = 0x8000000000
(XEN) ELF: note: ENTRY = 0xffffffff81d731f0
(XEN) ELF: note: HYPERCALL_PAGE = 0xffffffff81001000
(XEN) ELF: note: FEATURES = "!writable_page_tables|pae_pgdir_above_4gb|writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel"
(XEN) ELF: note: SUPPORTED_FEATURES = 0x90d
(XEN) ELF: note: PAE_MODE = "yes"
(XEN) ELF: note: LOADER = "generic"
(XEN) ELF: note: unknown (0xd)
(XEN) ELF: note: SUSPEND_CANCEL = 0x1
(XEN) ELF: note: MOD_START_PFN = 0x1
(XEN) ELF: note: HV_START_LOW = 0xffff800000000000
(XEN) ELF: note: PADDR_OFFSET = 0
(XEN) ELF: addresses:
(XEN)     virt_base        = 0xffffffff80000000
(XEN)     elf_paddr_offset = 0x0
(XEN)     virt_offset      = 0xffffffff80000000
(XEN)     virt_kstart      = 0xffffffff81000000
(XEN)     virt_kend        = 0xffffffff8220a000
(XEN)     virt_entry       = 0xffffffff81d731f0
(XEN)     p2m_base         = 0x8000000000
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x220a000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   0000000774000000->0000000778000000 (8368588 pages to be allocated)
(XEN)  Init. ramdisk: 0000001c7f1cc000->0000001c7ffff788
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff81000000->ffffffff8220a000
(XEN)  Init. ramdisk: 0000000000000000->0000000000000000
(XEN)  Phys-Mach map: 0000008000000000->0000008004000000
(XEN)  Start info:    ffffffff8220a000->ffffffff8220a4b4
(XEN)  Xenstore ring: 0000000000000000->0000000000000000
(XEN)  Console ring:  0000000000000000->0000000000000000
(XEN)  Page tables:   ffffffff8220b000->ffffffff82220000
(XEN)  Boot stack:    ffffffff82220000->ffffffff82221000
(XEN)  TOTAL:         ffffffff80000000->ffffffff82400000
(XEN)  ENTRY ADDRESS: ffffffff81d731f0
(XEN) Dom0 has maximum 30 VCPUs
(XEN) ELF: phdr 0 at 0xffffffff81000000 -> 0xffffffff81abf000
(XEN) ELF: phdr 1 at 0xffffffff81c00000 -> 0xffffffff81d5b000
(XEN) ELF: phdr 2 at 0xffffffff81d5b000 -> 0xffffffff81d72518
(XEN) ELF: phdr 3 at 0xffffffff81d73000 -> 0xffffffff81f66000
(XEN) Initial low memory virq threshold set at 0x4000 pages.
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 2048kB init memory
(XEN) d0: Forcing write emulation on MFNs 80000-8ffff
(XEN) PCI add device 0000:ff:08.0
(XEN) PCI add device 0000:ff:08.2
(XEN) PCI add device 0000:ff:09.0
(XEN) PCI add device 0000:ff:09.2
(XEN) PCI add device 0000:ff:0a.0
(XEN) PCI add device 0000:ff:0a.2
(XEN) PCI add device 0000:ff:0b.0
(XEN) PCI add device 0000:ff:0b.1
(XEN) PCI add device 0000:ff:0b.2
(XEN) PCI add device 0000:ff:0b.4
(XEN) PCI add device 0000:ff:0b.5
(XEN) PCI add device 0000:ff:0b.6
(XEN) PCI add device 0000:ff:0c.0
(XEN) PCI add device 0000:ff:0c.1
(XEN) PCI add device 0000:ff:0c.2
(XEN) PCI add device 0000:ff:0c.3
(XEN) PCI add device 0000:ff:0c.4
(XEN) PCI add device 0000:ff:0c.5
(XEN) PCI add device 0000:ff:0c.6
(XEN) PCI add device 0000:ff:0c.7
(XEN) PCI add device 0000:ff:0d.0
(XEN) PCI add device 0000:ff:0d.1
(XEN) PCI add device 0000:ff:0d.2
(XEN) PCI add device 0000:ff:0d.3
(XEN) PCI add device 0000:ff:0d.4
(XEN) PCI add device 0000:ff:0d.5
(XEN) PCI add device 0000:ff:0d.6
(XEN) PCI add device 0000:ff:0d.7
(XEN) PCI add device 0000:ff:0e.0
(XEN) PCI add device 0000:ff:0e.1
(XEN) PCI add device 0000:ff:0f.0
(XEN) PCI add device 0000:ff:0f.1
(XEN) PCI add device 0000:ff:0f.2
(XEN) PCI add device 0000:ff:0f.3
(XEN) PCI add device 0000:ff:0f.4
(XEN) PCI add device 0000:ff:0f.5
(XEN) PCI add device 0000:ff:0f.6
(XEN) PCI add device 0000:ff:10.0
(XEN) PCI add device 0000:ff:10.1
(XEN) PCI add device 0000:ff:10.5
(XEN) PCI add device 0000:ff:10.7
(XEN) PCI add device 0000:ff:12.0
(XEN) PCI add device 0000:ff:12.1
(XEN) PCI add device 0000:ff:12.4
(XEN) PCI add device 0000:ff:12.5
(XEN) PCI add device 0000:ff:13.0
(XEN) PCI add device 0000:ff:13.1
(XEN) PCI add device 0000:ff:13.2
(XEN) PCI add device 0000:ff:13.3
(XEN) PCI add device 0000:ff:13.4
(XEN) PCI add device 0000:ff:13.5
(XEN) PCI add device 0000:ff:13.6
(XEN) PCI add device 0000:ff:13.7
(XEN) PCI add device 0000:ff:14.0
(XEN) PCI add device 0000:ff:14.1
(XEN) PCI add device 0000:ff:14.2
(XEN) PCI add device 0000:ff:14.3
(XEN) PCI add device 0000:ff:14.4
(XEN) PCI add device 0000:ff:14.5
(XEN) PCI add device 0000:ff:14.6
(XEN) PCI add device 0000:ff:14.7
(XEN) PCI add device 0000:ff:15.0
(XEN) PCI add device 0000:ff:15.1
(XEN) PCI add device 0000:ff:15.2
(XEN) PCI add device 0000:ff:15.3
(XEN) PCI add device 0000:ff:16.0
(XEN) PCI add device 0000:ff:16.1
(XEN) PCI add device 0000:ff:16.2
(XEN) PCI add device 0000:ff:16.3
(XEN) PCI add device 0000:ff:16.4
(XEN) PCI add device 0000:ff:16.5
(XEN) PCI add device 0000:ff:16.6
(XEN) PCI add device 0000:ff:16.7
(XEN) PCI add device 0000:ff:17.0
(XEN) PCI add device 0000:ff:17.1
(XEN) PCI add device 0000:ff:17.2
(XEN) PCI add device 0000:ff:17.3
(XEN) PCI add device 0000:ff:17.4
(XEN) PCI add device 0000:ff:17.5
(XEN) PCI add device 0000:ff:17.6
(XEN) PCI add device 0000:ff:17.7
(XEN) PCI add device 0000:ff:18.0
(XEN) PCI add device 0000:ff:18.1
(XEN) PCI add device 0000:ff:18.2
(XEN) PCI add device 0000:ff:18.3
(XEN) PCI add device 0000:ff:1e.0
(XEN) PCI add device 0000:ff:1e.1
(XEN) PCI add device 0000:ff:1e.2
(XEN) PCI add device 0000:ff:1e.3
(XEN) PCI add device 0000:ff:1e.4
(XEN) PCI add device 0000:ff:1f.0
(XEN) PCI add device 0000:ff:1f.2
(XEN) PCI add device 0000:bf:08.0
(XEN) PCI add device 0000:bf:08.2
(XEN) PCI add device 0000:bf:09.0
(XEN) PCI add device 0000:bf:09.2
(XEN) PCI add device 0000:bf:0a.0
(XEN) PCI add device 0000:bf:0a.2
(XEN) PCI add device 0000:bf:0b.0
(XEN) PCI add device 0000:bf:0b.1
(XEN) PCI add device 0000:bf:0b.2
(XEN) PCI add device 0000:bf:0b.4
(XEN) PCI add device 0000:bf:0b.5
(XEN) PCI add device 0000:bf:0b.6
(XEN) PCI add device 0000:bf:0c.0
(XEN) PCI add device 0000:bf:0c.1
(XEN) PCI add device 0000:bf:0c.2
(XEN) PCI add device 0000:bf:0c.3
(XEN) PCI add device 0000:bf:0c.4
(XEN) PCI add device 0000:bf:0c.5
(XEN) PCI add device 0000:bf:0c.6
(XEN) PCI add device 0000:bf:0c.7
(XEN) PCI add device 0000:bf:0d.0
(XEN) PCI add device 0000:bf:0d.1
(XEN) PCI add device 0000:bf:0d.2
(XEN) PCI add device 0000:bf:0d.3
(XEN) PCI add device 0000:bf:0d.4
(XEN) PCI add device 0000:bf:0d.5
(XEN) PCI add device 0000:bf:0d.6
(XEN) PCI add device 0000:bf:0d.7
(XEN) PCI add device 0000:bf:0e.0
(XEN) PCI add device 0000:bf:0e.1
(XEN) PCI add device 0000:bf:0f.0
(XEN) PCI add device 0000:bf:0f.1
(XEN) PCI add device 0000:bf:0f.2
(XEN) PCI add device 0000:bf:0f.3
(XEN) PCI add device 0000:bf:0f.4
(XEN) PCI add device 0000:bf:0f.5
(XEN) PCI add device 0000:bf:0f.6
(XEN) PCI add device 0000:bf:10.0
(XEN) PCI add device 0000:bf:10.1
(XEN) PCI add device 0000:bf:10.5
(XEN) PCI add device 0000:bf:10.7
(XEN) PCI add device 0000:bf:12.0
(XEN) PCI add device 0000:bf:12.1
(XEN) PCI add device 0000:bf:12.4
(XEN) PCI add device 0000:bf:12.5
(XEN) PCI add device 0000:bf:13.0
(XEN) PCI add device 0000:bf:13.1
(XEN) PCI add device 0000:bf:13.2
(XEN) PCI add device 0000:bf:13.3
(XEN) PCI add device 0000:bf:13.4
(XEN) PCI add device 0000:bf:13.5
(XEN) PCI add device 0000:bf:13.6
(XEN) PCI add device 0000:bf:13.7
(XEN) PCI add device 0000:bf:14.0
(XEN) PCI add device 0000:bf:14.1
(XEN) PCI add device 0000:bf:14.2
(XEN) PCI add device 0000:bf:14.3
(XEN) PCI add device 0000:bf:14.4
(XEN) PCI add device 0000:bf:14.5
(XEN) PCI add device 0000:bf:14.6
(XEN) PCI add device 0000:bf:14.7
(XEN) PCI add device 0000:bf:15.0
(XEN) PCI add device 0000:bf:15.1
(XEN) PCI add device 0000:bf:15.2
(XEN) PCI add device 0000:bf:15.3
(XEN) PCI add device 0000:bf:16.0
(XEN) PCI add device 0000:bf:16.1
(XEN) PCI add device 0000:bf:16.2
(XEN) PCI add device 0000:bf:16.3
(XEN) PCI add device 0000:bf:16.4
(XEN) PCI add device 0000:bf:16.5
(XEN) PCI add device 0000:bf:16.6
(XEN) PCI add device 0000:bf:16.7
(XEN) PCI add device 0000:bf:17.0
(XEN) PCI add device 0000:bf:17.1
(XEN) PCI add device 0000:bf:17.2
(XEN) PCI add device 0000:bf:17.3
(XEN) PCI add device 0000:bf:17.4
(XEN) PCI add device 0000:bf:17.5
(XEN) PCI add device 0000:bf:17.6
(XEN) PCI add device 0000:bf:17.7
(XEN) PCI add device 0000:bf:18.0
(XEN) PCI add device 0000:bf:18.1
(XEN) PCI add device 0000:bf:18.2
(XEN) PCI add device 0000:bf:18.3
(XEN) PCI add device 0000:bf:1e.0
(XEN) PCI add device 0000:bf:1e.1
(XEN) PCI add device 0000:bf:1e.2
(XEN) PCI add device 0000:bf:1e.3
(XEN) PCI add device 0000:bf:1e.4
(XEN) PCI add device 0000:bf:1f.0
(XEN) PCI add device 0000:bf:1f.2
(XEN) PCI add device 0000:7f:08.0
(XEN) PCI add device 0000:7f:08.2
(XEN) PCI add device 0000:7f:09.0
(XEN) PCI add device 0000:7f:09.2
(XEN) PCI add device 0000:7f:0a.0
(XEN) PCI add device 0000:7f:0a.2
(XEN) PCI add device 0000:7f:0b.0
(XEN) PCI add device 0000:7f:0b.1
(XEN) PCI add device 0000:7f:0b.2
(XEN) PCI add device 0000:7f:0b.4
(XEN) PCI add device 0000:7f:0b.5
(XEN) PCI add device 0000:7f:0b.6
(XEN) PCI add device 0000:7f:0c.0
(XEN) PCI add device 0000:7f:0c.1
(XEN) PCI add device 0000:7f:0c.2
(XEN) PCI add device 0000:7f:0c.3
(XEN) PCI add device 0000:7f:0c.4
(XEN) PCI add device 0000:7f:0c.5
(XEN) PCI add device 0000:7f:0c.6
(XEN) PCI add device 0000:7f:0c.7
(XEN) PCI add device 0000:7f:0d.0
(XEN) PCI add device 0000:7f:0d.1
(XEN) PCI add device 0000:7f:0d.2
(XEN) PCI add device 0000:7f:0d.3
(XEN) PCI add device 0000:7f:0d.4
(XEN) PCI add device 0000:7f:0d.5
(XEN) PCI add device 0000:7f:0d.6
(XEN) PCI add device 0000:7f:0d.7
(XEN) PCI add device 0000:7f:0e.0
(XEN) PCI add device 0000:7f:0e.1
(XEN) PCI add device 0000:7f:0f.0
(XEN) PCI add device 0000:7f:0f.1
(XEN) PCI add device 0000:7f:0f.2
(XEN) PCI add device 0000:7f:0f.3
(XEN) PCI add device 0000:7f:0f.4
(XEN) PCI add device 0000:7f:0f.5
(XEN) PCI add device 0000:7f:0f.6
(XEN) PCI add device 0000:7f:10.0
(XEN) PCI add device 0000:7f:10.1
(XEN) PCI add device 0000:7f:10.5
(XEN) PCI add device 0000:7f:10.7
(XEN) PCI add device 0000:7f:12.0
(XEN) PCI add device 0000:7f:12.1
(XEN) PCI add device 0000:7f:12.4
(XEN) PCI add device 0000:7f:12.5
(XEN) PCI add device 0000:7f:13.0
(XEN) PCI add device 0000:7f:13.1
(XEN) PCI add device 0000:7f:13.2
(XEN) PCI add device 0000:7f:13.3
(XEN) PCI add device 0000:7f:13.4
(XEN) PCI add device 0000:7f:13.5
(XEN) PCI add device 0000:7f:13.6
(XEN) PCI add device 0000:7f:13.7
(XEN) PCI add device 0000:7f:14.0
(XEN) PCI add device 0000:7f:14.1
(XEN) PCI add device 0000:7f:14.2
(XEN) PCI add device 0000:7f:14.3
(XEN) PCI add device 0000:7f:14.4
(XEN) PCI add device 0000:7f:14.5
(XEN) PCI add device 0000:7f:14.6
(XEN) PCI add device 0000:7f:14.7
(XEN) PCI add device 0000:7f:15.0
(XEN) PCI add device 0000:7f:15.1
(XEN) PCI add device 0000:7f:15.2
(XEN) PCI add device 0000:7f:15.3
(XEN) PCI add device 0000:7f:16.0
(XEN) PCI add device 0000:7f:16.1
(XEN) PCI add device 0000:7f:16.2
(XEN) PCI add device 0000:7f:16.3
(XEN) PCI add device 0000:7f:16.4
(XEN) PCI add device 0000:7f:16.5
(XEN) PCI add device 0000:7f:16.6
(XEN) PCI add device 0000:7f:16.7
(XEN) PCI add device 0000:7f:17.0
(XEN) PCI add device 0000:7f:17.1
(XEN) PCI add device 0000:7f:17.2
(XEN) PCI add device 0000:7f:17.3
(XEN) PCI add device 0000:7f:17.4
(XEN) PCI add device 0000:7f:17.5
(XEN) PCI add device 0000:7f:17.6
(XEN) PCI add device 0000:7f:17.7
(XEN) PCI add device 0000:7f:18.0
(XEN) PCI add device 0000:7f:18.1
(XEN) PCI add device 0000:7f:18.2
(XEN) PCI add device 0000:7f:18.3
(XEN) PCI add device 0000:7f:1e.0
(XEN) PCI add device 0000:7f:1e.1
(XEN) PCI add device 0000:7f:1e.2
(XEN) PCI add device 0000:7f:1e.3
(XEN) PCI add device 0000:7f:1e.4
(XEN) PCI add device 0000:7f:1f.0
(XEN) PCI add device 0000:7f:1f.2
(XEN) PCI add device 0000:3f:08.0
(XEN) PCI add device 0000:3f:08.2
(XEN) PCI add device 0000:3f:09.0
(XEN) PCI add device 0000:3f:09.2
(XEN) PCI add device 0000:3f:0a.0
(XEN) PCI add device 0000:3f:0a.2
(XEN) PCI add device 0000:3f:0b.0
(XEN) PCI add device 0000:3f:0b.1
(XEN) PCI add device 0000:3f:0b.2
(XEN) PCI add device 0000:3f:0b.4
(XEN) PCI add device 0000:3f:0b.5
(XEN) PCI add device 0000:3f:0b.6
(XEN) PCI add device 0000:3f:0c.0
(XEN) PCI add device 0000:3f:0c.1
(XEN) PCI add device 0000:3f:0c.2
(XEN) PCI add device 0000:3f:0c.3
(XEN) PCI add device 0000:3f:0c.4
(XEN) PCI add device 0000:3f:0c.5
(XEN) PCI add device 0000:3f:0c.6
(XEN) PCI add device 0000:3f:0c.7
(XEN) PCI add device 0000:3f:0d.0
(XEN) PCI add device 0000:3f:0d.1
(XEN) PCI add device 0000:3f:0d.2
(XEN) PCI add device 0000:3f:0d.3
(XEN) PCI add device 0000:3f:0d.4
(XEN) PCI add device 0000:3f:0d.5
(XEN) PCI add device 0000:3f:0d.6
(XEN) PCI add device 0000:3f:0d.7
(XEN) PCI add device 0000:3f:0e.0
(XEN) PCI add device 0000:3f:0e.1
(XEN) PCI add device 0000:3f:0f.0
(XEN) PCI add device 0000:3f:0f.1
(XEN) PCI add device 0000:3f:0f.2
(XEN) PCI add device 0000:3f:0f.3
(XEN) PCI add device 0000:3f:0f.4
(XEN) PCI add device 0000:3f:0f.5
(XEN) PCI add device 0000:3f:0f.6
(XEN) PCI add device 0000:3f:10.0
(XEN) PCI add device 0000:3f:10.1
(XEN) PCI add device 0000:3f:10.5
(XEN) PCI add device 0000:3f:10.7
(XEN) PCI add device 0000:3f:12.0
(XEN) PCI add device 0000:3f:12.1
(XEN) PCI add device 0000:3f:12.4
(XEN) PCI add device 0000:3f:12.5
(XEN) PCI add device 0000:3f:13.0
(XEN) PCI add device 0000:3f:13.1
(XEN) PCI add device 0000:3f:13.2
(XEN) PCI add device 0000:3f:13.3
(XEN) PCI add device 0000:3f:13.4
(XEN) PCI add device 0000:3f:13.5
(XEN) PCI add device 0000:3f:13.6
(XEN) PCI add device 0000:3f:13.7
(XEN) PCI add device 0000:3f:14.0
(XEN) PCI add device 0000:3f:14.1
(XEN) PCI add device 0000:3f:14.2
(XEN) PCI add device 0000:3f:14.3
(XEN) PCI add device 0000:3f:14.4
(XEN) PCI add device 0000:3f:14.5
(XEN) PCI add device 0000:3f:14.6
(XEN) PCI add device 0000:3f:14.7
(XEN) PCI add device 0000:3f:15.0
(XEN) PCI add device 0000:3f:15.1
(XEN) PCI add device 0000:3f:15.2
(XEN) PCI add device 0000:3f:15.3
(XEN) PCI add device 0000:3f:16.0
(XEN) PCI add device 0000:3f:16.1
(XEN) PCI add device 0000:3f:16.2
(XEN) PCI add device 0000:3f:16.3
(XEN) PCI add device 0000:3f:16.4
(XEN) PCI add device 0000:3f:16.5
(XEN) PCI add device 0000:3f:16.6
(XEN) PCI add device 0000:3f:16.7
(XEN) PCI add device 0000:3f:17.0
(XEN) PCI add device 0000:3f:17.1
(XEN) PCI add device 0000:3f:17.2
(XEN) PCI add device 0000:3f:17.3
(XEN) PCI add device 0000:3f:17.4
(XEN) PCI add device 0000:3f:17.5
(XEN) PCI add device 0000:3f:17.6
(XEN) PCI add device 0000:3f:17.7
(XEN) PCI add device 0000:3f:18.0
(XEN) PCI add device 0000:3f:18.1
(XEN) PCI add device 0000:3f:18.2
(XEN) PCI add device 0000:3f:18.3
(XEN) PCI add device 0000:3f:1e.0
(XEN) PCI add device 0000:3f:1e.1
(XEN) PCI add device 0000:3f:1e.2
(XEN) PCI add device 0000:3f:1e.3
(XEN) PCI add device 0000:3f:1e.4
(XEN) PCI add device 0000:3f:1f.0
(XEN) PCI add device 0000:3f:1f.2
(XEN) PCI add device 0000:00:00.0
(XEN) PCI add device 0000:00:02.0
(XEN) PCI add device 0000:00:03.0
(XEN) PCI add device 0000:00:03.2
(XEN) PCI add device 0000:00:03.3
(XEN) PCI add device 0000:00:05.0
(XEN) PCI add device 0000:00:05.1
(XEN) PCI add device 0000:00:05.2
(XEN) PCI add device 0000:00:05.4
(XEN) PCI add device 0000:00:11.0
(XEN) PCI add device 0000:00:16.0
(XEN) PCI add device 0000:00:16.1
(XEN) PCI add device 0000:00:1a.0
(XEN) PCI add device 0000:00:1c.0
(XEN) PCI add device 0000:00:1c.7
(XEN) PCI add device 0000:00:1d.0
(XEN) PCI add device 0000:00:1e.0
(XEN) PCI add device 0000:00:1f.0
(XEN) PCI add device 0000:00:1f.2
(XEN) PCI add device 0000:00:1f.3
(XEN) PCI add device 0000:01:00.0
(XEN) PCI add device 0000:03:00.0
(XEN) PCI add device 0000:03:00.1
(XEN) PCI add device 0000:08:00.0
(XEN) PCI add device 0000:40:02.0
(XEN) PCI add device 0000:40:02.2
(XEN) PCI add device 0000:40:03.0
(XEN) PCI add device 0000:40:05.0
(XEN) PCI add device 0000:40:05.1
(XEN) PCI add device 0000:40:05.2
(XEN) PCI add device 0000:40:05.4
(XEN) PCI add device 0000:80:02.0
(XEN) PCI add device 0000:80:02.2
(XEN) PCI add device 0000:80:03.0
(XEN) PCI add device 0000:80:05.0
(XEN) PCI add device 0000:80:05.1
(XEN) PCI add device 0000:80:05.2
(XEN) PCI add device 0000:80:05.4
(XEN) PCI add device 0000:c0:02.0
(XEN) PCI add device 0000:c0:02.2
(XEN) PCI add device 0000:c0:03.0
(XEN) PCI add device 0000:c0:05.0
(XEN) PCI add device 0000:c0:05.1
(XEN) PCI add device 0000:c0:05.2
(XEN) PCI add device 0000:c0:05.4
(XEN) PCI add device 0000:c2:00.0
(XEN) PCI add device 0000:c2:00.1
(XEN) emul-priv-op.c:1179:d0v0 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v4 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v5 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v6 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v7 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v8 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v9 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v10 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v11 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v12 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v13 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v14 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v15 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v16 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v17 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v18 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v19 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v20 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v21 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v22 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v23 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v24 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v25 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v26 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v27 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v28 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v29 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) d0: Forcing read-only access to MFN fed00
(XEN) traps.c:1569: GPF (0000): ffff82d0803684d5 [emul-priv-op.c#read_msr+0x462/0x4a5] -> ffff82d080375bb0
(XEN) emul-priv-op.c:1179:d0v0 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v4 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v5 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v6 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v7 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v8 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v9 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v10 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v11 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v12 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v13 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v14 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v15 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v16 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v17 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v18 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v19 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v20 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v21 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v22 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v23 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v24 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v25 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v26 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v27 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v28 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v29 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) Monitor-Mwait will be used to enter C1 state
(XEN) Monitor-Mwait will be used to enter C2 state
(XEN) No CPU ID for APIC ID 0x24

[-- Attachment #1.1.4: xl-debugkeys-u.txt --]
[-- Type: text/plain, Size: 47620 bytes --]

09] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x0a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x0b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x0c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x0d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x0e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x0f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x10] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x11] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x12] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x13] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x14] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x15] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x16] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x17] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x18] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x19] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x1a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x1b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x1c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x1d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x1e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x1f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x20] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x21] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x22] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x23] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x24] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x25] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x26] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x27] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x28] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x29] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x2a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x2b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x2c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x2d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x2e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x2f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x30] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x31] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x32] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x33] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x34] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x35] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x36] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x37] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x38] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x39] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x3a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x3b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x3c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x3d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x3e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x3f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x40] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x41] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x42] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x43] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x44] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x45] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x46] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x47] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x48] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x49] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x4a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x4b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x4c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x4d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x4e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x4f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x50] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x51] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x52] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x53] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x54] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x55] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x56] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x57] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x58] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x59] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x5a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x5b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x5c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x5d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x5e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x5f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x60] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x61] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x62] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x63] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x64] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x65] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x66] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x67] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x68] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x69] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x6a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x6b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x6c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x6d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x6e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x6f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x70] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x71] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x72] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x73] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x74] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x75] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x76] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x77] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x78] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x79] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x7a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x7b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x7c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x7c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x7d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x7e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x7f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x80] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x81] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x82] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x83] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x84] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x85] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x86] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x87] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x88] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x89] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x8a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x8b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x8c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x8d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x8f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x90] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x91] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x92] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x93] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x94] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x95] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x96] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x97] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x98] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x99] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x9a] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x9b] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x9c] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x9d] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x9e] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x9f] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa0] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa1] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa2] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa3] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa4] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa5] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa6] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa7] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa8] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xa9] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xaa] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xab] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xac] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xad] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xae] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xaf] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb0] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb1] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb2] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb3] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb4] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb5] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb6] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb7] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb8] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xb9] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xba] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xbb] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xbc] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xbd] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xbe] high level lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0xbf] high level lint[0x1])
(XEN) Overriding APIC driver with bigsmp
(XEN) ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: IOAPIC (id[0x09] address[0xfec01000] gsi_base[24])
(XEN) IOAPIC[1]: apic_id 9, version 32, address 0xfec01000, GSI 24-47
(XEN) ACPI: IOAPIC (id[0x0a] address[0xfec40000] gsi_base[48])
(XEN) IOAPIC[2]: apic_id 10, version 32, address 0xfec40000, GSI 48-71
(XEN) ACPI: IOAPIC (id[0x0b] address[0xfec80000] gsi_base[72])
(XEN) IOAPIC[3]: apic_id 11, version 32, address 0xfec80000, GSI 72-95
(XEN) ACPI: IOAPIC (id[0x0c] address[0xfecc0000] gsi_base[96])
(XEN) IOAPIC[4]: apic_id 12, version 32, address 0xfecc0000, GSI 96-119
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:  Phys.  Using 5 I/O APICs
(XEN) ACPI: HPET id: 0x8086a301 base: 0xfed00000
(XEN) ERST table was not found
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) SMP: Allowing 192 CPUs (48 hotplug CPUs)
(XEN) IRQ limits: 120 GSI, 27544 MSI/MSI-X
(XEN) Not enabling x2APIC: depends on iommu_supports_eim.
(XEN) xstate: size: 0x340 and states: 0x7
(XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, SER, CMCI
(XEN) CMCI: threshold 0x2 too large for CPU0 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU0 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU0 bank 19, using 0x1
(XEN) CPU0: Intel machine check reporting enabled
(XEN) Speculative mitigation facilities:
(XEN)   Hardware features:
(XEN)   Compiled-in support: INDIRECT_THUNK
(XEN) BTI mitigations: Thunk RETPOLINE, Others: RSB_NATIVE RSB_VMEXIT
(XEN) XPTI: enabled
(XEN) Using scheduler: SMP Credit Scheduler rev2 (credit2)
(XEN) Initializing Credit2 scheduler
(XEN)  load_precision_shift: 18
(XEN)  load_window_shift: 30
(XEN)  underload_balance_tolerance: 0
(XEN)  overload_balance_tolerance: -3
(XEN)  runqueues arrangement: socket
(XEN)  cap enforcement granularity: 10ms
(XEN) load tracking window length 1073741824 ns
(XEN) Adding cpu 0 to runqueue 0
(XEN)  First cpu on runqueue, activating
(XEN) Platform timer is 14.318MHz HPET
(XEN) Detected 2493.993 MHz processor.
(XEN) EFI memory map:
(XEN)  0000000000000-000000008dfff type=7 attr=000000000000000f
(XEN)  000000008e000-000000008ffff type=0 attr=000000000000000f
(XEN)  0000000090000-000000009dfff type=7 attr=000000000000000f
(XEN)  000000009e000-000000009ffff type=2 attr=000000000000000f
(XEN)  0000000100000-00000003fffff type=7 attr=000000000000000f
(XEN)  0000000400000-0000000505fff type=3 attr=000000000000000f
(XEN)  0000000506000-000002e54efff type=7 attr=000000000000000f
(XEN)  000002e54f000-0000048289fff type=2 attr=000000000000000f
(XEN)  000004828a000-0000048309fff type=4 attr=000000000000000f
(XEN)  000004830a000-00000484d1fff type=7 attr=000000000000000f
(XEN)  00000484d2000-0000048a7afff type=2 attr=000000000000000f
(XEN)  0000048a7b000-0000049c7afff type=1 attr=000000000000000f
(XEN)  0000049c7b000-000005ba84fff type=4 attr=000000000000000f
(XEN)  000005ba85000-000005ba85fff type=3 attr=000000000000000f
(XEN)  000005ba86000-000005bc9dfff type=7 attr=000000000000000f
(XEN)  000005bc9e000-000005be88fff type=2 attr=000000000000000f
(XEN)  000005be89000-000005beb8fff type=7 attr=000000000000000f
(XEN)  000005beb9000-000005c288fff type=1 attr=000000000000000f
(XEN)  000005c289000-000005cdbffff type=7 attr=000000000000000f
(XEN)  000005cdc0000-000005d288fff type=3 attr=000000000000000f
(XEN)  000005d289000-000005d688fff type=6 attr=800000000000000f
(XEN)  000005d689000-000005de88fff type=5 attr=800000000000000f
(XEN)  000005de89000-000005e785fff type=0 attr=000000000000000f
(XEN)  000005e786000-0000060952fff type=10 attr=000000000000000f
(XEN)  0000060953000-0000060adbfff type=9 attr=000000000000000f
(XEN)  0000060adc000-0000079c4efff type=7 attr=000000000000000f
(XEN)  0000079c4f000-000007a22dfff type=4 attr=000000000000000f
(XEN)  000007a22e000-000007a22ffff type=7 attr=000000000000000f
(XEN)  000007a230000-000007a28afff type=4 attr=000000000000000f
(XEN)  000007a28b000-000007a2d3fff type=7 attr=000000000000000f
(XEN)  000007a2d4000-000007bafffff type=4 attr=000000000000000f
(XEN)  0000100000000-0001c7fffffff type=7 attr=000000000000000f
(XEN)  00000000a0000-00000000bffff type=0 attr=0000000000000001
(XEN)  00000000c0000-00000000dffff type=0 attr=0000000000000000
(XEN)  00000000e0000-00000000fffff type=0 attr=0000000000000001
(XEN)  000007bb00000-000007bffffff type=0 attr=0000000000000008
(XEN)  000007c000000-000007fbfffff type=0 attr=0000000000000001
(XEN)  000007fc00000-000007fffffff type=0 attr=0000000000000008
(XEN)  0000080000000-000008fffffff type=11 attr=8000000000000001
(XEN)  00000fed1c000-00000fed1ffff type=11 attr=8000000000000001
(XEN) Initing memory sharing.
(XEN) alt table ffff82d0806717f0 -> ffff82d080673658
(XEN) PCI: MCFG configuration 0: base 80000000 segment 0000 buses 00 - ff
(XEN) PCI: MCFG area at 80000000 reserved in E820
(XEN) PCI: Using MCFG for segment 0000 bus 00-ff
(XEN) I/O virtualisation disabled
(XEN) nr_sockets: 5
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using old ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) TSC deadline timer enabled
(XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.
(XEN) Defaulting to alternative key handling; send 'A' to switch to normal mode.
(XEN) Allocated console ring of 2048 KiB.
(XEN) mwait-idle: disabled
(XEN) VMX: Supported advanced features:
(XEN)  - APIC MMIO access virtualisation
(XEN)  - APIC TPR shadow
(XEN)  - Extended Page Tables (EPT)
(XEN)  - Virtual-Processor Identifiers (VPID)
(XEN)  - Virtual NMI
(XEN)  - MSR direct-access bitmap
(XEN)  - Unrestricted Guest
(XEN)  - APIC Register Virtualization
(XEN)  - Virtual Interrupt Delivery
(XEN)  - Posted Interrupt Processing
(XEN)  - VMCS shadowing
(XEN)  - VM Functions
(XEN) HVM: ASIDs enabled.
(XEN) HVM: VMX enabled
(XEN) HVM: Hardware Assisted Paging (HAP) detected
(XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB
(XEN) Adding cpu 1 to runqueue 0
(XEN) Adding cpu 2 to runqueue 0
(XEN) Adding cpu 3 to runqueue 0
(XEN) Adding cpu 4 to runqueue 0
(XEN) Adding cpu 5 to runqueue 0
(XEN) Adding cpu 6 to runqueue 0
(XEN) Adding cpu 7 to runqueue 0
(XEN) Adding cpu 8 to runqueue 0
(XEN) Adding cpu 9 to runqueue 0
(XEN) Adding cpu 10 to runqueue 0
(XEN) Adding cpu 11 to runqueue 0
(XEN) Adding cpu 12 to runqueue 0
(XEN) Adding cpu 13 to runqueue 0
(XEN) Adding cpu 14 to runqueue 0
(XEN) Adding cpu 15 to runqueue 0
(XEN) Adding cpu 16 to runqueue 0
(XEN) Adding cpu 17 to runqueue 0
(XEN) Adding cpu 18 to runqueue 0
(XEN) Adding cpu 19 to runqueue 0
(XEN) Adding cpu 20 to runqueue 0
(XEN) Adding cpu 21 to runqueue 0
(XEN) Adding cpu 22 to runqueue 0
(XEN) Adding cpu 23 to runqueue 0
(XEN) Adding cpu 24 to runqueue 0
(XEN) Adding cpu 25 to runqueue 0
(XEN) Adding cpu 26 to runqueue 0
(XEN) Adding cpu 27 to runqueue 0
(XEN) Adding cpu 28 to runqueue 0
(XEN) Adding cpu 29 to runqueue 0
(XEN) Adding cpu 30 to runqueue 0
(XEN) Adding cpu 31 to runqueue 0
(XEN) Adding cpu 32 to runqueue 0
(XEN) Adding cpu 33 to runqueue 0
(XEN) Adding cpu 34 to runqueue 0
(XEN) Adding cpu 35 to runqueue 0
(XEN) CMCI: threshold 0x2 too large for CPU36 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU36 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU36 bank 19, using 0x1
(XEN) Adding cpu 36 to runqueue 1
(XEN)  First cpu on runqueue, activating
(XEN) Adding cpu 37 to runqueue 1
(XEN) Adding cpu 38 to runqueue 1
(XEN) Adding cpu 39 to runqueue 1
(XEN) Adding cpu 40 to runqueue 1
(XEN) Adding cpu 41 to runqueue 1
(XEN) Adding cpu 42 to runqueue 1
(XEN) Adding cpu 43 to runqueue 1
(XEN) Adding cpu 44 to runqueue 1
(XEN) Adding cpu 45 to runqueue 1
(XEN) Adding cpu 46 to runqueue 1
(XEN) Adding cpu 47 to runqueue 1
(XEN) Adding cpu 48 to runqueue 1
(XEN) Adding cpu 49 to runqueue 1
(XEN) Adding cpu 50 to runqueue 1
(XEN) Adding cpu 51 to runqueue 1
(XEN) Adding cpu 52 to runqueue 1
(XEN) Adding cpu 53 to runqueue 1
(XEN) Adding cpu 54 to runqueue 1
(XEN) Adding cpu 55 to runqueue 1
(XEN) Adding cpu 56 to runqueue 1
(XEN) Adding cpu 57 to runqueue 1
(XEN) Adding cpu 58 to runqueue 1
(XEN) Adding cpu 59 to runqueue 1
(XEN) Adding cpu 60 to runqueue 1
(XEN) Adding cpu 61 to runqueue 1
(XEN) Adding cpu 62 to runqueue 1
(XEN) Adding cpu 63 to runqueue 1
(XEN) Adding cpu 64 to runqueue 1
(XEN) Adding cpu 65 to runqueue 1
(XEN) Adding cpu 66 to runqueue 1
(XEN) Adding cpu 67 to runqueue 1
(XEN) Adding cpu 68 to runqueue 1
(XEN) Adding cpu 69 to runqueue 1
(XEN) Adding cpu 70 to runqueue 1
(XEN) Adding cpu 71 to runqueue 1
(XEN) CMCI: threshold 0x2 too large for CPU72 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU72 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU72 bank 19, using 0x1
(XEN) Adding cpu 72 to runqueue 2
(XEN)  First cpu on runqueue, activating
(XEN) Adding cpu 73 to runqueue 2
(XEN) Adding cpu 74 to runqueue 2
(XEN) Adding cpu 75 to runqueue 2
(XEN) Adding cpu 76 to runqueue 2
(XEN) Adding cpu 77 to runqueue 2
(XEN) Adding cpu 78 to runqueue 2
(XEN) Adding cpu 79 to runqueue 2
(XEN) Adding cpu 80 to runqueue 2
(XEN) Adding cpu 81 to runqueue 2
(XEN) Adding cpu 82 to runqueue 2
(XEN) Adding cpu 83 to runqueue 2
(XEN) Adding cpu 84 to runqueue 2
(XEN) Adding cpu 85 to runqueue 2
(XEN) Adding cpu 86 to runqueue 2
(XEN) Adding cpu 87 to runqueue 2
(XEN) Adding cpu 88 to runqueue 2
(XEN) Adding cpu 89 to runqueue 2
(XEN) Adding cpu 90 to runqueue 2
(XEN) Adding cpu 91 to runqueue 2
(XEN) Adding cpu 92 to runqueue 2
(XEN) Adding cpu 93 to runqueue 2
(XEN) Adding cpu 94 to runqueue 2
(XEN) Adding cpu 95 to runqueue 2
(XEN) Adding cpu 96 to runqueue 2
(XEN) Adding cpu 97 to runqueue 2
(XEN) Adding cpu 98 to runqueue 2
(XEN) Adding cpu 99 to runqueue 2
(XEN) Adding cpu 100 to runqueue 2
(XEN) Adding cpu 101 to runqueue 2
(XEN) Adding cpu 102 to runqueue 2
(XEN) Adding cpu 103 to runqueue 2
(XEN) Adding cpu 104 to runqueue 2
(XEN) Adding cpu 105 to runqueue 2
(XEN) Adding cpu 106 to runqueue 2
(XEN) Adding cpu 107 to runqueue 2
(XEN) CMCI: threshold 0x2 too large for CPU108 bank 17, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU108 bank 18, using 0x1
(XEN) CMCI: threshold 0x2 too large for CPU108 bank 19, using 0x1
(XEN) Adding cpu 108 to runqueue 3
(XEN)  First cpu on runqueue, activating
(XEN) Adding cpu 109 to runqueue 3
(XEN) Adding cpu 110 to runqueue 3
(XEN) Adding cpu 111 to runqueue 3
(XEN) Adding cpu 112 to runqueue 3
(XEN) Adding cpu 113 to runqueue 3
(XEN) Adding cpu 114 to runqueue 3
(XEN) Adding cpu 115 to runqueue 3
(XEN) Adding cpu 116 to runqueue 3
(XEN) Adding cpu 117 to runqueue 3
(XEN) Adding cpu 118 to runqueue 3
(XEN) Adding cpu 119 to runqueue 3
(XEN) Adding cpu 120 to runqueue 3
(XEN) Adding cpu 121 to runqueue 3
(XEN) Adding cpu 122 to runqueue 3
(XEN) Adding cpu 123 to runqueue 3
(XEN) Adding cpu 124 to runqueue 3
(XEN) Adding cpu 125 to runqueue 3
(XEN) Adding cpu 126 to runqueue 3
(XEN) Adding cpu 127 to runqueue 3
(XEN) Adding cpu 128 to runqueue 3
(XEN) Adding cpu 129 to runqueue 3
(XEN) Adding cpu 130 to runqueue 3
(XEN) Adding cpu 131 to runqueue 3
(XEN) Adding cpu 132 to runqueue 3
(XEN) Adding cpu 133 to runqueue 3
(XEN) Adding cpu 134 to runqueue 3
(XEN) Adding cpu 135 to runqueue 3
(XEN) Adding cpu 136 to runqueue 3
(XEN) Adding cpu 137 to runqueue 3
(XEN) Adding cpu 138 to runqueue 3
(XEN) Adding cpu 139 to runqueue 3
(XEN) Adding cpu 140 to runqueue 3
(XEN) Adding cpu 141 to runqueue 3
(XEN) Adding cpu 142 to runqueue 3
(XEN) Adding cpu 143 to runqueue 3
(XEN) Brought up 144 CPUs
(XEN) build-id: 2137921bc738bc97e99c82c05988f698
(XEN) Running stub recovery selftests...
(XEN) traps.c:1569: GPF (0000): ffff82d0bffff041 [ffff82d0bffff041] -> ffff82d0803753f2
(XEN) traps.c:754: Trap 12: ffff82d0bffff040 [ffff82d0bffff040] -> ffff82d0803753f2
(XEN) traps.c:1096: Trap 3: ffff82d0bffff041 [ffff82d0bffff041] -> ffff82d0803753f2
(XEN) ACPI sleep modes: S3
(XEN) VPMU: disabled
(XEN) mcheck_poll: Machine check polling timer started.
(XEN) Dom0 has maximum 1656 PIRQs
(XEN) grant_table.c:1769:IDLEv0 Expanding d0 grant table from 0 to 1 frames
(XEN) NX (Execute Disable) protection active
(XEN) *** Building a PV Dom0 ***
(XEN) ELF: phdr: paddr=0x1000000 memsz=0xabf000
(XEN) ELF: phdr: paddr=0x1c00000 memsz=0x15b000
(XEN) ELF: phdr: paddr=0x1d5b000 memsz=0x17518
(XEN) ELF: phdr: paddr=0x1d73000 memsz=0x497000
(XEN) ELF: memory: 0x1000000 -> 0x220a000
(XEN) ELF: note: GUEST_OS = "linux"
(XEN) ELF: note: GUEST_VERSION = "2.6"
(XEN) ELF: note: XEN_VERSION = "xen-3.0"
(XEN) ELF: note: VIRT_BASE = 0xffffffff80000000
(XEN) ELF: note: INIT_P2M = 0x8000000000
(XEN) ELF: note: ENTRY = 0xffffffff81d731f0
(XEN) ELF: note: HYPERCALL_PAGE = 0xffffffff81001000
(XEN) ELF: note: FEATURES = "!writable_page_tables|pae_pgdir_above_4gb|writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel"
(XEN) ELF: note: SUPPORTED_FEATURES = 0x90d
(XEN) ELF: note: PAE_MODE = "yes"
(XEN) ELF: note: LOADER = "generic"
(XEN) ELF: note: unknown (0xd)
(XEN) ELF: note: SUSPEND_CANCEL = 0x1
(XEN) ELF: note: MOD_START_PFN = 0x1
(XEN) ELF: note: HV_START_LOW = 0xffff800000000000
(XEN) ELF: note: PADDR_OFFSET = 0
(XEN) ELF: addresses:
(XEN)     virt_base        = 0xffffffff80000000
(XEN)     elf_paddr_offset = 0x0
(XEN)     virt_offset      = 0xffffffff80000000
(XEN)     virt_kstart      = 0xffffffff81000000
(XEN)     virt_kend        = 0xffffffff8220a000
(XEN)     virt_entry       = 0xffffffff81d731f0
(XEN)     p2m_base         = 0x8000000000
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x220a000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   0000000774000000->0000000778000000 (8368588 pages to be allocated)
(XEN)  Init. ramdisk: 0000001c7f1cc000->0000001c7ffff788
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff81000000->ffffffff8220a000
(XEN)  Init. ramdisk: 0000000000000000->0000000000000000
(XEN)  Phys-Mach map: 0000008000000000->0000008004000000
(XEN)  Start info:    ffffffff8220a000->ffffffff8220a4b4
(XEN)  Xenstore ring: 0000000000000000->0000000000000000
(XEN)  Console ring:  0000000000000000->0000000000000000
(XEN)  Page tables:   ffffffff8220b000->ffffffff82220000
(XEN)  Boot stack:    ffffffff82220000->ffffffff82221000
(XEN)  TOTAL:         ffffffff80000000->ffffffff82400000
(XEN)  ENTRY ADDRESS: ffffffff81d731f0
(XEN) Dom0 has maximum 30 VCPUs
(XEN) ELF: phdr 0 at 0xffffffff81000000 -> 0xffffffff81abf000
(XEN) ELF: phdr 1 at 0xffffffff81c00000 -> 0xffffffff81d5b000
(XEN) ELF: phdr 2 at 0xffffffff81d5b000 -> 0xffffffff81d72518
(XEN) ELF: phdr 3 at 0xffffffff81d73000 -> 0xffffffff81f66000
(XEN) Initial low memory virq threshold set at 0x4000 pages.
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 2048kB init memory
(XEN) d0: Forcing write emulation on MFNs 80000-8ffff
(XEN) PCI add device 0000:ff:08.0
(XEN) PCI add device 0000:ff:08.2
(XEN) PCI add device 0000:ff:09.0
(XEN) PCI add device 0000:ff:09.2
(XEN) PCI add device 0000:ff:0a.0
(XEN) PCI add device 0000:ff:0a.2
(XEN) PCI add device 0000:ff:0b.0
(XEN) PCI add device 0000:ff:0b.1
(XEN) PCI add device 0000:ff:0b.2
(XEN) PCI add device 0000:ff:0b.4
(XEN) PCI add device 0000:ff:0b.5
(XEN) PCI add device 0000:ff:0b.6
(XEN) PCI add device 0000:ff:0c.0
(XEN) PCI add device 0000:ff:0c.1
(XEN) PCI add device 0000:ff:0c.2
(XEN) PCI add device 0000:ff:0c.3
(XEN) PCI add device 0000:ff:0c.4
(XEN) PCI add device 0000:ff:0c.5
(XEN) PCI add device 0000:ff:0c.6
(XEN) PCI add device 0000:ff:0c.7
(XEN) PCI add device 0000:ff:0d.0
(XEN) PCI add device 0000:ff:0d.1
(XEN) PCI add device 0000:ff:0d.2
(XEN) PCI add device 0000:ff:0d.3
(XEN) PCI add device 0000:ff:0d.4
(XEN) PCI add device 0000:ff:0d.5
(XEN) PCI add device 0000:ff:0d.6
(XEN) PCI add device 0000:ff:0d.7
(XEN) PCI add device 0000:ff:0e.0
(XEN) PCI add device 0000:ff:0e.1
(XEN) PCI add device 0000:ff:0f.0
(XEN) PCI add device 0000:ff:0f.1
(XEN) PCI add device 0000:ff:0f.2
(XEN) PCI add device 0000:ff:0f.3
(XEN) PCI add device 0000:ff:0f.4
(XEN) PCI add device 0000:ff:0f.5
(XEN) PCI add device 0000:ff:0f.6
(XEN) PCI add device 0000:ff:10.0
(XEN) PCI add device 0000:ff:10.1
(XEN) PCI add device 0000:ff:10.5
(XEN) PCI add device 0000:ff:10.7
(XEN) PCI add device 0000:ff:12.0
(XEN) PCI add device 0000:ff:12.1
(XEN) PCI add device 0000:ff:12.4
(XEN) PCI add device 0000:ff:12.5
(XEN) PCI add device 0000:ff:13.0
(XEN) PCI add device 0000:ff:13.1
(XEN) PCI add device 0000:ff:13.2
(XEN) PCI add device 0000:ff:13.3
(XEN) PCI add device 0000:ff:13.4
(XEN) PCI add device 0000:ff:13.5
(XEN) PCI add device 0000:ff:13.6
(XEN) PCI add device 0000:ff:13.7
(XEN) PCI add device 0000:ff:14.0
(XEN) PCI add device 0000:ff:14.1
(XEN) PCI add device 0000:ff:14.2
(XEN) PCI add device 0000:ff:14.3
(XEN) PCI add device 0000:ff:14.4
(XEN) PCI add device 0000:ff:14.5
(XEN) PCI add device 0000:ff:14.6
(XEN) PCI add device 0000:ff:14.7
(XEN) PCI add device 0000:ff:15.0
(XEN) PCI add device 0000:ff:15.1
(XEN) PCI add device 0000:ff:15.2
(XEN) PCI add device 0000:ff:15.3
(XEN) PCI add device 0000:ff:16.0
(XEN) PCI add device 0000:ff:16.1
(XEN) PCI add device 0000:ff:16.2
(XEN) PCI add device 0000:ff:16.3
(XEN) PCI add device 0000:ff:16.4
(XEN) PCI add device 0000:ff:16.5
(XEN) PCI add device 0000:ff:16.6
(XEN) PCI add device 0000:ff:16.7
(XEN) PCI add device 0000:ff:17.0
(XEN) PCI add device 0000:ff:17.1
(XEN) PCI add device 0000:ff:17.2
(XEN) PCI add device 0000:ff:17.3
(XEN) PCI add device 0000:ff:17.4
(XEN) PCI add device 0000:ff:17.5
(XEN) PCI add device 0000:ff:17.6
(XEN) PCI add device 0000:ff:17.7
(XEN) PCI add device 0000:ff:18.0
(XEN) PCI add device 0000:ff:18.1
(XEN) PCI add device 0000:ff:18.2
(XEN) PCI add device 0000:ff:18.3
(XEN) PCI add device 0000:ff:1e.0
(XEN) PCI add device 0000:ff:1e.1
(XEN) PCI add device 0000:ff:1e.2
(XEN) PCI add device 0000:ff:1e.3
(XEN) PCI add device 0000:ff:1e.4
(XEN) PCI add device 0000:ff:1f.0
(XEN) PCI add device 0000:ff:1f.2
(XEN) PCI add device 0000:bf:08.0
(XEN) PCI add device 0000:bf:08.2
(XEN) PCI add device 0000:bf:09.0
(XEN) PCI add device 0000:bf:09.2
(XEN) PCI add device 0000:bf:0a.0
(XEN) PCI add device 0000:bf:0a.2
(XEN) PCI add device 0000:bf:0b.0
(XEN) PCI add device 0000:bf:0b.1
(XEN) PCI add device 0000:bf:0b.2
(XEN) PCI add device 0000:bf:0b.4
(XEN) PCI add device 0000:bf:0b.5
(XEN) PCI add device 0000:bf:0b.6
(XEN) PCI add device 0000:bf:0c.0
(XEN) PCI add device 0000:bf:0c.1
(XEN) PCI add device 0000:bf:0c.2
(XEN) PCI add device 0000:bf:0c.3
(XEN) PCI add device 0000:bf:0c.4
(XEN) PCI add device 0000:bf:0c.5
(XEN) PCI add device 0000:bf:0c.6
(XEN) PCI add device 0000:bf:0c.7
(XEN) PCI add device 0000:bf:0d.0
(XEN) PCI add device 0000:bf:0d.1
(XEN) PCI add device 0000:bf:0d.2
(XEN) PCI add device 0000:bf:0d.3
(XEN) PCI add device 0000:bf:0d.4
(XEN) PCI add device 0000:bf:0d.5
(XEN) PCI add device 0000:bf:0d.6
(XEN) PCI add device 0000:bf:0d.7
(XEN) PCI add device 0000:bf:0e.0
(XEN) PCI add device 0000:bf:0e.1
(XEN) PCI add device 0000:bf:0f.0
(XEN) PCI add device 0000:bf:0f.1
(XEN) PCI add device 0000:bf:0f.2
(XEN) PCI add device 0000:bf:0f.3
(XEN) PCI add device 0000:bf:0f.4
(XEN) PCI add device 0000:bf:0f.5
(XEN) PCI add device 0000:bf:0f.6
(XEN) PCI add device 0000:bf:10.0
(XEN) PCI add device 0000:bf:10.1
(XEN) PCI add device 0000:bf:10.5
(XEN) PCI add device 0000:bf:10.7
(XEN) PCI add device 0000:bf:12.0
(XEN) PCI add device 0000:bf:12.1
(XEN) PCI add device 0000:bf:12.4
(XEN) PCI add device 0000:bf:12.5
(XEN) PCI add device 0000:bf:13.0
(XEN) PCI add device 0000:bf:13.1
(XEN) PCI add device 0000:bf:13.2
(XEN) PCI add device 0000:bf:13.3
(XEN) PCI add device 0000:bf:13.4
(XEN) PCI add device 0000:bf:13.5
(XEN) PCI add device 0000:bf:13.6
(XEN) PCI add device 0000:bf:13.7
(XEN) PCI add device 0000:bf:14.0
(XEN) PCI add device 0000:bf:14.1
(XEN) PCI add device 0000:bf:14.2
(XEN) PCI add device 0000:bf:14.3
(XEN) PCI add device 0000:bf:14.4
(XEN) PCI add device 0000:bf:14.5
(XEN) PCI add device 0000:bf:14.6
(XEN) PCI add device 0000:bf:14.7
(XEN) PCI add device 0000:bf:15.0
(XEN) PCI add device 0000:bf:15.1
(XEN) PCI add device 0000:bf:15.2
(XEN) PCI add device 0000:bf:15.3
(XEN) PCI add device 0000:bf:16.0
(XEN) PCI add device 0000:bf:16.1
(XEN) PCI add device 0000:bf:16.2
(XEN) PCI add device 0000:bf:16.3
(XEN) PCI add device 0000:bf:16.4
(XEN) PCI add device 0000:bf:16.5
(XEN) PCI add device 0000:bf:16.6
(XEN) PCI add device 0000:bf:16.7
(XEN) PCI add device 0000:bf:17.0
(XEN) PCI add device 0000:bf:17.1
(XEN) PCI add device 0000:bf:17.2
(XEN) PCI add device 0000:bf:17.3
(XEN) PCI add device 0000:bf:17.4
(XEN) PCI add device 0000:bf:17.5
(XEN) PCI add device 0000:bf:17.6
(XEN) PCI add device 0000:bf:17.7
(XEN) PCI add device 0000:bf:18.0
(XEN) PCI add device 0000:bf:18.1
(XEN) PCI add device 0000:bf:18.2
(XEN) PCI add device 0000:bf:18.3
(XEN) PCI add device 0000:bf:1e.0
(XEN) PCI add device 0000:bf:1e.1
(XEN) PCI add device 0000:bf:1e.2
(XEN) PCI add device 0000:bf:1e.3
(XEN) PCI add device 0000:bf:1e.4
(XEN) PCI add device 0000:bf:1f.0
(XEN) PCI add device 0000:bf:1f.2
(XEN) PCI add device 0000:7f:08.0
(XEN) PCI add device 0000:7f:08.2
(XEN) PCI add device 0000:7f:09.0
(XEN) PCI add device 0000:7f:09.2
(XEN) PCI add device 0000:7f:0a.0
(XEN) PCI add device 0000:7f:0a.2
(XEN) PCI add device 0000:7f:0b.0
(XEN) PCI add device 0000:7f:0b.1
(XEN) PCI add device 0000:7f:0b.2
(XEN) PCI add device 0000:7f:0b.4
(XEN) PCI add device 0000:7f:0b.5
(XEN) PCI add device 0000:7f:0b.6
(XEN) PCI add device 0000:7f:0c.0
(XEN) PCI add device 0000:7f:0c.1
(XEN) PCI add device 0000:7f:0c.2
(XEN) PCI add device 0000:7f:0c.3
(XEN) PCI add device 0000:7f:0c.4
(XEN) PCI add device 0000:7f:0c.5
(XEN) PCI add device 0000:7f:0c.6
(XEN) PCI add device 0000:7f:0c.7
(XEN) PCI add device 0000:7f:0d.0
(XEN) PCI add device 0000:7f:0d.1
(XEN) PCI add device 0000:7f:0d.2
(XEN) PCI add device 0000:7f:0d.3
(XEN) PCI add device 0000:7f:0d.4
(XEN) PCI add device 0000:7f:0d.5
(XEN) PCI add device 0000:7f:0d.6
(XEN) PCI add device 0000:7f:0d.7
(XEN) PCI add device 0000:7f:0e.0
(XEN) PCI add device 0000:7f:0e.1
(XEN) PCI add device 0000:7f:0f.0
(XEN) PCI add device 0000:7f:0f.1
(XEN) PCI add device 0000:7f:0f.2
(XEN) PCI add device 0000:7f:0f.3
(XEN) PCI add device 0000:7f:0f.4
(XEN) PCI add device 0000:7f:0f.5
(XEN) PCI add device 0000:7f:0f.6
(XEN) PCI add device 0000:7f:10.0
(XEN) PCI add device 0000:7f:10.1
(XEN) PCI add device 0000:7f:10.5
(XEN) PCI add device 0000:7f:10.7
(XEN) PCI add device 0000:7f:12.0
(XEN) PCI add device 0000:7f:12.1
(XEN) PCI add device 0000:7f:12.4
(XEN) PCI add device 0000:7f:12.5
(XEN) PCI add device 0000:7f:13.0
(XEN) PCI add device 0000:7f:13.1
(XEN) PCI add device 0000:7f:13.2
(XEN) PCI add device 0000:7f:13.3
(XEN) PCI add device 0000:7f:13.4
(XEN) PCI add device 0000:7f:13.5
(XEN) PCI add device 0000:7f:13.6
(XEN) PCI add device 0000:7f:13.7
(XEN) PCI add device 0000:7f:14.0
(XEN) PCI add device 0000:7f:14.1
(XEN) PCI add device 0000:7f:14.2
(XEN) PCI add device 0000:7f:14.3
(XEN) PCI add device 0000:7f:14.4
(XEN) PCI add device 0000:7f:14.5
(XEN) PCI add device 0000:7f:14.6
(XEN) PCI add device 0000:7f:14.7
(XEN) PCI add device 0000:7f:15.0
(XEN) PCI add device 0000:7f:15.1
(XEN) PCI add device 0000:7f:15.2
(XEN) PCI add device 0000:7f:15.3
(XEN) PCI add device 0000:7f:16.0
(XEN) PCI add device 0000:7f:16.1
(XEN) PCI add device 0000:7f:16.2
(XEN) PCI add device 0000:7f:16.3
(XEN) PCI add device 0000:7f:16.4
(XEN) PCI add device 0000:7f:16.5
(XEN) PCI add device 0000:7f:16.6
(XEN) PCI add device 0000:7f:16.7
(XEN) PCI add device 0000:7f:17.0
(XEN) PCI add device 0000:7f:17.1
(XEN) PCI add device 0000:7f:17.2
(XEN) PCI add device 0000:7f:17.3
(XEN) PCI add device 0000:7f:17.4
(XEN) PCI add device 0000:7f:17.5
(XEN) PCI add device 0000:7f:17.6
(XEN) PCI add device 0000:7f:17.7
(XEN) PCI add device 0000:7f:18.0
(XEN) PCI add device 0000:7f:18.1
(XEN) PCI add device 0000:7f:18.2
(XEN) PCI add device 0000:7f:18.3
(XEN) PCI add device 0000:7f:1e.0
(XEN) PCI add device 0000:7f:1e.1
(XEN) PCI add device 0000:7f:1e.2
(XEN) PCI add device 0000:7f:1e.3
(XEN) PCI add device 0000:7f:1e.4
(XEN) PCI add device 0000:7f:1f.0
(XEN) PCI add device 0000:7f:1f.2
(XEN) PCI add device 0000:3f:08.0
(XEN) PCI add device 0000:3f:08.2
(XEN) PCI add device 0000:3f:09.0
(XEN) PCI add device 0000:3f:09.2
(XEN) PCI add device 0000:3f:0a.0
(XEN) PCI add device 0000:3f:0a.2
(XEN) PCI add device 0000:3f:0b.0
(XEN) PCI add device 0000:3f:0b.1
(XEN) PCI add device 0000:3f:0b.2
(XEN) PCI add device 0000:3f:0b.4
(XEN) PCI add device 0000:3f:0b.5
(XEN) PCI add device 0000:3f:0b.6
(XEN) PCI add device 0000:3f:0c.0
(XEN) PCI add device 0000:3f:0c.1
(XEN) PCI add device 0000:3f:0c.2
(XEN) PCI add device 0000:3f:0c.3
(XEN) PCI add device 0000:3f:0c.4
(XEN) PCI add device 0000:3f:0c.5
(XEN) PCI add device 0000:3f:0c.6
(XEN) PCI add device 0000:3f:0c.7
(XEN) PCI add device 0000:3f:0d.0
(XEN) PCI add device 0000:3f:0d.1
(XEN) PCI add device 0000:3f:0d.2
(XEN) PCI add device 0000:3f:0d.3
(XEN) PCI add device 0000:3f:0d.4
(XEN) PCI add device 0000:3f:0d.5
(XEN) PCI add device 0000:3f:0d.6
(XEN) PCI add device 0000:3f:0d.7
(XEN) PCI add device 0000:3f:0e.0
(XEN) PCI add device 0000:3f:0e.1
(XEN) PCI add device 0000:3f:0f.0
(XEN) PCI add device 0000:3f:0f.1
(XEN) PCI add device 0000:3f:0f.2
(XEN) PCI add device 0000:3f:0f.3
(XEN) PCI add device 0000:3f:0f.4
(XEN) PCI add device 0000:3f:0f.5
(XEN) PCI add device 0000:3f:0f.6
(XEN) PCI add device 0000:3f:10.0
(XEN) PCI add device 0000:3f:10.1
(XEN) PCI add device 0000:3f:10.5
(XEN) PCI add device 0000:3f:10.7
(XEN) PCI add device 0000:3f:12.0
(XEN) PCI add device 0000:3f:12.1
(XEN) PCI add device 0000:3f:12.4
(XEN) PCI add device 0000:3f:12.5
(XEN) PCI add device 0000:3f:13.0
(XEN) PCI add device 0000:3f:13.1
(XEN) PCI add device 0000:3f:13.2
(XEN) PCI add device 0000:3f:13.3
(XEN) PCI add device 0000:3f:13.4
(XEN) PCI add device 0000:3f:13.5
(XEN) PCI add device 0000:3f:13.6
(XEN) PCI add device 0000:3f:13.7
(XEN) PCI add device 0000:3f:14.0
(XEN) PCI add device 0000:3f:14.1
(XEN) PCI add device 0000:3f:14.2
(XEN) PCI add device 0000:3f:14.3
(XEN) PCI add device 0000:3f:14.4
(XEN) PCI add device 0000:3f:14.5
(XEN) PCI add device 0000:3f:14.6
(XEN) PCI add device 0000:3f:14.7
(XEN) PCI add device 0000:3f:15.0
(XEN) PCI add device 0000:3f:15.1
(XEN) PCI add device 0000:3f:15.2
(XEN) PCI add device 0000:3f:15.3
(XEN) PCI add device 0000:3f:16.0
(XEN) PCI add device 0000:3f:16.1
(XEN) PCI add device 0000:3f:16.2
(XEN) PCI add device 0000:3f:16.3
(XEN) PCI add device 0000:3f:16.4
(XEN) PCI add device 0000:3f:16.5
(XEN) PCI add device 0000:3f:16.6
(XEN) PCI add device 0000:3f:16.7
(XEN) PCI add device 0000:3f:17.0
(XEN) PCI add device 0000:3f:17.1
(XEN) PCI add device 0000:3f:17.2
(XEN) PCI add device 0000:3f:17.3
(XEN) PCI add device 0000:3f:17.4
(XEN) PCI add device 0000:3f:17.5
(XEN) PCI add device 0000:3f:17.6
(XEN) PCI add device 0000:3f:17.7
(XEN) PCI add device 0000:3f:18.0
(XEN) PCI add device 0000:3f:18.1
(XEN) PCI add device 0000:3f:18.2
(XEN) PCI add device 0000:3f:18.3
(XEN) PCI add device 0000:3f:1e.0
(XEN) PCI add device 0000:3f:1e.1
(XEN) PCI add device 0000:3f:1e.2
(XEN) PCI add device 0000:3f:1e.3
(XEN) PCI add device 0000:3f:1e.4
(XEN) PCI add device 0000:3f:1f.0
(XEN) PCI add device 0000:3f:1f.2
(XEN) PCI add device 0000:00:00.0
(XEN) PCI add device 0000:00:02.0
(XEN) PCI add device 0000:00:03.0
(XEN) PCI add device 0000:00:03.2
(XEN) PCI add device 0000:00:03.3
(XEN) PCI add device 0000:00:05.0
(XEN) PCI add device 0000:00:05.1
(XEN) PCI add device 0000:00:05.2
(XEN) PCI add device 0000:00:05.4
(XEN) PCI add device 0000:00:11.0
(XEN) PCI add device 0000:00:16.0
(XEN) PCI add device 0000:00:16.1
(XEN) PCI add device 0000:00:1a.0
(XEN) PCI add device 0000:00:1c.0
(XEN) PCI add device 0000:00:1c.7
(XEN) PCI add device 0000:00:1d.0
(XEN) PCI add device 0000:00:1e.0
(XEN) PCI add device 0000:00:1f.0
(XEN) PCI add device 0000:00:1f.2
(XEN) PCI add device 0000:00:1f.3
(XEN) PCI add device 0000:01:00.0
(XEN) PCI add device 0000:03:00.0
(XEN) PCI add device 0000:03:00.1
(XEN) PCI add device 0000:08:00.0
(XEN) PCI add device 0000:40:02.0
(XEN) PCI add device 0000:40:02.2
(XEN) PCI add device 0000:40:03.0
(XEN) PCI add device 0000:40:05.0
(XEN) PCI add device 0000:40:05.1
(XEN) PCI add device 0000:40:05.2
(XEN) PCI add device 0000:40:05.4
(XEN) PCI add device 0000:80:02.0
(XEN) PCI add device 0000:80:02.2
(XEN) PCI add device 0000:80:03.0
(XEN) PCI add device 0000:80:05.0
(XEN) PCI add device 0000:80:05.1
(XEN) PCI add device 0000:80:05.2
(XEN) PCI add device 0000:80:05.4
(XEN) PCI add device 0000:c0:02.0
(XEN) PCI add device 0000:c0:02.2
(XEN) PCI add device 0000:c0:03.0
(XEN) PCI add device 0000:c0:05.0
(XEN) PCI add device 0000:c0:05.1
(XEN) PCI add device 0000:c0:05.2
(XEN) PCI add device 0000:c0:05.4
(XEN) PCI add device 0000:c2:00.0
(XEN) PCI add device 0000:c2:00.1
(XEN) emul-priv-op.c:1179:d0v0 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v4 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v5 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v6 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v7 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v8 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v9 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v10 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v11 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v12 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v13 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v14 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v15 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v16 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v17 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v18 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v19 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v20 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v21 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v22 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v23 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v24 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v25 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v26 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v27 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v28 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v29 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 000001fc from 0x0000000021040043 to 0x0000000021040041
(XEN) d0: Forcing read-only access to MFN fed00
(XEN) traps.c:1569: GPF (0000): ffff82d0803684d5 [emul-priv-op.c#read_msr+0x462/0x4a5] -> ffff82d080375bb0
(XEN) emul-priv-op.c:1179:d0v0 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v4 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v5 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v6 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v7 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v8 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v9 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v10 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v11 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v12 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v13 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v14 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v15 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v16 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v17 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v18 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v19 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v20 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v21 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v22 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v23 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v24 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v25 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v26 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v27 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v28 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) emul-priv-op.c:1179:d0v29 Domain attempted WRMSR 0000017f from 0x0000000000000000 to 0x0000000000000002
(XEN) Monitor-Mwait will be used to enter C1 state
(XEN) Monitor-Mwait will be used to enter C2 state
(XEN) No CPU ID for APIC ID 0x24
(XEN) 'u' pressed -> dumping numa info (now-0x32E6:7CBB2A5F)
(XEN) NODE0 start->0 size->7864320 free->473713
(XEN) NODE1 start->7864320 size->7340032 free->5739479
(XEN) NODE2 start->15204352 size->6291456 free->6272619
(XEN) NODE3 start->21495808 size->8388608 free->8130612
(XEN) CPU0...35 -> NODE0
(XEN) CPU36...71 -> NODE1
(XEN) CPU72...107 -> NODE2
(XEN) CPU108...143 -> NODE3
(XEN) Memory location of each domain:
(XEN) Domain 0 (total: 8388608):
(XEN)     Node 0: 6806494
(XEN)     Node 1: 1578478
(XEN)     Node 2: 0
(XEN)     Node 3: 3636

[-- Attachment #1.1.5: xl-info-n.txt --]
[-- Type: text/plain, Size: 7078 bytes --]

host                   : xen91
release                : 4.4.120-92.70-default
version                : #1 SMP Wed Mar 14 15:59:43 UTC 2018 (52a83de)
machine                : x86_64
nr_cpus                : 144
max_cpu_id             : 191
nr_nodes               : 4
cores_per_socket       : 18
threads_per_core       : 2
cpu_mhz                : 2493
hw_caps                : bfebfbff:77fef3ff:2c100800:00000021:00000001:00003fbb:00000000:00000100
virt_caps              : hvm
total_memory           : 114562
free_memory            : 80532
sharing_freed_memory   : 0
sharing_used_memory    : 0
outstanding_claims     : 0
free_cpus              : 0
cpu_topology           :
cpu:    core    socket     node
  0:       0        0        0
  1:       0        0        0
  2:       1        0        0
  3:       1        0        0
  4:       2        0        0
  5:       2        0        0
  6:       3        0        0
  7:       3        0        0
  8:       4        0        0
  9:       4        0        0
 10:       8        0        0
 11:       8        0        0
 12:       9        0        0
 13:       9        0        0
 14:      10        0        0
 15:      10        0        0
 16:      11        0        0
 17:      11        0        0
 18:      16        0        0
 19:      16        0        0
 20:      17        0        0
 21:      17        0        0
 22:      18        0        0
 23:      18        0        0
 24:      19        0        0
 25:      19        0        0
 26:      20        0        0
 27:      20        0        0
 28:      24        0        0
 29:      24        0        0
 30:      25        0        0
 31:      25        0        0
 32:      26        0        0
 33:      26        0        0
 34:      27        0        0
 35:      27        0        0
 36:       0        1        1
 37:       0        1        1
 38:       1        1        1
 39:       1        1        1
 40:       2        1        1
 41:       2        1        1
 42:       3        1        1
 43:       3        1        1
 44:       4        1        1
 45:       4        1        1
 46:       8        1        1
 47:       8        1        1
 48:       9        1        1
 49:       9        1        1
 50:      10        1        1
 51:      10        1        1
 52:      11        1        1
 53:      11        1        1
 54:      16        1        1
 55:      16        1        1
 56:      17        1        1
 57:      17        1        1
 58:      18        1        1
 59:      18        1        1
 60:      19        1        1
 61:      19        1        1
 62:      20        1        1
 63:      20        1        1
 64:      24        1        1
 65:      24        1        1
 66:      25        1        1
 67:      25        1        1
 68:      26        1        1
 69:      26        1        1
 70:      27        1        1
 71:      27        1        1
 72:       0        2        2
 73:       0        2        2
 74:       1        2        2
 75:       1        2        2
 76:       2        2        2
 77:       2        2        2
 78:       3        2        2
 79:       3        2        2
 80:       4        2        2
 81:       4        2        2
 82:       8        2        2
 83:       8        2        2
 84:       9        2        2
 85:       9        2        2
 86:      10        2        2
 87:      10        2        2
 88:      11        2        2
 89:      11        2        2
 90:      16        2        2
 91:      16        2        2
 92:      17        2        2
 93:      17        2        2
 94:      18        2        2
 95:      18        2        2
 96:      19        2        2
 97:      19        2        2
 98:      20        2        2
 99:      20        2        2
100:      24        2        2
101:      24        2        2
102:      25        2        2
103:      25        2        2
104:      26        2        2
105:      26        2        2
106:      27        2        2
107:      27        2        2
108:       0        3        3
109:       0        3        3
110:       1        3        3
111:       1        3        3
112:       2        3        3
113:       2        3        3
114:       3        3        3
115:       3        3        3
116:       4        3        3
117:       4        3        3
118:       8        3        3
119:       8        3        3
120:       9        3        3
121:       9        3        3
122:      10        3        3
123:      10        3        3
124:      11        3        3
125:      11        3        3
126:      16        3        3
127:      16        3        3
128:      17        3        3
129:      17        3        3
130:      18        3        3
131:      18        3        3
132:      19        3        3
133:      19        3        3
134:      20        3        3
135:      20        3        3
136:      24        3        3
137:      24        3        3
138:      25        3        3
139:      25        3        3
140:      26        3        3
141:      26        3        3
142:      27        3        3
143:      27        3        3
device topology        :
device           node
0000:80:02.0      2
0000:80:02.2      2
0000:80:03.0      2
0000:80:05.0      2
0000:80:05.1      2
0000:80:05.2      2
0000:80:05.4      2
0000:c0:02.0      3
0000:c0:02.2      3
0000:c0:03.0      3
0000:c0:05.0      3
0000:c0:05.1      3
0000:c0:05.2      3
0000:c0:05.4      3
0000:c2:00.0      3
0000:c2:00.1      3
0000:40:02.0      1
0000:40:02.2      1
0000:40:03.0      1
0000:40:05.0      1
0000:40:05.1      1
0000:40:05.2      1
0000:40:05.4      1
numa_info              :
node:    memsize    memfree    distances
   0:     30720       1850      10,21,21,21
   1:     28672      22419      21,10,21,21
   2:     24576      24502      21,21,10,21
   3:     32768      31760      21,21,21,10
xen_major              : 4
xen_minor              : 11
xen_extra              : .20180410T12570
xen_version            : 4.11.20180410T12570
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 
xen_scheduler          : credit2
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : 2018-04-10 13:57:09 +0100 git:50f8ba84a5
xen_commandline        : bootscrub=0 consoleblank=0 nomodeset com1=115200,8n1 console=com1,vga Xrashkernel=1024M<4G dom0_mem=32G dom0_max_vcpus=30 dom0_vcpus_pin max_cstate=0 loglvl=all guest_loglvl=all suse_vtsc_tolerance=110000 sched=credit2
cc_compiler            : gcc-4.8 (SUSE Linux) 4.8.5
cc_compile_by          : debug=y
cc_compile_domain      : olh:bug1087289:sle12sp2_411
cc_compile_date        : Tue Apr 10 13:57:09 UTC 2018
build_id               : 2137921bc738bc97e99c82c05988f698
xend_config_format     : 4

[-- Attachment #1.1.6: xl-list-n.txt --]
[-- Type: text/plain, Size: 169 bytes --]

Name                                        ID   Mem VCPUs	State	Time(s) NODE Affinity
Domain-0                                     0 32768    30     r-----     118.9 0

[-- Attachment #1.1.7: xl-vcp-list.txt --]
[-- Type: text/plain, Size: 2360 bytes --]

Name                                ID  VCPU   CPU State   Time(s) Affinity (Hard / Soft)
Domain-0                             0     0    0   -b-      10.1  0 / all
Domain-0                             0     1    1   -b-       3.8  1 / all
Domain-0                             0     2    2   -b-       8.0  2 / all
Domain-0                             0     3    3   -b-       3.2  3 / all
Domain-0                             0     4    4   -b-       3.6  4 / all
Domain-0                             0     5    5   -b-       2.8  5 / all
Domain-0                             0     6    6   -b-       4.9  6 / all
Domain-0                             0     7    7   -b-       3.8  7 / all
Domain-0                             0     8    8   -b-       3.4  8 / all
Domain-0                             0     9    9   -b-       3.5  9 / all
Domain-0                             0    10   10   -b-       3.1  10 / all
Domain-0                             0    11   11   -b-       3.0  11 / all
Domain-0                             0    12   12   -b-       2.3  12 / all
Domain-0                             0    13   13   -b-       3.8  13 / all
Domain-0                             0    14   14   -b-       3.9  14 / all
Domain-0                             0    15   15   -b-       3.3  15 / all
Domain-0                             0    16   16   -b-       4.5  16 / all
Domain-0                             0    17   17   -b-       4.3  17 / all
Domain-0                             0    18   18   -b-       3.8  18 / all
Domain-0                             0    19   19   -b-       3.3  19 / all
Domain-0                             0    20   20   -b-       3.0  20 / all
Domain-0                             0    21   21   -b-       2.8  21 / all
Domain-0                             0    22   22   -b-       2.7  22 / all
Domain-0                             0    23   23   -b-       2.5  23 / all
Domain-0                             0    24   24   -b-       2.4  24 / all
Domain-0                             0    25   25   -b-       2.1  25 / all
Domain-0                             0    26   26   r--       4.3  26 / all
Domain-0                             0    27   27   -b-       9.5  27 / all
Domain-0                             0    28   28   -b-       3.5  28 / all
Domain-0                             0    29   29   -b-       4.2  29 / all

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-11 10:00                   ` Olaf Hering
       [not found]                     ` <298ec681a9c38eb7618e6b3e226486691e9eab4d.camel@suse.com>
@ 2018-04-11 15:03                     ` Olaf Hering
  2018-04-11 15:27                       ` Olaf Hering
       [not found]                       ` <5ACE23DF0200002D03781F6A@prv1-mh.provo.novell.com>
  1 sibling, 2 replies; 58+ messages in thread
From: Olaf Hering @ 2018-04-11 15:03 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andrew Cooper, George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3347 bytes --]

On Wed, Apr 11, Olaf Hering wrote:

> On Wed, Apr 11, Dario Faggioli wrote:
> 
> > Olaf, can you give it a try? It should be fine to run it on top of the
> > last debug patch (the one that produced this crash).
> 
> Yes, with both changes it did >4k iterations already. Thanks.

That was with sched=credit2, sorry for that.
Now with just that second patch I got this after a few iterations, in __vmread().
We have seen such crashes a few times with 4.7 already.


(XEN) Xen BUG at ...0f8ba84a5/non-dbg/xen/include/asm/hvm/vmx/vmx.h:390
(XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5-6.bug1087289_411  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    71
(XEN) RIP:    e008:[<ffff82d08030aa55>] vmx.c#arch/x86/hvm/vmx/vmx.o.unlikely+0/0x15b
(XEN) RFLAGS: 0000000000010203   CONTEXT: hypervisor (d16v0)
(XEN) rax: 0000000000004824   rbx: ffff83007ba44000   rcx: ffffffffffffef76
(XEN) rdx: ffff830e7aa77fff   rsi: 000000000000f305   rdi: ffff83007ba44000
(XEN) rbp: 000000000000f305   rsp: ffff830e7aa77e60   r8:  0000000015c23047
(XEN) r9:  000004489c4a4a69   r10: 000001b7ca057c00   r11: 0000000000000000
(XEN) r12: 000000000000f305   r13: 0000000000004016   r14: ffff830779e92180
(XEN) r15: 00000000ffffffff   cr0: 000000008005003b   cr4: 00000000001526e0
(XEN) cr3: 000000067083a000   cr2: 00007fedb9f6c000
(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d08030aa55> (vmx.c#arch/x86/hvm/vmx/vmx.o.unlikely):
(XEN)  44 24 0c e9 82 fd ff ff <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b
(XEN) Xen stack trace from rsp=ffff830e7aa77e60:
(XEN)    ffff82d0802e1442 ffff83007ba44000 000000000000f305 000000000000f305
(XEN)    ffff82d0802ff477 ffff82d08030f9ab 000000f37aa77ef8 ffffffffffffffff
(XEN)    ffff830e7aa77fff ffff82d080933c00 ffff830779e92180 ffff82d08026d870
(XEN)    ffff83007ba44000 ffff83007ba44000 ffff830779e92188 000001b7c8fec61b
(XEN)    ffff830779e92180 ffff82d08094a480 ffff82d08030f9e7 ffffffff81c00000
(XEN)    ffffffff81c00000 ffffffff81c00000 0000000000000000 0000000000000000
(XEN)    ffffffff81d4c180 0000000000000400 0000000000000400 0000000000000000
(XEN)    0000000000000000 ffffffff81020e50 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 000000fc00000000 ffffffff81060182
(XEN)    0000000000000000 0000000000000246 ffffffff81c03f00 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000047 ffff83007ba44000 00000036f9533080 00000000001526e0
(XEN)    0000000000000000 0000000779e90000 0000040000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d08030aa55>] vmx.c#arch/x86/hvm/vmx/vmx.o.unlikely+0/0x15b
(XEN)    [<ffff82d0802e1442>] hvm_interrupt_blocked+0x82/0xd0
(XEN)    [<ffff82d0802ff477>] vmx_intr_assist+0x137/0x490
(XEN)    [<ffff82d08030f9ab>] vmx_asm_vmexit_handler+0xab/0x240
(XEN)    [<ffff82d08026d870>] domain.c#vcpu_kick_softirq+0/0x10
(XEN)    [<ffff82d08030f9e7>] vmx_asm_vmexit_handler+0xe7/0x240
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 71:
(XEN) Xen BUG at ...0f8ba84a5/non-dbg/xen/include/asm/hvm/vmx/vmx.h:390
(XEN) ****************************************


Olaf

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-11 15:03                     ` Olaf Hering
@ 2018-04-11 15:27                       ` Olaf Hering
  2018-04-11 17:20                         ` Dario Faggioli
       [not found]                       ` <5ACE23DF0200002D03781F6A@prv1-mh.provo.novell.com>
  1 sibling, 1 reply; 58+ messages in thread
From: Olaf Hering @ 2018-04-11 15:27 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andrew Cooper, George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3479 bytes --]

On Wed, Apr 11, Olaf Hering wrote:

> On Wed, Apr 11, Olaf Hering wrote:
> > On Wed, Apr 11, Dario Faggioli wrote:
> > > Olaf, can you give it a try? It should be fine to run it on top of the
> > > last debug patch (the one that produced this crash).
> > Yes, with both changes it did >4k iterations already. Thanks.
> That was with sched=credit2, sorry for that.
> Now with just that second patch ...

Still BUG in csched_load_balance.

(XEN) Xen BUG at sched_credit.c:1694
(XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5-6.bug1087289_411  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    135
(XEN) RIP:    e008:[<ffff82d08022ae34>] sched_credit.c#csched_schedule+0x44a/0xd42
(XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
(XEN) rax: ffff83077ffe76d0   rbx: ffff830779c26da0   rcx: ffff8309d55ff290
(XEN) rdx: ffff83007ba44000   rsi: 0000000000000087   rdi: ffff8309d55ff290
(XEN) rbp: ffff831c3f77fdf0   rsp: ffff831c3f77fcf0   r8:  ffff8309d55ff290
(XEN) r9:  0000000000000010   r10: 0000000000000001   r11: 0000ffff0000ffff
(XEN) r12: ffff830779c26e00   r13: 000002effe563d4d   r14: 000002effe56f067
(XEN) r15: 0000000000000087   cr0: 000000008005003b   cr4: 00000000001526e0
(XEN) cr3: 00000015a5932000   cr2: 00007f889333f000
(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d08022ae34> (sched_credit.c#csched_schedule+0x44a/0xd42):
(XEN)  8b 51 28 3b 72 04 74 02 <0f> 0b 48 8d 15 43 d9 5d 00 48 85 c0 74 04 48 8b
(XEN) Xen stack trace from rsp=ffff831c3f77fcf0:
(XEN)    ffff830890972000 000000033f77fd28 ffff830779c1a188 ffff830779c1a188
(XEN)    000002effe56f067 0000000000000087 ffff83007ba44000 ffff831c3f77fda8
(XEN)    0000000001c9c380 ffff831c3f77fe38 ffff82d08095f0e0 ffff82d08095f100
(XEN)    ffff83077a6c59e0 ffff83007ba44000 0000008700000000 ffff8309d55ff290
(XEN)    0000000000000000 0000000000000000 0000000000000086 ffff82d000000087
(XEN)    ffff830060aff000 ffff830779c1a1a0 ffff831c3f77fdf0 0000000000000046
(XEN)    ffff830890972000 ffff830779c1a1c8 0000000000000082 ffff830060aff000
(XEN)    ffff82d08095f100 ffff830779c1a188 000002effe56f067 0000000000000087
(XEN)    ffff831c3f77fe80 ffff82d080236406 ffff830779c1a188 ffff830779c1a1a0
(XEN)    0000008700000001 ffff830779c1a180 ffff82d0802368f3 ffff82d08031f3c1
(XEN)    ffff830779c1a1a0 0000008700972000 ffff830779c1a180 ffff831c3f77fe68
(XEN)    ffff82d0802f8f83 ffff82d080937f80 ffff82d080933c00 ffffffffffffffff
(XEN)    ffff831c3f77ffff ffff830890972000 ffff831c3f77feb0 ffff82d080239ec5
(XEN)    0000000000000087 ffff82d080933c00 ffff82d08095f1f0 ffff831c3f77ffff
(XEN)    ffff831c3f77fec0 ffff82d080239f1a ffff831c3f77fef0 ffff82d0802738f0
(XEN)    ffff830060aff000 ffff83007ba44000 ffff83077a6c4000 0000000000000087
(XEN)    ffff831c3f77fdc8 ffffffff81c00000 ffffffff81c00000 ffffffff81c00000
(XEN)    0000000000000000 0000000000000000 ffffffff81d4c180 0000000000000005
(XEN)    0000000000000000 ffff88011da42660 0000000000000000 ffffffff81020e50
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d08022ae34>] sched_credit.c#csched_schedule+0x44a/0xd42
(XEN)    [<ffff82d080236406>] schedule.c#schedule+0x107/0x627
(XEN)    [<ffff82d080239ec5>] softirq.c#__do_softirq+0x85/0x90
(XEN)    [<ffff82d080239f1a>] do_softirq+0x13/0x15
(XEN)    [<ffff82d0802738f0>] domain.c#idle_loop+0xac/0xbe


Olaf

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
       [not found]                       ` <5ACE23DF0200002D03781F6A@prv1-mh.provo.novell.com>
@ 2018-04-11 15:38                         ` Jan Beulich
  2018-04-11 15:48                           ` Olaf Hering
  0 siblings, 1 reply; 58+ messages in thread
From: Jan Beulich @ 2018-04-11 15:38 UTC (permalink / raw)
  To: Olaf Hering, Jun Nakajima, Kevin Tian
  Cc: Andrew Cooper, Dario Faggioli, George Dunlap, xen-devel

>>> On 11.04.18 at 17:03, <olaf@aepfle.de> wrote:
> On Wed, Apr 11, Olaf Hering wrote:
> 
>> On Wed, Apr 11, Dario Faggioli wrote:
>> 
>> > Olaf, can you give it a try? It should be fine to run it on top of the
>> > last debug patch (the one that produced this crash).
>> 
>> Yes, with both changes it did >4k iterations already. Thanks.
> 
> That was with sched=credit2, sorry for that.
> Now with just that second patch I got this after a few iterations, in 
> __vmread().
> We have seen such crashes a few times with 4.7 already.

And till now I had assumed we've taken care of them with earlier
fixes (all 4.7 reports were with old packages, like 4.7.2 based
ones). Can you repro this with a debug hypervisor (so we can
both trust the stack trace and know whether any earlier
assertion would trigger)?

Is this also tied to those frequent affinity changes? Are there
multiple guests, or is there any non-default activity (like last time
such an issue was found to be triggered by a guest being
destroyed in parallel)?

Kevin, Jun, I'm adding you early here as it would be really nice if
this time round we could get some help from you (being the VMX
maintainers after all).

Jan

> (XEN) Xen BUG at ...0f8ba84a5/non-dbg/xen/include/asm/hvm/vmx/vmx.h:390
> (XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5-6.bug1087289_411  x86_64  
> debug=n   Not tainted ]----
> (XEN) CPU:    71
> (XEN) RIP:    e008:[<ffff82d08030aa55>] 
> vmx.c#arch/x86/hvm/vmx/vmx.o.unlikely+0/0x15b
> (XEN) RFLAGS: 0000000000010203   CONTEXT: hypervisor (d16v0)
> (XEN) rax: 0000000000004824   rbx: ffff83007ba44000   rcx: ffffffffffffef76
> (XEN) rdx: ffff830e7aa77fff   rsi: 000000000000f305   rdi: ffff83007ba44000
> (XEN) rbp: 000000000000f305   rsp: ffff830e7aa77e60   r8:  0000000015c23047
> (XEN) r9:  000004489c4a4a69   r10: 000001b7ca057c00   r11: 0000000000000000
> (XEN) r12: 000000000000f305   r13: 0000000000004016   r14: ffff830779e92180
> (XEN) r15: 00000000ffffffff   cr0: 000000008005003b   cr4: 00000000001526e0
> (XEN) cr3: 000000067083a000   cr2: 00007fedb9f6c000
> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen code around <ffff82d08030aa55> 
> (vmx.c#arch/x86/hvm/vmx/vmx.o.unlikely):
> (XEN)  44 24 0c e9 82 fd ff ff <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 
> 0b
> (XEN) Xen stack trace from rsp=ffff830e7aa77e60:
> (XEN)    ffff82d0802e1442 ffff83007ba44000 000000000000f305 000000000000f305
> (XEN)    ffff82d0802ff477 ffff82d08030f9ab 000000f37aa77ef8 ffffffffffffffff
> (XEN)    ffff830e7aa77fff ffff82d080933c00 ffff830779e92180 ffff82d08026d870
> (XEN)    ffff83007ba44000 ffff83007ba44000 ffff830779e92188 000001b7c8fec61b
> (XEN)    ffff830779e92180 ffff82d08094a480 ffff82d08030f9e7 ffffffff81c00000
> (XEN)    ffffffff81c00000 ffffffff81c00000 0000000000000000 0000000000000000
> (XEN)    ffffffff81d4c180 0000000000000400 0000000000000400 0000000000000000
> (XEN)    0000000000000000 ffffffff81020e50 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 000000fc00000000 ffffffff81060182
> (XEN)    0000000000000000 0000000000000246 ffffffff81c03f00 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000047 ffff83007ba44000 00000036f9533080 00000000001526e0
> (XEN)    0000000000000000 0000000779e90000 0000040000000000 0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08030aa55>] vmx.c#arch/x86/hvm/vmx/vmx.o.unlikely+0/0x15b
> (XEN)    [<ffff82d0802e1442>] hvm_interrupt_blocked+0x82/0xd0
> (XEN)    [<ffff82d0802ff477>] vmx_intr_assist+0x137/0x490
> (XEN)    [<ffff82d08030f9ab>] vmx_asm_vmexit_handler+0xab/0x240
> (XEN)    [<ffff82d08026d870>] domain.c#vcpu_kick_softirq+0/0x10
> (XEN)    [<ffff82d08030f9e7>] vmx_asm_vmexit_handler+0xe7/0x240
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 71:
> (XEN) Xen BUG at ...0f8ba84a5/non-dbg/xen/include/asm/hvm/vmx/vmx.h:390
> (XEN) ****************************************
> 
> 
> Olaf



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-11 15:38                         ` Jan Beulich
@ 2018-04-11 15:48                           ` Olaf Hering
  0 siblings, 0 replies; 58+ messages in thread
From: Olaf Hering @ 2018-04-11 15:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Andrew Cooper, George Dunlap, Dario Faggioli,
	Jun Nakajima, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 420 bytes --]

Am Wed, 11 Apr 2018 09:38:59 -0600
schrieb "Jan Beulich" <JBeulich@suse.com>:

> And till now I had assumed we've taken care of them with earlier
> fixes (all 4.7 reports were with old packages, like 4.7.2 based
> ones). Can you repro this with a debug hypervisor (so we can
> both trust the stack trace and know whether any earlier
> assertion would trigger)?

I have seen it only once, with debug=n.

Olaf

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-11 15:27                       ` Olaf Hering
@ 2018-04-11 17:20                         ` Dario Faggioli
  2018-04-11 20:43                           ` Olaf Hering
  0 siblings, 1 reply; 58+ messages in thread
From: Dario Faggioli @ 2018-04-11 17:20 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Cooper, George Dunlap, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 2156 bytes --]

On Wed, 2018-04-11 at 17:27 +0200, Olaf Hering wrote:
> On Wed, Apr 11, Olaf Hering wrote:
> 
> > That was with sched=credit2, sorry for that.
> > Now with just that second patch ...
> 
> Still BUG in csched_load_balance.
> 
> (XEN) Xen BUG at sched_credit.c:1694
> (XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5-
> 6.bug1087289_411  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    135
> (XEN) RIP:    e008:[<ffff82d08022ae34>]
> sched_credit.c#csched_schedule+0x44a/0xd42
> ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08022ae34>]
> sched_credit.c#csched_schedule+0x44a/0xd42
> (XEN)    [<ffff82d080236406>] schedule.c#schedule+0x107/0x627
> (XEN)    [<ffff82d080239ec5>] softirq.c#__do_softirq+0x85/0x90
> (XEN)    [<ffff82d080239f1a>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d0802738f0>] domain.c#idle_loop+0xac/0xbe
> 
Ok, back to square 1. :-/

A data point is that Credit2 works. In Credit2, vcpu_move_locked()
(called by vcpu_migrate()) calls a function called migrate() which
--because of Credit2 specific reasons-- consider legit the fact that it
finds the vcpu in a runqueue... So that's what I think "save" us, and
that is why this data point does not help much (sorry Olaf for not
realizing this earlier, and asking you to try Credit2). :-(

On the other hand, in Credit1, there should be no good reason why
vcpu_migrate() would be called on a vcpu which is on a runqueue, and
the fact that we're still crashing proves that there is at least
another race, causing that to happen.

So, the debug patch I posted previously in this thread, was wrong. I'm
attaching a new one to this email. Olaf, if you're trying again, please
do it with both, the "fix" (xen-sched-debug-vcpumigrate-race.patch),
and this one.

Debug hypervisor, as usual, if possible. :-)

It will crash, again, possibly with the same stack trace, but I think
it's worth a try.

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.1.2: csched_migrate_debug.patch --]
[-- Type: text/x-patch, Size: 1297 bytes --]

commit 5fb7ad8d1220101e69a87014d5a485d19aea9917
Author: Dario Faggioli <dfaggioli@suse.com>
Date:   Wed Apr 11 09:04:33 2018 +0200

    xen: credit: implement SCHED_OP(migrate)
    
    with just sanity checking in it, to catch a race.
    
    Signed-off-by: Dario Faggioli <dfaggioli@suse.com>

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 9bc638c09c..7a909376e6 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -867,6 +867,17 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
     return cpu;
 }
 
+static void
+csched_vcpu_migrate(const struct scheduler *ops, struct vcpu *vc,
+		    unsigned int new_cpu)
+{
+    BUG_ON(vc->is_running);
+    BUG_ON(test_bit(_VPF_migrating, &vc->pause_flags));
+    BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
+    BUG_ON(CSCHED_VCPU(vc) == CSCHED_VCPU(curr_on_cpu(vc->processor)));
+    vc->processor = new_cpu;
+}
+
 static int
 csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 {
@@ -2278,6 +2289,7 @@ static const struct scheduler sched_credit_def = {
     .adjust_global  = csched_sys_cntl,
 
     .pick_cpu       = csched_cpu_pick,
+    .migrate        = csched_vcpu_migrate,
     .do_schedule    = csched_schedule,
 
     .dump_cpu_state = csched_dump_pcpu,

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-11 17:20                         ` Dario Faggioli
@ 2018-04-11 20:43                           ` Olaf Hering
  2018-04-11 21:31                             ` Dario Faggioli
  0 siblings, 1 reply; 58+ messages in thread
From: Olaf Hering @ 2018-04-11 20:43 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andrew Cooper, George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4477 bytes --]

On Wed, Apr 11, Dario Faggioli wrote:

> It will crash, again, possibly with the same stack trace, but I think
> it's worth a try.

    BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));

(XEN) grant_table.c:1769:d15v18 Expanding d15 grant table from 12 to 13 frames
(XEN) grant_table.c:1769:d15v20 Expanding d15 grant table from 13 to 14 frames
(XEN) grant_table.c:1769:d15v21 Expanding d15 grant table from 14 to 15 frames
(XEN) traps.c:1569: GPF (0000): ffff82d080315d5f [vmx.c#vmx_msr_read_intercept+0x375/0x3e2] -> ffff82d08037594a
(XEN) grant_table.c:1769:d16v19 Expanding d16 grant table from 9 to 10 frames
(XEN) grant_table.c:1769:d16v22 Expanding d16 grant table from 10 to 11 frames
(XEN) grant_table.c:1769:d16v27 Expanding d16 grant table from 11 to 12 frames
(XEN) grant_table.c:1769:d16v21 Expanding d16 grant table from 12 to 13 frames
(XEN) grant_table.c:1769:d16v21 Expanding d16 grant table from 13 to 14 frames
(XEN) grant_table.c:1769:d16v20 Expanding d16 grant table from 14 to 15 frames
(XEN) Xen BUG at sched_credit.c:876
(XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5-7.bug1087289_411  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    108
(XEN) RIP:    e008:[<ffff82d080229ab4>] sched_credit.c#csched_vcpu_migrate+0x27/0x54
(XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
(XEN) rax: ffff8308990cb230   rbx: ffff830779d28188   rcx: ffff82d080803640
(XEN) rdx: 000000000000006e   rsi: ffff83007ba44000   rdi: ffff82d080803640
(XEN) rbp: ffff831c7d80fd18   rsp: ffff831c7d80fd18   r8:  0000000000000004
(XEN) r9:  0000000000000000   r10: 0000000000000001   r11: 0000ffff0000ffff
(XEN) r12: ffff830779d28188   r13: 000000000000006e   r14: 000000000000006c
(XEN) r15: ffff83007ba44000   cr0: 000000008005003b   cr4: 00000000001526e0
(XEN) cr3: 00000015a589c000   cr2: 00007f163e153d00
(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d080229ab4> (sched_credit.c#csched_vcpu_migrate+0x27/0x54):
(XEN)  00 00 00 48 3b 00 74 02 <0f> 0b 48 8d 0d 43 56 73 00 4c 63 46 04 48 8d 3d
(XEN) Xen stack trace from rsp=ffff831c7d80fd18:
(XEN)    ffff831c7d80fd28 ffff82d080236348 ffff831c7d80fda8 ffff82d08023764c
(XEN)    ffff831c7d80fd58 ffff82d08095f0e0 ffff82d08095f100 ffff830779d12188
(XEN)    ffff83007ba44000 0000006e01236fbe ffff83077ffe7720 0000000000000296
(XEN)    ffff83077a6c59e0 ffff83007ba44000 ffff83007ba44000 ffff83077a6c4000
(XEN)    000000000000006c ffff8308bc78e000 ffff831c7d80fdc8 ffff82d080239367
(XEN)    ffff83077a6c4000 ffff83005d1da000 ffff831c7d80fe18 ffff82d08027797d
(XEN)    ffff831c7d80fde8 ffff82d0802a4f50 ffff831c7d80fe18 ffff83007ba44000
(XEN)    ffff83005d1da000 ffff830779d28188 0000112defdf0971 0000000000000003
(XEN)    ffff831c7d80fea8 ffff82d080236943 ffff831c7d80fe68 ffff830779d281a0
(XEN)    0000006c0080fe68 ffff830779d28180 000000000000f305 000000000000f305
(XEN)    ffff83007ba44000 ffff83005d1da000 ffffffffffffffff ffff83005d1da000
(XEN)    ffff831c7d80fee8 ffff82d080937200 ffff82d080933c00 ffffffffffffffff
(XEN)    ffff831c7d80ffff ffff83077a6c4000 ffff831c7d80fed8 ffff82d080239f15
(XEN)    ffff83007ba44000 ffff83005d1da000 ffff8308bc78e000 000000000000006c
(XEN)    ffff831c7d80fee8 ffff82d080239f6a ffff831c7d80fda0 ffff82d08031f5db
(XEN)    ffffffff81c00000 ffffffff81c00000 ffffffff81c00000 0000000000000000
(XEN)    0000000000000000 ffffffff81d4c180 ffffffff821cf188 00000005da10c197
(XEN)    0000000000000001 0000000000000000 ffffffff81020e50 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000beef0000beef
(XEN)    ffffffff81060182 000000bf0000beef 0000000000000246 ffffffff81c03f00
(XEN) Xen call trace:
(XEN)    [<ffff82d080229ab4>] sched_credit.c#csched_vcpu_migrate+0x27/0x54
(XEN)    [<ffff82d080236348>] schedule.c#vcpu_move_locked+0xbb/0xc2
(XEN)    [<ffff82d08023764c>] schedule.c#vcpu_migrate+0x226/0x25b
(XEN)    [<ffff82d080239367>] context_saved+0x95/0x9c
(XEN)    [<ffff82d08027797d>] context_switch+0xe66/0xeb0
(XEN)    [<ffff82d080236943>] schedule.c#schedule+0x5f4/0x627
(XEN)    [<ffff82d080239f15>] softirq.c#__do_softirq+0x85/0x90
(XEN)    [<ffff82d080239f6a>] do_softirq+0x13/0x15
(XEN)    [<ffff82d08031f5db>] vmx_asm_do_vmentry+0x2b/0x30
(XEN) ****************************************
(XEN) Panic on CPU 108:
(XEN) Xen BUG at sched_credit.c:876
(XEN) ****************************************
(XEN) Reboot in five seconds...


Olaf

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-11 20:43                           ` Olaf Hering
@ 2018-04-11 21:31                             ` Dario Faggioli
       [not found]                               ` <5ACE29E00200005B03782666@prv1-mh.provo.novell.com>
  2018-04-12  9:38                               ` George Dunlap
  0 siblings, 2 replies; 58+ messages in thread
From: Dario Faggioli @ 2018-04-11 21:31 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Cooper, George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1844 bytes --]

Il Mer 11 Apr 2018, 22:48 Olaf Hering <olaf@aepfle.de> ha scritto:

> On Wed, Apr 11, Dario Faggioli wrote:
>
> > It will crash, again, possibly with the same stack trace, but I think
> > it's worth a try.
>
>     BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
>
> (XEN) Xen BUG at sched_credit.c:876
> (XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5-7.bug1087289_411  x86_64
> debug=y   Not tainted ]----
> (XEN) CPU:    108
> (XEN) RIP:    e008:[<ffff82d080229ab4>]
> sched_credit.c#csched_vcpu_migrate+0x27/0x54
> (XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
> ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d080229ab4>] sched_credit.c#csched_vcpu_migrate+0x27/0x54
> (XEN)    [<ffff82d080236348>] schedule.c#vcpu_move_locked+0xbb/0xc2
> (XEN)    [<ffff82d08023764c>] schedule.c#vcpu_migrate+0x226/0x25b
> (XEN)    [<ffff82d080239367>] context_saved+0x95/0x9c
> (XEN)    [<ffff82d08027797d>] context_switch+0xe66/0xeb0
> (XEN)    [<ffff82d080236943>] schedule.c#schedule+0x5f4/0x627
> (XEN)    [<ffff82d080239f15>] softirq.c#__do_softirq+0x85/0x90
> (XEN)    [<ffff82d080239f6a>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d08031f5db>] vmx_asm_do_vmentry+0x2b/0x30
>

So, really *exactly* the same. Ok, thanks.

I think that from "CONTEXT: hypervisor", we can tell that the current vcpu
is the idle one, and I'm starting to wonder whether the lazy context switch
logic may play a role in all this.

But, for now, it's just a gut feeling. I'll investigate tomorrow.

Another thing we could do, would be to try George's migratiin refactoring
series. I haven't reviewed it in details yet, but it seemed reasonable at a
first glance.

Not that that could be the solution (backportabiliry, ecc), but, if it
works, it might give us ideas on where to look, and on how to produce a
stepgap patch, "just" solving the issue.

Thanks and regards,
Dario

[-- Attachment #1.2: Type: text/html, Size: 2982 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
       [not found]                                     ` <5ACE7F370200002F0378782F@prv1-mh.provo.novell.com>
@ 2018-04-12  7:18                                       ` Jan Beulich
  0 siblings, 0 replies; 58+ messages in thread
From: Jan Beulich @ 2018-04-12  7:18 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andrew Cooper, Olaf Hering, George Dunlap, xen-devel

>>> On 11.04.18 at 23:31, <raistlin@linux.it> wrote:
> Il Mer 11 Apr 2018, 22:48 Olaf Hering <olaf@aepfle.de> ha scritto:
> 
>> On Wed, Apr 11, Dario Faggioli wrote:
>>
>> > It will crash, again, possibly with the same stack trace, but I think
>> > it's worth a try.
>>
>>     BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
>>
>> (XEN) Xen BUG at sched_credit.c:876
>> (XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5-7.bug1087289_411  x86_64
>> debug=y   Not tainted ]----
>> (XEN) CPU:    108
>> (XEN) RIP:    e008:[<ffff82d080229ab4>]
>> sched_credit.c#csched_vcpu_migrate+0x27/0x54
>> (XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
>> ...
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82d080229ab4>] sched_credit.c#csched_vcpu_migrate+0x27/0x54
>> (XEN)    [<ffff82d080236348>] schedule.c#vcpu_move_locked+0xbb/0xc2
>> (XEN)    [<ffff82d08023764c>] schedule.c#vcpu_migrate+0x226/0x25b
>> (XEN)    [<ffff82d080239367>] context_saved+0x95/0x9c
>> (XEN)    [<ffff82d08027797d>] context_switch+0xe66/0xeb0
>> (XEN)    [<ffff82d080236943>] schedule.c#schedule+0x5f4/0x627
>> (XEN)    [<ffff82d080239f15>] softirq.c#__do_softirq+0x85/0x90
>> (XEN)    [<ffff82d080239f6a>] do_softirq+0x13/0x15
>> (XEN)    [<ffff82d08031f5db>] vmx_asm_do_vmentry+0x2b/0x30
>>
> 
> So, really *exactly* the same. Ok, thanks.
> 
> I think that from "CONTEXT: hypervisor", we can tell that the current vcpu
> is the idle one,

No, not that fact - the context shown is strictly dependent on where
the exception occurred (that'll always be in the hypervisor for
BUG() and WARN()). The absence of a dXvY is telling you that it's
the idle vCPU we're on. See _show_registers().

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-11 21:31                             ` Dario Faggioli
       [not found]                               ` <5ACE29E00200005B03782666@prv1-mh.provo.novell.com>
@ 2018-04-12  9:38                               ` George Dunlap
  2018-04-12 10:16                                 ` Dario Faggioli
  1 sibling, 1 reply; 58+ messages in thread
From: George Dunlap @ 2018-04-12  9:38 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andrew Cooper, Olaf Hering, xen-devel



> On Apr 11, 2018, at 10:31 PM, Dario Faggioli <raistlin@linux.it> wrote:
> 
> Il Mer 11 Apr 2018, 22:48 Olaf Hering <olaf@aepfle.de> ha scritto:
> On Wed, Apr 11, Dario Faggioli wrote:
> 
> > It will crash, again, possibly with the same stack trace, but I think
> > it's worth a try.
> 
>     BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
> 
> (XEN) Xen BUG at sched_credit.c:876
> (XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5-7.bug1087289_411  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    108
> (XEN) RIP:    e008:[<ffff82d080229ab4>] sched_credit.c#csched_vcpu_migrate+0x27/0x54
> (XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
> ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d080229ab4>] sched_credit.c#csched_vcpu_migrate+0x27/0x54
> (XEN)    [<ffff82d080236348>] schedule.c#vcpu_move_locked+0xbb/0xc2
> (XEN)    [<ffff82d08023764c>] schedule.c#vcpu_migrate+0x226/0x25b
> (XEN)    [<ffff82d080239367>] context_saved+0x95/0x9c
> (XEN)    [<ffff82d08027797d>] context_switch+0xe66/0xeb0
> (XEN)    [<ffff82d080236943>] schedule.c#schedule+0x5f4/0x627
> (XEN)    [<ffff82d080239f15>] softirq.c#__do_softirq+0x85/0x90
> (XEN)    [<ffff82d080239f6a>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d08031f5db>] vmx_asm_do_vmentry+0x2b/0x30
> 
> So, really *exactly* the same. Ok, thanks.

But this doesn’t make any sense.  If you applied Dario’s ‘fix’ patch, then context_saved() should have *just* called vcpu_sleep_nosync() before calling vcpu_migrate().  The VPF_migrating flag should still be set, so it should have called csched_vcpu_sleep(); and sd->curr should have been changed to be != prev way back in schedule(), so csched_vcpu_sleep() should have called runq_remove().

It’s probably worth asking the obvious question: Are you sure the “fix” patch is actually applied (in addition to the new “debug” patch)? :-)

If so, then maybe it’s time to open-code vcpu_sleep_nosync() there in context_saved(), to try to figure out where our understanding of what *should* happen is incorrect.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-12  9:38                               ` George Dunlap
@ 2018-04-12 10:16                                 ` Dario Faggioli
  2018-04-12 12:45                                   ` Olaf Hering
  0 siblings, 1 reply; 58+ messages in thread
From: Dario Faggioli @ 2018-04-12 10:16 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Olaf Hering, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 2667 bytes --]

On Thu, 2018-04-12 at 09:38 +0000, George Dunlap wrote:
> > On Apr 11, 2018, at 10:31 PM, Dario Faggioli <raistlin@linux.it>
> > wrote:
> > (XEN) Xen BUG at sched_credit.c:876
> > (XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5-
> > 7.bug1087289_411  x86_64  debug=y   Not tainted ]----
> > (XEN) CPU:    108
> > (XEN) RIP:    e008:[<ffff82d080229ab4>]
> > sched_credit.c#csched_vcpu_migrate+0x27/0x54
> > (XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
> > ...
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d080229ab4>]
> > sched_credit.c#csched_vcpu_migrate+0x27/0x54
> > (XEN)    [<ffff82d080236348>] schedule.c#vcpu_move_locked+0xbb/0xc2
> > (XEN)    [<ffff82d08023764c>] schedule.c#vcpu_migrate+0x226/0x25b
> > (XEN)    [<ffff82d080239367>] context_saved+0x95/0x9c
> > (XEN)    [<ffff82d08027797d>] context_switch+0xe66/0xeb0
> > (XEN)    [<ffff82d080236943>] schedule.c#schedule+0x5f4/0x627
> > (XEN)    [<ffff82d080239f15>] softirq.c#__do_softirq+0x85/0x90
> > (XEN)    [<ffff82d080239f6a>] do_softirq+0x13/0x15
> > (XEN)    [<ffff82d08031f5db>] vmx_asm_do_vmentry+0x2b/0x30
> > 
> > So, really *exactly* the same. Ok, thanks.
> 
> But this doesn’t make any sense.  If you applied Dario’s ‘fix’ patch,
> then context_saved() should have *just* called vcpu_sleep_nosync()
> before calling vcpu_migrate().  The VPF_migrating flag should still
> be set, so it should have called csched_vcpu_sleep(); and sd->curr
> should have been changed to be != prev way back in schedule(), so
> csched_vcpu_sleep() should have called runq_remove().
> 
Well, you've just described me, banging my head on my desk, since
yesterday afternoon. :-P

> It’s probably worth asking the obvious question: Are you sure the
> “fix” patch is actually applied (in addition to the new “debug”
> patch)? :-)
> 
> If so, then maybe it’s time to open-code vcpu_sleep_nosync() there in
> context_saved(), to try to figure out where our understanding of what
> *should* happen is incorrect.
> 
Ehm... Can you please stop reading my mind? It's annoying. :-D
Well, I guess we can say: "great minds think alike". :-P

Olaf, new patch. Please, remove _everything_ and apply _only_ this one.

As George is saying, the vcpu just can't be in the runqueue, unless:
 1) vcpu_sleep_nosync() did not remove it
 2) someone is putting it back there

Let's check 1 first.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.1.2: context-save-race-debug.patch --]
[-- Type: text/x-patch, Size: 2345 bytes --]

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 9bc638c09c..67628a1f95 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -867,6 +867,17 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
     return cpu;
 }
 
+static void
+csched_vcpu_migrate(const struct scheduler *ops, struct vcpu *vc,
+		    unsigned int new_cpu)
+{
+    BUG_ON(vc->is_running);
+    BUG_ON(test_bit(_VPF_migrating, &vc->pause_flags));
+    BUG_ON(CSCHED_VCPU(vc) == CSCHED_VCPU(curr_on_cpu(vc->processor)));
+    BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
+    vc->processor = new_cpu;
+}
+
 static int
 csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 {
@@ -2278,6 +2289,7 @@ static const struct scheduler sched_credit_def = {
     .adjust_global  = csched_sys_cntl,
 
     .pick_cpu       = csched_cpu_pick,
+    .migrate        = csched_vcpu_migrate,
     .do_schedule    = csched_schedule,
 
     .dump_cpu_state = csched_dump_pcpu,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 343ab6306e..7be62efa33 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1554,7 +1554,29 @@ void context_saved(struct vcpu *prev)
     SCHED_OP(vcpu_scheduler(prev), context_saved, prev);
 
     if ( unlikely(prev->pause_flags & VPF_migrating) )
+    {
+        /*
+         * If someone (e.g., vcpu_set_affinity()) has set VPF_migrating
+         * on prev in between when schedule() releases the scheduler
+         * lock and here, we need to make sure we properly mark the
+         * vcpu as not runnable (and all it comes with that), with
+         * vcpu_sleep_nosync(), before calling vcpu_migrate().
+         */
+        //vcpu_sleep_nosync(prev);
+        unsigned long flags;
+        spinlock_t *lock = vcpu_schedule_lock_irqsave(prev, &flags);
+
+        BUG_ON(vcpu_runnable(prev));
+        BUG_ON(!test_bit(_VPF_migrating, &prev->pause_flags));
+        if ( prev->runstate.state == RUNSTATE_runnable )
+            vcpu_runstate_change(prev, RUNSTATE_offline, NOW());
+        BUG_ON(curr_on_cpu(prev->processor) == prev);
+        SCHED_OP(vcpu_scheduler(prev), sleep, prev);
+
+        vcpu_schedule_unlock_irqrestore(lock, flags, prev);
+
         vcpu_migrate(prev);
+    }
 }
 
 /* The scheduler timer: force a run through the scheduler */

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-12 10:16                                 ` Dario Faggioli
@ 2018-04-12 12:45                                   ` Olaf Hering
  2018-04-12 13:15                                     ` Dario Faggioli
  0 siblings, 1 reply; 58+ messages in thread
From: Olaf Hering @ 2018-04-12 12:45 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andrew Cooper, George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3350 bytes --]

Am Thu, 12 Apr 2018 12:16:34 +0200
schrieb Dario Faggioli <dfaggioli@suse.com>:

> Olaf, new patch. Please, remove _everything_ and apply _only_ this one.

dies after the first iteration.

        BUG_ON(!test_bit(_VPF_migrating, &prev->pause_flags));

(XEN) Xen BUG at schedule.c:1570
(XEN) ----[ Xen-4.11.20180411T100655.82540b66ce-1.xen_unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    29
(XEN) RIP:    e008:[<ffff82d08023c6f4>] context_saved+0x1a3/0x32c
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: 0000000000000001   rbx: ffff8300779b3000   rcx: ffff83047fe04188
(XEN) rdx: 0000000000000000   rsi: 0000000000006218   rdi: ffff83047fe0418e
(XEN) rbp: ffff830880057db8   rsp: ffff830880057d78   r8:  0000000000000001
(XEN) r9:  0000000000000000   r10: 00000000ffffffc0   r11: ffff83047fe8e0a0
(XEN) r12: ffff83047fe04188   r13: 0000000000000292   r14: ffff82d0805c7180
(XEN) r15: ffff82d0805b2520   cr0: 0000000080050033   cr4: 00000000000026e0
(XEN) cr3: 0000000b62f19000   cr2: 00000000006af6e8
(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d08023c6f4> (context_saved+0x1a3/0x32c):
(XEN)  00 e9 09 ff ff ff 0f 0b <0f> 0b e8 a7 bc 06 00 49 89 c6 83 bb c8 00 00 00
(XEN) Xen stack trace from rsp=ffff830880057d78:
(XEN)    ffff83047cde2f40 ffff8300779b3000 ffff830880057db8 ffff830077bea000
(XEN)    ffff8300779b3000 ffff83047ffe7000 000000000000001d ffff83052b234000
(XEN)    ffff830880057e08 ffff82d08027a3d8 ffff830880057dd8 ffff82d0802a83b0
(XEN)    ffff830880057e08 ffff8300779b3000 ffff830077bea000 ffff83047fe04188
(XEN)    00000056c1375e97 0000000000000002 ffff830880057e98 ffff82d080239783
(XEN)    ffff8300779b3560 ffff83047fe041a0 0000001d00057e58 ffff83047fe04180
(XEN)    ffff82d080328a41 ffff8300779b3000 ffff83052b234000 ffff830077bea000
(XEN)    ffffffffffffffff ffff82d080301f00 ffff8300779b3000 ffff82d08059cb00
(XEN)    ffff82d08059bc80 ffffffffffffffff ffff830880057fff ffff82d0805a3c80
(XEN)    ffff830880057ed8 ffff82d08023d3f7 ffff82d080328a41 ffff8300779b3000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    ffff830880057ee8 ffff82d08023d46a 00007cf77ffa80e7 ffff82d080328c0b
(XEN)    ffff880086960000 ffff880086960000 ffff880086960000 0000000000000000
(XEN)    0000000000000002 ffffffff81d4c180 0000000000000008 0000000a7c976ba7
(XEN)    0000000000000001 0000000000000000 ffffffff81020e50 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000beef0000beef
(XEN)    ffffffff81060182 000000bf0000beef 0000000000000246 ffff880086963ed8
(XEN)    000000000000beef 000000000000beef 000000000000beef 000000000000beef
(XEN)    000000000000beef 000000000000001d ffff830077bea000 00000033ff83d000
(XEN)    00000000000026e0 0000000000000000 000000047fe06000 0000040000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d08023c6f4>] context_saved+0x1a3/0x32c
(XEN)    [<ffff82d08027a3d8>] context_switch+0xe9/0xf67
(XEN)    [<ffff82d080239783>] schedule.c#schedule+0x306/0x6ab
(XEN)    [<ffff82d08023d3f7>] softirq.c#__do_softirq+0x71/0x9a
(XEN)    [<ffff82d08023d46a>] do_softirq+0x13/0x15
(XEN)    [<ffff82d080328c0b>] vmx_asm_do_vmentry+0x2b/0x30



Olaf

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-12 12:45                                   ` Olaf Hering
@ 2018-04-12 13:15                                     ` Dario Faggioli
  2018-04-12 15:38                                       ` Dario Faggioli
  0 siblings, 1 reply; 58+ messages in thread
From: Dario Faggioli @ 2018-04-12 13:15 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Cooper, George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1629 bytes --]

On Thu, 2018-04-12 at 14:45 +0200, Olaf Hering wrote:
> Am Thu, 12 Apr 2018 12:16:34 +0200
> schrieb Dario Faggioli <dfaggioli@suse.com>:
> 
> > Olaf, new patch. Please, remove _everything_ and apply _only_ this
> > one.
> 
> dies after the first iteration.
> 
>         BUG_ON(!test_bit(_VPF_migrating, &prev->pause_flags));
> 
So, VPF_migrating is set, when we enter the if() and decide to call
vcpu_sleep_nosync() and vcpu_migrate(), but is not set here, once we
have taken the lock.

Interestingly, we did not hit BUG_ON(vcpu_runnable(prev)), right before
that...

Anyway, there is only once place where VPF_migrating is reset, and that
is in vcpu_migrate().

So, basing on our theory that we are running concurrently with
vcpu_set_affinity(), it's the call to vcpu_migrate() from
vcpu_set_affinity() that resets it.

I need to think a bit more (I'm trying to picture the exact scenario)
but as of now, it still does not make sense... As it looks to me that
now it is the call to vcpu_sleep_nosync(), also from
vcpu_set_affinity(), that should have removed prev from the runqueue.

True that vcpu_migrate() ends with vcpu_wake(), which put is back in a
runqueue, but then again the our vcpu_migrate(), here in
context_saved(), finding that VPF_migrate() is off, should *not* call
vcpu_move_locked().

This is getting insane (or I am)... :-O

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-12 13:15                                     ` Dario Faggioli
@ 2018-04-12 15:38                                       ` Dario Faggioli
  2018-04-12 17:25                                         ` Dario Faggioli
  0 siblings, 1 reply; 58+ messages in thread
From: Dario Faggioli @ 2018-04-12 15:38 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Cooper, George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1913 bytes --]

On Thu, 2018-04-12 at 15:15 +0200, Dario Faggioli wrote:
> On Thu, 2018-04-12 at 14:45 +0200, Olaf Hering wrote:
> > 
> > dies after the first iteration.
> > 
> >         BUG_ON(!test_bit(_VPF_migrating, &prev->pause_flags));
> > 
> 
Update. I replaced this:

+        BUG_ON(vcpu_runnable(prev));
+        BUG_ON(!test_bit(_VPF_migrating, &prev->pause_flags));

with this, in the patch:

+        if (vcpu_runnable(prev) || !test_bit(_VPF_migrating, &prev->pause_flags))
+            printk("d%uv%d runnbl=%d proc=%d pf=%lu\n", prev->domain->domain_id, prev->vcpu_id,
+                   vcpu_runnable(prev), prev->processor, prev->pause_flags);
+        BUG_ON(!test_bit(_VPF_migrating, &prev->pause_flags));

Output is:

(XEN) d10v0 runnbl=1 proc=31 pf=0
(XEN) Xen BUG at schedule.c:1572

On CPU 16.

It is still the BUG_ON(!test_bit(VPF_migrating)) which is triggering (I
actually meant to get rid of that as well, but I forgot.)

So, it looks like before, we did not hit BUG_ON(vcpu_runnable(prev)),
while in this run, vcpu_runnable(prev) is 1. I mean, I know it's a
race, but... wow...

We are in here because VPF_migrating was set, but it must be getting
cleared, concurrently with us, at about this time.

We are on CPU 16, inside context_saved(), and our 'prev' is d10v0. This
means its 'processor' should still be 16. But it's 31, so someone has
changed it already. I'm assuming it has been the vcpu_migrate() from
vcpu_set_affinity(). And this could very well be fine, but then, why we
also, when inside vcpu_migrate(), find VPF_migrating set?

I'll add more debugging to check if the vcpu is in a runqueue...

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-12 15:38                                       ` Dario Faggioli
@ 2018-04-12 17:25                                         ` Dario Faggioli
  2018-04-13  6:23                                           ` Olaf Hering
  2018-04-13  9:03                                           ` George Dunlap
  0 siblings, 2 replies; 58+ messages in thread
From: Dario Faggioli @ 2018-04-12 17:25 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Cooper, George Dunlap, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 6420 bytes --]

On Thu, 2018-04-12 at 17:38 +0200, Dario Faggioli wrote:
> On Thu, 2018-04-12 at 15:15 +0200, Dario Faggioli wrote:
> > On Thu, 2018-04-12 at 14:45 +0200, Olaf Hering wrote:
> > > 
> > > dies after the first iteration.
> > > 
> > >         BUG_ON(!test_bit(_VPF_migrating, &prev->pause_flags));
> > > 
> 
> Update. I replaced this:
> 
Olaf, new patch! :-)

FTR, a previous version of this (where I was not printing
smp_processor_id() and prev->is_running), produced the output that I am
attaching below.

Looks to me like, while on the crashing CPU, we are here [*]:

void context_saved(struct vcpu *prev)
{
    ...
    if ( unlikely(prev->pause_flags & VPF_migrating) )
    {
        unsigned long flags;
        spinlock_t *lock = vcpu_schedule_lock_irqsave(prev, &flags);

        if (vcpu_runnable(prev) || !test_bit(_VPF_migrating, &prev->pause_flags))
            printk("CPU %u: d%uv%d isr=%u runnbl=%d proc=%d pf=%lu orq=%d csf=%u\n",
                   smp_processor_id(), prev->domain->domain_id, prev->vcpu_id,
                   prev->is_running, vcpu_runnable(prev),
                   prev->processor, prev->pause_flags,
                   SCHED_OP(vcpu_scheduler(prev), onrunq, prev),
                   SCHED_OP(vcpu_scheduler(prev), csflags, prev));

        [*]

        if ( prev->runstate.state == RUNSTATE_runnable )
            vcpu_runstate_change(prev, RUNSTATE_offline, NOW());
        BUG_ON(curr_on_cpu(prev->processor) == prev);
        SCHED_OP(vcpu_scheduler(prev), sleep, prev);

        vcpu_schedule_unlock_irqrestore(lock, flags, prev);

        vcpu_migrate(prev);
    }
}

On the "other CPU", we might be around here [**]:

static void vcpu_migrate(struct vcpu *v)
{
    ...
    if ( v->is_running ||
         !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )
    {
        sched_spin_unlock_double(old_lock, new_lock, flags); 
        return; 
    } 
 
    vcpu_move_locked(v, new_cpu); 
 
    sched_spin_unlock_double(old_lock, new_lock, flags); 

    [**] 

    if ( old_cpu != new_cpu ) 
        sched_move_irqs(v); 
 
    /* Wake on new CPU. */ 
    vcpu_wake(v); 
}

(XEN) d10v1 runnbl=0 proc=22 pf=1 orq=0 csf=4
(XEN) d10v0 runnbl=1 proc=20 pf=0 orq=0 csf=4
(XEN) d10v0 runnbl=1 proc=25 pf=0 orq=0 csf=4
(XEN) d10v2 runnbl=1 proc=31 pf=0 orq=0 csf=4
(XEN) d10v2 runnbl=1 proc=10 pf=0 orq=1 csf=0
(XEN) d10v0 runnbl=1 proc=30 pf=0 orq=0 csf=4
(XEN) d10v0 runnbl=1 proc=15 pf=0 orq=0 csf=4
(XEN) d10v3 runnbl=1 proc=13 pf=0 orq=1 csf=0
(XEN) d10v2 runnbl=1 proc=39 pf=0 orq=0 csf=4
(XEN) d10v3 runnbl=1 proc=32 pf=0 orq=0 csf=4
(XEN) d10v2 runnbl=1 proc=20 pf=0 orq=0 csf=4
(XEN) d10v2 runnbl=1 proc=20 pf=0 orq=0 csf=4
(XEN) d10v1 runnbl=0 proc=26 pf=1 orq=0 csf=4
(XEN) d10v3 runnbl=1 proc=16 pf=0 orq=0 csf=4
(XEN) Xen BUG at sched_credit.c:877
(XEN) ----[ Xen-4.11.20180411T100655.82540b66ce-180412155659  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    16
(XEN) RIP:    e008:[<ffff82d08022c84d>] sched_credit.c#csched_vcpu_migrate+0x52/0x54
(XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor (d6v0)
(XEN) rax: ffff8300779c9000   rbx: 0000000000000012   rcx: ffff830adac719f0
(XEN) rdx: 0000000000000012   rsi: ffff8300779b2000   rdi: 00000033ff8bb000
(XEN) rbp: ffff83087cfb7ce8   rsp: ffff83087cfb7ce8   r8:  0000000000000010
(XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
(XEN) r12: ffff83047fe82188   r13: ffff83047fe70188   r14: ffff82d0805c7180
(XEN) r15: ffff8300779b2000   cr0: 000000008005003b   cr4: 00000000000026e0
(XEN) cr3: 0000000f8404b000   cr2: 00007f18dfeca000
(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d08022c84d> (sched_credit.c#csched_vcpu_migrate+0x52/0x54):
(XEN)  5d c3 0f 0b 0f 0b 0f 0b <0f> 0b 55 48 89 e5 48 8d 05 26 a9 39 00 48 8b 57
(XEN) Xen stack trace from rsp=ffff83087cfb7ce8:
(XEN)    ffff83087cfb7cf8 ffff82d080239419 ffff83087cfb7d68 ffff82d08023a8d8
(XEN)    ffff82d0805c7160 ffff82d0805c7180 01ff83087cfb7d78 0000001200000010
(XEN)    0000000000000092 0000000000000296 0000000000000003 ffff8300779b2000
(XEN)    ffff83047fe82188 0000000000000292 0000000000000004 ffff82d0805b2520
(XEN)    ffff83087cfb7db8 ffff82d08023c795 ffff83087cfb7d98 ffff8300779b2000
(XEN)    ffff83087cfb7db8 ffff8300779c9000 ffff8300779b2000 ffff830ad6463000
(XEN)    0000000000000010 ffff830adad26000 ffff83087cfb7e08 ffff82d08027a538
(XEN)    ffff83087cfb7dd8 ffff82d0802a8510 ffff83087cfb7e08 ffff8300779b2000
(XEN)    ffff8300779c9000 ffff83047fe82188 0000008405ba3022 0000000000000003
(XEN)    ffff83087cfb7e98 ffff82d0802397a9 ffff8300779b2560 ffff83047fe821a0
(XEN)    0000001000fb7e58 ffff83047fe82180 ffff82d080328ba1 ffff8300779b2000
(XEN)    ffff830adad26000 ffff8300779c9000 0000000001c9c380 ffff82d080302000
(XEN)    ffff8300779b2000 ffff82d08059c480 ffff82d08059bc80 ffffffffffffffff
(XEN)    ffff83087cfb7fff ffff82d0805a3c80 ffff83087cfb7ed8 ffff82d08023d552
(XEN)    ffff82d080328ba1 ffff8300779b2000 ffff8300779c9000 ffff830adad26000
(XEN)    0000000000000010 ffff830ad6463000 ffff83087cfb7ee8 ffff82d08023d5c5
(XEN)    ffff83087cfb7db8 ffff82d080328d6b ffffffff81c00000 ffffffff81c00000
(XEN)    ffffffff81c00000 0000000000000000 0000000000000000 ffffffff81d4c180
(XEN)    0000000000000008 000000470cb96de6 0000000000000001 0000000000000000
(XEN)    ffffffff81020e50 0000000000000000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d08022c84d>] sched_credit.c#csched_vcpu_migrate+0x52/0x54
(XEN)    [<ffff82d080239419>] schedule.c#vcpu_move_locked+0x42/0xcc
(XEN)    [<ffff82d08023a8d8>] schedule.c#vcpu_migrate+0x210/0x23b
(XEN)    [<ffff82d08023c795>] context_saved+0x21e/0x461
(XEN)    [<ffff82d08027a538>] context_switch+0xe9/0xf67
(XEN)    [<ffff82d0802397a9>] schedule.c#schedule+0x306/0x6ab
(XEN)    [<ffff82d08023d552>] softirq.c#__do_softirq+0x71/0x9a
(XEN)    [<ffff82d08023d5c5>] do_softirq+0x13/0x15
(XEN)    [<ffff82d080328d6b>] vmx_asm_do_vmentry+0x2b/0x30

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.1.2: context-save-race-debug.patch --]
[-- Type: text/x-patch, Size: 4127 bytes --]

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 9bc638c09c..6e886bdfbb 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -867,6 +867,17 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
     return cpu;
 }
 
+static void
+csched_vcpu_migrate(const struct scheduler *ops, struct vcpu *vc,
+		    unsigned int new_cpu)
+{
+    BUG_ON(vc->is_running);
+    BUG_ON(test_bit(_VPF_migrating, &vc->pause_flags));
+    BUG_ON(CSCHED_VCPU(vc) == CSCHED_VCPU(curr_on_cpu(vc->processor)));
+    BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
+    vc->processor = new_cpu;
+}
+
 static int
 csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 {
@@ -1086,6 +1097,18 @@ csched_vcpu_remove(const struct scheduler *ops, struct vcpu *vc)
     BUG_ON( sdom == NULL );
 }
 
+static int
+csched_vcpu_onrunq(const struct scheduler *ops, struct vcpu *vc)
+{
+    return __vcpu_on_runq(CSCHED_VCPU(vc));
+}
+
+static int
+csched_vcpu_csflags(const struct scheduler *ops, struct vcpu *vc)
+{
+    return CSCHED_VCPU(vc)->flags;
+}
+
 static void
 csched_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
 {
@@ -2278,8 +2301,12 @@ static const struct scheduler sched_credit_def = {
     .adjust_global  = csched_sys_cntl,
 
     .pick_cpu       = csched_cpu_pick,
+    .migrate        = csched_vcpu_migrate,
     .do_schedule    = csched_schedule,
 
+    .onrunq         = csched_vcpu_onrunq,
+    .csflags        = csched_vcpu_csflags,
+
     .dump_cpu_state = csched_dump_pcpu,
     .dump_settings  = csched_dump,
     .init           = csched_init,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 343ab6306e..2b98b38e6b 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1554,7 +1554,34 @@ void context_saved(struct vcpu *prev)
     SCHED_OP(vcpu_scheduler(prev), context_saved, prev);
 
     if ( unlikely(prev->pause_flags & VPF_migrating) )
+    {
+        /*
+         * If someone (e.g., vcpu_set_affinity()) has set VPF_migrating
+         * on prev in between when schedule() releases the scheduler
+         * lock and here, we need to make sure we properly mark the
+         * vcpu as not runnable (and all it comes with that), with
+         * vcpu_sleep_nosync(), before calling vcpu_migrate().
+         */
+        //vcpu_sleep_nosync(prev);
+        unsigned long flags;
+        spinlock_t *lock = vcpu_schedule_lock_irqsave(prev, &flags);
+
+        if (vcpu_runnable(prev) || !test_bit(_VPF_migrating, &prev->pause_flags))
+            printk("CPU %u: d%uv%d isr=%u runnbl=%d proc=%d pf=%lu orq=%d csf=%u\n",
+                   smp_processor_id(), prev->domain->domain_id, prev->vcpu_id,
+                   prev->is_running, vcpu_runnable(prev),
+                   prev->processor, prev->pause_flags,
+                   SCHED_OP(vcpu_scheduler(prev), onrunq, prev),
+                   SCHED_OP(vcpu_scheduler(prev), csflags, prev));
+        if ( prev->runstate.state == RUNSTATE_runnable )
+            vcpu_runstate_change(prev, RUNSTATE_offline, NOW());
+        BUG_ON(curr_on_cpu(prev->processor) == prev);
+        SCHED_OP(vcpu_scheduler(prev), sleep, prev);
+
+        vcpu_schedule_unlock_irqrestore(lock, flags, prev);
+
         vcpu_migrate(prev);
+    }
 }
 
 /* The scheduler timer: force a run through the scheduler */
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 9596eae1e2..97b6461106 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -160,6 +160,9 @@ struct scheduler {
     void         (*insert_vcpu)    (const struct scheduler *, struct vcpu *);
     void         (*remove_vcpu)    (const struct scheduler *, struct vcpu *);
 
+    int          (*onrunq)         (const struct scheduler *, struct vcpu *);
+    int          (*csflags)        (const struct scheduler *, struct vcpu *);
+
     void         (*sleep)          (const struct scheduler *, struct vcpu *);
     void         (*wake)           (const struct scheduler *, struct vcpu *);
     void         (*yield)          (const struct scheduler *, struct vcpu *);

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-12 17:25                                         ` Dario Faggioli
@ 2018-04-13  6:23                                           ` Olaf Hering
  2018-04-13  9:01                                             ` Dario Faggioli
  2018-04-13  9:03                                           ` George Dunlap
  1 sibling, 1 reply; 58+ messages in thread
From: Olaf Hering @ 2018-04-13  6:23 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andrew Cooper, George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4102 bytes --]

Am Thu, 12 Apr 2018 19:25:43 +0200
schrieb Dario Faggioli <dfaggioli@suse.com>:

> Olaf, new patch! :-)

    BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));

(XEN) CPU 36: d10v1 isr=0 runnbl=1 proc=36 pf=0 orq=0 csf=4
(XEN) CPU 33: d10v2 isr=0 runnbl=0 proc=33 pf=1 orq=0 csf=4
(XEN) CPU 20: d10v2 isr=0 runnbl=1 proc=20 pf=0 orq=0 csf=4
(XEN) CPU 32: d10v0 isr=0 runnbl=1 proc=32 pf=0 orq=0 csf=4
(XEN) CPU 33: d10v0 isr=0 runnbl=1 proc=12 pf=0 orq=0 csf=4
(XEN) CPU 36: d10v0 isr=0 runnbl=1 proc=36 pf=0 orq=0 csf=4
(XEN) CPU 31: d10v0 isr=0 runnbl=1 proc=31 pf=0 orq=0 csf=4
(XEN) Xen BUG at sched_credit.c:877
(XEN) ----[ Xen-4.11.20180411T100655.82540b66ce-180413055758  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    31
(XEN) RIP:    e008:[<ffff82d08022c84d>] sched_credit.c#csched_vcpu_migrate+0x52/0x54
(XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
(XEN) rax: ffff830077be8000   rbx: 0000000000000020   rcx: ffff830adaca7d30
(XEN) rdx: 0000000000000020   rsi: ffff8300779b5000   rdi: 00000033fc629000
(XEN) rbp: ffff83107d44fce8   rsp: ffff83107d44fce8   r8:  000000000000001f
(XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
(XEN) r12: ffff83047cbf0188   r13: ffff83047cbe6188   r14: ffff82d0805c7180
(XEN) r15: ffff8300779b5000   cr0: 0000000080050033   cr4: 00000000000026e0
(XEN) cr3: 0000000eb8239000   cr2: 00007f867ef9835c
(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d08022c84d> (sched_credit.c#csched_vcpu_migrate+0x52/0x54):
(XEN)  5d c3 0f 0b 0f 0b 0f 0b <0f> 0b 55 48 89 e5 48 8d 05 26 a9 39 00 48 8b 57
(XEN) Xen stack trace from rsp=ffff83107d44fce8:
(XEN)    ffff83107d44fcf8 ffff82d080239419 ffff83107d44fd68 ffff82d08023a8d8
(XEN)    ffff82d0805c7160 ffff82d0805c7180 01ff83107d44fd38 000000200000001f
(XEN)    000000000000000a 0000000000000296 0000000000000000 ffff8300779b5000
(XEN)    ffff83047cbf0188 0000000000000292 0000000000000004 ffff82d0805b2520
(XEN)    ffff83107d44fdb8 ffff82d08023c7ad ffff83047cbf0188 ffff8300779b5000
(XEN)    ffff83107d44fdb8 ffff830077be8000 ffff8300779b5000 ffff83047ffe7000
(XEN)    000000000000001f ffff830adad2f000 ffff83107d44fe08 ffff82d08027a558
(XEN)    ffff83107d44fdd8 ffff82d0802a8530 ffff83107d44fe08 ffff8300779b5000
(XEN)    ffff830077be8000 ffff83047cbf0188 0000007c1960d213 0000000000000003
(XEN)    ffff83107d44fe98 ffff82d0802397a9 ffff8300779b5560 ffff83047cbf01a0
(XEN)    0000001f0044fe58 ffff83047cbf0180 ffff8300779b5000 ffff8300779b5568
(XEN)    ffff83107d44fe78 ffff830077be8000 ffffffffffffffff ffff8300779b5000
(XEN)    ffff8300779b5000 ffff82d08059cc00 ffff82d08059bc80 ffffffffffffffff
(XEN)    ffff83107d44ffff ffff82d0805a3c80 ffff83107d44fed8 ffff82d08023d56a
(XEN)    ffff82d080328bc1 ffff8300779b5000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffff83107d44fee8 ffff82d08023d5dd
(XEN)    00007cef82bb00e7 ffff82d080328d8b ffff88007d86d680 0000000000000000
(XEN)    ffff88007d86d680 ffff88007ea60000 0000000000000001 0000000000000000
(XEN)    0000000000000000 ffff8800870009c0 ffff880087000908 0000000000000000
(XEN)    0000000000000000 00000000fffffffa 0000000000000000 deadbeefdeadf00d
(XEN) Xen call trace:
(XEN)    [<ffff82d08022c84d>] sched_credit.c#csched_vcpu_migrate+0x52/0x54
(XEN)    [<ffff82d080239419>] schedule.c#vcpu_move_locked+0x42/0xcc
(XEN)    [<ffff82d08023a8d8>] schedule.c#vcpu_migrate+0x210/0x23b
(XEN)    [<ffff82d08023c7ad>] context_saved+0x236/0x479
(XEN)    [<ffff82d08027a558>] context_switch+0xe9/0xf67
(XEN)    [<ffff82d0802397a9>] schedule.c#schedule+0x306/0x6ab
(XEN)    [<ffff82d08023d56a>] softirq.c#__do_softirq+0x71/0x9a
(XEN)    [<ffff82d08023d5dd>] do_softirq+0x13/0x15
(XEN)    [<ffff82d080328d8b>] vmx_asm_do_vmentry+0x2b/0x30
(XEN) ****************************************
(XEN) Panic on CPU 31:
(XEN) Xen BUG at sched_credit.c:877
(XEN) ****************************************
(XEN) Reboot in five seconds...

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-13  6:23                                           ` Olaf Hering
@ 2018-04-13  9:01                                             ` Dario Faggioli
  0 siblings, 0 replies; 58+ messages in thread
From: Dario Faggioli @ 2018-04-13  9:01 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Cooper, George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5267 bytes --]

On Fri, 2018-04-13 at 08:23 +0200, Olaf Hering wrote:
> Am Thu, 12 Apr 2018 19:25:43 +0200
> schrieb Dario Faggioli <dfaggioli@suse.com>:
> 
> > Olaf, new patch! :-)
> 
>     BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
> 
Thanks!

> (XEN) CPU 36: d10v1 isr=0 runnbl=1 proc=36 pf=0 orq=0 csf=4
>
So, FTR:
- CPU is smp_processor_id()
- dXvY is prev, in context_saved()
- isr is prev->is_running
- runnbl is vcpu_runnable(prev)
- proc is prev->processor
- pf is pref->pause_flags
- orq is __vcpu_on_runq(CSCHED_VCPU(prev)) (coming from sched_credit.c)
- csf is CSCHED_VCPU(prev)->flags

csf = 4 is CSCHED_FLAG_VCPU_MIGRATING, which means someone is calling
vcpu_migrate() on prev, on some other processor (presumably via
vcpu_set_affinity()) and is around here:

static void vcpu_migrate(struct vcpu *v)
{
    ...
    if ( v->is_running ||
         !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )
    {
        sched_spin_unlock_double(old_lock, new_lock, flags); 
        return; 
    } 
 
    vcpu_move_locked(v, new_cpu); 
 
    sched_spin_unlock_double(old_lock, new_lock, flags); 

    [**] 

    if ( old_cpu != new_cpu ) 
        sched_move_irqs(v); 
 
    /* Wake on new CPU. */ 
    vcpu_wake(v); 
}

I.e., SCHED_OP(pick_cpu) has been called already, but not vcpu_wake().

We must be past the sched_spin_unlock_double() because, on this
processor (i.e., CPU 31 in this crash), we are, while printing,
_inside_ a critical section on prev's scheduler lock.

> (XEN) CPU 33: d10v2 isr=0 runnbl=0 proc=33 pf=1 orq=0 csf=4
> (XEN) CPU 20: d10v2 isr=0 runnbl=1 proc=20 pf=0 orq=0 csf=4
> (XEN) CPU 32: d10v0 isr=0 runnbl=1 proc=32 pf=0 orq=0 csf=4
> (XEN) CPU 33: d10v0 isr=0 runnbl=1 proc=12 pf=0 orq=0 csf=4
> (XEN) CPU 36: d10v0 isr=0 runnbl=1 proc=36 pf=0 orq=0 csf=4
> (XEN) CPU 31: d10v0 isr=0 runnbl=1 proc=31 pf=0 orq=0 csf=4
> (XEN) Xen BUG at sched_credit.c:877
> (XEN) ----[ Xen-4.11.20180411T100655.82540b66ce-
> 180413055758  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    31
>
Right, so, in this case, the vcpu_migrate()->SCHED_OP(pick_cpu) did not
change prev->processor. That could very well have happened. This just
means that, if it weren't for the BUG_ON added in csched_vcpu_migrate()
by this patch, this iteration would not have crashed in
csched_load_balance().

However, in previous report, we have seen a situation where
prev->processor was 31 on CPU 16.

Fact is, VPF_migrating is 0 right now, for prev, which corroborates the
theory that we are at [*] point, in vcpu_migrate() on the other CPU. In
fact, it was 1, but !test_and_clear_bit() has been called to reset it.

However, in order for us, on this CPU, to actually execute
sched_move_locked(), like we do:

> (XEN) Xen call trace:
> (XEN)    [<ffff82d08022c84d>]
> sched_credit.c#csched_vcpu_migrate+0x52/0x54
> (XEN)    [<ffff82d080239419>] schedule.c#vcpu_move_locked+0x42/0xcc
>
It means that someone raise VPF_migrating again!

> (XEN)    [<ffff82d08023a8d8>] schedule.c#vcpu_migrate+0x210/0x23b
> (XEN)    [<ffff82d08023c7ad>] context_saved+0x236/0x479
> (XEN)    [<ffff82d08027a558>] context_switch+0xe9/0xf67
> (XEN)    [<ffff82d0802397a9>] schedule.c#schedule+0x306/0x6ab
> (XEN)    [<ffff82d08023d56a>] softirq.c#__do_softirq+0x71/0x9a
> (XEN)    [<ffff82d08023d5dd>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d080328d8b>] vmx_asm_do_vmentry+0x2b/0x30
> (XEN) ****************************************
> (XEN) Panic on CPU 31:
> (XEN) Xen BUG at sched_credit.c:877
> (XEN) ****************************************
>
Now, VPF_migrating is raise in the following circumstances:

* in __runq_tickle(): I actually was about to pinpoint this as the 
  problem, but then I realized that, when calling __runq_tickle(prev),
  in vcpu_wake() (called by vcpu_migrate()), we do not set the bit on
  prev itself, but on the currently running vcpu of prev->processor.
  And a vcpu that is in  per_cpu(schedule_data, <CPU>).curr, can't 
  also be prev in (any) context_saved(), I think.

* in csched_vcpu_acct(): we set the flag on CSCHED_VCPU(current). I 
   may be wrong, but I don't immediatly see why we use current here, 
   instead than curr_on_cpu(cpu). Yet, I think that, similarly to 
   above, current can't be prev. Still, I may send a "Just in case"^TM 
   patch... :-P

* in vcpu_force_reschedule(): it's used in shim code (well... :-) and 
  in VCPUOP_set_periodic_timer(). But it only sets the flag if
  prev->is_running is 1, which is not. Besides, don't most guests use 
  singleshot timer only these days?

* in cpu_disable_scheduler(): no. Just no.

* in vcpu_set_affinity(): well, it looks to me that, either, a) we use
  the set of the bit in here to actually enter the if() in
  context_saved(), which is a precondition for the race, and then we
  are already past that or, b) things just work. Will think more...

* in vcpu_pin_override(): again, no.... I think?

So, thoughts? :-)

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-12 17:25                                         ` Dario Faggioli
  2018-04-13  6:23                                           ` Olaf Hering
@ 2018-04-13  9:03                                           ` George Dunlap
  2018-04-13  9:25                                             ` Dario Faggioli
  2018-04-13 10:47                                             ` Dario Faggioli
  1 sibling, 2 replies; 58+ messages in thread
From: George Dunlap @ 2018-04-13  9:03 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andrew Cooper, Olaf Hering, xen-devel



> On Apr 12, 2018, at 6:25 PM, Dario Faggioli <dfaggioli@suse.com> wrote:
> 
> On Thu, 2018-04-12 at 17:38 +0200, Dario Faggioli wrote:
>> On Thu, 2018-04-12 at 15:15 +0200, Dario Faggioli wrote:
>>> On Thu, 2018-04-12 at 14:45 +0200, Olaf Hering wrote:
>>>> 
>>>> dies after the first iteration.
>>>> 
>>>>        BUG_ON(!test_bit(_VPF_migrating, &prev->pause_flags));
>>>> 
>> 
>> Update. I replaced this:
>> 
> Olaf, new patch! :-)
> 
> FTR, a previous version of this (where I was not printing
> smp_processor_id() and prev->is_running), produced the output that I am
> attaching below.
> 
> Looks to me like, while on the crashing CPU, we are here [*]:
> 
> void context_saved(struct vcpu *prev)
> {
>    ...
>    if ( unlikely(prev->pause_flags & VPF_migrating) )
>    {
>        unsigned long flags;
>        spinlock_t *lock = vcpu_schedule_lock_irqsave(prev, &flags);
> 
>        if (vcpu_runnable(prev) || !test_bit(_VPF_migrating, &prev->pause_flags))
>            printk("CPU %u: d%uv%d isr=%u runnbl=%d proc=%d pf=%lu orq=%d csf=%u\n",
>                   smp_processor_id(), prev->domain->domain_id, prev->vcpu_id,
>                   prev->is_running, vcpu_runnable(prev),
>                   prev->processor, prev->pause_flags,
>                   SCHED_OP(vcpu_scheduler(prev), onrunq, prev),
>                   SCHED_OP(vcpu_scheduler(prev), csflags, prev));
> 
>        [*]
> 
>        if ( prev->runstate.state == RUNSTATE_runnable )
>            vcpu_runstate_change(prev, RUNSTATE_offline, NOW());
>        BUG_ON(curr_on_cpu(prev->processor) == prev);
>        SCHED_OP(vcpu_scheduler(prev), sleep, prev);
> 
>        vcpu_schedule_unlock_irqrestore(lock, flags, prev);
> 
>        vcpu_migrate(prev);
>    }
> }
> 
> On the "other CPU", we might be around here [**]:
> 
> static void vcpu_migrate(struct vcpu *v)
> {
>    ...
>    if ( v->is_running ||
>         !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )n

I think the bottom line is, for this test to be valid, then at this point test_bit(VPF_migrating) *must* imply !vcpu_on_runqueue(v), but at this point it doesn’t: If someone else has come by and cleared the bit, done migration, and woken it up, and then someone *else* set the bit again without taking it off the runqueue, it may still be on the runqueue.

My series which calls vcpu_sleep_nosync_locked() after setting VPF_migrating should help with this.

Or, alternately, instead of baking all this implicit  knowledge about credit into the scheduler, we should just implement credit_vcpu_migrate(), and have it remove it from one runqueue and put it on another.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-13  9:03                                           ` George Dunlap
@ 2018-04-13  9:25                                             ` Dario Faggioli
  2018-04-13 10:38                                               ` Olaf Hering
  2018-04-13 11:29                                               ` George Dunlap
  2018-04-13 10:47                                             ` Dario Faggioli
  1 sibling, 2 replies; 58+ messages in thread
From: Dario Faggioli @ 2018-04-13  9:25 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Olaf Hering, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2074 bytes --]

On Fri, 2018-04-13 at 09:03 +0000, George Dunlap wrote:
> > On Apr 12, 2018, at 6:25 PM, Dario Faggioli <dfaggioli@suse.com>
> > wrote:
> > 
> I think the bottom line is, for this test to be valid, then at this
> point test_bit(VPF_migrating) *must* imply !vcpu_on_runqueue(v), but
> at this point it doesn’t: If someone else has come by and cleared the
> bit, done migration, and woken it up, and then someone *else* set the
> bit again without taking it off the runqueue, it may still be on the
> runqueue.
> 
> My series which calls vcpu_sleep_nosync_locked() after setting
> VPF_migrating should help with this.
> 
Yes. In fact, Olaf, I still think that doing a run with George's RFC
applied, would be useful, if only as a data point.

> Or, alternately, instead of baking all this implicit  knowledge about
> credit into the scheduler, we should just implement
> credit_vcpu_migrate(), and have it remove it from one runqueue and
> put it on another.
> 
But it's not really "baking Credit implicit knowledge", IMO. It is that
we have an invariant which we are failing to enforce.

That's why your series goes in the right direction, because by calling
sleep() in the same critical section of where the bit is set, it
improves how we enforce the invariant.

Implementing a csched_vcpu_migrate(), looks to me like "relaxing" the
invariant, which is right the opposite direction. :-)

We may well decide to _get_rid_ of the invariant, but I'm not sure that
implementing csched_vcpu_migrate() would be all that this takes and, in
general, I don't think that something like this:
 - is an approapriate thing to do at this point of 4.11 cycle;
 - will be easy to backport (while, despite the look of it, 
   backporting patch 1 and 2 of your series might not be too terrible).

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-13  9:25                                             ` Dario Faggioli
@ 2018-04-13 10:38                                               ` Olaf Hering
  2018-04-13 11:29                                               ` George Dunlap
  1 sibling, 0 replies; 58+ messages in thread
From: Olaf Hering @ 2018-04-13 10:38 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andrew Cooper, George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 223 bytes --]

On Fri, Apr 13, Dario Faggioli wrote:

> Yes. In fact, Olaf, I still think that doing a run with George's RFC
> applied, would be useful, if only as a data point.

First tests indicate that this series fixes the bug.

Olaf

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-13  9:03                                           ` George Dunlap
  2018-04-13  9:25                                             ` Dario Faggioli
@ 2018-04-13 10:47                                             ` Dario Faggioli
  1 sibling, 0 replies; 58+ messages in thread
From: Dario Faggioli @ 2018-04-13 10:47 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Olaf Hering, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5828 bytes --]

On Fri, 2018-04-13 at 09:03 +0000, George Dunlap wrote:
> > On Apr 12, 2018, at 6:25 PM, Dario Faggioli <dfaggioli@suse.com>
> > wrote:
> > 
> > On the "other CPU", we might be around here [**]:
> > 
> > static void vcpu_migrate(struct vcpu *v)
> > {
> >    ...
> >    if ( v->is_running ||
> >         !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )n
> 
> I think the bottom line is, for this test to be valid, then at this
> point test_bit(VPF_migrating) *must* imply !vcpu_on_runqueue(v), but
> at this point it doesn’t: If someone else has come by and cleared the
> bit, done migration, and woken it up, and then someone *else* set the
> bit again without taking it off the runqueue, it may still be on the
> runqueue.
> 
BTW, I suddenly realized that Olaf, in his reproducer, is changing both
hard and soft-affinity.

That means two calls to vcpu_set_affinity(). And here's the race
(beware, it's a bit of a long chain of events! :-P):

 CPU A                                  CPU B
 .                                      .
 schedule(current == v)                 vcpu_set_affinity(v) <-- for hard affinity
  prev = current     // == v             .
  schedule_lock(CPU A)                   .
   csched_schedule()                     schedule_lock(CPU A)
   if (runnable(v))  //YES               x
    runq_insert(v)                       x
   return next != v                      x
  schedule_unlock(CPU A)                 x // takes the lock
  context_switch(prev,next)              set_bit(v, VPF_migrating)
   context_saved(prev) // still == v     .
    v->is_running = 0                    schedule_unlock(CPU A)
    SMP_MB                               domain_update_node_affinity(v->d)
    .                                    if (test_bit(v, VPF_migrating) // YES
    .                                     vcpu_sleep_nosync(v)
    .                                      schedule_lock(CPU A)
    .                                      if (!vcpu_runnable(v)) // YES
    .                                       SCHED_OP(v, sleep)
    .                                        if (curr_on_cpu(v, CPU A)) // NO
    .                                         ---
    .                                        else if (__vcpu_on_runq(v)) // YES
    .                                         runq_remove(v)
    .                                       schedule_unlock(CPU A)
    .                                     vcpu_migrate(v)
    .                                      for {
    .                                       schedule_lock(CPU A)
    .                                       SCHED_OP(v, pick_cpu)
    .                                        set_bit(v, CSCHED_MIGRATING)
    .                                        return CPU D
    .                                       pick_called = 1
    .                                       schedule_unlock(CPU A)
    if (test_bit(v, VPF_migrating)) // YES  schedule_lock(CPU A + CPU D)
     vcpu_sleep_nosync(v)                   if (pick_called && ) // YES
      schedule_lock(CPU A)                   break
      x                                    }
      x // CPU B clears VPF_migrating!     if (v->is_running || !test_and_clear(v, VPF_migrating)) // NO
      x                                     ---
      x                                    vcpu_move_locked(v)
      x                                     v->processor = CPU D
      x                                    schedule_unlock(CPU A + CPU D)
      x // takes *CPU D* lock              .
      if (!vcpu_runnable(v)) // FALSE, as VPF_migrating is now clear
       ---                                 vcpu_wake(v)
      schedule_unlock(CPU D)                .
      vcpu_migrate(v)                       schedule_lock(CPU D)
        for {                               if (vcpu_runnable(v)) // YES
         schedule_lock(CPU D)                SCHED_OP(v, wake)
         x                                    runq_insert(v) // v is now in CPU D's runqueue
         x                                    runq_tickle(v)
         x                                  schedule_unlock(CPU D)
         x // takes the lock                .
         SCHED_OP(v, pick_cpu)              .
          set_bit(v, CSCHED_MIGRATING)      .
          return CPU C                      .
         pick_called = 1                    .
         schedule_unlock(CPU D)             .
         .                                  vcpu_set_affinity(v) <-- for soft-affinity
         .                                  schedule_lock(CPU D)
         schedule_lock(CPU D + CPU C)       set_bit(v, VPF_migrating)
         x                                  schedule_unlock(CPU D)
         x // takes the lock                .
         if (pick_called && ...) // YES     .
          break                             .
        }                                   .
        if ( v->is_running || !test_and_clear(v, VPF_migrating)) // FALSE !!
         vcpu_move_locked(v, CPU C)         .
         BUG_ON(__vcpu_on_runq(v))          .

It appears that changing hard-affinity only does not trigger the bug,
which would mean this is correct.

Also, as Olaf just reported, running with your series (and changing
both hard and soft-affinity), also work.

Now we have to decide if take your series and backport it (which is
what I'm leaning toward), or do something else.

But if you don't mind, we'd have to do it on Monday, as I have to run
right now. :-P

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-13  9:25                                             ` Dario Faggioli
  2018-04-13 10:38                                               ` Olaf Hering
@ 2018-04-13 11:29                                               ` George Dunlap
  2018-04-13 11:41                                                 ` Dario Faggioli
  1 sibling, 1 reply; 58+ messages in thread
From: George Dunlap @ 2018-04-13 11:29 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andrew Cooper, Olaf Hering, xen-devel



> On Apr 13, 2018, at 10:25 AM, Dario Faggioli <dfaggioli@suse.com> wrote:
> 
> On Fri, 2018-04-13 at 09:03 +0000, George Dunlap wrote:
>>> On Apr 12, 2018, at 6:25 PM, Dario Faggioli <dfaggioli@suse.com>
>>> wrote:
>>> 
>> I think the bottom line is, for this test to be valid, then at this
>> point test_bit(VPF_migrating) *must* imply !vcpu_on_runqueue(v), but
>> at this point it doesn’t: If someone else has come by and cleared the
>> bit, done migration, and woken it up, and then someone *else* set the
>> bit again without taking it off the runqueue, it may still be on the
>> runqueue.
>> 
>> My series which calls vcpu_sleep_nosync_locked() after setting
>> VPF_migrating should help with this.
>> 
> Yes. In fact, Olaf, I still think that doing a run with George's RFC
> applied, would be useful, if only as a data point.
> 
>> Or, alternately, instead of baking all this implicit  knowledge about
>> credit into the scheduler, we should just implement
>> credit_vcpu_migrate(), and have it remove it from one runqueue and
>> put it on another.
>> 
> But it's not really "baking Credit implicit knowledge", IMO. It is that
> we have an invariant which we are failing to enforce.

Which invariant is that?  That a vcpu is not on a runqueue when switching v->processor.  But “on a runqueue” is a scheduler-specific construct that the main scheduling code doesn’t know about.  Otherwise we could make the late bail-out clause in vcpu_migrate() something like this:

if ( !v->is_running ||
     vcpu_on_runq(v) ||
     !test_and_clear_bit(VPF_migrating) )
{
     /* unlock and return */
}

All this stuff with vcpu_sleep_nosync() and vcpu_wake() is just indirectly making sure that the Credit1-specific invariant — that switching v->processor removes it from one runqueue and adds it to another — actually happens; but it does it in an opaque way.  And the main reason the migrate() callback was introduced (IIRC) is because credit2’s migration invariants didn’t really correspond to the invariants implicitly defined by schedule.c for credit1.

I think as far as backports go, my current RFC would be fine.  Another possibility, though, would be to simply add a migrate() callback to remove the vcpu from the runqueue before switching v->processor, *without* removing any of the current song and dance about vcpu_sleep_nosync().  That should be fairly simple and straightforward to backport, and won’t make anything worse (since in theory it should have been removed by that point anyway).  Then for 4.12 we can figure out what we want to do going forward.

 -George
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-13 11:29                                               ` George Dunlap
@ 2018-04-13 11:41                                                 ` Dario Faggioli
  0 siblings, 0 replies; 58+ messages in thread
From: Dario Faggioli @ 2018-04-13 11:41 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Olaf Hering, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1425 bytes --]

On Fri, 2018-04-13 at 11:29 +0000, George Dunlap wrote:
> I think as far as backports go, my current RFC would be
> fine.  Another possibility, though, would be to simply add a
> migrate() callback to remove the vcpu from the runqueue before
> switching v->processor, *without* removing any of the current song
> and dance about vcpu_sleep_nosync().  That should be fairly simple
> and straightforward to backport, and won’t make anything worse (since
> in theory it should have been removed by that point anyway).  Then
> for 4.12 we can figure out what we want to do going forward.
> 
FYI, adapting (but then I haven't even compiled) the first two patches
to as far as 4.7 was rather easy.

And, modulo the fact that I still have to properly review them (which
I'll do... but I looked at them, and they seem fine), I do prefer the
series, to the Credit1 migrate callback.

*Especially* if you are right, and the invariant is entirely Credit1
specific. In fact, that means here might be other code paths, in
sched_credit.c, that relies on it, and hence I'd prefer for it to be
enforced better, rather than relaxed, at this point in the cycle.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: crash in csched_load_balance after xl vcpu-pin
  2018-04-11 12:45                       ` Olaf Hering
@ 2018-04-17 12:39                         ` Dario Faggioli
  0 siblings, 0 replies; 58+ messages in thread
From: Dario Faggioli @ 2018-04-17 12:39 UTC (permalink / raw)
  To: Olaf Hering; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2338 bytes --]

On Wed, 2018-04-11 at 14:45 +0200, Olaf Hering wrote:
> On Wed, Apr 11, Dario Faggioli wrote:
> 
> > If you're interested in figuring out, I'd like to see:
> > - full output of `xl info -n'
> > - output of `xl debug-key u'
> > - xl vcpu-list
> > - xl list -n
> 
> Logs for this .cfg attached:
> 
> name='fv_sles12sp1.0'
> vif=[ 'mac=00:18:3e:58:00:c1,bridge=br0' ]
> memory=4444
> vcpus=36
> serial="pty"
> builder="hvm"
> kernel="/xen100.migration/olh/bug1088498/nfsroot_sles12sp2.bug1088498
> /boot/vmlinuz"
> ramdisk="/xen100.migration/olh/bug1088498/nfsroot_sles12sp2.bug108849
> 8/boot/initrd"
> cmdline="quiet panic=9
> root=nfs:xen100:/share/migration/olh/bug1088498/nfsroot_sles12sp2.bug
> 1088498,vers=3,tcp,actimeo=1,nolock readonlyroot ro Xignore_loglevel
> Xdebug Xsystemd.log_target=kmsg    Xsystemd.log_level=debug Xrd.debug
> Xrd.shell Xrd.udev.debug Xudev.log-priority=debug Xrd.udev.log-
> priority=debug console=ttyS0"
> cpus="node:2"
> #pus="nodes:2"
> #pus="nodes:2,^node:0"
> #pus_soft="nodes:2,^node:0"
>
So, I do not really know what the problem could be here.

In fact, vcpu_hard_affinity is being defined, and numa_placement is
being set to false, which are both correct.

However, vcpu_hard_affinity seems to be empty:

"vcpu_hard_affinity": [
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            [

            ],
            ...
            ...
            ...
        ],
        "numa_placement": "False",

Judging on the output of other xl commands, though, retrieving the cpus
from node 2 seems to work, and the fact that "node:2" behaves
differently than "node:1" is quite weird.

If we still have access to this system, it would be interesting to
instrument, e.g., update_cpumap_range() in xl_parse.c, and see what
actually libxl_node_to_cpumap() does in this case...

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2018-04-17 12:39 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-10  8:57 crash in csched_load_balance after xl vcpu-pin Olaf Hering
2018-04-10  9:34 ` George Dunlap
2018-04-10 10:33   ` Dario Faggioli
2018-04-10 10:59     ` George Dunlap
2018-04-10 11:29       ` Dario Faggioli
2018-04-10 15:25         ` George Dunlap
2018-04-10 15:36           ` Dario Faggioli
     [not found]           ` <960702b6d9dfb67bfae72ae02ae502210695416b.camel@suse.com>
2018-04-10 20:37             ` Olaf Hering
2018-04-10 22:59               ` Dario Faggioli
2018-04-11  7:31                 ` Dario Faggioli
2018-04-11  7:39                   ` Juergen Gross
2018-04-11  7:42                     ` Dario Faggioli
2018-04-11 10:00                   ` Olaf Hering
     [not found]                     ` <298ec681a9c38eb7618e6b3e226486691e9eab4d.camel@suse.com>
2018-04-11 11:02                       ` George Dunlap
2018-04-11 12:31                         ` Jan Beulich
2018-04-11 15:03                     ` Olaf Hering
2018-04-11 15:27                       ` Olaf Hering
2018-04-11 17:20                         ` Dario Faggioli
2018-04-11 20:43                           ` Olaf Hering
2018-04-11 21:31                             ` Dario Faggioli
     [not found]                               ` <5ACE29E00200005B03782666@prv1-mh.provo.novell.com>
     [not found]                                 ` <5ACE443D020000E603784221@prv1-mh.provo.novell.com>
     [not found]                                   ` <5ACE73C402000076046FD1E0@prv1-mh.provo.novell.com>
     [not found]                                     ` <5ACE7F370200002F0378782F@prv1-mh.provo.novell.com>
2018-04-12  7:18                                       ` Jan Beulich
2018-04-12  9:38                               ` George Dunlap
2018-04-12 10:16                                 ` Dario Faggioli
2018-04-12 12:45                                   ` Olaf Hering
2018-04-12 13:15                                     ` Dario Faggioli
2018-04-12 15:38                                       ` Dario Faggioli
2018-04-12 17:25                                         ` Dario Faggioli
2018-04-13  6:23                                           ` Olaf Hering
2018-04-13  9:01                                             ` Dario Faggioli
2018-04-13  9:03                                           ` George Dunlap
2018-04-13  9:25                                             ` Dario Faggioli
2018-04-13 10:38                                               ` Olaf Hering
2018-04-13 11:29                                               ` George Dunlap
2018-04-13 11:41                                                 ` Dario Faggioli
2018-04-13 10:47                                             ` Dario Faggioli
     [not found]                       ` <5ACE23DF0200002D03781F6A@prv1-mh.provo.novell.com>
2018-04-11 15:38                         ` Jan Beulich
2018-04-11 15:48                           ` Olaf Hering
     [not found]                 ` <9c857d1a-d592-8db5-827c-30fbc97477e0@citrix.com>
2018-04-11 11:00                   ` Dario Faggioli
2018-04-10 11:30       ` Dario Faggioli
2018-04-10 11:31       ` Dario Faggioli
2018-04-10 11:32       ` Dario Faggioli
2018-04-10 11:33       ` Dario Faggioli
2018-04-10 15:18 ` Olaf Hering
2018-04-10 15:29   ` George Dunlap
2018-04-10 15:59 ` Olaf Hering
2018-04-10 16:28   ` Dario Faggioli
2018-04-10 19:03     ` Olaf Hering
2018-04-10 20:02       ` Dario Faggioli
2018-04-10 20:09         ` Olaf Hering
2018-04-10 20:13           ` Olaf Hering
2018-04-10 20:41             ` Dario Faggioli
2018-04-11  6:23               ` Olaf Hering
2018-04-11  8:42                 ` Dario Faggioli
2018-04-11  8:48                   ` Olaf Hering
2018-04-11 10:20                     ` Dario Faggioli
2018-04-11 12:45                       ` Olaf Hering
2018-04-17 12:39                         ` Dario Faggioli
2018-04-11 10:20                     ` Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.