xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* NULL scheduler DoS
@ 2021-08-09 16:19 Ahmed, Daniele
  2021-08-09 16:47 ` Ahmed, Daniele
  2021-08-09 17:35 ` Julien Grall
  0 siblings, 2 replies; 8+ messages in thread
From: Ahmed, Daniele @ 2021-08-09 16:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Dario Faggioli, Stefano Stabellini, Grall, Julien, Doebel,
	Bjoern, Pohlack, Martin

[-- Attachment #1: Type: text/plain, Size: 4716 bytes --]

Hi all,
The NULL scheduler is affected by an issue that triggers an assertion and reboots the hypervisor.

This issue arise when:

  *   a guest is being created with a configuration specifying a file that does not exist
  *   the hypervisor boots with the null scheduler
4.16 is affected and 4.15 also.

This is the stack trace from 4.16:

(XEN) Assertion 'npc->unit == unit' failed at null.c:377
(XEN) ----[ Xen-4.16-unstable x86_64 debug=y Not tainted ]----
(XEN) CPU: 3
(XEN) RIP: e008:[<ffff82d04024f577>] common/sched/null.c#unit_deassign+0x1c3/0x2ec
(XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor
(XEN) rax: ffff83005ce1c850 rbx: 0000000000000001 rcx: 0000000000000001
(XEN) rdx: ffff83007fde6fc0 rsi: ffff83005ce1c790 rdi: ffff83007ffb7850
(XEN) rbp: ffff83007ffdfda0 rsp: ffff83007ffdfd48 r8: 0000000000000000
(XEN) r9: 0000000000048fee r10: 0000000000000000 r11: 0000000000000000
(XEN) r12: ffff82d0405c9298 r13: ffff83007f7fd508 r14: ffff83005ce1c850
(XEN) r15: ffff82d0405e2680 cr0: 000000008005003b cr4: 00000000003526e0
(XEN) cr3: 000000007f6b3000 cr2: ffff888072e79dc0
(XEN) fsb: 0000000000000000 gsb: ffff888071ac0000 gss: 0000000000000000
(XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
(XEN) Xen code around <ffff82d04024f577> (common/sched/null.c#unit_deassign+0x1c3/0x2ec):
(XEN) 41 5e 41 5f 5d c3 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 49 8b 04 24 0f b7 00 66
(XEN) Xen stack trace from rsp=ffff83007ffdfd48:
(XEN) ffff83007ffdfd88 ffff82d04023961c 0000000400000000 ffff83005ce1cc50
(XEN) 0000000000000002 ffff83007ffdfd90 ffff83005ce1c790 ffff82d0405c9298
(XEN) ffff83007f7fd508 ffff83005ce1c850 ffff82d0405e2680 ffff83007ffdfde0
(XEN) ffff82d04024f889 ffff83007ffb7850 ffff83005dd63000 ffff83005ce1c790
(XEN) ffff83005845ab28 ffff83005845a000 0000000000000000 ffff83007ffdfe00
(XEN) ffff82d040253326 ffff83005dd63000 0000000000000000 ffff83007ffdfe38
(XEN) ffff82d04020506b ffff83007a881080 0000000000000000 0000000000000000
(XEN) 0000000000000000 ffff82d0405d6f80 ffff83007ffdfe70 ffff82d04022d9e5
(XEN) 0000001100000003 ffff82d0405cf100 ffff82d0405cf100 ffffffffffffffff
(XEN) ffff82d0405cef80 ffff83007ffdfea8 ffff82d04022e14b 0000000000000003
(XEN) ffff82d0405cf100 0000000000007fff 0000000000000003 0000000000000003
(XEN) ffff83007ffdfeb8 ffff82d04022e1e6 ffff83007ffdfef0 ffff82d0403172b4
(XEN) ffff82d04031721d ffff83007fec1000 ffff83007ffb6000 0000000000000003
(XEN) ffff83007ffcc000 ffff83007ffdfe18 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000003 0000000000000003
(XEN) 0000000000000246 0000000000000003 0000000000000000 000000001bf9dde5
(XEN) 0000000000000000 ffffffff810023aa 0000000000000003 deadbeefdeadf00d
(XEN) deadbeefdeadf00d 0000010000000000 ffffffff810023aa 000000000000e033
(XEN) 0000000000000246 ffffc900400a3ea8 000000000000e02b 7ffdff707fffd140
(XEN) 000000017fe37a6c 000000007ffe8010 0000000000000000 0000e01000000003
(XEN) Xen call trace:
(XEN) [<ffff82d04024f577>] R common/sched/null.c#unit_deassign+0x1c3/0x2ec
(XEN) [<ffff82d04024f889>] F common/sched/null.c#null_unit_remove+0xfc/0x136
(XEN) [<ffff82d040253326>] F sched_destroy_vcpu+0xca/0x199
(XEN) [<ffff82d04020506b>] F common/domain.c#complete_domain_destroy+0x68/0x13f
(XEN) [<ffff82d04022d9e5>] F common/rcupdate.c#rcu_process_callbacks+0xdb/0x24b
(XEN) [<ffff82d04022e14b>] F common/softirq.c#__do_softirq+0x8a/0xbc
(XEN) [<ffff82d04022e1e6>] F do_softirq+0x13/0x15
(XEN) [<ffff82d0403172b4>] F arch/x86/domain.c#idle_loop+0x97/0xee
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 3:
(XEN) Assertion 'npc->unit == unit' failed at null.c:377
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...

This is the line of the assertion that triggers the reboot: https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/common/sched/null.c;h=82d5d1baab853d24fcbb455fb3f3e8263c871277;hb=HEAD#l377

To reproduce the vulnerability, I took the following steps:

  *   Install XEN; only 4.15+ seem to be vulnerable
  *   Use the null scheduler (depends on your setup): edit /etc/default/grub adding at the end of the file: GRUB_CMDLINE_XEN="sched=null" and update grub
  *   Reboot into xen
  *   Create a file guest.cfg with the following contents
name="guest"
builder="hvm"
memory=512

serial = [ 'file:/tmp/log', 'pty' ]

disk = [ '/home/user/boot.iso,,hdc,cdrom' ]

on_reboot = "destroy"

vcpus=1

Make sure that the file /home/user/boot.iso does not exist

  *   Create a guest with this configuration: xl create -c guest.cfg
CC’ing Dario, Stefano and Julien to whom I’ve shown this.

Thanks

Daniele

[-- Attachment #2: Type: text/html, Size: 15733 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NULL scheduler DoS
  2021-08-09 16:19 NULL scheduler DoS Ahmed, Daniele
@ 2021-08-09 16:47 ` Ahmed, Daniele
  2021-08-09 17:35 ` Julien Grall
  1 sibling, 0 replies; 8+ messages in thread
From: Ahmed, Daniele @ 2021-08-09 16:47 UTC (permalink / raw)
  To: xen-devel
  Cc: Dario Faggioli, Stefano Stabellini, Grall, Julien, Doebel,
	Bjoern, Pohlack, Martin, julien

[-- Attachment #1: Type: text/plain, Size: 5133 bytes --]

CC’ing julien@xen.org

From: "Ahmed, Daniele" <ahmeddan@amazon.co.uk>
Date: Monday, 9 August 2021 at 17:19
To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: Dario Faggioli <dfaggioli@suse.com>, Stefano Stabellini <sstabellini@kernel.org>, "Grall, Julien" <jgrall@amazon.co.uk>, "Doebel, Bjoern" <doebel@amazon.de>, "Pohlack, Martin" <mpohlack@amazon.de>
Subject: NULL scheduler DoS

Hi all,
The NULL scheduler is affected by an issue that triggers an assertion and reboots the hypervisor.

This issue arise when:

  *   a guest is being created with a configuration specifying a file that does not exist
  *   the hypervisor boots with the null scheduler
4.16 is affected and 4.15 also.

This is the stack trace from 4.16:

(XEN) Assertion 'npc->unit == unit' failed at null.c:377
(XEN) ----[ Xen-4.16-unstable x86_64 debug=y Not tainted ]----
(XEN) CPU: 3
(XEN) RIP: e008:[<ffff82d04024f577>] common/sched/null.c#unit_deassign+0x1c3/0x2ec
(XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor
(XEN) rax: ffff83005ce1c850 rbx: 0000000000000001 rcx: 0000000000000001
(XEN) rdx: ffff83007fde6fc0 rsi: ffff83005ce1c790 rdi: ffff83007ffb7850
(XEN) rbp: ffff83007ffdfda0 rsp: ffff83007ffdfd48 r8: 0000000000000000
(XEN) r9: 0000000000048fee r10: 0000000000000000 r11: 0000000000000000
(XEN) r12: ffff82d0405c9298 r13: ffff83007f7fd508 r14: ffff83005ce1c850
(XEN) r15: ffff82d0405e2680 cr0: 000000008005003b cr4: 00000000003526e0
(XEN) cr3: 000000007f6b3000 cr2: ffff888072e79dc0
(XEN) fsb: 0000000000000000 gsb: ffff888071ac0000 gss: 0000000000000000
(XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
(XEN) Xen code around <ffff82d04024f577> (common/sched/null.c#unit_deassign+0x1c3/0x2ec):
(XEN) 41 5e 41 5f 5d c3 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 49 8b 04 24 0f b7 00 66
(XEN) Xen stack trace from rsp=ffff83007ffdfd48:
(XEN) ffff83007ffdfd88 ffff82d04023961c 0000000400000000 ffff83005ce1cc50
(XEN) 0000000000000002 ffff83007ffdfd90 ffff83005ce1c790 ffff82d0405c9298
(XEN) ffff83007f7fd508 ffff83005ce1c850 ffff82d0405e2680 ffff83007ffdfde0
(XEN) ffff82d04024f889 ffff83007ffb7850 ffff83005dd63000 ffff83005ce1c790
(XEN) ffff83005845ab28 ffff83005845a000 0000000000000000 ffff83007ffdfe00
(XEN) ffff82d040253326 ffff83005dd63000 0000000000000000 ffff83007ffdfe38
(XEN) ffff82d04020506b ffff83007a881080 0000000000000000 0000000000000000
(XEN) 0000000000000000 ffff82d0405d6f80 ffff83007ffdfe70 ffff82d04022d9e5
(XEN) 0000001100000003 ffff82d0405cf100 ffff82d0405cf100 ffffffffffffffff
(XEN) ffff82d0405cef80 ffff83007ffdfea8 ffff82d04022e14b 0000000000000003
(XEN) ffff82d0405cf100 0000000000007fff 0000000000000003 0000000000000003
(XEN) ffff83007ffdfeb8 ffff82d04022e1e6 ffff83007ffdfef0 ffff82d0403172b4
(XEN) ffff82d04031721d ffff83007fec1000 ffff83007ffb6000 0000000000000003
(XEN) ffff83007ffcc000 ffff83007ffdfe18 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000003 0000000000000003
(XEN) 0000000000000246 0000000000000003 0000000000000000 000000001bf9dde5
(XEN) 0000000000000000 ffffffff810023aa 0000000000000003 deadbeefdeadf00d
(XEN) deadbeefdeadf00d 0000010000000000 ffffffff810023aa 000000000000e033
(XEN) 0000000000000246 ffffc900400a3ea8 000000000000e02b 7ffdff707fffd140
(XEN) 000000017fe37a6c 000000007ffe8010 0000000000000000 0000e01000000003
(XEN) Xen call trace:
(XEN) [<ffff82d04024f577>] R common/sched/null.c#unit_deassign+0x1c3/0x2ec
(XEN) [<ffff82d04024f889>] F common/sched/null.c#null_unit_remove+0xfc/0x136
(XEN) [<ffff82d040253326>] F sched_destroy_vcpu+0xca/0x199
(XEN) [<ffff82d04020506b>] F common/domain.c#complete_domain_destroy+0x68/0x13f
(XEN) [<ffff82d04022d9e5>] F common/rcupdate.c#rcu_process_callbacks+0xdb/0x24b
(XEN) [<ffff82d04022e14b>] F common/softirq.c#__do_softirq+0x8a/0xbc
(XEN) [<ffff82d04022e1e6>] F do_softirq+0x13/0x15
(XEN) [<ffff82d0403172b4>] F arch/x86/domain.c#idle_loop+0x97/0xee
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 3:
(XEN) Assertion 'npc->unit == unit' failed at null.c:377
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...

This is the line of the assertion that triggers the reboot: https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/common/sched/null.c;h=82d5d1baab853d24fcbb455fb3f3e8263c871277;hb=HEAD#l377

To reproduce the vulnerability, I took the following steps:

  *   Install XEN; only 4.15+ seem to be vulnerable
  *   Use the null scheduler (depends on your setup): edit /etc/default/grub adding at the end of the file: GRUB_CMDLINE_XEN="sched=null" and update grub
  *   Reboot into xen
  *   Create a file guest.cfg with the following contents
name="guest"
builder="hvm"
memory=512

serial = [ 'file:/tmp/log', 'pty' ]

disk = [ '/home/user/boot.iso,,hdc,cdrom' ]

on_reboot = "destroy"

vcpus=1

Make sure that the file /home/user/boot.iso does not exist

  *   Create a guest with this configuration: xl create -c guest.cfg
CC’ing Dario, Stefano and Julien to whom I’ve shown this.

Thanks

Daniele

[-- Attachment #2: Type: text/html, Size: 22602 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NULL scheduler DoS
  2021-08-09 16:19 NULL scheduler DoS Ahmed, Daniele
  2021-08-09 16:47 ` Ahmed, Daniele
@ 2021-08-09 17:35 ` Julien Grall
  2021-08-09 20:38   ` Julien Grall
  2021-08-11  9:59   ` Dario Faggioli
  1 sibling, 2 replies; 8+ messages in thread
From: Julien Grall @ 2021-08-09 17:35 UTC (permalink / raw)
  To: Ahmed, Daniele, xen-devel
  Cc: Dario Faggioli, Stefano Stabellini, Grall, Julien, Doebel,
	Bjoern, Pohlack, Martin



On 09/08/2021 17:19, Ahmed, Daniele wrote:
> Hi all,

Hi Daniele,

Thank you for the report!

> The NULL scheduler is affected by an issue that triggers an assertion 
> and reboots the hypervisor.
> 
> This issue arise when:
> 
>   * a guest is being created with a configuration specifying a file that
>     does not exist
>   * the hypervisor boots with the null scheduler
> 
> 4.16 is affected and 4.15 also.
> 
> This is the stack trace from 4.16:
> 
> (XEN) Assertion 'npc->unit == unit' failed at null.c:377
> (XEN) ----[ Xen-4.16-unstable x86_64 debug=y Not tainted ]----
> (XEN) CPU: 3
> (XEN) RIP: e008:[<ffff82d04024f577>] 
> common/sched/null.c#unit_deassign+0x1c3/0x2ec
> (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor
> (XEN) rax: ffff83005ce1c850 rbx: 0000000000000001 rcx: 0000000000000001
> (XEN) rdx: ffff83007fde6fc0 rsi: ffff83005ce1c790 rdi: ffff83007ffb7850
> (XEN) rbp: ffff83007ffdfda0 rsp: ffff83007ffdfd48 r8: 0000000000000000
> (XEN) r9: 0000000000048fee r10: 0000000000000000 r11: 0000000000000000
> (XEN) r12: ffff82d0405c9298 r13: ffff83007f7fd508 r14: ffff83005ce1c850
> (XEN) r15: ffff82d0405e2680 cr0: 000000008005003b cr4: 00000000003526e0
> (XEN) cr3: 000000007f6b3000 cr2: ffff888072e79dc0
> (XEN) fsb: 0000000000000000 gsb: ffff888071ac0000 gss: 0000000000000000
> (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
> (XEN) Xen code around <ffff82d04024f577> 
> (common/sched/null.c#unit_deassign+0x1c3/0x2ec):
> (XEN) 41 5e 41 5f 5d c3 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 49 8b 04 24 0f 
> b7 00 66
> (XEN) Xen stack trace from rsp=ffff83007ffdfd48:
> (XEN) ffff83007ffdfd88 ffff82d04023961c 0000000400000000 ffff83005ce1cc50
> (XEN) 0000000000000002 ffff83007ffdfd90 ffff83005ce1c790 ffff82d0405c9298
> (XEN) ffff83007f7fd508 ffff83005ce1c850 ffff82d0405e2680 ffff83007ffdfde0
> (XEN) ffff82d04024f889 ffff83007ffb7850 ffff83005dd63000 ffff83005ce1c790
> (XEN) ffff83005845ab28 ffff83005845a000 0000000000000000 ffff83007ffdfe00
> (XEN) ffff82d040253326 ffff83005dd63000 0000000000000000 ffff83007ffdfe38
> (XEN) ffff82d04020506b ffff83007a881080 0000000000000000 0000000000000000
> (XEN) 0000000000000000 ffff82d0405d6f80 ffff83007ffdfe70 ffff82d04022d9e5
> (XEN) 0000001100000003 ffff82d0405cf100 ffff82d0405cf100 ffffffffffffffff
> (XEN) ffff82d0405cef80 ffff83007ffdfea8 ffff82d04022e14b 0000000000000003
> (XEN) ffff82d0405cf100 0000000000007fff 0000000000000003 0000000000000003
> (XEN) ffff83007ffdfeb8 ffff82d04022e1e6 ffff83007ffdfef0 ffff82d0403172b4
> (XEN) ffff82d04031721d ffff83007fec1000 ffff83007ffb6000 0000000000000003
> (XEN) ffff83007ffcc000 ffff83007ffdfe18 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000003 0000000000000003
> (XEN) 0000000000000246 0000000000000003 0000000000000000 000000001bf9dde5
> (XEN) 0000000000000000 ffffffff810023aa 0000000000000003 deadbeefdeadf00d
> (XEN) deadbeefdeadf00d 0000010000000000 ffffffff810023aa 000000000000e033
> (XEN) 0000000000000246 ffffc900400a3ea8 000000000000e02b 7ffdff707fffd140
> (XEN) 000000017fe37a6c 000000007ffe8010 0000000000000000 0000e01000000003
> (XEN) Xen call trace:
> (XEN) [<ffff82d04024f577>] R common/sched/null.c#unit_deassign+0x1c3/0x2ec
> (XEN) [<ffff82d04024f889>] F common/sched/null.c#null_unit_remove+0xfc/0x136
> (XEN) [<ffff82d040253326>] F sched_destroy_vcpu+0xca/0x199
> (XEN) [<ffff82d04020506b>] F 
> common/domain.c#complete_domain_destroy+0x68/0x13f
> (XEN) [<ffff82d04022d9e5>] F 
> common/rcupdate.c#rcu_process_callbacks+0xdb/0x24b
> (XEN) [<ffff82d04022e14b>] F common/softirq.c#__do_softirq+0x8a/0xbc
> (XEN) [<ffff82d04022e1e6>] F do_softirq+0x13/0x15
> (XEN) [<ffff82d0403172b4>] F arch/x86/domain.c#idle_loop+0x97/0xee
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 3:
> (XEN) Assertion 'npc->unit == unit' failed at null.c:377
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
> 
> This is the line of the assertion that triggers the reboot: 
> https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/common/sched/null.c;h=82d5d1baab853d24fcbb455fb3f3e8263c871277;hb=HEAD#l377 
> <https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/common/sched/null.c;h=82d5d1baab853d24fcbb455fb3f3e8263c871277;hb=HEAD#l377>
> 
> To reproduce the vulnerability, I took the following steps:

Just to make clear for the others in the thread, per SUPPORT.MD, the 
NULL scheduler is not security supported. Hence why this is sent to 
xen-devel directly.

Also, for completeness, debug build are also not security supported. On 
production build, the ASSERT() would be turned to a NOP which could 
result to potentially more interesting issue. Anyway, that's not a 
problem here. :)

> 
>   * Install XEN; only 4.15+ seem to be vulnerable
>   * Use the null scheduler (depends on your setup): edit
>     /etc/default/grub adding at the end of the file:
>     GRUB_CMDLINE_XEN="sched=null" and update grub
>   * Reboot into xen
>   * Create a file guest.cfg with the following contents
> 
> name="guest"
> builder="hvm"
> memory=512
> 
> serial = [ 'file:/tmp/log', 'pty' ]
> 
> disk = [ '/home/user/boot.iso,,hdc,cdrom' ]
> 
> on_reboot = "destroy"
> 
> vcpus=1
> 
> 
> Make sure that the file //home/user/boot.iso/ does not exist
> 
>   * Create a guest with this configuration: xl create -c guest.cfg
> 
> CC’ing Dario, Stefano and Julien to whom I’ve shown this.

I am not quite too sure where the problem lies yet but adding some more 
information of the debugging we discussed together.

The ASSERT() is triggered because the pCPU was already assigned to one 
of the dom0 vCPU. This problem is happening regardless whether there is 
free pCPU.

I have added some debugging in sched_set_res():

diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
index a870320146ef..2355f531dc13 100644
--- a/xen/common/sched/private.h
+++ b/xen/common/sched/private.h
@@ -150,6 +150,10 @@ static inline void sched_set_res(struct sched_unit 
*unit,
      unsigned int cpu = cpumask_first(res->cpus);
      struct vcpu *v;

+    printk("%s: res->master_cpu %u unit %p %pd %pv\n", __func__,
+           res->master_cpu, unit, unit->domain, unit->vcpu_list);
+    WARN();
+
      for_each_sched_unit_vcpu ( unit, v )
      {
          ASSERT(cpu < nr_cpu_ids);

This traced the problem to null_unit_migrate():


(XEN) sched_set_res: res->master_cpu 0 unit ffff830200887f00 d1 d1v0
(XEN) Xen WARN at private.h:155
(XEN) ----[ Xen-4.16-unstable  x86_64  debug=y  Tainted:   C   ]----
(XEN) CPU:    1
(XEN) RIP:    e008:[<ffff82d04023fd9f>] core.c#sched_set_res+0x5b/0xc6
(XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor (d0v1)
(XEN) rax: ffff83027bf55038   rbx: 0000000000000000   rcx: 0000000000000000
(XEN) rdx: ffff83027bf4ffff   rsi: 000000000000000a   rdi: ffff82d0404944b8
(XEN) rbp: ffff83027bf4fc70   rsp: ffff83027bf4fc40   r8:  0000000000000004
(XEN) r9:  0000000000000030   r10: ffff83027bf4fcf8   r11: 00000000fffffffd
(XEN) r12: ffff830275e83000   r13: ffff830275e8d000   r14: ffff830200887f00
(XEN) r15: ffff83027bf850a0   cr0: 0000000080050033   cr4: 00000000003526e0
(XEN) cr3: 00000001f1e3d000   cr2: 0000563f71516088
(XEN) fsb: 00007f6561cda780   gsb: ffff88817fe80000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d04023fd9f> (core.c#sched_set_res+0x5b/0xc6):
(XEN)  14 18 00 e8 e7 7f 00 00 <0f> 0b 4d 8b 66 08 4d 85 e4 75 28 4d 89 
7e 20 48
(XEN) Xen stack trace from rsp=ffff83027bf4fc40:
(XEN)    ffff83027bf85118 ffff830200887f00 ffff830275e83000 ffff830275e8d000
(XEN)    0000000000000000 ffff83027bf552a0 ffff83027bf4fce0 ffff82d040241614
(XEN)    ffff82d040226393 0000000000000286 ffff83027bf822e8 0000000175e8d000
(XEN)    ffff830200887f00 ffff83027bf552a0 ffff830275e83000 ffff83027bf4fcf8
(XEN)    ffff830275e8d000 ffff830275e8d000 0000000000000001 0000000000000000
(XEN)    ffff83027bf4fd40 ffff82d04020527d ffff82d04020527d 0000000000000000
(XEN)    0000000000000000 ffff83027bf4fd30 0000000000000000 0000000000000000
(XEN)    ffff830275e8d000 00007f65620f6010 0000000000000001 ffff82d040238319
(XEN)    ffff83027bf4fe58 ffff82d040238dd9 00000000001f1eae 0000000000000004
(XEN)    ffff83027bee4001 8000000000000000 ffff83027bf4fdc0 ffff82d04032e6df
(XEN)    000000044032e6df 0000000000000000 ffff82e003e3e120 000000140000000f
(XEN)    00007f6561d90001 0000559a00000001 0000000000000014 0000559ad9c303e0
(XEN)    0000000000000008 0000559ad9c303e0 0000559ad9c31170 0000559ad9c303c0
(XEN)    0000000000000000 00007ffd4ed54b60 0000559ad9c309a0 00007ffd4ed54c50
(XEN)    0000000000000000 0000559ad9c38240 0000559ad9c32570 00007ffd4ed54f00
(XEN)    0000559ad9c31170 ffff83027bf4fef8 0000000000000000 0000000000000001
(XEN)    deadbeefdeadf00d ffff83027bec0000 ffff82d040238319 ffff83027bf4fee8
(XEN)    ffff82d04030d8bc 00007f65620f6010 deadbeefdeadf00d deadbeefdeadf00d
(XEN)    deadbeefdeadf00d deadbeefdeadf00d ffff82d04038821c ffff82d040388228
(XEN)    ffff82d04038821c ffff82d040388228 ffff82d04038821c ffff82d040388228
(XEN) Xen call trace:
(XEN)    [<ffff82d04023fd9f>] R core.c#sched_set_res+0x5b/0xc6
(XEN)    [<ffff82d040241614>] F sched_init_vcpu+0x3dc/0x5d7
(XEN)    [<ffff82d04020527d>] F vcpu_create+0xfb/0x37a
(XEN)    [<ffff82d040238dd9>] F do_domctl+0xac0/0x184a
(XEN)    [<ffff82d04030d8bc>] F pv_hypercall+0x10d/0x2b8
(XEN)    [<ffff82d04038829d>] F lstar_enter+0x12d/0x140
(XEN)

The end of the function contains an interesting comment:

     /*
      * Whatever all the above, we always at least override v->processor.
      * This is especially important for shutdown or suspend/resume paths,
      * when it is important to let our caller (cpu_disable_scheduler())
      * know that the migration did happen, to the best of our 
possibilities,
      * at least. In case of suspend, any temporary inconsistency caused
      * by this, will be fixed-up during resume.
      */

This implies that a pCPU may temporarily be assigned to two vCPUs and we 
expect to be fixed up afterwards. However, a domain may be destroyed 
before this is happening.

So it looks like that unit_deassign() is not able to cope with this 
case. From a brief look, I think we may want to check if the pCPU is in 
the wait list. If it is, then we should bail out.

Dario, Stefano, what do you think?

Cheers,

-- 
Julien Grall


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: NULL scheduler DoS
  2021-08-09 17:35 ` Julien Grall
@ 2021-08-09 20:38   ` Julien Grall
  2021-08-11 10:09     ` Dario Faggioli
  2021-08-11  9:59   ` Dario Faggioli
  1 sibling, 1 reply; 8+ messages in thread
From: Julien Grall @ 2021-08-09 20:38 UTC (permalink / raw)
  To: Ahmed, Daniele, xen-devel
  Cc: Dario Faggioli, Stefano Stabellini, Grall, Julien, Doebel,
	Bjoern, Pohlack, Martin



On 09/08/2021 18:35, Julien Grall wrote:
> 
> 
> On 09/08/2021 17:19, Ahmed, Daniele wrote:
>> Hi all,
> 
> Hi Daniele,
> 
> Thank you for the report!
> 
>> The NULL scheduler is affected by an issue that triggers an assertion 
>> and reboots the hypervisor.
>>
>> This issue arise when:
>>
>>   * a guest is being created with a configuration specifying a file that
>>     does not exist
>>   * the hypervisor boots with the null scheduler
>>
>> 4.16 is affected and 4.15 also.
>>
>> This is the stack trace from 4.16:
>>
>> (XEN) Assertion 'npc->unit == unit' failed at null.c:377
>> (XEN) ----[ Xen-4.16-unstable x86_64 debug=y Not tainted ]----
>> (XEN) CPU: 3
>> (XEN) RIP: e008:[<ffff82d04024f577>] 
>> common/sched/null.c#unit_deassign+0x1c3/0x2ec
>> (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor
>> (XEN) rax: ffff83005ce1c850 rbx: 0000000000000001 rcx: 0000000000000001
>> (XEN) rdx: ffff83007fde6fc0 rsi: ffff83005ce1c790 rdi: ffff83007ffb7850
>> (XEN) rbp: ffff83007ffdfda0 rsp: ffff83007ffdfd48 r8: 0000000000000000
>> (XEN) r9: 0000000000048fee r10: 0000000000000000 r11: 0000000000000000
>> (XEN) r12: ffff82d0405c9298 r13: ffff83007f7fd508 r14: ffff83005ce1c850
>> (XEN) r15: ffff82d0405e2680 cr0: 000000008005003b cr4: 00000000003526e0
>> (XEN) cr3: 000000007f6b3000 cr2: ffff888072e79dc0
>> (XEN) fsb: 0000000000000000 gsb: ffff888071ac0000 gss: 0000000000000000
>> (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
>> (XEN) Xen code around <ffff82d04024f577> 
>> (common/sched/null.c#unit_deassign+0x1c3/0x2ec):
>> (XEN) 41 5e 41 5f 5d c3 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 49 8b 04 24 0f 
>> b7 00 66
>> (XEN) Xen stack trace from rsp=ffff83007ffdfd48:
>> (XEN) ffff83007ffdfd88 ffff82d04023961c 0000000400000000 ffff83005ce1cc50
>> (XEN) 0000000000000002 ffff83007ffdfd90 ffff83005ce1c790 ffff82d0405c9298
>> (XEN) ffff83007f7fd508 ffff83005ce1c850 ffff82d0405e2680 ffff83007ffdfde0
>> (XEN) ffff82d04024f889 ffff83007ffb7850 ffff83005dd63000 ffff83005ce1c790
>> (XEN) ffff83005845ab28 ffff83005845a000 0000000000000000 ffff83007ffdfe00
>> (XEN) ffff82d040253326 ffff83005dd63000 0000000000000000 ffff83007ffdfe38
>> (XEN) ffff82d04020506b ffff83007a881080 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 ffff82d0405d6f80 ffff83007ffdfe70 ffff82d04022d9e5
>> (XEN) 0000001100000003 ffff82d0405cf100 ffff82d0405cf100 ffffffffffffffff
>> (XEN) ffff82d0405cef80 ffff83007ffdfea8 ffff82d04022e14b 0000000000000003
>> (XEN) ffff82d0405cf100 0000000000007fff 0000000000000003 0000000000000003
>> (XEN) ffff83007ffdfeb8 ffff82d04022e1e6 ffff83007ffdfef0 ffff82d0403172b4
>> (XEN) ffff82d04031721d ffff83007fec1000 ffff83007ffb6000 0000000000000003
>> (XEN) ffff83007ffcc000 ffff83007ffdfe18 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 0000000000000003 0000000000000003
>> (XEN) 0000000000000246 0000000000000003 0000000000000000 000000001bf9dde5
>> (XEN) 0000000000000000 ffffffff810023aa 0000000000000003 deadbeefdeadf00d
>> (XEN) deadbeefdeadf00d 0000010000000000 ffffffff810023aa 000000000000e033
>> (XEN) 0000000000000246 ffffc900400a3ea8 000000000000e02b 7ffdff707fffd140
>> (XEN) 000000017fe37a6c 000000007ffe8010 0000000000000000 0000e01000000003
>> (XEN) Xen call trace:
>> (XEN) [<ffff82d04024f577>] R 
>> common/sched/null.c#unit_deassign+0x1c3/0x2ec
>> (XEN) [<ffff82d04024f889>] F 
>> common/sched/null.c#null_unit_remove+0xfc/0x136
>> (XEN) [<ffff82d040253326>] F sched_destroy_vcpu+0xca/0x199
>> (XEN) [<ffff82d04020506b>] F 
>> common/domain.c#complete_domain_destroy+0x68/0x13f
>> (XEN) [<ffff82d04022d9e5>] F 
>> common/rcupdate.c#rcu_process_callbacks+0xdb/0x24b
>> (XEN) [<ffff82d04022e14b>] F common/softirq.c#__do_softirq+0x8a/0xbc
>> (XEN) [<ffff82d04022e1e6>] F do_softirq+0x13/0x15
>> (XEN) [<ffff82d0403172b4>] F arch/x86/domain.c#idle_loop+0x97/0xee
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 3:
>> (XEN) Assertion 'npc->unit == unit' failed at null.c:377
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>>
>> This is the line of the assertion that triggers the reboot: 
>> https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/common/sched/null.c;h=82d5d1baab853d24fcbb455fb3f3e8263c871277;hb=HEAD#l377 
>> <https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/common/sched/null.c;h=82d5d1baab853d24fcbb455fb3f3e8263c871277;hb=HEAD#l377> 
>>
>>
>> To reproduce the vulnerability, I took the following steps:
> 
> Just to make clear for the others in the thread, per SUPPORT.MD, the 
> NULL scheduler is not security supported. Hence why this is sent to 
> xen-devel directly.
> 
> Also, for completeness, debug build are also not security supported. On 
> production build, the ASSERT() would be turned to a NOP which could 
> result to potentially more interesting issue. Anyway, that's not a 
> problem here. :)
> 
>>
>>   * Install XEN; only 4.15+ seem to be vulnerable
>>   * Use the null scheduler (depends on your setup): edit
>>     /etc/default/grub adding at the end of the file:
>>     GRUB_CMDLINE_XEN="sched=null" and update grub
>>   * Reboot into xen
>>   * Create a file guest.cfg with the following contents
>>
>> name="guest"
>> builder="hvm"
>> memory=512
>>
>> serial = [ 'file:/tmp/log', 'pty' ]
>>
>> disk = [ '/home/user/boot.iso,,hdc,cdrom' ]
>>
>> on_reboot = "destroy"
>>
>> vcpus=1
>>
>>
>> Make sure that the file //home/user/boot.iso/ does not exist
>>
>>   * Create a guest with this configuration: xl create -c guest.cfg
>>
>> CC’ing Dario, Stefano and Julien to whom I’ve shown this.
> 
> I am not quite too sure where the problem lies yet but adding some more 
> information of the debugging we discussed together.
> 
> The ASSERT() is triggered because the pCPU was already assigned to one 
> of the dom0 vCPU. This problem is happening regardless whether there is 
> free pCPU.
> 
> I have added some debugging in sched_set_res():
> 
> diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
> index a870320146ef..2355f531dc13 100644
> --- a/xen/common/sched/private.h
> +++ b/xen/common/sched/private.h
> @@ -150,6 +150,10 @@ static inline void sched_set_res(struct sched_unit 
> *unit,
>       unsigned int cpu = cpumask_first(res->cpus);
>       struct vcpu *v;
> 
> +    printk("%s: res->master_cpu %u unit %p %pd %pv\n", __func__,
> +           res->master_cpu, unit, unit->domain, unit->vcpu_list);
> +    WARN();
> +
>       for_each_sched_unit_vcpu ( unit, v )
>       {
>           ASSERT(cpu < nr_cpu_ids);
> 
> This traced the problem to null_unit_migrate():
> 
> 
> (XEN) sched_set_res: res->master_cpu 0 unit ffff830200887f00 d1 d1v0
> (XEN) Xen WARN at private.h:155
> (XEN) ----[ Xen-4.16-unstable  x86_64  debug=y  Tainted:   C   ]----
> (XEN) CPU:    1
> (XEN) RIP:    e008:[<ffff82d04023fd9f>] core.c#sched_set_res+0x5b/0xc6
> (XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor (d0v1)
> (XEN) rax: ffff83027bf55038   rbx: 0000000000000000   rcx: 0000000000000000
> (XEN) rdx: ffff83027bf4ffff   rsi: 000000000000000a   rdi: ffff82d0404944b8
> (XEN) rbp: ffff83027bf4fc70   rsp: ffff83027bf4fc40   r8:  0000000000000004
> (XEN) r9:  0000000000000030   r10: ffff83027bf4fcf8   r11: 00000000fffffffd
> (XEN) r12: ffff830275e83000   r13: ffff830275e8d000   r14: ffff830200887f00
> (XEN) r15: ffff83027bf850a0   cr0: 0000000080050033   cr4: 00000000003526e0
> (XEN) cr3: 00000001f1e3d000   cr2: 0000563f71516088
> (XEN) fsb: 00007f6561cda780   gsb: ffff88817fe80000   gss: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen code around <ffff82d04023fd9f> (core.c#sched_set_res+0x5b/0xc6):
> (XEN)  14 18 00 e8 e7 7f 00 00 <0f> 0b 4d 8b 66 08 4d 85 e4 75 28 4d 89 
> 7e 20 48
> (XEN) Xen stack trace from rsp=ffff83027bf4fc40:
> (XEN)    ffff83027bf85118 ffff830200887f00 ffff830275e83000 
> ffff830275e8d000
> (XEN)    0000000000000000 ffff83027bf552a0 ffff83027bf4fce0 
> ffff82d040241614
> (XEN)    ffff82d040226393 0000000000000286 ffff83027bf822e8 
> 0000000175e8d000
> (XEN)    ffff830200887f00 ffff83027bf552a0 ffff830275e83000 
> ffff83027bf4fcf8
> (XEN)    ffff830275e8d000 ffff830275e8d000 0000000000000001 
> 0000000000000000
> (XEN)    ffff83027bf4fd40 ffff82d04020527d ffff82d04020527d 
> 0000000000000000
> (XEN)    0000000000000000 ffff83027bf4fd30 0000000000000000 
> 0000000000000000
> (XEN)    ffff830275e8d000 00007f65620f6010 0000000000000001 
> ffff82d040238319
> (XEN)    ffff83027bf4fe58 ffff82d040238dd9 00000000001f1eae 
> 0000000000000004
> (XEN)    ffff83027bee4001 8000000000000000 ffff83027bf4fdc0 
> ffff82d04032e6df
> (XEN)    000000044032e6df 0000000000000000 ffff82e003e3e120 
> 000000140000000f
> (XEN)    00007f6561d90001 0000559a00000001 0000000000000014 
> 0000559ad9c303e0
> (XEN)    0000000000000008 0000559ad9c303e0 0000559ad9c31170 
> 0000559ad9c303c0
> (XEN)    0000000000000000 00007ffd4ed54b60 0000559ad9c309a0 
> 00007ffd4ed54c50
> (XEN)    0000000000000000 0000559ad9c38240 0000559ad9c32570 
> 00007ffd4ed54f00
> (XEN)    0000559ad9c31170 ffff83027bf4fef8 0000000000000000 
> 0000000000000001
> (XEN)    deadbeefdeadf00d ffff83027bec0000 ffff82d040238319 
> ffff83027bf4fee8
> (XEN)    ffff82d04030d8bc 00007f65620f6010 deadbeefdeadf00d 
> deadbeefdeadf00d
> (XEN)    deadbeefdeadf00d deadbeefdeadf00d ffff82d04038821c 
> ffff82d040388228
> (XEN)    ffff82d04038821c ffff82d040388228 ffff82d04038821c 
> ffff82d040388228
> (XEN) Xen call trace:
> (XEN)    [<ffff82d04023fd9f>] R core.c#sched_set_res+0x5b/0xc6
> (XEN)    [<ffff82d040241614>] F sched_init_vcpu+0x3dc/0x5d7
> (XEN)    [<ffff82d04020527d>] F vcpu_create+0xfb/0x37a
> (XEN)    [<ffff82d040238dd9>] F do_domctl+0xac0/0x184a
> (XEN)    [<ffff82d04030d8bc>] F pv_hypercall+0x10d/0x2b8
> (XEN)    [<ffff82d04038829d>] F lstar_enter+0x12d/0x140
> (XEN)
> 
> The end of the function contains an interesting comment:
> 
>      /*
>       * Whatever all the above, we always at least override v->processor.
>       * This is especially important for shutdown or suspend/resume paths,
>       * when it is important to let our caller (cpu_disable_scheduler())
>       * know that the migration did happen, to the best of our 
> possibilities,
>       * at least. In case of suspend, any temporary inconsistency caused
>       * by this, will be fixed-up during resume.
>       */
> 
> This implies that a pCPU may temporarily be assigned to two vCPUs and we 
> expect to be fixed up afterwards. However, a domain may be destroyed 
> before this is happening.
> 
> So it looks like that unit_deassign() is not able to cope with this 
> case. From a brief look, I think we may want to check if the pCPU is in 
> the wait list. If it is, then we should bail out.

Actually, I was wrong. It looks like null_unit_remove() is already 
checking this condition. Also, the vCPU should be offline (and the unit 
as well) because they haven't come online yet:

     /*
      * If a unit is (going) offline, we want it to be neither assigned
      * to a pCPU, nor in the waitqueue.
      *
      * If it was on a cpu, we've removed it from there above. If it is
      * in the waitqueue, we remove it from there now. And then we bail.
      */
     if ( unlikely(!is_unit_online(unit)) )
     {
         spin_lock(&prv->waitq_lock);
         list_del_init(&nvc->waitq_elem);
         spin_unlock(&prv->waitq_lock);
         goto out;
     }

[...]

  out:
     sched_set_res(unit, get_sched_res(new_cpu));

The jump to out looks suspicious because the comment on top says "We 
want it to be neither assigned to a pCPU, nor in the waitqueue". So 
maybe we should replace the goto out by a return?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NULL scheduler DoS
  2021-08-09 17:35 ` Julien Grall
  2021-08-09 20:38   ` Julien Grall
@ 2021-08-11  9:59   ` Dario Faggioli
  1 sibling, 0 replies; 8+ messages in thread
From: Dario Faggioli @ 2021-08-11  9:59 UTC (permalink / raw)
  To: julien, ahmeddan, xen-devel; +Cc: sstabellini, mpohlack, jgrall, doebel

[-- Attachment #1: Type: text/plain, Size: 2642 bytes --]

On Mon, 2021-08-09 at 18:35 +0100, Julien Grall wrote:
> On 09/08/2021 17:19, Ahmed, Daniele wrote:
> > Hi all,
> 
> Hi Daniele,
> 
Hello everyone from me as well,

> Thank you for the report!
> 
Indeed. :-)

> 
> The ASSERT() is triggered because the pCPU was already assigned to
> one 
> of the dom0 vCPU. This problem is happening regardless whether there
> is 
> free pCPU.
> 
Right. Can we raise the appropriate log level, so that we can see these
messages:

dprintk(XENLOG_G_INFO, "%d <-- %pdv%d\n", cpu, unit->domain, unit->unit_id);

(and then see a full `xl dmesg`, or even better, a serial console dump,
since we crash! :-P)

> I have added some debugging in sched_set_res():
> 
> diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
> index a870320146ef..2355f531dc13 100644
> --- a/xen/common/sched/private.h
> +++ b/xen/common/sched/private.h
> @@ -150,6 +150,10 @@ static inline void sched_set_res(struct
> sched_unit 
> *unit,
>       unsigned int cpu = cpumask_first(res->cpus);
>       struct vcpu *v;
> 
> +    printk("%s: res->master_cpu %u unit %p %pd %pv\n", __func__,
> +           res->master_cpu, unit, unit->domain, unit->vcpu_list);
> +    WARN();
> +
>       for_each_sched_unit_vcpu ( unit, v )
>       {
>           ASSERT(cpu < nr_cpu_ids);
> 
> This traced the problem to null_unit_migrate():
> 
> (XEN) sched_set_res: res->master_cpu 0 unit ffff830200887f00 d1 d1v0
> (XEN) Xen WARN at private.h:155
> (XEN) ----[ Xen-4.16-unstable  x86_64  debug=y  Tainted:   C   ]----
> (XEN) CPU:    1
> (XEN) RIP:    e008:[<ffff82d04023fd9f>]
> core.c#sched_set_res+0x5b/0xc6
> [...]
> (XEN) Xen call trace:
> (XEN)    [<ffff82d04023fd9f>] R core.c#sched_set_res+0x5b/0xc6
> (XEN)    [<ffff82d040241614>] F sched_init_vcpu+0x3dc/0x5d7
> (XEN)    [<ffff82d04020527d>] F vcpu_create+0xfb/0x37a
> (XEN)    [<ffff82d040238dd9>] F do_domctl+0xac0/0x184a
> (XEN)    [<ffff82d04030d8bc>] F pv_hypercall+0x10d/0x2b8
> (XEN)    [<ffff82d04038829d>] F lstar_enter+0x12d/0x140
> (XEN)
> 
So, it's entirely possible that I'm missing something obvious here, but
what it is that is making you think that we're in null_unit_migrate() ?

Does that come from a different instance of this WARN() ?

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NULL scheduler DoS
  2021-08-09 20:38   ` Julien Grall
@ 2021-08-11 10:09     ` Dario Faggioli
  2021-09-08  8:01       ` Ahmed, Daniele
  2021-12-17 15:03       ` Ahmed, Daniele
  0 siblings, 2 replies; 8+ messages in thread
From: Dario Faggioli @ 2021-08-11 10:09 UTC (permalink / raw)
  To: julien, ahmeddan, xen-devel; +Cc: sstabellini, mpohlack, jgrall, doebel

[-- Attachment #1: Type: text/plain, Size: 1310 bytes --]

On Mon, 2021-08-09 at 21:38 +0100, Julien Grall wrote:
> On 09/08/2021 18:35, Julien Grall wrote:
> > 
> > This implies that a pCPU may temporarily be assigned to two vCPUs
> > and we 
> > expect to be fixed up afterwards. However, a domain may be
> > destroyed 
> > before this is happening.
> > 
> > So it looks like that unit_deassign() is not able to cope with this
> > case. From a brief look, I think we may want to check if the pCPU
> > is in 
> > the wait list. If it is, then we should bail out.
> 
> Actually, I was wrong. It looks like null_unit_remove() is already 
> checking this condition. Also, the vCPU should be offline (and the
> unit 
> as well) because they haven't come online yet:
> 
That's what is currently puzzling me.

I mean, the crash happens in unit_deassign(), called by
null_unit_remove(), called by sched_destroy_vcpu().

And I agree that the unit should be offline, but null_unit_remove()
calls unit_deassign() only if the unit *is* online, so... What's going
on? :-/

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NULL scheduler DoS
  2021-08-11 10:09     ` Dario Faggioli
@ 2021-09-08  8:01       ` Ahmed, Daniele
  2021-12-17 15:03       ` Ahmed, Daniele
  1 sibling, 0 replies; 8+ messages in thread
From: Ahmed, Daniele @ 2021-09-08  8:01 UTC (permalink / raw)
  To: Dario Faggioli, julien, xen-devel
  Cc: sstabellini, Pohlack, Martin, Grall, Julien, Doebel, Bjoern

Hi, sorry for the late reply.

Here is the full log from the serial console: https://pastebin.com/raw/g9EFtk3y

This still happens on the current unstable as of today.

Thank you
Daniele

On 11/08/2021, 12:09, "Dario Faggioli" <dfaggioli@suse.com> wrote:

    On Mon, 2021-08-09 at 21:38 +0100, Julien Grall wrote:
    > On 09/08/2021 18:35, Julien Grall wrote:
    > > 
    > > This implies that a pCPU may temporarily be assigned to two vCPUs
    > > and we 
    > > expect to be fixed up afterwards. However, a domain may be
    > > destroyed 
    > > before this is happening.
    > > 
    > > So it looks like that unit_deassign() is not able to cope with this
    > > case. From a brief look, I think we may want to check if the pCPU
    > > is in 
    > > the wait list. If it is, then we should bail out.
    > 
    > Actually, I was wrong. It looks like null_unit_remove() is already 
    > checking this condition. Also, the vCPU should be offline (and the
    > unit 
    > as well) because they haven't come online yet:
    > 
    That's what is currently puzzling me.

    I mean, the crash happens in unit_deassign(), called by
    null_unit_remove(), called by sched_destroy_vcpu().

    And I agree that the unit should be offline, but null_unit_remove()
    calls unit_deassign() only if the unit *is* online, so... What's going
    on? :-/

    Regards
    -- 
    Dario Faggioli, Ph.D
    http://about.me/dario.faggioli
    Virtualization Software Engineer
    SUSE Labs, SUSE https://www.suse.com/
    -------------------------------------------------------------------
    <<This happens because _I_ choose it to happen!>> (Raistlin Majere)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NULL scheduler DoS
  2021-08-11 10:09     ` Dario Faggioli
  2021-09-08  8:01       ` Ahmed, Daniele
@ 2021-12-17 15:03       ` Ahmed, Daniele
  1 sibling, 0 replies; 8+ messages in thread
From: Ahmed, Daniele @ 2021-12-17 15:03 UTC (permalink / raw)
  To: Dario Faggioli, julien, xen-devel; +Cc: sstabellini, Grall, Julien

Hi!

Is there any update on this issue? How should we move forward?

Thank you,


On 11/08/2021, 12:09, "Dario Faggioli" <dfaggioli@suse.com> wrote:

    On Mon, 2021-08-09 at 21:38 +0100, Julien Grall wrote:
    > On 09/08/2021 18:35, Julien Grall wrote:
    > > 
    > > This implies that a pCPU may temporarily be assigned to two vCPUs
    > > and we 
    > > expect to be fixed up afterwards. However, a domain may be
    > > destroyed 
    > > before this is happening.
    > > 
    > > So it looks like that unit_deassign() is not able to cope with this
    > > case. From a brief look, I think we may want to check if the pCPU
    > > is in 
    > > the wait list. If it is, then we should bail out.
    > 
    > Actually, I was wrong. It looks like null_unit_remove() is already 
    > checking this condition. Also, the vCPU should be offline (and the
    > unit 
    > as well) because they haven't come online yet:
    > 
    That's what is currently puzzling me.

    I mean, the crash happens in unit_deassign(), called by
    null_unit_remove(), called by sched_destroy_vcpu().

    And I agree that the unit should be offline, but null_unit_remove()
    calls unit_deassign() only if the unit *is* online, so... What's going
    on? :-/

    Regards
    -- 
    Dario Faggioli, Ph.D
    http://about.me/dario.faggioli
    Virtualization Software Engineer
    SUSE Labs, SUSE https://www.suse.com/
    -------------------------------------------------------------------
    <<This happens because _I_ choose it to happen!>> (Raistlin Majere)




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-12-17 15:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-09 16:19 NULL scheduler DoS Ahmed, Daniele
2021-08-09 16:47 ` Ahmed, Daniele
2021-08-09 17:35 ` Julien Grall
2021-08-09 20:38   ` Julien Grall
2021-08-11 10:09     ` Dario Faggioli
2021-09-08  8:01       ` Ahmed, Daniele
2021-12-17 15:03       ` Ahmed, Daniele
2021-08-11  9:59   ` Dario Faggioli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).