From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 628ECC4338F for ; Mon, 9 Aug 2021 17:35:33 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 11CEF60EDF for ; Mon, 9 Aug 2021 17:35:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 11CEF60EDF Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=xen.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.xenproject.org Received: from list by lists.xenproject.org with outflank-mailman.165101.301734 (Exim 4.92) (envelope-from ) id 1mD9BF-0005fa-QI; Mon, 09 Aug 2021 17:35:17 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 165101.301734; Mon, 09 Aug 2021 17:35:17 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mD9BF-0005fT-NT; Mon, 09 Aug 2021 17:35:17 +0000 Received: by outflank-mailman (input) for mailman id 165101; Mon, 09 Aug 2021 17:35:16 +0000 Received: from mail.xenproject.org ([104.130.215.37]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mD9BD-0005fN-W3 for xen-devel@lists.xenproject.org; Mon, 09 Aug 2021 17:35:16 +0000 Received: from xenbits.xenproject.org ([104.239.192.120]) by mail.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mD9BC-0002Ab-K9; Mon, 09 Aug 2021 17:35:14 +0000 Received: from [54.239.6.184] (helo=a483e7b01a66.ant.amazon.com) by xenbits.xenproject.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1mD9BC-0004cI-Da; Mon, 09 Aug 2021 17:35:14 +0000 X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=xen.org; s=20200302mail; h=Content-Transfer-Encoding:Content-Type:In-Reply-To: MIME-Version:Date:Message-ID:From:References:Cc:To:Subject; bh=6mKge3nzAl0SwtL7FuDyOhzT0WSdiT8St/uER5mWI1Y=; b=IxUXDUFzrvbnRdQ/wpbIeJH0rU 8Oow6wqdWnyafvnIZQ9fqyhrMcJGA1PUoOFu73e5ed+9QfKY4bf09nVH4BmTeMu1wn6osnoaHMXyx E81cJ8bwgNtJ/s2BfVBy+VJA9AHxz/qQ/eNzTQ9yuWqOv5y1CkWRZjwEVy7cTtv3thDg=; Subject: Re: NULL scheduler DoS To: "Ahmed, Daniele" , "xen-devel@lists.xenproject.org" Cc: Dario Faggioli , Stefano Stabellini , "Grall, Julien" , "Doebel, Bjoern" , "Pohlack, Martin" References: From: Julien Grall Message-ID: <8193a685-3ab1-9407-75f3-e335ea4406e4@xen.org> Date: Mon, 9 Aug 2021 18:35:12 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 8bit On 09/08/2021 17:19, Ahmed, Daniele wrote: > Hi all, Hi Daniele, Thank you for the report! > The NULL scheduler is affected by an issue that triggers an assertion > and reboots the hypervisor. > > This issue arise when: > > * a guest is being created with a configuration specifying a file that > does not exist > * the hypervisor boots with the null scheduler > > 4.16 is affected and 4.15 also. > > This is the stack trace from 4.16: > > (XEN) Assertion 'npc->unit == unit' failed at null.c:377 > (XEN) ----[ Xen-4.16-unstable x86_64 debug=y Not tainted ]---- > (XEN) CPU: 3 > (XEN) RIP: e008:[] > common/sched/null.c#unit_deassign+0x1c3/0x2ec > (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor > (XEN) rax: ffff83005ce1c850 rbx: 0000000000000001 rcx: 0000000000000001 > (XEN) rdx: ffff83007fde6fc0 rsi: ffff83005ce1c790 rdi: ffff83007ffb7850 > (XEN) rbp: ffff83007ffdfda0 rsp: ffff83007ffdfd48 r8: 0000000000000000 > (XEN) r9: 0000000000048fee r10: 0000000000000000 r11: 0000000000000000 > (XEN) r12: ffff82d0405c9298 r13: ffff83007f7fd508 r14: ffff83005ce1c850 > (XEN) r15: ffff82d0405e2680 cr0: 000000008005003b cr4: 00000000003526e0 > (XEN) cr3: 000000007f6b3000 cr2: ffff888072e79dc0 > (XEN) fsb: 0000000000000000 gsb: ffff888071ac0000 gss: 0000000000000000 > (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen code around > (common/sched/null.c#unit_deassign+0x1c3/0x2ec): > (XEN) 41 5e 41 5f 5d c3 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 49 8b 04 24 0f > b7 00 66 > (XEN) Xen stack trace from rsp=ffff83007ffdfd48: > (XEN) ffff83007ffdfd88 ffff82d04023961c 0000000400000000 ffff83005ce1cc50 > (XEN) 0000000000000002 ffff83007ffdfd90 ffff83005ce1c790 ffff82d0405c9298 > (XEN) ffff83007f7fd508 ffff83005ce1c850 ffff82d0405e2680 ffff83007ffdfde0 > (XEN) ffff82d04024f889 ffff83007ffb7850 ffff83005dd63000 ffff83005ce1c790 > (XEN) ffff83005845ab28 ffff83005845a000 0000000000000000 ffff83007ffdfe00 > (XEN) ffff82d040253326 ffff83005dd63000 0000000000000000 ffff83007ffdfe38 > (XEN) ffff82d04020506b ffff83007a881080 0000000000000000 0000000000000000 > (XEN) 0000000000000000 ffff82d0405d6f80 ffff83007ffdfe70 ffff82d04022d9e5 > (XEN) 0000001100000003 ffff82d0405cf100 ffff82d0405cf100 ffffffffffffffff > (XEN) ffff82d0405cef80 ffff83007ffdfea8 ffff82d04022e14b 0000000000000003 > (XEN) ffff82d0405cf100 0000000000007fff 0000000000000003 0000000000000003 > (XEN) ffff83007ffdfeb8 ffff82d04022e1e6 ffff83007ffdfef0 ffff82d0403172b4 > (XEN) ffff82d04031721d ffff83007fec1000 ffff83007ffb6000 0000000000000003 > (XEN) ffff83007ffcc000 ffff83007ffdfe18 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000003 0000000000000003 > (XEN) 0000000000000246 0000000000000003 0000000000000000 000000001bf9dde5 > (XEN) 0000000000000000 ffffffff810023aa 0000000000000003 deadbeefdeadf00d > (XEN) deadbeefdeadf00d 0000010000000000 ffffffff810023aa 000000000000e033 > (XEN) 0000000000000246 ffffc900400a3ea8 000000000000e02b 7ffdff707fffd140 > (XEN) 000000017fe37a6c 000000007ffe8010 0000000000000000 0000e01000000003 > (XEN) Xen call trace: > (XEN) [] R common/sched/null.c#unit_deassign+0x1c3/0x2ec > (XEN) [] F common/sched/null.c#null_unit_remove+0xfc/0x136 > (XEN) [] F sched_destroy_vcpu+0xca/0x199 > (XEN) [] F > common/domain.c#complete_domain_destroy+0x68/0x13f > (XEN) [] F > common/rcupdate.c#rcu_process_callbacks+0xdb/0x24b > (XEN) [] F common/softirq.c#__do_softirq+0x8a/0xbc > (XEN) [] F do_softirq+0x13/0x15 > (XEN) [] F arch/x86/domain.c#idle_loop+0x97/0xee > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 3: > (XEN) Assertion 'npc->unit == unit' failed at null.c:377 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > > This is the line of the assertion that triggers the reboot: > https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/common/sched/null.c;h=82d5d1baab853d24fcbb455fb3f3e8263c871277;hb=HEAD#l377 > > > To reproduce the vulnerability, I took the following steps: Just to make clear for the others in the thread, per SUPPORT.MD, the NULL scheduler is not security supported. Hence why this is sent to xen-devel directly. Also, for completeness, debug build are also not security supported. On production build, the ASSERT() would be turned to a NOP which could result to potentially more interesting issue. Anyway, that's not a problem here. :) > > * Install XEN; only 4.15+ seem to be vulnerable > * Use the null scheduler (depends on your setup): edit > /etc/default/grub adding at the end of the file: > GRUB_CMDLINE_XEN="sched=null" and update grub > * Reboot into xen > * Create a file guest.cfg with the following contents > > name="guest" > builder="hvm" > memory=512 > > serial = [ 'file:/tmp/log', 'pty' ] > > disk = [ '/home/user/boot.iso,,hdc,cdrom' ] > > on_reboot = "destroy" > > vcpus=1 > > > Make sure that the file //home/user/boot.iso/ does not exist > > * Create a guest with this configuration: xl create -c guest.cfg > > CC’ing Dario, Stefano and Julien to whom I’ve shown this. I am not quite too sure where the problem lies yet but adding some more information of the debugging we discussed together. The ASSERT() is triggered because the pCPU was already assigned to one of the dom0 vCPU. This problem is happening regardless whether there is free pCPU. I have added some debugging in sched_set_res(): diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h index a870320146ef..2355f531dc13 100644 --- a/xen/common/sched/private.h +++ b/xen/common/sched/private.h @@ -150,6 +150,10 @@ static inline void sched_set_res(struct sched_unit *unit, unsigned int cpu = cpumask_first(res->cpus); struct vcpu *v; + printk("%s: res->master_cpu %u unit %p %pd %pv\n", __func__, + res->master_cpu, unit, unit->domain, unit->vcpu_list); + WARN(); + for_each_sched_unit_vcpu ( unit, v ) { ASSERT(cpu < nr_cpu_ids); This traced the problem to null_unit_migrate(): (XEN) sched_set_res: res->master_cpu 0 unit ffff830200887f00 d1 d1v0 (XEN) Xen WARN at private.h:155 (XEN) ----[ Xen-4.16-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 1 (XEN) RIP: e008:[] core.c#sched_set_res+0x5b/0xc6 (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor (d0v1) (XEN) rax: ffff83027bf55038 rbx: 0000000000000000 rcx: 0000000000000000 (XEN) rdx: ffff83027bf4ffff rsi: 000000000000000a rdi: ffff82d0404944b8 (XEN) rbp: ffff83027bf4fc70 rsp: ffff83027bf4fc40 r8: 0000000000000004 (XEN) r9: 0000000000000030 r10: ffff83027bf4fcf8 r11: 00000000fffffffd (XEN) r12: ffff830275e83000 r13: ffff830275e8d000 r14: ffff830200887f00 (XEN) r15: ffff83027bf850a0 cr0: 0000000080050033 cr4: 00000000003526e0 (XEN) cr3: 00000001f1e3d000 cr2: 0000563f71516088 (XEN) fsb: 00007f6561cda780 gsb: ffff88817fe80000 gss: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen code around (core.c#sched_set_res+0x5b/0xc6): (XEN) 14 18 00 e8 e7 7f 00 00 <0f> 0b 4d 8b 66 08 4d 85 e4 75 28 4d 89 7e 20 48 (XEN) Xen stack trace from rsp=ffff83027bf4fc40: (XEN) ffff83027bf85118 ffff830200887f00 ffff830275e83000 ffff830275e8d000 (XEN) 0000000000000000 ffff83027bf552a0 ffff83027bf4fce0 ffff82d040241614 (XEN) ffff82d040226393 0000000000000286 ffff83027bf822e8 0000000175e8d000 (XEN) ffff830200887f00 ffff83027bf552a0 ffff830275e83000 ffff83027bf4fcf8 (XEN) ffff830275e8d000 ffff830275e8d000 0000000000000001 0000000000000000 (XEN) ffff83027bf4fd40 ffff82d04020527d ffff82d04020527d 0000000000000000 (XEN) 0000000000000000 ffff83027bf4fd30 0000000000000000 0000000000000000 (XEN) ffff830275e8d000 00007f65620f6010 0000000000000001 ffff82d040238319 (XEN) ffff83027bf4fe58 ffff82d040238dd9 00000000001f1eae 0000000000000004 (XEN) ffff83027bee4001 8000000000000000 ffff83027bf4fdc0 ffff82d04032e6df (XEN) 000000044032e6df 0000000000000000 ffff82e003e3e120 000000140000000f (XEN) 00007f6561d90001 0000559a00000001 0000000000000014 0000559ad9c303e0 (XEN) 0000000000000008 0000559ad9c303e0 0000559ad9c31170 0000559ad9c303c0 (XEN) 0000000000000000 00007ffd4ed54b60 0000559ad9c309a0 00007ffd4ed54c50 (XEN) 0000000000000000 0000559ad9c38240 0000559ad9c32570 00007ffd4ed54f00 (XEN) 0000559ad9c31170 ffff83027bf4fef8 0000000000000000 0000000000000001 (XEN) deadbeefdeadf00d ffff83027bec0000 ffff82d040238319 ffff83027bf4fee8 (XEN) ffff82d04030d8bc 00007f65620f6010 deadbeefdeadf00d deadbeefdeadf00d (XEN) deadbeefdeadf00d deadbeefdeadf00d ffff82d04038821c ffff82d040388228 (XEN) ffff82d04038821c ffff82d040388228 ffff82d04038821c ffff82d040388228 (XEN) Xen call trace: (XEN) [] R core.c#sched_set_res+0x5b/0xc6 (XEN) [] F sched_init_vcpu+0x3dc/0x5d7 (XEN) [] F vcpu_create+0xfb/0x37a (XEN) [] F do_domctl+0xac0/0x184a (XEN) [] F pv_hypercall+0x10d/0x2b8 (XEN) [] F lstar_enter+0x12d/0x140 (XEN) The end of the function contains an interesting comment: /* * Whatever all the above, we always at least override v->processor. * This is especially important for shutdown or suspend/resume paths, * when it is important to let our caller (cpu_disable_scheduler()) * know that the migration did happen, to the best of our possibilities, * at least. In case of suspend, any temporary inconsistency caused * by this, will be fixed-up during resume. */ This implies that a pCPU may temporarily be assigned to two vCPUs and we expect to be fixed up afterwards. However, a domain may be destroyed before this is happening. So it looks like that unit_deassign() is not able to cope with this case. From a brief look, I think we may want to check if the pCPU is in the wait list. If it is, then we should bail out. Dario, Stefano, what do you think? Cheers, -- Julien Grall