From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41F9BC4363D for ; Wed, 23 Sep 2020 22:57:56 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D7BA8221EB for ; Wed, 23 Sep 2020 22:57:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="zsV8kbz6" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D7BA8221EB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1kLDhU-0003KL-VM; Wed, 23 Sep 2020 22:57:24 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1kLDhU-0003KG-Aw for xen-devel@lists.xenproject.org; Wed, 23 Sep 2020 22:57:24 +0000 X-Inumbo-ID: 0777d2c5-eda4-4ce9-95c1-59b8205f9cb9 Received: from mail.kernel.org (unknown [198.145.29.99]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 0777d2c5-eda4-4ce9-95c1-59b8205f9cb9; Wed, 23 Sep 2020 22:57:23 +0000 (UTC) Received: from localhost (c-67-164-102-47.hsd1.ca.comcast.net [67.164.102.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 25B12221EB; Wed, 23 Sep 2020 22:57:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600901842; bh=ywFJpMGTv5jG5jbg33/I94cuq14ijF9WgxWFcU7XNv0=; h=Date:From:To:cc:Subject:In-Reply-To:References:From; b=zsV8kbz65yCEw7i2g4+2OClrE0XiFvFrGmFiW4cVcrsAcho4eYxsFyt4OWVIyJ399 sJvsTHs2NiSaend0kP6eDLdU81tl5Xv2E6yjIaMdJKdjP2s3xlhLHRvm1k2MqyLe5S O4pSEPqdT1uJZz/E/bmZT1vwtWSzJzaKi1cxqpY8= Date: Wed, 23 Sep 2020 15:57:21 -0700 (PDT) From: Stefano Stabellini X-X-Sender: sstabellini@sstabellini-ThinkPad-T480s To: Julien Grall cc: "Durrant, Paul" , Jan Beulich , Stefano Stabellini , "andrew.cooper3@citrix.com" , George Dunlap , "Xia, Hongyan" , "xen-devel@lists.xenproject.org" Subject: Re: Memory ordering question in the shutdown deferral code In-Reply-To: Message-ID: References: <468576ba-8d3f-98e9-e65e-1128b5220d40@xen.org> <92a6373003e142e9943a4057024a2616@EX13D32EUC003.ant.amazon.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" On Mon, 21 Sep 2020, Julien Grall wrote: > On 21/09/2020 13:55, Durrant, Paul wrote: > > > (+ Xen-devel) > > > > > > Sorry I forgot to CC xen-devel. > > > > > > On 21/09/2020 12:38, Julien Grall wrote: > > > > Hi all, > > > > > > > > I have started to look at the deferral code (see > > > > vcpu_start_shutdown_deferral()) because we need it for LiveUpdate and > > > > Arm will soon use it. > > > > > > > > The current implementation is using an smp_mb() to ensure ordering > > > > between a write then a read. The code looks roughly (I have slightly > > > > adapted it to make my question more obvious): > > > > > > > > domain_shutdown() > > > > d->is_shutting_down = 1; > > > > smp_mb(); > > > > if ( !vcpu0->defer_shutdown ) > > > > { > > > > vcpu_pause_nosync(v); > > > > v->paused_for_shutdown = 1; > > > > } > > > > > > > > vcpu_start_shutdown_deferral() > > > > vcpu0->defer_shutdown = 1; > > > > smp_mb(); > > > > if ( unlikely(d->is_shutting_down) ) > > > > vcpu_check_shutdown(v); > > > > > > > > return vcpu0->defer_shutdown; > > > > > > > > smp_mb() should only guarantee ordering (this may be stronger on some > > > > arch), so I think there is a race between the two functions. > > > > > > > > It would be possible to pause the vCPU in domain_shutdown() because > > > > vcpu0->defer_shutdown wasn't yet seen. > > > > > > > > Equally, vcpu_start_shutdown_deferral() may not see d->is_shutting_down > > > > and therefore Xen may continue to send the I/O. Yet the vCPU will be > > > > paused so the I/O will never complete. > > > > > > > > The barrier enforces global order, right? > > It is not clear to me what you mean by "global ordering". This seems to > suggest a very expensive synchronization barrier between all the processors. > > From an arch-agnostic PoV, smp_mb() will enforce an ordering between > loads/stores but it doesn't guarantee *when* they will be observed. > > > So, if domain_shutdown() pauses the vcpu then is_shutting_down must > > necessarily be visible all cpus. > > That's not the guarantee provided by smp_mb() (see above). I simplified the code further to help us reason about it: thread#1 | thread#2 | 1) WRITE A | WRITE B 2) BARRIER | BARRIER 3) READ B | READ A I think it is (theoretically) possible for thread#1 to be at 1) and about to do 2), while thread#2 goes ahead and does 1) 2) 3). By the time thread#1 does 2), thread#2 has already completed the entire sequence. If thread#2 has already done 2), and thread#1 is about to do 3), then I think we are guaranteed that thread#1 will see the new value of B. Or is this the core of the issue we are discussing? If it works the way I wrote, then it would confirm Paul's view. For your information I went to check what the Linux memory model has to say about this. It says: "smp_mb() guarantees to restore sequential consistency among accesses that use READ_ONCE, WRITE_ONCE(), or stronger. For example, the following Linux-kernel code would forbid non-SC outcomes" It is interesting that they chose the words "restore sequential consistency". It would be difficult to come up with an example that has "sequential consistency" but doesn't work the way described earlier.