From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:56000) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R0To9-0004NE-8B for qemu-devel@nongnu.org; Mon, 05 Sep 2011 03:41:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1R0To8-0008Hp-46 for qemu-devel@nongnu.org; Mon, 05 Sep 2011 03:41:29 -0400 Received: from mail-iy0-f173.google.com ([209.85.210.173]:59689) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R0To8-0008Hk-0q for qemu-devel@nongnu.org; Mon, 05 Sep 2011 03:41:28 -0400 Received: by iabz25 with SMTP id z25so6994455iab.4 for ; Mon, 05 Sep 2011 00:41:27 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <4E647D1F.6050307@redhat.com> Date: Mon, 05 Sep 2011 09:41:19 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <20110901163359.GB11620@redhat.com> <786649703.1049386.1314909069542.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> <20110902154549.GA18368@redhat.com> <20110903144635.GD12965@yookeroo.fritz.box> <20110904091643.GA20795@redhat.com> In-Reply-To: <20110904091643.GA20795@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] virtio: Make memory barriers be memory barriers List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: aik@ozlabs.ru, aliguori@us.ibm.com, rusty@rustcorp.com.au, qemu-devel@nongnu.org, agraf@suse.de On 09/04/2011 11:16 AM, Michael S. Tsirkin wrote: >> > I mean argue for a richer set of barriers, with per-arch minimal >> > implementations instead of the large but portable hammer of >> > sync_synchronize, if you will. > > That's what I'm saying really. On x86 the richer set of barriers > need not insert code at all for both wmb and rmb macros. All we > might need is an 'optimization barrier'- e.g. linux does > __asm__ __volatile__("": : :"memory") > ppc needs something like sync_synchronize there. No, rmb and wmb need to generate code. You are right that in some places there will be some extra barriers. If you want a richer set of barriers, that must be something like {rr,rw,wr,ww}_mb{_acq,_rel,} (again not counting the Alpha). On x86, then, all the rr/rw/ww barriers will be compiler barriers because the hardware already enforces ordering. The other three map to lfence/sfence/mfence: barrier assembly why? --------------------------------------------------------------------- wr_mb_acq lfence prevents the read from moving up -> acquire wr_mb_rel sfence prevents the write from moving down -> release wr_mb mfence (full barrier) But if you stick to rmb/wmb/mb, then the correct definition of rmb is "the least strict barrier that provides all three of rr_mb(), rw_mb_rel() and wr_mb_acq()". This is, as expected, an lfence. Similarly, wmb must provide all three of ww_mb(), wr_mb_rel() and rw_mb_acq(), and this is an sfence. So the right place to put an #ifdef is not "wmb()", but the _uses_ of wmb() where you know you need a barrier that is less strict. That's why I say David patch is correct; on top of that you may change the particular uses of wmb() in virtio.c to compiler barriers, for example when you only care about ordering writes after writes. Likewise, there may even be places in which you could #ifdef out a full memory barrier. For example, if you only care about ordering writes with respect to reads, x86 hardware is already providing that and you could omit the mb(). I think in general it is premature optimization, though. Regarding specific examples in virtio where lfence and sfence could be used, there may be one when using event signaling. In the backend you write first the index of your response, then you check whether to generate an event. (I think) the following requirements hold: * if you read the event-index too early, you might skip an event and deadlock. So you need at least a read barrier. * you can write the response-index after reading the event-index, as long as you write it before waking up the guest. So, in that case an x86 lfence should be enough, though again without more consideration I would use a full barrier just to be sure. Paolo