From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:56605) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SVJ0U-0004Fz-HI for qemu-devel@nongnu.org; Fri, 18 May 2012 04:57:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SVJ0M-0001T7-KA for qemu-devel@nongnu.org; Fri, 18 May 2012 04:57:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:26650) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SVJ0M-0001Sn-C0 for qemu-devel@nongnu.org; Fri, 18 May 2012 04:57:46 -0400 Message-ID: <4FB60EFF.6070205@redhat.com> Date: Fri, 18 May 2012 10:57:35 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1336625347-10169-1-git-send-email-benh@kernel.crashing.org> <1336625347-10169-14-git-send-email-benh@kernel.crashing.org> <4FB1A8BF.7060503@codemonkey.ws> <20120515014449.GF30229@truffala.fritz.box> <1337142938.6727.122.camel@pasglop> <4FB4028F.7070003@codemonkey.ws> <1337213257.30558.22.camel@pasglop> <1337214293.30558.25.camel@pasglop> <4FB5F1FD.9020009@redhat.com> <1337329136.2513.7.camel@pasglop> In-Reply-To: <1337329136.2513.7.camel@pasglop> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 13/13] iommu: Add a memory barrier to DMA RW function List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Benjamin Herrenschmidt Cc: David Gibson , qemu-devel@nongnu.org, Anthony Liguori , "Michael S. Tsirkin" Il 18/05/2012 10:18, Benjamin Herrenschmidt ha scritto: > On Fri, 2012-05-18 at 08:53 +0200, Paolo Bonzini wrote: > >> It depends on what semantics you attach to dma_mb. In my opinion, >> having a separate barrier for DMA is wrong, because you want the same >> semantics on all architectures. >> >> The x86 requirements are roughly as follows: >> >> 1) it never needs explicit rmb and wmb (as long as you don't use >> non-temporal stores etc.); >> >> 2) synchronized operations have an implicit mb before and after (unlike >> LL/SC on PowerPC). >> >> 3) everywhere else, you need an mb. >> >> So, on x86 you have more or less an implicit wmb after each write and an >> implicit rmb before a read. This explains why these kind of bugs are >> very hard to see on x86 (or often impossible to see). > > So what you mean is that on x86, a read can pass a write (and > vice-versa) ? Interesting.... I didn't know that. I have to look it up every time. It takes several pages in the x86 manuals, but the important part is this: 0) Reads are not moved before older reads, writes are not moved before older writes 1) Writes are not moved before older reads 2) Reads may be moved before older writes to different locations, but not before older writes to the same location. 3) Intra-processor forwarding is allowed. While a store is temporarily held in a processor's store buffer, it can satisfy the processor's own loads. > I'm still tempted to add barriers in map and unmap as well in the case > where they don't bounce to provide consistent semantics here, ie, all > accesses done between the map and unmap are ordered vs all previous and > subsequent accesses. Ok with that ? [...] > I will not add barriers to the various ld*/st* variants. In theory you would need a memory barrier before the first ld/st and one after the last... considering virtio uses map/unmap, what about leaving map/unmap and ld*_phys/st*_phys as the high performance unsafe API? Then you can add barriers around ld*_pci_dma/st*_pci_dma. >> x86 -> Load Seq_Cst: mov or mfence; mov >> Store Seq Cst: mov; mfence or mov >> >> ARM -> Load Seq Cst: ldr; dmb or dmb; ldr; dmb >> Store Seq Cst: dmb; str; dmb or dmb; str >> >> PPC -> Load Seq Cst: sync; ld; cmp; bc; isync >> Store Seq Cst: sync; st >> >> where cmp; bc; isync can be replaced by sync. > > Hrm, the cmp/bc/isync can be -very- expensive, we use a variant of that > using twi to enforce complete execution of reads in our readX() > accessors in the kernel but I don't think I want to do that in qemu. Ah, ok, thanks for explaining what cmp;bc;isync really is. :) > The full sync should provide all the synchronization we need You mean "sync; ld; sync" for load and "sync; st" for store? That would do, yes. Paolo