From mboxrd@z Thu Jan 1 00:00:00 1970 From: Benjamin Herrenschmidt Subject: Re: RFC on writel and writel_relaxed Date: Wed, 28 Mar 2018 12:21:17 +1100 Message-ID: <1522200077.7364.85.camel@kernel.crashing.org> References: <1521854626.16434.359.camel@kernel.crashing.org> <58ce5b83f40f4775bec1be8db66adb0d@AcuMS.aculab.com> <20180326165425.GA15554@ziepe.ca> <20180326202545.GB15554@ziepe.ca> <20180326210951.GD15554@ziepe.ca> <1522101616.7364.13.camel@kernel.crashing.org> <1e077f6a-90b6-cce9-6f0f-a8c003fec850@codeaurora.org> <20180327151029.GB17494@arm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20180327151029.GB17494@arm.com> Sender: netdev-owner@vger.kernel.org To: Will Deacon , Sinan Kaya Cc: Arnd Bergmann , Jason Gunthorpe , David Laight , Oliver , "open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)" , "linux-rdma@vger.kernel.org" , Alexander Duyck , "Paul E. McKenney" , "netdev@vger.kernel.org" , Alexander Duyck , torvalds@linux-foundation.org List-Id: linux-rdma@vger.kernel.org On Tue, 2018-03-27 at 16:10 +0100, Will Deacon wrote: > To clarify: are you saying that on x86 you need a wmb() prior to a writel > if you want that writel to be ordered after prior writes to memory? Is this > specific to WC memory or some other non-standard attribute? > > The only reason we have wmb() inside writel() on arm, arm64 and power is for > parity with x86 because Linus (CC'd) wanted architectures to order I/O vs > memory by default so that it was easier to write portable drivers. The > performance impact of that implicit barrier is non-trivial, but we want the > driver portability and I went as far as adding generic _relaxed versions for > the cases where ordering isn't required. You seem to be suggesting that none > of this is necessary and drivers would already run into problems on x86 if > they didn't use wmb() explicitly in conjunction with writel, which I find > hard to believe and is in direct contradiction with the current Linux I/O > memory model (modulo the broken example in the dma_*mb section of > memory-barriers.txt). Another clarification while we are at it .... All of this only applies to concurrent access by the CPU and the device to memory allocate with dma_alloc_coherent(). For memory "mapped" into the DMA domain via dma_map_* then an extra dma_sync_for_* is needed. In most useful server cases etc... these latter are NOPs, but architecture without full DMA cache coherency or using swiotlb, dma_map_* might maintain bounce buffers or play additional cache flushing tricks. Cheers, Ben.