From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754141AbbKBQeB (ORCPT ); Mon, 2 Nov 2015 11:34:01 -0500 Received: from mout.kundenserver.de ([212.227.17.10]:52795 "EHLO mout.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752157AbbKBQd6 (ORCPT ); Mon, 2 Nov 2015 11:33:58 -0500 From: Arnd Bergmann To: Sinan Kaya Cc: dmaengine@vger.kernel.org, timur@codeaurora.org, cov@codeaurora.org, jcm@redhat.com, Rob Herring , Pawel Moll , Mark Rutland , Ian Campbell , Kumar Gala , Vinod Koul , Dan Williams , devicetree@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/2] dma: add Qualcomm Technologies HIDMA channel driver Date: Mon, 02 Nov 2015 17:33:34 +0100 Message-ID: <3962829.iR3I63FEm8@wuerfel> User-Agent: KMail/4.11.5 (Linux/3.16.0-10-generic; KDE/4.11.5; x86_64; ; ) In-Reply-To: <56365F0D.6010508@codeaurora.org> References: <1446174501-8870-1-git-send-email-okaya@codeaurora.org> <4552697.VhjWnxQoIo@wuerfel> <56365F0D.6010508@codeaurora.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Provags-ID: V03:K0:X6gYoYXOJefZRK3TKLuJf4Z7WKUZdz8Ontom2EddqEqXDRsJbSD BOQfnwwRjf6DYSZA4PTujZjVS6f1oKb0+s1EBe5F+eWcfXgo6PPD0wDwWc/DXRGVJNMoOfk A/W16/sWGHwk9aAwpqGnPEeE5CvHjxF6ukMwrMJPvNqfISFZW7LQblKKjdcNYc1wfeN5ImL dMmkeDaZhyi89Pb2bpvMg== X-UI-Out-Filterresults: notjunk:1;V01:K0:uCcmXKzsk3I=:qjJqEfiVGi+2OsB85WohYL KT72SmZSAWfbhqtzlgg/YegK4wOEgbkFST3tVfn7dH+Ig9mHX/0I2FQXPPzfePpiCUXP3JJlt jWja5lYmXstU1tjfvVUMl1O8TeuVuG9COqOU3zSkvg2+si5bbD2FpyavYICmQ0Q5cBQmtgL++ ab3IJ6yBQDfkS7UsvjlcAcHpcFnEEQEqo91CnP93uMSA4Tlt3vsTqvIA8Y+Ips6+xCCPFuV+f twoDY/MieEc0ZfSpmvy6kCxwOm0/sYh6sh1ySjQGfTxW5A6fwWIWb+glWfRYguKQRE2/cUdTC z19hb52MW2A1fE41H2ZucjsHZh9bjPINn3TOwFllF2+QNElcJj3yJbgqc0wPMPp2EmFFvrOYL Y7QuN41JAXQ/3dbRoPFM50KoFWOXLvwzQumQnM/V14dFaBEIHWPRUUe6/OdwgbawCSHtK3yqg +ya+IWK/X9annveMYFWhMGlxM9pIFNE4YHII+wqQRkYf5iVRizvQJcHwAmFUGB88mKi5c74Ee OH9SorWvtEk621WjOF7xREIxw+UnbjZlRqHLqboW3HQAHgS2EV//W7uYZ2+4XPh8mlcwp8X3u O88dUOqlRFLaFmbt9xdEFzMEJeSuAKQaguJJILzH/AZo14q/GRQNQHAKbZOCDx6YiMqW0wPz6 HX0LHQAE9HcKtjR/vhc20kAzCXQYIOAOKTQZsGrL+xe2xOVUae4Nzf5B3daAIj5RnmVHJ5dTO c6BEhHDJiBNu856h Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sunday 01 November 2015 13:50:53 Sinan Kaya wrote: > > >> The issue is not writel_relaxed vs. writel. After I issue reset, I need > >> wait for some time to confirm reset was done. I can use readl_polling > >> instead of mdelay if we don't like mdelay. > > > > I meant that both _relaxed() and mdelay() are probably wrong here. > > You are right about redundant writel_relaxed + wmb. They are effectively > equal to writel. Actually, writel() is wmb()+writel_relaxed(), not the other way round: When sending a command to a device that can start a DMA transfer, the barrier is required to ensure that the DMA happens after previously written data has gone from the CPU write buffers into the memory that is used as the source for the transfer. A barrier after the writel() has no effect, as MMIO writes are posted on the bus. > However, after issuing the command; I still need to wait some amount of > time until hardware acknowledges the commands like reset/enable/disable. > These are relatively faster operations happening in microseconds. That's > why, I have mdelay there. > > I'll take a look at workqueues but it could turn out to be an overkill > for few microseconds. Most devices are able to provide an interrupt for long-running commands. Are you sure that yours is unable to do this? If so, is this a design mistake or an implementation bug? > >>> Reading the status probably requires a readl() rather than readl_relaxed() > >>> to guarantee that the DMA data has arrived in memory by the time that the > >>> register data is seen by the CPU. If using readl_relaxed() here is a valid > >>> and required optimization, please add a comment to explain why it works > >>> and how much you gain. > >> > >> I will add some description. This is a high speed peripheral. I don't > >> like spreading barriers as candies inside the readl and writel unless I > >> have to. > >> > >> According to the barriers video, I watched on youtube this should be the > >> rule for ordering. > >> > >> "if you do two relaxed reads and check the results of the returned > >> variables, ARM architecture guarantees that these two relaxed variables > >> will get observed during the check." > >> > >> this is called implied ordering or something of that sort. > > > > My point was a bit different: while it is guaranteed that the > > result of the readl_relaxed() is observed in order, they do not > > guarantee that a DMA from device to memory that was started by > > the device before the readl_relaxed() has arrived in memory > > by the time that the readl_relaxed() result is visible to the > > CPU and it starts accessing the memory. > > > I checked with the hardware designers. Hardware guarantees that by the > time interrupt is observed, all data transactions in flight are > delivered to their respective places and are visible to the CPU. I'll > add a comment in the code about this. I'm curious about this. Does that mean the device is not meant for high-performance transfers and just synchronizes the bus before triggering the interrupt? > > In other words, when the hardware sends you data followed by an > > interrupt to tell you the data is there, your interrupt handler > > can tell the driver that is waiting for this data that the DMA > > is complete while the data itself is still in flight, e.g. waiting > > for an IOMMU to fetch page table entries. > > > There is HW guarantee for ordering. > > On demand paging for IOMMU is only supported for PCIe via PRI (Page > Request Interface) not for HIDMA. All other hardware instances work on > pinned DMA addresses. I'll drop a note about this too to the code as well. I wasn't talking about paging, just fetching the IOTLB from the preloaded page tables in RAM. This can takes several uncached memory accesses, so it would generally be slow. Arnd