From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754141AbbKBQeB (ORCPT <rfc822;w@1wt.eu>);
	Mon, 2 Nov 2015 11:34:01 -0500
Received: from mout.kundenserver.de ([212.227.17.10]:52795 "EHLO
	mout.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752157AbbKBQd6 (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 2 Nov 2015 11:33:58 -0500
From: Arnd Bergmann <arnd@arndb.de>
To: Sinan Kaya <okaya@codeaurora.org>
Cc: dmaengine@vger.kernel.org, timur@codeaurora.org, cov@codeaurora.org,
        jcm@redhat.com, Rob Herring <robh+dt@kernel.org>,
        Pawel Moll <pawel.moll@arm.com>, Mark Rutland <mark.rutland@arm.com>,
        Ian Campbell <ijc+devicetree@hellion.org.uk>,
        Kumar Gala <galak@codeaurora.org>, Vinod Koul <vinod.koul@intel.com>,
        Dan Williams <dan.j.williams@intel.com>, devicetree@vger.kernel.org,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] dma: add Qualcomm Technologies HIDMA channel driver
Date: Mon, 02 Nov 2015 17:33:34 +0100
Message-ID: <3962829.iR3I63FEm8@wuerfel>
User-Agent: KMail/4.11.5 (Linux/3.16.0-10-generic; KDE/4.11.5; x86_64; ; )
In-Reply-To: <56365F0D.6010508@codeaurora.org>
References: <1446174501-8870-1-git-send-email-okaya@codeaurora.org> <4552697.VhjWnxQoIo@wuerfel> <56365F0D.6010508@codeaurora.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Provags-ID: V03:K0:X6gYoYXOJefZRK3TKLuJf4Z7WKUZdz8Ontom2EddqEqXDRsJbSD
 BOQfnwwRjf6DYSZA4PTujZjVS6f1oKb0+s1EBe5F+eWcfXgo6PPD0wDwWc/DXRGVJNMoOfk
 A/W16/sWGHwk9aAwpqGnPEeE5CvHjxF6ukMwrMJPvNqfISFZW7LQblKKjdcNYc1wfeN5ImL
 dMmkeDaZhyi89Pb2bpvMg==
X-UI-Out-Filterresults: notjunk:1;V01:K0:uCcmXKzsk3I=:qjJqEfiVGi+2OsB85WohYL
 KT72SmZSAWfbhqtzlgg/YegK4wOEgbkFST3tVfn7dH+Ig9mHX/0I2FQXPPzfePpiCUXP3JJlt
 jWja5lYmXstU1tjfvVUMl1O8TeuVuG9COqOU3zSkvg2+si5bbD2FpyavYICmQ0Q5cBQmtgL++
 ab3IJ6yBQDfkS7UsvjlcAcHpcFnEEQEqo91CnP93uMSA4Tlt3vsTqvIA8Y+Ips6+xCCPFuV+f
 twoDY/MieEc0ZfSpmvy6kCxwOm0/sYh6sh1ySjQGfTxW5A6fwWIWb+glWfRYguKQRE2/cUdTC
 z19hb52MW2A1fE41H2ZucjsHZh9bjPINn3TOwFllF2+QNElcJj3yJbgqc0wPMPp2EmFFvrOYL
 Y7QuN41JAXQ/3dbRoPFM50KoFWOXLvwzQumQnM/V14dFaBEIHWPRUUe6/OdwgbawCSHtK3yqg
 +ya+IWK/X9annveMYFWhMGlxM9pIFNE4YHII+wqQRkYf5iVRizvQJcHwAmFUGB88mKi5c74Ee
 OH9SorWvtEk621WjOF7xREIxw+UnbjZlRqHLqboW3HQAHgS2EV//W7uYZ2+4XPh8mlcwp8X3u
 O88dUOqlRFLaFmbt9xdEFzMEJeSuAKQaguJJILzH/AZo14q/GRQNQHAKbZOCDx6YiMqW0wPz6
 HX0LHQAE9HcKtjR/vhc20kAzCXQYIOAOKTQZsGrL+xe2xOVUae4Nzf5B3daAIj5RnmVHJ5dTO
 c6BEhHDJiBNu856h
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Sunday 01 November 2015 13:50:53 Sinan Kaya wrote:
> 
> >> The issue is not writel_relaxed vs. writel. After I issue reset, I need
> >> wait for some time to confirm reset was done. I can use readl_polling
> >> instead of mdelay if we don't like mdelay.
> >
> > I meant that both _relaxed() and mdelay() are probably wrong here.
> 
> You are right about redundant writel_relaxed + wmb. They are effectively 
> equal to writel.

Actually, writel() is wmb()+writel_relaxed(), not the other way round:

When sending a command to a device that can start a DMA transfer,
the barrier is required to ensure that the DMA happens after previously
written data has gone from the CPU write buffers into the memory that
is used as the source for the transfer.

A barrier after the writel() has no effect, as MMIO writes are posted
on the bus.

> However, after issuing the command; I still need to wait some amount of 
> time until hardware acknowledges the commands like reset/enable/disable. 
> These are relatively faster operations happening in microseconds. That's 
> why, I have mdelay there.
> 
> I'll take a look at workqueues but it could turn out to be an overkill 
> for few microseconds.

Most devices are able to provide an interrupt for long-running commands.
Are you sure that yours is unable to do this? If so, is this a design
mistake or an implementation bug?

> >>> Reading the status probably requires a readl() rather than readl_relaxed()
> >>> to guarantee that the DMA data has arrived in memory by the time that the
> >>> register data is seen by the CPU. If using readl_relaxed() here is a valid
> >>> and required optimization, please add a comment to explain why it works
> >>> and how much you gain.
> >>
> >> I will add some description. This is a high speed peripheral. I don't
> >> like spreading barriers as candies inside the readl and writel unless I
> >> have to.
> >>
> >> According to the barriers video, I watched on youtube this should be the
> >> rule for ordering.
> >>
> >> "if you do two relaxed reads and check the results of the returned
> >> variables, ARM architecture guarantees that these two relaxed variables
> >> will get observed during the check."
> >>
> >> this is called implied ordering or something of that sort.
> >
> > My point was a bit different: while it is guaranteed that the
> > result of the readl_relaxed() is observed in order, they do not
> > guarantee that a DMA from device to memory that was started by
> > the device before the readl_relaxed() has arrived in memory
> > by the time that the readl_relaxed() result is visible to the
> > CPU and it starts accessing the memory.
> >
> I checked with the hardware designers. Hardware guarantees that by the 
> time interrupt is observed, all data transactions in flight are 
> delivered to their respective places and are visible to the CPU. I'll 
> add a comment in the code about this.

I'm curious about this. Does that mean the device is not meant for
high-performance transfers and just synchronizes the bus before
triggering the interrupt?

> > In other words, when the hardware sends you data followed by an
> > interrupt to tell you the data is there, your interrupt handler
> > can tell the driver that is waiting for this data that the DMA
> > is complete while the data itself is still in flight, e.g. waiting
> > for an IOMMU to fetch page table entries.
> >
> There is HW guarantee for ordering.
> 
> On demand paging for IOMMU is only supported for PCIe via PRI (Page 
> Request Interface) not for HIDMA. All other hardware instances work on 
> pinned DMA addresses. I'll drop a note about this too to the code as well.

I wasn't talking about paging, just fetching the IOTLB from the
preloaded page tables in RAM. This can takes several uncached memory
accesses, so it would generally be slow.


	Arnd