From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1753499AbeDYVfU (ORCPT <rfc822;w@1wt.eu>);
        Wed, 25 Apr 2018 17:35:20 -0400
Received: from mail-it0-f65.google.com ([209.85.214.65]:55408 "EHLO
        mail-it0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751852AbeDYVfP (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 25 Apr 2018 17:35:15 -0400
X-Google-Smtp-Source: AB8JxZq2z+IHLA1iLdRdIvdCPmos1ZhoRp4QRdmNMrmvJgpJX41E7ZGZPjn3ZHP7EmTtz1fqkLGQnoHVGNgeLJSt4P8=
MIME-Version: 1.0
X-Originating-IP: [2a02:168:5635:0:39d2:f87e:2033:9f6]
In-Reply-To: <20180425153312.GD27076@infradead.org>
References: <20180420152111.GR31310@phenom.ffwll.local> <20180424184847.GA3247@infradead.org>
 <CAKMK7uFL68pu+-9LODTgz+GQYvxpnXOGhxfz9zorJ_JKsPVw2g@mail.gmail.com>
 <20180425054855.GA17038@infradead.org> <CAKMK7uEFitkNQrD6cLX5Txe11XhVO=LC4YKJXH=VNdq+CY=DjQ@mail.gmail.com>
 <CAKMK7uFx=KB1vup=WhPCyfUFairKQcRR4BEd7aXaX1Pj-vj3Cw@mail.gmail.com>
 <20180425064335.GB28100@infradead.org> <20180425074151.GA2271@ulmo>
 <20180425085439.GA29996@infradead.org> <20180425100429.GR25142@phenom.ffwll.local>
 <20180425153312.GD27076@infradead.org>
From: Daniel Vetter <daniel@ffwll.ch>
Date: Wed, 25 Apr 2018 23:35:13 +0200
X-Google-Sender-Auth: clLW9LOvJLz6ewemqlwmJZjZzyU
Message-ID: <CAKMK7uH14qupTYDa1pr8UC434Vs+97eUXj+fYi=+2uijCLayMA@mail.gmail.com>
Subject: Re: noveau vs arm dma ops
To: Christoph Hellwig <hch@infradead.org>
Cc: Thierry Reding <treding@nvidia.com>,
        =?UTF-8?Q?Christian_K=C3=B6nig?= <christian.koenig@amd.com>,
        "moderated list:DMA BUFFER SHARING FRAMEWORK"
        <linaro-mm-sig@lists.linaro.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        amd-gfx list <amd-gfx@lists.freedesktop.org>,
        Jerome Glisse <jglisse@redhat.com>,
        dri-devel <dri-devel@lists.freedesktop.org>,
        Dan Williams <dan.j.williams@intel.com>,
        Logan Gunthorpe <logang@deltatee.com>,
        "open list:DMA BUFFER SHARING FRAMEWORK"
        <linux-media@vger.kernel.org>,
        iommu@lists.linux-foundation.org,
        Linux ARM <linux-arm-kernel@lists.infradead.org>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, Apr 25, 2018 at 5:33 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Wed, Apr 25, 2018 at 12:04:29PM +0200, Daniel Vetter wrote:
>> > Coordinating the backport of a trivial helper in the arm tree is not
>> > the end of the world.  Really, this cowboy attitude is a good reason
>> > why graphics folks have such a bad rep.  You keep poking into random
>> > kernel internals, don't talk to anoyone and then complain if people
>> > are upset.  This shouldn't be surprising.
>>
>> Not really agreeing on the cowboy thing. The fundamental problem is that
>> the dma api provides abstraction that seriously gets in the way of writing
>> a gpu driver. Some examples:
>
> So talk to other people.  Maybe people share your frustation.  Or maybe
> other people have a way to help.
>
>> - We never want bounce buffers, ever. dma_map_sg gives us that, so there's
>>   hacks to fall back to a cache of pages allocated using
>>   dma_alloc_coherent if you build a kernel with bounce buffers.
>
> get_required_mask() is supposed to tell you if you are safe.  However
> we are missing lots of implementations of it for iommus so you might get
> some false negatives, improvements welcome.  It's been on my list of
> things to fix in the DMA API, but it is nowhere near the top.

I hasn't come up in a while in some fireworks, so I honestly don't
remember exactly what the issues have been. But

commit d766ef53006c2c38a7fe2bef0904105a793383f2
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Dec 19 12:43:45 2016 +0000

    drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping

and the various bits of code that a

$ git grep SWIOTLB -- drivers/gpu

turns up is what we're doing to hack around that stuff. And in general
(there's some exceptions) gpus should be able to address everything,
so I never fully understood where that's even coming from.

>> - dma api hides the cache flushing requirements from us. GPUs love
>>   non-snooped access, and worse give userspace control over that. We want
>>   a strict separation between mapping stuff and flushing stuff. With the
>>   IOMMU api we mostly have the former, but for the later arch maintainers
>>   regularly tells they won't allow that. So we have drm_clflush.c.
>
> The problem is that a cache flushing API entirely separate is hard. That
> being said if you look at my generic dma-noncoherent API series it tries
> to move that way.  So far it is in early stages and apparently rather
> buggy unfortunately.

I'm assuming this stuff here?

https://lkml.org/lkml/2018/4/20/146

Anyway got lost in all that work a bit, looks really nice.

>> - dma api hides how/where memory is allocated. Kinda similar problem,
>>   except now for CMA or address limits. So either we roll our own
>>   allocators and then dma_map_sg (and pray it doesn't bounce buffer), or
>>   we use dma_alloc_coherent and then grab the sgt to get at the CMA
>>   allocations because that's the only way. Which sucks, because we can't
>>   directly tell CMA how to back off if there's some way to make CMA memory
>>   available through other means (gpus love to hog all of memory, so we
>>   have shrinkers and everything).
>
> If you really care about doing explicitly cache flushing anyway (see
> above) allocating your own memory and mapping it where needed is by
> far the superior solution.  On cache coherent architectures
> dma_alloc_coherent is nothing but allocate memory + dma_map_single.
> On non coherent allocations the memory might come through a special
> pool or must be used through a special virtual address mapping that
> is set up either statically or dynamically.  For that case splitting
> allocation and mapping is a good idea in many ways, and I plan to move
> towards that once the number of dma mapping implementations is down
> to a reasonable number so that it can actually be done.

Yeah the above is pretty much what we do on x86. dma-api believes
everything is coherent, so dma_map_sg does the mapping we want and
nothing else (minus swiotlb fun). Cache flushing, allocations, all
done by the driver.

On arm that doesn't work. The iommu api seems like a good fit, except
the dma-api tends to get in the way a bit (drm/msm apparently has
similar problems like tegra), and if you need contiguous memory
dma_alloc_coherent is the only way to get at contiguous memory. There
was a huge discussion years ago about that, and direct cma access was
shot down because it would have exposed too much of the caching
attribute mangling required (most arm platforms need wc-pages to not
be in the kernel's linear map apparently).

Anything that separate these 3 things more (allocation pools, mapping
through IOMMUs and flushing cpu caches) sounds like the right
direction to me. Even if that throws some portability across platforms
away - drivers who want to control things in this much detail aren't
really portable (without some serious work) anyway.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

From mboxrd@z Thu Jan  1 00:00:00 1970
From: Daniel Vetter <daniel-/w4YWyX8dFk@public.gmane.org>
Subject: Re: noveau vs arm dma ops
Date: Wed, 25 Apr 2018 23:35:13 +0200
Message-ID: <CAKMK7uH14qupTYDa1pr8UC434Vs+97eUXj+fYi=+2uijCLayMA@mail.gmail.com>
References: <20180420152111.GR31310@phenom.ffwll.local>
 <20180424184847.GA3247@infradead.org>
 <CAKMK7uFL68pu+-9LODTgz+GQYvxpnXOGhxfz9zorJ_JKsPVw2g@mail.gmail.com>
 <20180425054855.GA17038@infradead.org>
 <CAKMK7uEFitkNQrD6cLX5Txe11XhVO=LC4YKJXH=VNdq+CY=DjQ@mail.gmail.com>
 <CAKMK7uFx=KB1vup=WhPCyfUFairKQcRR4BEd7aXaX1Pj-vj3Cw@mail.gmail.com>
 <20180425064335.GB28100@infradead.org> <20180425074151.GA2271@ulmo>
 <20180425085439.GA29996@infradead.org>
 <20180425100429.GR25142@phenom.ffwll.local>
 <20180425153312.GD27076@infradead.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Return-path: <amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
In-Reply-To: <20180425153312.GD27076-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/amd-gfx>,
 <mailto:amd-gfx-request-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/amd-gfx>
List-Post: <mailto:amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
List-Help: <mailto:amd-gfx-request-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/amd-gfx>,
 <mailto:amd-gfx-request-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org?subject=subscribe>
Errors-To: amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Sender: "amd-gfx" <amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
To: Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Cc: Linux Kernel Mailing List <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, amd-gfx list <amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>, "moderated list:DMA BUFFER SHARING FRAMEWORK" <linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw@public.gmane.org>, Jerome Glisse <jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, dri-devel <dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>, Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>, Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>, =?UTF-8?Q?Christian_K=C3=B6nig?= <christian.koenig-5C7GfCeVMHo@public.gmane.org>, Linux ARM <linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>, "open list:DMA BUFFER SHARING FRAMEWORK" <linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
List-Id: iommu@lists.linux-foundation.org

T24gV2VkLCBBcHIgMjUsIDIwMTggYXQgNTozMyBQTSwgQ2hyaXN0b3BoIEhlbGx3aWcgPGhjaEBp
bmZyYWRlYWQub3JnPiB3cm90ZToKPiBPbiBXZWQsIEFwciAyNSwgMjAxOCBhdCAxMjowNDoyOVBN
ICswMjAwLCBEYW5pZWwgVmV0dGVyIHdyb3RlOgo+PiA+IENvb3JkaW5hdGluZyB0aGUgYmFja3Bv
cnQgb2YgYSB0cml2aWFsIGhlbHBlciBpbiB0aGUgYXJtIHRyZWUgaXMgbm90Cj4+ID4gdGhlIGVu
ZCBvZiB0aGUgd29ybGQuICBSZWFsbHksIHRoaXMgY293Ym95IGF0dGl0dWRlIGlzIGEgZ29vZCBy
ZWFzb24KPj4gPiB3aHkgZ3JhcGhpY3MgZm9sa3MgaGF2ZSBzdWNoIGEgYmFkIHJlcC4gIFlvdSBr
ZWVwIHBva2luZyBpbnRvIHJhbmRvbQo+PiA+IGtlcm5lbCBpbnRlcm5hbHMsIGRvbid0IHRhbGsg
dG8gYW5veW9uZSBhbmQgdGhlbiBjb21wbGFpbiBpZiBwZW9wbGUKPj4gPiBhcmUgdXBzZXQuICBU
aGlzIHNob3VsZG4ndCBiZSBzdXJwcmlzaW5nLgo+Pgo+PiBOb3QgcmVhbGx5IGFncmVlaW5nIG9u
IHRoZSBjb3dib3kgdGhpbmcuIFRoZSBmdW5kYW1lbnRhbCBwcm9ibGVtIGlzIHRoYXQKPj4gdGhl
IGRtYSBhcGkgcHJvdmlkZXMgYWJzdHJhY3Rpb24gdGhhdCBzZXJpb3VzbHkgZ2V0cyBpbiB0aGUg
d2F5IG9mIHdyaXRpbmcKPj4gYSBncHUgZHJpdmVyLiBTb21lIGV4YW1wbGVzOgo+Cj4gU28gdGFs
ayB0byBvdGhlciBwZW9wbGUuICBNYXliZSBwZW9wbGUgc2hhcmUgeW91ciBmcnVzdGF0aW9uLiAg
T3IgbWF5YmUKPiBvdGhlciBwZW9wbGUgaGF2ZSBhIHdheSB0byBoZWxwLgo+Cj4+IC0gV2UgbmV2
ZXIgd2FudCBib3VuY2UgYnVmZmVycywgZXZlci4gZG1hX21hcF9zZyBnaXZlcyB1cyB0aGF0LCBz
byB0aGVyZSdzCj4+ICAgaGFja3MgdG8gZmFsbCBiYWNrIHRvIGEgY2FjaGUgb2YgcGFnZXMgYWxs
b2NhdGVkIHVzaW5nCj4+ICAgZG1hX2FsbG9jX2NvaGVyZW50IGlmIHlvdSBidWlsZCBhIGtlcm5l
bCB3aXRoIGJvdW5jZSBidWZmZXJzLgo+Cj4gZ2V0X3JlcXVpcmVkX21hc2soKSBpcyBzdXBwb3Nl
ZCB0byB0ZWxsIHlvdSBpZiB5b3UgYXJlIHNhZmUuICBIb3dldmVyCj4gd2UgYXJlIG1pc3Npbmcg
bG90cyBvZiBpbXBsZW1lbnRhdGlvbnMgb2YgaXQgZm9yIGlvbW11cyBzbyB5b3UgbWlnaHQgZ2V0
Cj4gc29tZSBmYWxzZSBuZWdhdGl2ZXMsIGltcHJvdmVtZW50cyB3ZWxjb21lLiAgSXQncyBiZWVu
IG9uIG15IGxpc3Qgb2YKPiB0aGluZ3MgdG8gZml4IGluIHRoZSBETUEgQVBJLCBidXQgaXQgaXMg
bm93aGVyZSBuZWFyIHRoZSB0b3AuCgpJIGhhc24ndCBjb21lIHVwIGluIGEgd2hpbGUgaW4gc29t
ZSBmaXJld29ya3MsIHNvIEkgaG9uZXN0bHkgZG9uJ3QKcmVtZW1iZXIgZXhhY3RseSB3aGF0IHRo
ZSBpc3N1ZXMgaGF2ZSBiZWVuLiBCdXQKCmNvbW1pdCBkNzY2ZWY1MzAwNmMyYzM4YTdmZTJiZWYw
OTA0MTA1YTc5MzM4M2YyCkF1dGhvcjogQ2hyaXMgV2lsc29uIDxjaHJpc0BjaHJpcy13aWxzb24u
Y28udWs+CkRhdGU6ICAgTW9uIERlYyAxOSAxMjo0Mzo0NSAyMDE2ICswMDAwCgogICAgZHJtL2k5
MTU6IEZhbGxiYWNrIHRvIHNpbmdsZSBQQUdFX1NJWkUgc2VnbWVudHMgZm9yIERNQSByZW1hcHBp
bmcKCmFuZCB0aGUgdmFyaW91cyBiaXRzIG9mIGNvZGUgdGhhdCBhCgokIGdpdCBncmVwIFNXSU9U
TEIgLS0gZHJpdmVycy9ncHUKCnR1cm5zIHVwIGlzIHdoYXQgd2UncmUgZG9pbmcgdG8gaGFjayBh
cm91bmQgdGhhdCBzdHVmZi4gQW5kIGluIGdlbmVyYWwKKHRoZXJlJ3Mgc29tZSBleGNlcHRpb25z
KSBncHVzIHNob3VsZCBiZSBhYmxlIHRvIGFkZHJlc3MgZXZlcnl0aGluZywKc28gSSBuZXZlciBm
dWxseSB1bmRlcnN0b29kIHdoZXJlIHRoYXQncyBldmVuIGNvbWluZyBmcm9tLgoKPj4gLSBkbWEg
YXBpIGhpZGVzIHRoZSBjYWNoZSBmbHVzaGluZyByZXF1aXJlbWVudHMgZnJvbSB1cy4gR1BVcyBs
b3ZlCj4+ICAgbm9uLXNub29wZWQgYWNjZXNzLCBhbmQgd29yc2UgZ2l2ZSB1c2Vyc3BhY2UgY29u
dHJvbCBvdmVyIHRoYXQuIFdlIHdhbnQKPj4gICBhIHN0cmljdCBzZXBhcmF0aW9uIGJldHdlZW4g
bWFwcGluZyBzdHVmZiBhbmQgZmx1c2hpbmcgc3R1ZmYuIFdpdGggdGhlCj4+ICAgSU9NTVUgYXBp
IHdlIG1vc3RseSBoYXZlIHRoZSBmb3JtZXIsIGJ1dCBmb3IgdGhlIGxhdGVyIGFyY2ggbWFpbnRh
aW5lcnMKPj4gICByZWd1bGFybHkgdGVsbHMgdGhleSB3b24ndCBhbGxvdyB0aGF0LiBTbyB3ZSBo
YXZlIGRybV9jbGZsdXNoLmMuCj4KPiBUaGUgcHJvYmxlbSBpcyB0aGF0IGEgY2FjaGUgZmx1c2hp
bmcgQVBJIGVudGlyZWx5IHNlcGFyYXRlIGlzIGhhcmQuIFRoYXQKPiBiZWluZyBzYWlkIGlmIHlv
dSBsb29rIGF0IG15IGdlbmVyaWMgZG1hLW5vbmNvaGVyZW50IEFQSSBzZXJpZXMgaXQgdHJpZXMK
PiB0byBtb3ZlIHRoYXQgd2F5LiAgU28gZmFyIGl0IGlzIGluIGVhcmx5IHN0YWdlcyBhbmQgYXBw
YXJlbnRseSByYXRoZXIKPiBidWdneSB1bmZvcnR1bmF0ZWx5LgoKSSdtIGFzc3VtaW5nIHRoaXMg
c3R1ZmYgaGVyZT8KCmh0dHBzOi8vbGttbC5vcmcvbGttbC8yMDE4LzQvMjAvMTQ2CgpBbnl3YXkg
Z290IGxvc3QgaW4gYWxsIHRoYXQgd29yayBhIGJpdCwgbG9va3MgcmVhbGx5IG5pY2UuCgo+PiAt
IGRtYSBhcGkgaGlkZXMgaG93L3doZXJlIG1lbW9yeSBpcyBhbGxvY2F0ZWQuIEtpbmRhIHNpbWls
YXIgcHJvYmxlbSwKPj4gICBleGNlcHQgbm93IGZvciBDTUEgb3IgYWRkcmVzcyBsaW1pdHMuIFNv
IGVpdGhlciB3ZSByb2xsIG91ciBvd24KPj4gICBhbGxvY2F0b3JzIGFuZCB0aGVuIGRtYV9tYXBf
c2cgKGFuZCBwcmF5IGl0IGRvZXNuJ3QgYm91bmNlIGJ1ZmZlciksIG9yCj4+ICAgd2UgdXNlIGRt
YV9hbGxvY19jb2hlcmVudCBhbmQgdGhlbiBncmFiIHRoZSBzZ3QgdG8gZ2V0IGF0IHRoZSBDTUEK
Pj4gICBhbGxvY2F0aW9ucyBiZWNhdXNlIHRoYXQncyB0aGUgb25seSB3YXkuIFdoaWNoIHN1Y2tz
LCBiZWNhdXNlIHdlIGNhbid0Cj4+ICAgZGlyZWN0bHkgdGVsbCBDTUEgaG93IHRvIGJhY2sgb2Zm
IGlmIHRoZXJlJ3Mgc29tZSB3YXkgdG8gbWFrZSBDTUEgbWVtb3J5Cj4+ICAgYXZhaWxhYmxlIHRo
cm91Z2ggb3RoZXIgbWVhbnMgKGdwdXMgbG92ZSB0byBob2cgYWxsIG9mIG1lbW9yeSwgc28gd2UK
Pj4gICBoYXZlIHNocmlua2VycyBhbmQgZXZlcnl0aGluZykuCj4KPiBJZiB5b3UgcmVhbGx5IGNh
cmUgYWJvdXQgZG9pbmcgZXhwbGljaXRseSBjYWNoZSBmbHVzaGluZyBhbnl3YXkgKHNlZQo+IGFi
b3ZlKSBhbGxvY2F0aW5nIHlvdXIgb3duIG1lbW9yeSBhbmQgbWFwcGluZyBpdCB3aGVyZSBuZWVk
ZWQgaXMgYnkKPiBmYXIgdGhlIHN1cGVyaW9yIHNvbHV0aW9uLiAgT24gY2FjaGUgY29oZXJlbnQg
YXJjaGl0ZWN0dXJlcwo+IGRtYV9hbGxvY19jb2hlcmVudCBpcyBub3RoaW5nIGJ1dCBhbGxvY2F0
ZSBtZW1vcnkgKyBkbWFfbWFwX3NpbmdsZS4KPiBPbiBub24gY29oZXJlbnQgYWxsb2NhdGlvbnMg
dGhlIG1lbW9yeSBtaWdodCBjb21lIHRocm91Z2ggYSBzcGVjaWFsCj4gcG9vbCBvciBtdXN0IGJl
IHVzZWQgdGhyb3VnaCBhIHNwZWNpYWwgdmlydHVhbCBhZGRyZXNzIG1hcHBpbmcgdGhhdAo+IGlz
IHNldCB1cCBlaXRoZXIgc3RhdGljYWxseSBvciBkeW5hbWljYWxseS4gIEZvciB0aGF0IGNhc2Ug
c3BsaXR0aW5nCj4gYWxsb2NhdGlvbiBhbmQgbWFwcGluZyBpcyBhIGdvb2QgaWRlYSBpbiBtYW55
IHdheXMsIGFuZCBJIHBsYW4gdG8gbW92ZQo+IHRvd2FyZHMgdGhhdCBvbmNlIHRoZSBudW1iZXIg
b2YgZG1hIG1hcHBpbmcgaW1wbGVtZW50YXRpb25zIGlzIGRvd24KPiB0byBhIHJlYXNvbmFibGUg
bnVtYmVyIHNvIHRoYXQgaXQgY2FuIGFjdHVhbGx5IGJlIGRvbmUuCgpZZWFoIHRoZSBhYm92ZSBp
cyBwcmV0dHkgbXVjaCB3aGF0IHdlIGRvIG9uIHg4Ni4gZG1hLWFwaSBiZWxpZXZlcwpldmVyeXRo
aW5nIGlzIGNvaGVyZW50LCBzbyBkbWFfbWFwX3NnIGRvZXMgdGhlIG1hcHBpbmcgd2Ugd2FudCBh
bmQKbm90aGluZyBlbHNlIChtaW51cyBzd2lvdGxiIGZ1bikuIENhY2hlIGZsdXNoaW5nLCBhbGxv
Y2F0aW9ucywgYWxsCmRvbmUgYnkgdGhlIGRyaXZlci4KCk9uIGFybSB0aGF0IGRvZXNuJ3Qgd29y
ay4gVGhlIGlvbW11IGFwaSBzZWVtcyBsaWtlIGEgZ29vZCBmaXQsIGV4Y2VwdAp0aGUgZG1hLWFw
aSB0ZW5kcyB0byBnZXQgaW4gdGhlIHdheSBhIGJpdCAoZHJtL21zbSBhcHBhcmVudGx5IGhhcwpz
aW1pbGFyIHByb2JsZW1zIGxpa2UgdGVncmEpLCBhbmQgaWYgeW91IG5lZWQgY29udGlndW91cyBt
ZW1vcnkKZG1hX2FsbG9jX2NvaGVyZW50IGlzIHRoZSBvbmx5IHdheSB0byBnZXQgYXQgY29udGln
dW91cyBtZW1vcnkuIFRoZXJlCndhcyBhIGh1Z2UgZGlzY3Vzc2lvbiB5ZWFycyBhZ28gYWJvdXQg
dGhhdCwgYW5kIGRpcmVjdCBjbWEgYWNjZXNzIHdhcwpzaG90IGRvd24gYmVjYXVzZSBpdCB3b3Vs
ZCBoYXZlIGV4cG9zZWQgdG9vIG11Y2ggb2YgdGhlIGNhY2hpbmcKYXR0cmlidXRlIG1hbmdsaW5n
IHJlcXVpcmVkIChtb3N0IGFybSBwbGF0Zm9ybXMgbmVlZCB3Yy1wYWdlcyB0byBub3QKYmUgaW4g
dGhlIGtlcm5lbCdzIGxpbmVhciBtYXAgYXBwYXJlbnRseSkuCgpBbnl0aGluZyB0aGF0IHNlcGFy
YXRlIHRoZXNlIDMgdGhpbmdzIG1vcmUgKGFsbG9jYXRpb24gcG9vbHMsIG1hcHBpbmcKdGhyb3Vn
aCBJT01NVXMgYW5kIGZsdXNoaW5nIGNwdSBjYWNoZXMpIHNvdW5kcyBsaWtlIHRoZSByaWdodApk
aXJlY3Rpb24gdG8gbWUuIEV2ZW4gaWYgdGhhdCB0aHJvd3Mgc29tZSBwb3J0YWJpbGl0eSBhY3Jv
c3MgcGxhdGZvcm1zCmF3YXkgLSBkcml2ZXJzIHdobyB3YW50IHRvIGNvbnRyb2wgdGhpbmdzIGlu
IHRoaXMgbXVjaCBkZXRhaWwgYXJlbid0CnJlYWxseSBwb3J0YWJsZSAod2l0aG91dCBzb21lIHNl
cmlvdXMgd29yaykgYW55d2F5LgotRGFuaWVsCi0tIApEYW5pZWwgVmV0dGVyClNvZnR3YXJlIEVu
Z2luZWVyLCBJbnRlbCBDb3Jwb3JhdGlvbgorNDEgKDApIDc5IDM2NSA1NyA0OCAtIGh0dHA6Ly9i
bG9nLmZmd2xsLmNoCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f
X19fCmFtZC1nZnggbWFpbGluZyBsaXN0CmFtZC1nZnhAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0
dHBzOi8vbGlzdHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vYW1kLWdmeAo=

From mboxrd@z Thu Jan  1 00:00:00 1970
From: daniel@ffwll.ch (Daniel Vetter)
Date: Wed, 25 Apr 2018 23:35:13 +0200
Subject: noveau vs arm dma ops
In-Reply-To: <20180425153312.GD27076@infradead.org>
References: <20180420152111.GR31310@phenom.ffwll.local>
 <20180424184847.GA3247@infradead.org>
 <CAKMK7uFL68pu+-9LODTgz+GQYvxpnXOGhxfz9zorJ_JKsPVw2g@mail.gmail.com>
 <20180425054855.GA17038@infradead.org>
 <CAKMK7uEFitkNQrD6cLX5Txe11XhVO=LC4YKJXH=VNdq+CY=DjQ@mail.gmail.com>
 <CAKMK7uFx=KB1vup=WhPCyfUFairKQcRR4BEd7aXaX1Pj-vj3Cw@mail.gmail.com>
 <20180425064335.GB28100@infradead.org> <20180425074151.GA2271@ulmo>
 <20180425085439.GA29996@infradead.org>
 <20180425100429.GR25142@phenom.ffwll.local>
 <20180425153312.GD27076@infradead.org>
Message-ID: <CAKMK7uH14qupTYDa1pr8UC434Vs+97eUXj+fYi=+2uijCLayMA@mail.gmail.com>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

On Wed, Apr 25, 2018 at 5:33 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Wed, Apr 25, 2018 at 12:04:29PM +0200, Daniel Vetter wrote:
>> > Coordinating the backport of a trivial helper in the arm tree is not
>> > the end of the world.  Really, this cowboy attitude is a good reason
>> > why graphics folks have such a bad rep.  You keep poking into random
>> > kernel internals, don't talk to anoyone and then complain if people
>> > are upset.  This shouldn't be surprising.
>>
>> Not really agreeing on the cowboy thing. The fundamental problem is that
>> the dma api provides abstraction that seriously gets in the way of writing
>> a gpu driver. Some examples:
>
> So talk to other people.  Maybe people share your frustation.  Or maybe
> other people have a way to help.
>
>> - We never want bounce buffers, ever. dma_map_sg gives us that, so there's
>>   hacks to fall back to a cache of pages allocated using
>>   dma_alloc_coherent if you build a kernel with bounce buffers.
>
> get_required_mask() is supposed to tell you if you are safe.  However
> we are missing lots of implementations of it for iommus so you might get
> some false negatives, improvements welcome.  It's been on my list of
> things to fix in the DMA API, but it is nowhere near the top.

I hasn't come up in a while in some fireworks, so I honestly don't
remember exactly what the issues have been. But

commit d766ef53006c2c38a7fe2bef0904105a793383f2
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Dec 19 12:43:45 2016 +0000

    drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping

and the various bits of code that a

$ git grep SWIOTLB -- drivers/gpu

turns up is what we're doing to hack around that stuff. And in general
(there's some exceptions) gpus should be able to address everything,
so I never fully understood where that's even coming from.

>> - dma api hides the cache flushing requirements from us. GPUs love
>>   non-snooped access, and worse give userspace control over that. We want
>>   a strict separation between mapping stuff and flushing stuff. With the
>>   IOMMU api we mostly have the former, but for the later arch maintainers
>>   regularly tells they won't allow that. So we have drm_clflush.c.
>
> The problem is that a cache flushing API entirely separate is hard. That
> being said if you look at my generic dma-noncoherent API series it tries
> to move that way.  So far it is in early stages and apparently rather
> buggy unfortunately.

I'm assuming this stuff here?

https://lkml.org/lkml/2018/4/20/146

Anyway got lost in all that work a bit, looks really nice.

>> - dma api hides how/where memory is allocated. Kinda similar problem,
>>   except now for CMA or address limits. So either we roll our own
>>   allocators and then dma_map_sg (and pray it doesn't bounce buffer), or
>>   we use dma_alloc_coherent and then grab the sgt to get at the CMA
>>   allocations because that's the only way. Which sucks, because we can't
>>   directly tell CMA how to back off if there's some way to make CMA memory
>>   available through other means (gpus love to hog all of memory, so we
>>   have shrinkers and everything).
>
> If you really care about doing explicitly cache flushing anyway (see
> above) allocating your own memory and mapping it where needed is by
> far the superior solution.  On cache coherent architectures
> dma_alloc_coherent is nothing but allocate memory + dma_map_single.
> On non coherent allocations the memory might come through a special
> pool or must be used through a special virtual address mapping that
> is set up either statically or dynamically.  For that case splitting
> allocation and mapping is a good idea in many ways, and I plan to move
> towards that once the number of dma mapping implementations is down
> to a reasonable number so that it can actually be done.

Yeah the above is pretty much what we do on x86. dma-api believes
everything is coherent, so dma_map_sg does the mapping we want and
nothing else (minus swiotlb fun). Cache flushing, allocations, all
done by the driver.

On arm that doesn't work. The iommu api seems like a good fit, except
the dma-api tends to get in the way a bit (drm/msm apparently has
similar problems like tegra), and if you need contiguous memory
dma_alloc_coherent is the only way to get at contiguous memory. There
was a huge discussion years ago about that, and direct cma access was
shot down because it would have exposed too much of the caching
attribute mangling required (most arm platforms need wc-pages to not
be in the kernel's linear map apparently).

Anything that separate these 3 things more (allocation pools, mapping
through IOMMUs and flushing cpu caches) sounds like the right
direction to me. Even if that throws some portability across platforms
away - drivers who want to control things in this much detail aren't
really portable (without some serious work) anyway.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch