From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753499AbeDYVfU (ORCPT ); Wed, 25 Apr 2018 17:35:20 -0400 Received: from mail-it0-f65.google.com ([209.85.214.65]:55408 "EHLO mail-it0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751852AbeDYVfP (ORCPT ); Wed, 25 Apr 2018 17:35:15 -0400 X-Google-Smtp-Source: AB8JxZq2z+IHLA1iLdRdIvdCPmos1ZhoRp4QRdmNMrmvJgpJX41E7ZGZPjn3ZHP7EmTtz1fqkLGQnoHVGNgeLJSt4P8= MIME-Version: 1.0 X-Originating-IP: [2a02:168:5635:0:39d2:f87e:2033:9f6] In-Reply-To: <20180425153312.GD27076@infradead.org> References: <20180420152111.GR31310@phenom.ffwll.local> <20180424184847.GA3247@infradead.org> <20180425054855.GA17038@infradead.org> <20180425064335.GB28100@infradead.org> <20180425074151.GA2271@ulmo> <20180425085439.GA29996@infradead.org> <20180425100429.GR25142@phenom.ffwll.local> <20180425153312.GD27076@infradead.org> From: Daniel Vetter Date: Wed, 25 Apr 2018 23:35:13 +0200 X-Google-Sender-Auth: clLW9LOvJLz6ewemqlwmJZjZzyU Message-ID: Subject: Re: noveau vs arm dma ops To: Christoph Hellwig Cc: Thierry Reding , =?UTF-8?Q?Christian_K=C3=B6nig?= , "moderated list:DMA BUFFER SHARING FRAMEWORK" , Linux Kernel Mailing List , amd-gfx list , Jerome Glisse , dri-devel , Dan Williams , Logan Gunthorpe , "open list:DMA BUFFER SHARING FRAMEWORK" , iommu@lists.linux-foundation.org, Linux ARM Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 25, 2018 at 5:33 PM, Christoph Hellwig wrote: > On Wed, Apr 25, 2018 at 12:04:29PM +0200, Daniel Vetter wrote: >> > Coordinating the backport of a trivial helper in the arm tree is not >> > the end of the world. Really, this cowboy attitude is a good reason >> > why graphics folks have such a bad rep. You keep poking into random >> > kernel internals, don't talk to anoyone and then complain if people >> > are upset. This shouldn't be surprising. >> >> Not really agreeing on the cowboy thing. The fundamental problem is that >> the dma api provides abstraction that seriously gets in the way of writing >> a gpu driver. Some examples: > > So talk to other people. Maybe people share your frustation. Or maybe > other people have a way to help. > >> - We never want bounce buffers, ever. dma_map_sg gives us that, so there's >> hacks to fall back to a cache of pages allocated using >> dma_alloc_coherent if you build a kernel with bounce buffers. > > get_required_mask() is supposed to tell you if you are safe. However > we are missing lots of implementations of it for iommus so you might get > some false negatives, improvements welcome. It's been on my list of > things to fix in the DMA API, but it is nowhere near the top. I hasn't come up in a while in some fireworks, so I honestly don't remember exactly what the issues have been. But commit d766ef53006c2c38a7fe2bef0904105a793383f2 Author: Chris Wilson Date: Mon Dec 19 12:43:45 2016 +0000 drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping and the various bits of code that a $ git grep SWIOTLB -- drivers/gpu turns up is what we're doing to hack around that stuff. And in general (there's some exceptions) gpus should be able to address everything, so I never fully understood where that's even coming from. >> - dma api hides the cache flushing requirements from us. GPUs love >> non-snooped access, and worse give userspace control over that. We want >> a strict separation between mapping stuff and flushing stuff. With the >> IOMMU api we mostly have the former, but for the later arch maintainers >> regularly tells they won't allow that. So we have drm_clflush.c. > > The problem is that a cache flushing API entirely separate is hard. That > being said if you look at my generic dma-noncoherent API series it tries > to move that way. So far it is in early stages and apparently rather > buggy unfortunately. I'm assuming this stuff here? https://lkml.org/lkml/2018/4/20/146 Anyway got lost in all that work a bit, looks really nice. >> - dma api hides how/where memory is allocated. Kinda similar problem, >> except now for CMA or address limits. So either we roll our own >> allocators and then dma_map_sg (and pray it doesn't bounce buffer), or >> we use dma_alloc_coherent and then grab the sgt to get at the CMA >> allocations because that's the only way. Which sucks, because we can't >> directly tell CMA how to back off if there's some way to make CMA memory >> available through other means (gpus love to hog all of memory, so we >> have shrinkers and everything). > > If you really care about doing explicitly cache flushing anyway (see > above) allocating your own memory and mapping it where needed is by > far the superior solution. On cache coherent architectures > dma_alloc_coherent is nothing but allocate memory + dma_map_single. > On non coherent allocations the memory might come through a special > pool or must be used through a special virtual address mapping that > is set up either statically or dynamically. For that case splitting > allocation and mapping is a good idea in many ways, and I plan to move > towards that once the number of dma mapping implementations is down > to a reasonable number so that it can actually be done. Yeah the above is pretty much what we do on x86. dma-api believes everything is coherent, so dma_map_sg does the mapping we want and nothing else (minus swiotlb fun). Cache flushing, allocations, all done by the driver. On arm that doesn't work. The iommu api seems like a good fit, except the dma-api tends to get in the way a bit (drm/msm apparently has similar problems like tegra), and if you need contiguous memory dma_alloc_coherent is the only way to get at contiguous memory. There was a huge discussion years ago about that, and direct cma access was shot down because it would have exposed too much of the caching attribute mangling required (most arm platforms need wc-pages to not be in the kernel's linear map apparently). Anything that separate these 3 things more (allocation pools, mapping through IOMMUs and flushing cpu caches) sounds like the right direction to me. Even if that throws some portability across platforms away - drivers who want to control things in this much detail aren't really portable (without some serious work) anyway. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Vetter Subject: Re: noveau vs arm dma ops Date: Wed, 25 Apr 2018 23:35:13 +0200 Message-ID: References: <20180420152111.GR31310@phenom.ffwll.local> <20180424184847.GA3247@infradead.org> <20180425054855.GA17038@infradead.org> <20180425064335.GB28100@infradead.org> <20180425074151.GA2271@ulmo> <20180425085439.GA29996@infradead.org> <20180425100429.GR25142@phenom.ffwll.local> <20180425153312.GD27076@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <20180425153312.GD27076-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org Sender: "amd-gfx" To: Christoph Hellwig Cc: Linux Kernel Mailing List , amd-gfx list , "moderated list:DMA BUFFER SHARING FRAMEWORK" , Jerome Glisse , iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, dri-devel , Dan Williams , Thierry Reding , Logan Gunthorpe , =?UTF-8?Q?Christian_K=C3=B6nig?= , Linux ARM , "open list:DMA BUFFER SHARING FRAMEWORK" List-Id: iommu@lists.linux-foundation.org T24gV2VkLCBBcHIgMjUsIDIwMTggYXQgNTozMyBQTSwgQ2hyaXN0b3BoIEhlbGx3aWcgPGhjaEBp bmZyYWRlYWQub3JnPiB3cm90ZToKPiBPbiBXZWQsIEFwciAyNSwgMjAxOCBhdCAxMjowNDoyOVBN ICswMjAwLCBEYW5pZWwgVmV0dGVyIHdyb3RlOgo+PiA+IENvb3JkaW5hdGluZyB0aGUgYmFja3Bv cnQgb2YgYSB0cml2aWFsIGhlbHBlciBpbiB0aGUgYXJtIHRyZWUgaXMgbm90Cj4+ID4gdGhlIGVu ZCBvZiB0aGUgd29ybGQuICBSZWFsbHksIHRoaXMgY293Ym95IGF0dGl0dWRlIGlzIGEgZ29vZCBy ZWFzb24KPj4gPiB3aHkgZ3JhcGhpY3MgZm9sa3MgaGF2ZSBzdWNoIGEgYmFkIHJlcC4gIFlvdSBr ZWVwIHBva2luZyBpbnRvIHJhbmRvbQo+PiA+IGtlcm5lbCBpbnRlcm5hbHMsIGRvbid0IHRhbGsg dG8gYW5veW9uZSBhbmQgdGhlbiBjb21wbGFpbiBpZiBwZW9wbGUKPj4gPiBhcmUgdXBzZXQuICBU aGlzIHNob3VsZG4ndCBiZSBzdXJwcmlzaW5nLgo+Pgo+PiBOb3QgcmVhbGx5IGFncmVlaW5nIG9u IHRoZSBjb3dib3kgdGhpbmcuIFRoZSBmdW5kYW1lbnRhbCBwcm9ibGVtIGlzIHRoYXQKPj4gdGhl IGRtYSBhcGkgcHJvdmlkZXMgYWJzdHJhY3Rpb24gdGhhdCBzZXJpb3VzbHkgZ2V0cyBpbiB0aGUg d2F5IG9mIHdyaXRpbmcKPj4gYSBncHUgZHJpdmVyLiBTb21lIGV4YW1wbGVzOgo+Cj4gU28gdGFs ayB0byBvdGhlciBwZW9wbGUuICBNYXliZSBwZW9wbGUgc2hhcmUgeW91ciBmcnVzdGF0aW9uLiAg T3IgbWF5YmUKPiBvdGhlciBwZW9wbGUgaGF2ZSBhIHdheSB0byBoZWxwLgo+Cj4+IC0gV2UgbmV2 ZXIgd2FudCBib3VuY2UgYnVmZmVycywgZXZlci4gZG1hX21hcF9zZyBnaXZlcyB1cyB0aGF0LCBz byB0aGVyZSdzCj4+ICAgaGFja3MgdG8gZmFsbCBiYWNrIHRvIGEgY2FjaGUgb2YgcGFnZXMgYWxs b2NhdGVkIHVzaW5nCj4+ICAgZG1hX2FsbG9jX2NvaGVyZW50IGlmIHlvdSBidWlsZCBhIGtlcm5l bCB3aXRoIGJvdW5jZSBidWZmZXJzLgo+Cj4gZ2V0X3JlcXVpcmVkX21hc2soKSBpcyBzdXBwb3Nl ZCB0byB0ZWxsIHlvdSBpZiB5b3UgYXJlIHNhZmUuICBIb3dldmVyCj4gd2UgYXJlIG1pc3Npbmcg bG90cyBvZiBpbXBsZW1lbnRhdGlvbnMgb2YgaXQgZm9yIGlvbW11cyBzbyB5b3UgbWlnaHQgZ2V0 Cj4gc29tZSBmYWxzZSBuZWdhdGl2ZXMsIGltcHJvdmVtZW50cyB3ZWxjb21lLiAgSXQncyBiZWVu IG9uIG15IGxpc3Qgb2YKPiB0aGluZ3MgdG8gZml4IGluIHRoZSBETUEgQVBJLCBidXQgaXQgaXMg bm93aGVyZSBuZWFyIHRoZSB0b3AuCgpJIGhhc24ndCBjb21lIHVwIGluIGEgd2hpbGUgaW4gc29t ZSBmaXJld29ya3MsIHNvIEkgaG9uZXN0bHkgZG9uJ3QKcmVtZW1iZXIgZXhhY3RseSB3aGF0IHRo ZSBpc3N1ZXMgaGF2ZSBiZWVuLiBCdXQKCmNvbW1pdCBkNzY2ZWY1MzAwNmMyYzM4YTdmZTJiZWYw OTA0MTA1YTc5MzM4M2YyCkF1dGhvcjogQ2hyaXMgV2lsc29uIDxjaHJpc0BjaHJpcy13aWxzb24u Y28udWs+CkRhdGU6ICAgTW9uIERlYyAxOSAxMjo0Mzo0NSAyMDE2ICswMDAwCgogICAgZHJtL2k5 MTU6IEZhbGxiYWNrIHRvIHNpbmdsZSBQQUdFX1NJWkUgc2VnbWVudHMgZm9yIERNQSByZW1hcHBp bmcKCmFuZCB0aGUgdmFyaW91cyBiaXRzIG9mIGNvZGUgdGhhdCBhCgokIGdpdCBncmVwIFNXSU9U TEIgLS0gZHJpdmVycy9ncHUKCnR1cm5zIHVwIGlzIHdoYXQgd2UncmUgZG9pbmcgdG8gaGFjayBh cm91bmQgdGhhdCBzdHVmZi4gQW5kIGluIGdlbmVyYWwKKHRoZXJlJ3Mgc29tZSBleGNlcHRpb25z KSBncHVzIHNob3VsZCBiZSBhYmxlIHRvIGFkZHJlc3MgZXZlcnl0aGluZywKc28gSSBuZXZlciBm dWxseSB1bmRlcnN0b29kIHdoZXJlIHRoYXQncyBldmVuIGNvbWluZyBmcm9tLgoKPj4gLSBkbWEg YXBpIGhpZGVzIHRoZSBjYWNoZSBmbHVzaGluZyByZXF1aXJlbWVudHMgZnJvbSB1cy4gR1BVcyBs b3ZlCj4+ICAgbm9uLXNub29wZWQgYWNjZXNzLCBhbmQgd29yc2UgZ2l2ZSB1c2Vyc3BhY2UgY29u dHJvbCBvdmVyIHRoYXQuIFdlIHdhbnQKPj4gICBhIHN0cmljdCBzZXBhcmF0aW9uIGJldHdlZW4g bWFwcGluZyBzdHVmZiBhbmQgZmx1c2hpbmcgc3R1ZmYuIFdpdGggdGhlCj4+ICAgSU9NTVUgYXBp IHdlIG1vc3RseSBoYXZlIHRoZSBmb3JtZXIsIGJ1dCBmb3IgdGhlIGxhdGVyIGFyY2ggbWFpbnRh aW5lcnMKPj4gICByZWd1bGFybHkgdGVsbHMgdGhleSB3b24ndCBhbGxvdyB0aGF0LiBTbyB3ZSBo YXZlIGRybV9jbGZsdXNoLmMuCj4KPiBUaGUgcHJvYmxlbSBpcyB0aGF0IGEgY2FjaGUgZmx1c2hp bmcgQVBJIGVudGlyZWx5IHNlcGFyYXRlIGlzIGhhcmQuIFRoYXQKPiBiZWluZyBzYWlkIGlmIHlv dSBsb29rIGF0IG15IGdlbmVyaWMgZG1hLW5vbmNvaGVyZW50IEFQSSBzZXJpZXMgaXQgdHJpZXMK PiB0byBtb3ZlIHRoYXQgd2F5LiAgU28gZmFyIGl0IGlzIGluIGVhcmx5IHN0YWdlcyBhbmQgYXBw YXJlbnRseSByYXRoZXIKPiBidWdneSB1bmZvcnR1bmF0ZWx5LgoKSSdtIGFzc3VtaW5nIHRoaXMg c3R1ZmYgaGVyZT8KCmh0dHBzOi8vbGttbC5vcmcvbGttbC8yMDE4LzQvMjAvMTQ2CgpBbnl3YXkg Z290IGxvc3QgaW4gYWxsIHRoYXQgd29yayBhIGJpdCwgbG9va3MgcmVhbGx5IG5pY2UuCgo+PiAt IGRtYSBhcGkgaGlkZXMgaG93L3doZXJlIG1lbW9yeSBpcyBhbGxvY2F0ZWQuIEtpbmRhIHNpbWls YXIgcHJvYmxlbSwKPj4gICBleGNlcHQgbm93IGZvciBDTUEgb3IgYWRkcmVzcyBsaW1pdHMuIFNv IGVpdGhlciB3ZSByb2xsIG91ciBvd24KPj4gICBhbGxvY2F0b3JzIGFuZCB0aGVuIGRtYV9tYXBf c2cgKGFuZCBwcmF5IGl0IGRvZXNuJ3QgYm91bmNlIGJ1ZmZlciksIG9yCj4+ICAgd2UgdXNlIGRt YV9hbGxvY19jb2hlcmVudCBhbmQgdGhlbiBncmFiIHRoZSBzZ3QgdG8gZ2V0IGF0IHRoZSBDTUEK Pj4gICBhbGxvY2F0aW9ucyBiZWNhdXNlIHRoYXQncyB0aGUgb25seSB3YXkuIFdoaWNoIHN1Y2tz LCBiZWNhdXNlIHdlIGNhbid0Cj4+ICAgZGlyZWN0bHkgdGVsbCBDTUEgaG93IHRvIGJhY2sgb2Zm IGlmIHRoZXJlJ3Mgc29tZSB3YXkgdG8gbWFrZSBDTUEgbWVtb3J5Cj4+ICAgYXZhaWxhYmxlIHRo cm91Z2ggb3RoZXIgbWVhbnMgKGdwdXMgbG92ZSB0byBob2cgYWxsIG9mIG1lbW9yeSwgc28gd2UK Pj4gICBoYXZlIHNocmlua2VycyBhbmQgZXZlcnl0aGluZykuCj4KPiBJZiB5b3UgcmVhbGx5IGNh cmUgYWJvdXQgZG9pbmcgZXhwbGljaXRseSBjYWNoZSBmbHVzaGluZyBhbnl3YXkgKHNlZQo+IGFi b3ZlKSBhbGxvY2F0aW5nIHlvdXIgb3duIG1lbW9yeSBhbmQgbWFwcGluZyBpdCB3aGVyZSBuZWVk ZWQgaXMgYnkKPiBmYXIgdGhlIHN1cGVyaW9yIHNvbHV0aW9uLiAgT24gY2FjaGUgY29oZXJlbnQg YXJjaGl0ZWN0dXJlcwo+IGRtYV9hbGxvY19jb2hlcmVudCBpcyBub3RoaW5nIGJ1dCBhbGxvY2F0 ZSBtZW1vcnkgKyBkbWFfbWFwX3NpbmdsZS4KPiBPbiBub24gY29oZXJlbnQgYWxsb2NhdGlvbnMg dGhlIG1lbW9yeSBtaWdodCBjb21lIHRocm91Z2ggYSBzcGVjaWFsCj4gcG9vbCBvciBtdXN0IGJl IHVzZWQgdGhyb3VnaCBhIHNwZWNpYWwgdmlydHVhbCBhZGRyZXNzIG1hcHBpbmcgdGhhdAo+IGlz IHNldCB1cCBlaXRoZXIgc3RhdGljYWxseSBvciBkeW5hbWljYWxseS4gIEZvciB0aGF0IGNhc2Ug c3BsaXR0aW5nCj4gYWxsb2NhdGlvbiBhbmQgbWFwcGluZyBpcyBhIGdvb2QgaWRlYSBpbiBtYW55 IHdheXMsIGFuZCBJIHBsYW4gdG8gbW92ZQo+IHRvd2FyZHMgdGhhdCBvbmNlIHRoZSBudW1iZXIg b2YgZG1hIG1hcHBpbmcgaW1wbGVtZW50YXRpb25zIGlzIGRvd24KPiB0byBhIHJlYXNvbmFibGUg bnVtYmVyIHNvIHRoYXQgaXQgY2FuIGFjdHVhbGx5IGJlIGRvbmUuCgpZZWFoIHRoZSBhYm92ZSBp cyBwcmV0dHkgbXVjaCB3aGF0IHdlIGRvIG9uIHg4Ni4gZG1hLWFwaSBiZWxpZXZlcwpldmVyeXRo aW5nIGlzIGNvaGVyZW50LCBzbyBkbWFfbWFwX3NnIGRvZXMgdGhlIG1hcHBpbmcgd2Ugd2FudCBh bmQKbm90aGluZyBlbHNlIChtaW51cyBzd2lvdGxiIGZ1bikuIENhY2hlIGZsdXNoaW5nLCBhbGxv Y2F0aW9ucywgYWxsCmRvbmUgYnkgdGhlIGRyaXZlci4KCk9uIGFybSB0aGF0IGRvZXNuJ3Qgd29y ay4gVGhlIGlvbW11IGFwaSBzZWVtcyBsaWtlIGEgZ29vZCBmaXQsIGV4Y2VwdAp0aGUgZG1hLWFw aSB0ZW5kcyB0byBnZXQgaW4gdGhlIHdheSBhIGJpdCAoZHJtL21zbSBhcHBhcmVudGx5IGhhcwpz aW1pbGFyIHByb2JsZW1zIGxpa2UgdGVncmEpLCBhbmQgaWYgeW91IG5lZWQgY29udGlndW91cyBt ZW1vcnkKZG1hX2FsbG9jX2NvaGVyZW50IGlzIHRoZSBvbmx5IHdheSB0byBnZXQgYXQgY29udGln dW91cyBtZW1vcnkuIFRoZXJlCndhcyBhIGh1Z2UgZGlzY3Vzc2lvbiB5ZWFycyBhZ28gYWJvdXQg dGhhdCwgYW5kIGRpcmVjdCBjbWEgYWNjZXNzIHdhcwpzaG90IGRvd24gYmVjYXVzZSBpdCB3b3Vs ZCBoYXZlIGV4cG9zZWQgdG9vIG11Y2ggb2YgdGhlIGNhY2hpbmcKYXR0cmlidXRlIG1hbmdsaW5n IHJlcXVpcmVkIChtb3N0IGFybSBwbGF0Zm9ybXMgbmVlZCB3Yy1wYWdlcyB0byBub3QKYmUgaW4g dGhlIGtlcm5lbCdzIGxpbmVhciBtYXAgYXBwYXJlbnRseSkuCgpBbnl0aGluZyB0aGF0IHNlcGFy YXRlIHRoZXNlIDMgdGhpbmdzIG1vcmUgKGFsbG9jYXRpb24gcG9vbHMsIG1hcHBpbmcKdGhyb3Vn aCBJT01NVXMgYW5kIGZsdXNoaW5nIGNwdSBjYWNoZXMpIHNvdW5kcyBsaWtlIHRoZSByaWdodApk aXJlY3Rpb24gdG8gbWUuIEV2ZW4gaWYgdGhhdCB0aHJvd3Mgc29tZSBwb3J0YWJpbGl0eSBhY3Jv c3MgcGxhdGZvcm1zCmF3YXkgLSBkcml2ZXJzIHdobyB3YW50IHRvIGNvbnRyb2wgdGhpbmdzIGlu IHRoaXMgbXVjaCBkZXRhaWwgYXJlbid0CnJlYWxseSBwb3J0YWJsZSAod2l0aG91dCBzb21lIHNl cmlvdXMgd29yaykgYW55d2F5LgotRGFuaWVsCi0tIApEYW5pZWwgVmV0dGVyClNvZnR3YXJlIEVu Z2luZWVyLCBJbnRlbCBDb3Jwb3JhdGlvbgorNDEgKDApIDc5IDM2NSA1NyA0OCAtIGh0dHA6Ly9i bG9nLmZmd2xsLmNoCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fCmFtZC1nZnggbWFpbGluZyBsaXN0CmFtZC1nZnhAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0 dHBzOi8vbGlzdHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vYW1kLWdmeAo= From mboxrd@z Thu Jan 1 00:00:00 1970 From: daniel@ffwll.ch (Daniel Vetter) Date: Wed, 25 Apr 2018 23:35:13 +0200 Subject: noveau vs arm dma ops In-Reply-To: <20180425153312.GD27076@infradead.org> References: <20180420152111.GR31310@phenom.ffwll.local> <20180424184847.GA3247@infradead.org> <20180425054855.GA17038@infradead.org> <20180425064335.GB28100@infradead.org> <20180425074151.GA2271@ulmo> <20180425085439.GA29996@infradead.org> <20180425100429.GR25142@phenom.ffwll.local> <20180425153312.GD27076@infradead.org> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Wed, Apr 25, 2018 at 5:33 PM, Christoph Hellwig wrote: > On Wed, Apr 25, 2018 at 12:04:29PM +0200, Daniel Vetter wrote: >> > Coordinating the backport of a trivial helper in the arm tree is not >> > the end of the world. Really, this cowboy attitude is a good reason >> > why graphics folks have such a bad rep. You keep poking into random >> > kernel internals, don't talk to anoyone and then complain if people >> > are upset. This shouldn't be surprising. >> >> Not really agreeing on the cowboy thing. The fundamental problem is that >> the dma api provides abstraction that seriously gets in the way of writing >> a gpu driver. Some examples: > > So talk to other people. Maybe people share your frustation. Or maybe > other people have a way to help. > >> - We never want bounce buffers, ever. dma_map_sg gives us that, so there's >> hacks to fall back to a cache of pages allocated using >> dma_alloc_coherent if you build a kernel with bounce buffers. > > get_required_mask() is supposed to tell you if you are safe. However > we are missing lots of implementations of it for iommus so you might get > some false negatives, improvements welcome. It's been on my list of > things to fix in the DMA API, but it is nowhere near the top. I hasn't come up in a while in some fireworks, so I honestly don't remember exactly what the issues have been. But commit d766ef53006c2c38a7fe2bef0904105a793383f2 Author: Chris Wilson Date: Mon Dec 19 12:43:45 2016 +0000 drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping and the various bits of code that a $ git grep SWIOTLB -- drivers/gpu turns up is what we're doing to hack around that stuff. And in general (there's some exceptions) gpus should be able to address everything, so I never fully understood where that's even coming from. >> - dma api hides the cache flushing requirements from us. GPUs love >> non-snooped access, and worse give userspace control over that. We want >> a strict separation between mapping stuff and flushing stuff. With the >> IOMMU api we mostly have the former, but for the later arch maintainers >> regularly tells they won't allow that. So we have drm_clflush.c. > > The problem is that a cache flushing API entirely separate is hard. That > being said if you look at my generic dma-noncoherent API series it tries > to move that way. So far it is in early stages and apparently rather > buggy unfortunately. I'm assuming this stuff here? https://lkml.org/lkml/2018/4/20/146 Anyway got lost in all that work a bit, looks really nice. >> - dma api hides how/where memory is allocated. Kinda similar problem, >> except now for CMA or address limits. So either we roll our own >> allocators and then dma_map_sg (and pray it doesn't bounce buffer), or >> we use dma_alloc_coherent and then grab the sgt to get at the CMA >> allocations because that's the only way. Which sucks, because we can't >> directly tell CMA how to back off if there's some way to make CMA memory >> available through other means (gpus love to hog all of memory, so we >> have shrinkers and everything). > > If you really care about doing explicitly cache flushing anyway (see > above) allocating your own memory and mapping it where needed is by > far the superior solution. On cache coherent architectures > dma_alloc_coherent is nothing but allocate memory + dma_map_single. > On non coherent allocations the memory might come through a special > pool or must be used through a special virtual address mapping that > is set up either statically or dynamically. For that case splitting > allocation and mapping is a good idea in many ways, and I plan to move > towards that once the number of dma mapping implementations is down > to a reasonable number so that it can actually be done. Yeah the above is pretty much what we do on x86. dma-api believes everything is coherent, so dma_map_sg does the mapping we want and nothing else (minus swiotlb fun). Cache flushing, allocations, all done by the driver. On arm that doesn't work. The iommu api seems like a good fit, except the dma-api tends to get in the way a bit (drm/msm apparently has similar problems like tegra), and if you need contiguous memory dma_alloc_coherent is the only way to get at contiguous memory. There was a huge discussion years ago about that, and direct cma access was shot down because it would have exposed too much of the caching attribute mangling required (most arm platforms need wc-pages to not be in the kernel's linear map apparently). Anything that separate these 3 things more (allocation pools, mapping through IOMMUs and flushing cpu caches) sounds like the right direction to me. Even if that throws some portability across platforms away - drivers who want to control things in this much detail aren't really portable (without some serious work) anyway. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch