From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot0-f194.google.com ([74.125.82.194]:36313 "EHLO mail-ot0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753641AbdCVU3L (ORCPT ); Wed, 22 Mar 2017 16:29:11 -0400 Received: by mail-ot0-f194.google.com with SMTP id i1so26838932ota.3 for ; Wed, 22 Mar 2017 13:29:11 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20170322163237.GF21222@n2100.armlinux.org.uk> References: <1487682998-2564-1-git-send-email-m.szyprowski@samsung.com> <20170322111020.GD21222@n2100.armlinux.org.uk> <20170322163237.GF21222@n2100.armlinux.org.uk> From: Shuah Khan Date: Wed, 22 Mar 2017 14:29:09 -0600 Message-ID: Subject: Re: [PATCH] ARM: dma-mapping: Fix dma_get_sgtable() for regions without struct page To: Russell King - ARM Linux Cc: Marek Szyprowski , linux-arm-kernel@lists.infradead.org, Bartlomiej Zolnierkiewicz , stable , shuahkh@osg.samsung.com Content-Type: multipart/mixed; boundary=f403043c4fa0e9809f054b579c0a Sender: stable-owner@vger.kernel.org List-ID: --f403043c4fa0e9809f054b579c0a Content-Type: text/plain; charset=UTF-8 On Wed, Mar 22, 2017 at 10:32 AM, Russell King - ARM Linux wrote: > On Wed, Mar 22, 2017 at 09:46:05AM -0600, Shuah Khan wrote: >> This check doesn't really do any kind of validation. > > Exactly right. > >> I have been debugging >> a pagefault when the sgtable created by arm_dma_get_sgtable() runs into >> the following kernel error when vb2_dc_dmabuf_ops_map() tries to map >> the exported buffer from another driver. > > It is _completely_ incorrect to even try to pass memory allocated from > the coherent DMA allocator back into the _streaming_ DMA API. That is > just something that the streaming DMA API does not support. > > I am dismayed at dma_get_sgtable() ever having been proposed - I think > it's completely wrong. Looking back to when it was created in 2012, it > appears that it never got any review from architecture people - there > certainly are no architecture people in the attributations for it. > > There are two Samsung people and a DRM person in the sign-off, and IMHO > that's not good enough for a proposal which completely changes the way > DMA memory is dealt with - especially as _that_ is soo architecture > dependent. > > Annoyingly, when I was looking at the dma_buf API while I was implementing > the armada DRM driver, I did point out the problems with the dma_buf API > wrt passing contiguous regions from one driver to another, and I suggested > separating the DMA mapping operation somewhat, but all that fell on deaf > ears. I guess the results of that are now coming home to roost. > > The fundamental point about coherent DMA memory, or memory that is > provided via the declaration mechanisms is that there is _no_ gurantee > that there will be a "struct page" backing it, so creating a scatterlist > with "struct page" pointers, which is then subsequently dma_map_sg()'d > elsewhere is always going to be totally broken. In the case of some > coherent DMA allocators, there _may_ be a "struct page" backing the > memory, but that's not guaranteed, and since dma_map_sg() relies on > there being a valid "struct page", things will go wrong. > > And by "wrong", I mean it could oops the kernel because of trying to > dereference the "struct page" pointer, or if that's successful, we > could end up invalidating the cache lines for some random region of > memory, which then goes on to cause a completely different failure, or > maybe filesystem corruption (eg, what if it hits a page cache page > containing an inode table, and invalidates all updates to that page?) > The consequences really don't bare thinking about. > >> Just an an experiment, I added this hack: in this case __dc_alloc() >> returned values is the mapped page. This is by mo means a fix - I am >> not sure if we can rely on this page, since at DQBUF its get unmapped. >> >> /* In this case cpu_addr is the page - just use it */ >> if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0) >> page = cpu_addr; >> >> At this point I dumped some data: >> >> [ 154.627505] platform 11000000.codec:left: >> debug_arm_dma_get_sgtable: bus_to_pfn 772608 dma_pfn_offset 0 >> PHYS_PFN_OFFSET 262144 >> [ 154.627512] platform 11000000.codec:left: >> debug_arm_dma_get_sgtable: dma_to_pfn 772608 virt_to_pfn 470272 >> virt_to_dma 1926234112 virt_to_page ef6ca000 page f0004000 >> >> cpu_addr is >> [ 154.627520] platform 11000000.codec:left: >> debug_arm_dma_get_sgtable: cpu_addr f2d00000 handle bca00000 page = >> f2d00000 size 946176 align size 946176 >> [ 154.627527] platform 11000000.codec:left: after >> dma_get_sgtable_attrs() vb2_dc_get_base_sgt: sgt ecdcec00 page >> f2d00000 buf ecdced80 cookie f2d00000 vaddr f2d00000 dc_cookie >> ecdced90 dma_addr bca00000 size 946176 >> >> cpu_addr is f2d00000 and the page you get from >> pfn_to_page(dma_to_pfn(dev, handle)) is f0004000. > > I'd like to see the diff that produced the above dumps to make better > sense of it... I added debug to track the buffer from allocation time to export buf time and all the way to the time map_sg fails. Please see the attached diff. It is somewhat cluttered with lots of debug messages. I also added a debug version (duplicate) arm_dma_get_sgtable() so I can limit the messages to when it gets called from vb2_dc_get_base_sgt() > >> The page you get with struct page *page = pfn_to_page(dma_to_pfn(dev, handle)); >> is bogus. So this check Merek added doesn't find a bogus page, does a reverse >> validation of the bogus page. > > Right - dma_to_pfn() will correctly translate the "handle" to a PFN > number, so we can just multiply that with PAGE_SIZE to get the physical > address, which if I'm interpreting your information above correctly, > would be 0x72d00000. Correct paddr is 0x72d00000. I forgot to add the allocation time debug dump: [ 154.619456] platform 11000000.codec:left: vb2_dc_alloc: buf ecdced80 cookie f2d00000 vaddr f2d00000 paddr 72d00000 dma_addr bca00000 size 946176 dc_cookie ecdced90 dma_to_pfn 772608 I added debug to track the buffer from allocation time to export buf time and all the way to the time map_sg fails. > > However, what you're saying is that there is no valid "struct page" > associated with that address. >>From what I can tell from the data I have, dma_to_pfn(dev, handle); doesn't return valid page at arm_dma_get_sgtable(), even though dma_to_pfn(dev, handle); itself stays the same from alloc time, as you can see in the above dump from vb2_dc_alloc(): dma_to_pfn 772608 I am guessing page might not be valid, even at the alloc time. That is next on debug list. Which is somewhat puzzling, because dma_attrs don't have the DMA_ATTR_NO_KERNEL_MAP set, __dma_alloc() passes in want_vaddr true. __alloc_from_contiguous() will map the page and return a valid vaddr __dma_alloc() then does *handle = pfn_to_dma(dev, page_to_pfn(page)); to return the handle. > > The correct way to validate this is: > > unsigned long pfn = dma_to_pfn(dev, handle); > struct page *page; > > if (!pfn_valid(pfn)) > return -ENXIO; > > page = pfn_to_page(pfn); > > However, I have a huge big problem with this - this can pass even for > memory obtained through the coherent allocator (because of the way CMA > works.) Right. DMA_ATTR_NO_KERNEL_MAP is another variable, if set, there won't be an kernel mapping. > > This can end up with the DMA streaming API kmapping memory using > cacheable attributes while CMA has it mapped with uncacheable attributes, > where (as mentioned on IRC this morning): Which irc channel?? I can get on there for the discussion if it makes it easier. > > you enter the realm of "mismatched memory attributes", > which don't blow up, but you can lose guarantees of both > mappings wrt coherence and ordering > > We really don't want to go there. The problem, however, is that we > can't distingish a CMA page allocated via the DMA coherent allocator > (and so potentially mapped uncacheable) with a normal allocation of > the same page, which would be valid for streaming DMA API use. > > I think the best we can do is as per the above for declared DMA memory > (iow, memory that isn't struct page backed.) It should _never_ be > passed into the streaming DMA API. We can see some of the evidence here in this case. One driver allocates the buffer and another driber tries to import. After buffer is exported, there is a window between DQBUF and QBUF, DQBUF unmaps the buffers. Then the importer comes along referencing the sg_table and looks to map them. Before mapping, looks like D-cache clean has to occur and that would need vaddr if I understand correctly. So there a lot of places, it can trip all over. > > I also think that the dma_get_sgtable() thing is a total trainwreck, > and we need to have a rethink about this. I don't think it works for the case I am running into. > > This isn't a case of "something changed in the architecture that broke > dma_get_sgtable()" - this is a case that it's always been broken in > this way ever since dma_get_sgtable() was introduced, and dma_get_sgtable() > was never (properly?) reviewed with this kind of use-case in mind. I think the use-case I am debugging has always been broken. This is a fairly involved gstreamer pipeline to my knowledge has never been tested. This page fault is a known issue at least for a couple of releases- I started looking into it a couple of releases ago towards the end of 4.9. Maybe there are some simpler use-cases that will work. I am also curious if this works well when DMA_ATTR_NO_KERNEL_MAPPING is set. It is not in my use-case. Thanks, -- Shuah > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net. --f403043c4fa0e9809f054b579c0a Content-Type: application/octet-stream; name="videobuf2-dma-contig.c_diff" Content-Disposition: attachment; filename="videobuf2-dma-contig.c_diff" Content-Transfer-Encoding: base64 X-Attachment-Id: f_j0lf91wi0 ZGlmZiAtLWdpdCBhL2RyaXZlcnMvbWVkaWEvdjRsMi1jb3JlL3ZpZGVvYnVmMi1kbWEtY29udGln LmMgYi9kcml2ZXJzL21lZGlhL3Y0bDItY29yZS92aWRlb2J1ZjItZG1hLWNvbnRpZy5jCmluZGV4 IGZiNmExNzcuLjhlMWQxZmEgMTAwNjQ0Ci0tLSBhL2RyaXZlcnMvbWVkaWEvdjRsMi1jb3JlL3Zp ZGVvYnVmMi1kbWEtY29udGlnLmMKKysrIGIvZHJpdmVycy9tZWRpYS92NGwyLWNvcmUvdmlkZW9i dWYyLWRtYS1jb250aWcuYwpAQCAtNzgsNiArNzgsNyBAQCBzdGF0aWMgdm9pZCAqdmIyX2RjX3Zh ZGRyKHZvaWQgKmJ1Zl9wcml2KQogCiAJaWYgKCFidWYtPnZhZGRyICYmIGJ1Zi0+ZGJfYXR0YWNo KQogCQlidWYtPnZhZGRyID0gZG1hX2J1Zl92bWFwKGJ1Zi0+ZGJfYXR0YWNoLT5kbWFidWYpOwor CXByX2luZm8oIiVzOiBidWYgJXAgdmFkZHIgJXBcbiIsIF9fZnVuY19fLCBidWYsIGJ1Zi0+dmFk ZHIpOwogCiAJcmV0dXJuIGJ1Zi0+dmFkZHI7CiB9CkBAIC0xNzIsNiArMTczLDkgQEAgc3RhdGlj IHZvaWQgKnZiMl9kY19hbGxvYyhzdHJ1Y3QgZGV2aWNlICpkZXYsIHVuc2lnbmVkIGxvbmcgYXR0 cnMsCiAKIAlhdG9taWNfaW5jKCZidWYtPnJlZmNvdW50KTsKIAorCWRldl9pbmZvKGRldiwgIiVz OiBidWYgJXAgY29va2llICVwIHZhZGRyICVwIHBhZGRyICVwIGRtYV9hZGRyICUweCBzaXplICVs dSBkY19jb29raWUgJXAgZG1hX3RvX3BmbiAlbHUgXG4iLAorCQkgX19mdW5jX18sIGJ1ZiwgYnVm LT5jb29raWUsIGJ1Zi0+dmFkZHIsICh2b2lkICopX19wYShidWYtPnZhZGRyKSwgYnVmLT5kbWFf YWRkciwgYnVmLT5zaXplLCAmYnVmLT5kbWFfYWRkciwgZG1hX3RvX3BmbihkZXYsIGJ1Zi0+ZG1h X2FkZHIpKTsKKwogCXJldHVybiBidWY7CiB9CiAKQEAgLTIwNSw3ICsyMDksNyBAQCBzdGF0aWMg aW50IHZiMl9kY19tbWFwKHZvaWQgKmJ1Zl9wcml2LCBzdHJ1Y3Qgdm1fYXJlYV9zdHJ1Y3QgKnZt YSkKIAogCXZtYS0+dm1fb3BzLT5vcGVuKHZtYSk7CiAKLQlwcl9kZWJ1ZygiJXM6IG1hcHBlZCBk bWEgYWRkciAweCUwOGx4IGF0IDB4JTA4bHgsIHNpemUgJWxkXG4iLAorCXByX2luZm8oIiVzOiBt YXBwZWQgZG1hIGFkZHIgMHglMDhseCBhdCAweCUwOGx4LCBzaXplICVsZFxuIiwKIAkJX19mdW5j X18sICh1bnNpZ25lZCBsb25nKWJ1Zi0+ZG1hX2FkZHIsIHZtYS0+dm1fc3RhcnQsCiAJCWJ1Zi0+ c2l6ZSk7CiAKQEAgLTI0OCw3ICsyNTIsMTEgQEAgc3RhdGljIGludCB2YjJfZGNfZG1hYnVmX29w c19hdHRhY2goc3RydWN0IGRtYV9idWYgKmRidWYsIHN0cnVjdCBkZXZpY2UgKmRldiwKIAlyZCA9 IGJ1Zi0+c2d0X2Jhc2UtPnNnbDsKIAl3ciA9IHNndC0+c2dsOwogCWZvciAoaSA9IDA7IGkgPCBz Z3QtPm9yaWdfbmVudHM7ICsraSkgeworCQlkZXZfaW5mbyhkZXYsICIlczogaSAlZCAlcCByZCAl cCBsZW5ndGggJWQgb2Zmc2V0ICVkIHBhZ2UgJXBcbiIsCisJCQlfX2Z1bmNfXywgaSwgYnVmLT5z Z3RfYmFzZSwgcmQsIHJkLT5sZW5ndGgsIHJkLT5vZmZzZXQsIHNnX3BhZ2UocmQpKTsKIAkJc2df c2V0X3BhZ2Uod3IsIHNnX3BhZ2UocmQpLCByZC0+bGVuZ3RoLCByZC0+b2Zmc2V0KTsKKwkJZGV2 X2luZm8oZGV2LCAiJXM6IGkgJWQgd3IgJXAgbGVuZ3RoICVkIG9mZnNldCAlZCBwYWdlICVwXG4i LAorCQkJX19mdW5jX18sIGksIHdyLCB3ci0+bGVuZ3RoLCB3ci0+b2Zmc2V0LCBzZ19wYWdlKHdy KSk7CiAJCXJkID0gc2dfbmV4dChyZCk7CiAJCXdyID0gc2dfbmV4dCh3cik7CiAJfQpAQCAtMjU2 LDYgKzI2NCwxMSBAQCBzdGF0aWMgaW50IHZiMl9kY19kbWFidWZfb3BzX2F0dGFjaChzdHJ1Y3Qg ZG1hX2J1ZiAqZGJ1Ziwgc3RydWN0IGRldmljZSAqZGV2LAogCWF0dGFjaC0+ZG1hX2RpciA9IERN QV9OT05FOwogCWRidWZfYXR0YWNoLT5wcml2ID0gYXR0YWNoOwogCisJZGV2X2luZm8oZGV2LCAi JXM6IGRidWYgJXAgZGJ1Zl9wcml2ICVwIGRidWZfYXR0YWNoICVwIGF0dGFjaCAlcFxuIiwKKwkJ IF9fZnVuY19fLCBkYnVmLCBkYnVmLT5wcml2LCBkYnVmX2F0dGFjaCwgYXR0YWNoKTsKKwlkZXZf aW5mbyhidWYtPmRldiwgIiVzOiBzZ3QgJXAgYnVmICVwIGNvb2tpZSAlcCB2YWRkciAlcCBkY19j b29raWUgJXAgZG1hX2FkZHIgJTB4IHNpemUgJWx1XG4iLAorCQkgX19mdW5jX18sIHNndCwgYnVm LCBidWYtPmNvb2tpZSwgYnVmLT52YWRkciwgJmJ1Zi0+ZG1hX2FkZHIsIGJ1Zi0+ZG1hX2FkZHIs IGJ1Zi0+c2l6ZSk7CisKIAlyZXR1cm4gMDsKIH0KIApAQCAtMzAxLDggKzMxNCwxMSBAQCBzdGF0 aWMgc3RydWN0IHNnX3RhYmxlICp2YjJfZGNfZG1hYnVmX29wc19tYXAoCiAJCWRtYV91bm1hcF9z ZyhkYl9hdHRhY2gtPmRldiwgc2d0LT5zZ2wsIHNndC0+b3JpZ19uZW50cywKIAkJCWF0dGFjaC0+ ZG1hX2Rpcik7CiAJCWF0dGFjaC0+ZG1hX2RpciA9IERNQV9OT05FOworCQlwcl9pbmZvKCJ2YjJf ZGNfZG1hYnVmX29wc19tYXAoKTogUmVsZWFzZWQgcHJldmlvdXMgY2FjaGVcbiIpOwogCX0KIAor CWRldl9pbmZvKGRiX2F0dGFjaC0+ZGV2LCAiJXM6IG9yaWdfbmVudHMgJXUsIGRiX2F0dGFjaCAl cCB2YjJfZGNfYXR0YWNoICVwIGRtYV9kaXIgJWRcbiIsCisJCV9fZnVuY19fLCBzZ3QtPm9yaWdf bmVudHMsIGRiX2F0dGFjaCwgYXR0YWNoLCBkbWFfZGlyKTsKIAkvKiBtYXBwaW5nIHRvIHRoZSBj bGllbnQgd2l0aCBuZXcgZGlyZWN0aW9uICovCiAJc2d0LT5uZW50cyA9IGRtYV9tYXBfc2coZGJf YXR0YWNoLT5kZXYsIHNndC0+c2dsLCBzZ3QtPm9yaWdfbmVudHMsCiAJCQkJZG1hX2Rpcik7CkBA IC0zNDQsNyArMzYwLDYgQEAgc3RhdGljIHZvaWQgKnZiMl9kY19kbWFidWZfb3BzX3ZtYXAoc3Ry dWN0IGRtYV9idWYgKmRidWYpCiAKIAlyZXR1cm4gYnVmLT52YWRkcjsKIH0KLQogc3RhdGljIGlu dCB2YjJfZGNfZG1hYnVmX29wc19tbWFwKHN0cnVjdCBkbWFfYnVmICpkYnVmLAogCXN0cnVjdCB2 bV9hcmVhX3N0cnVjdCAqdm1hKQogewpAQCAtMzYzLDYgKzM3OCw1NiBAQCBzdGF0aWMgc3RydWN0 IGRtYV9idWZfb3BzIHZiMl9kY19kbWFidWZfb3BzID0gewogCS5yZWxlYXNlID0gdmIyX2RjX2Rt YWJ1Zl9vcHNfcmVsZWFzZSwKIH07CiAKK3N0YXRpYyBpbnQgZGVidWdfYXJtX2RtYV9nZXRfc2d0 YWJsZShzdHJ1Y3QgZGV2aWNlICpkZXYsIHN0cnVjdCBzZ190YWJsZSAqc2d0LAorICAgICAgICAg ICAgICAgICB2b2lkICpjcHVfYWRkciwgZG1hX2FkZHJfdCBoYW5kbGUsIHNpemVfdCBzaXplLAor ICAgICAgICAgICAgICAgICB1bnNpZ25lZCBsb25nIGF0dHJzKQoreworICAgICAgICBzdHJ1Y3Qg cGFnZSAqcGFnZSA9IHBmbl90b19wYWdlKGRtYV90b19wZm4oZGV2LCBoYW5kbGUpKTsKKyAgICAg ICAgaW50IHJldDsKKworCWRldl9pbmZvKGRldiwgIiVzOiBidXNfdG9fcGZuICVsdSBkbWFfcGZu X29mZnNldCAlbHUgUEhZU19QRk5fT0ZGU0VUICVsdVxuIiwgX19mdW5jX18sIF9fYnVzX3RvX3Bm bihoYW5kbGUpLCBkZXYtPmRtYV9wZm5fb2Zmc2V0LCBQSFlTX1BGTl9PRkZTRVQpOworCisJZGV2 X2luZm8oZGV2LCAiJXM6IGRtYV90b19wZm4gJWx1IHZpcnRfdG9fcGZuICVsdSB2aXJ0X3RvX2Rt YSAlIGx1IHZpcnRfdG9fcGFnZSAlcCBwYWdlICVwXG4iLAorCQlfX2Z1bmNfXywgZG1hX3RvX3Bm bihkZXYsIGhhbmRsZSksIHZpcnRfdG9fcGZuKGNwdV9hZGRyKSwgdmlydF90b19kbWEoZGV2LCBj cHVfYWRkciksIHZpcnRfdG9fcGFnZShjcHVfYWRkciksIHBhZ2UpOworCisJaWYgKHBmbl90b19k bWEoZGV2LCBwYWdlX3RvX3BmbihwYWdlKSkgIT0gaGFuZGxlKSB7CisJCWRldl9pbmZvKGRldiwg IkludmFsaWQgc3RydWN0IHBhZ2UgcG9pbnRlclxuIik7CisJCXJldHVybiAtRU5YSU87CisJfQor CisgICAgICAgIHJldCA9IHNnX2FsbG9jX3RhYmxlKHNndCwgMSwgR0ZQX0tFUk5FTCk7CisgICAg ICAgIGlmICh1bmxpa2VseShyZXQpKQorICAgICAgICAgICAgICAgIHJldHVybiByZXQ7CisKKwkv KiBJbiB0aGlzIGNhc2UgY3B1X2FkZHIgaXMgdGhlIHBhZ2UgLSBqdXN0IHVzZSBpdCAqLworCWlm ICgoYXR0cnMgJiBETUFfQVRUUl9OT19LRVJORUxfTUFQUElORykgPT0gMCkKKwkJcGFnZSA9IGNw dV9hZGRyOworCisJZGV2X2luZm8oZGV2LCAiJXM6IGNwdV9hZGRyICVwIGhhbmRsZSAlMHggcGFn ZSA9ICVwIHNpemUgJWQgYWxpZ24gc2l6ZSAlZFxuIiwgX19mdW5jX18sIGNwdV9hZGRyLCBoYW5k bGUsIHBhZ2UsIHNpemUsIFBBR0VfQUxJR04oc2l6ZSkpOworICAgICAgICBzZ19zZXRfcGFnZShz Z3QtPnNnbCwgcGFnZSwgUEFHRV9BTElHTihzaXplKSwgMCk7CisgICAgICAgIHJldHVybiAwOwor fQorCitzdGF0aWMgaW5saW5lIGludAorZGVidWdfZG1hX2dldF9zZ3RhYmxlX2F0dHJzKHN0cnVj dCBkZXZpY2UgKmRldiwgc3RydWN0IHNnX3RhYmxlICpzZ3QsIHZvaWQgKmNwdV9hZGRyLAorICAg ICAgICAgICAgICAgICAgICAgIGRtYV9hZGRyX3QgZG1hX2FkZHIsIHNpemVfdCBzaXplLAorICAg ICAgICAgICAgICAgICAgICAgIHVuc2lnbmVkIGxvbmcgYXR0cnMpCit7CisgICAgICAgIHN0cnVj dCBkbWFfbWFwX29wcyAqb3BzID0gZ2V0X2RtYV9vcHMoZGV2KTsKKyAgICAgICAgQlVHX09OKCFv cHMpOworICAgICAgICBpZiAob3BzLT5nZXRfc2d0YWJsZSkgeworCQlkZXZfaW5mbyhkZXYsICIl czogY2FsbGluZyBnZXRfc2d0YWJsZSBhdCAlcFxuIiwKKwkJCV9fZnVuY19fLCBvcHMtPmdldF9z Z3RhYmxlKTsKKy8qCisgICAgICAgICAgICAgICAgcmV0dXJuIG9wcy0+Z2V0X3NndGFibGUoZGV2 LCBzZ3QsIGNwdV9hZGRyLCBkbWFfYWRkciwgc2l6ZSwKKyAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICBhdHRycyk7CisqLworCQlyZXR1cm4gZGVidWdfYXJtX2RtYV9nZXRf c2d0YWJsZShkZXYsIHNndCwgY3B1X2FkZHIsIGRtYV9hZGRyLCBzaXplLAorICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgIGF0dHJzKTsKKwl9CisgICAgICAgIHJldHVybiBk bWFfY29tbW9uX2dldF9zZ3RhYmxlKGRldiwgc2d0LCBjcHVfYWRkciwgZG1hX2FkZHIsIHNpemUp OworfQorCiBzdGF0aWMgc3RydWN0IHNnX3RhYmxlICp2YjJfZGNfZ2V0X2Jhc2Vfc2d0KHN0cnVj dCB2YjJfZGNfYnVmICpidWYpCiB7CiAJaW50IHJldDsKQEAgLTM3NCwxMyArNDM5LDI1IEBAIHN0 YXRpYyBzdHJ1Y3Qgc2dfdGFibGUgKnZiMl9kY19nZXRfYmFzZV9zZ3Qoc3RydWN0IHZiMl9kY19i dWYgKmJ1ZikKIAkJcmV0dXJuIE5VTEw7CiAJfQogCi0JcmV0ID0gZG1hX2dldF9zZ3RhYmxlX2F0 dHJzKGJ1Zi0+ZGV2LCBzZ3QsIGJ1Zi0+Y29va2llLCBidWYtPmRtYV9hZGRyLAorCWRldl9pbmZv KGJ1Zi0+ZGV2LCAiYmVmb3JlIGRtYV9nZXRfc2d0YWJsZV9hdHRycygpICVzOiBzZ3QgJXAgYnVm ICVwIGNvb2tpZSAlcCB2YWRkciAlcCBkY19jb29raWUgJXAgZG1hX2FkZHIgJTB4IHNpemUgJWx1 IGF0dHJzIDB4JWx4XG4iLAorCQkgX19mdW5jX18sIHNndCwgYnVmLCBidWYtPmNvb2tpZSwgYnVm LT52YWRkciwgJmJ1Zi0+ZG1hX2FkZHIsIGJ1Zi0+ZG1hX2FkZHIsIGJ1Zi0+c2l6ZSwgYnVmLT5h dHRycyk7CisKKwlpZiAoKGJ1Zi0+YXR0cnMgJiBETUFfQVRUUl9OT19LRVJORUxfTUFQUElORykg PT0gMCkKKwkJZGV2X2luZm8oYnVmLT5kZXYsICJJcyB2YWRkciA9PSBjb29raWUgYXR0cnMgMHgl bHhcbiIsCisJCQkgYnVmLT5hdHRycyk7CisJaWYgKChidWYtPmF0dHJzICYgRE1BX0FUVFJfU0tJ UF9DUFVfU1lOQykgPT0gMCkKKwkJZGV2X2luZm8oYnVmLT5kZXYsICJETUFfQVRUUl9TS0lQX0NQ VV9TWU5DIGF0dHJzIDB4JWx4XG4iLAorCQkJIGJ1Zi0+YXR0cnMpOworCisJcmV0ID0gZGVidWdf ZG1hX2dldF9zZ3RhYmxlX2F0dHJzKGJ1Zi0+ZGV2LCBzZ3QsIGJ1Zi0+Y29va2llLCBidWYtPmRt YV9hZGRyLAogCQlidWYtPnNpemUsIGJ1Zi0+YXR0cnMpOwogCWlmIChyZXQgPCAwKSB7CiAJCWRl dl9lcnIoYnVmLT5kZXYsICJmYWlsZWQgdG8gZ2V0IHNjYXR0ZXJsaXN0IGZyb20gRE1BIEFQSVxu Iik7CiAJCWtmcmVlKHNndCk7CiAJCXJldHVybiBOVUxMOwogCX0KKwlkZXZfaW5mbyhidWYtPmRl diwgImFmdGVyIGRtYV9nZXRfc2d0YWJsZV9hdHRycygpICVzOiBzZ3QgJXAgcGFnZSAlcCBidWYg JXAgY29va2llICVwIHZhZGRyICVwIGRjX2Nvb2tpZSAlcCBkbWFfYWRkciAlMHggc2l6ZSAlbHVc biIsCisJCSBfX2Z1bmNfXywgc2d0LCBzZ19wYWdlKHNndC0+c2dsKSwgYnVmLCBidWYtPmNvb2tp ZSwgYnVmLT52YWRkciwgJmJ1Zi0+ZG1hX2FkZHIsICBidWYtPmRtYV9hZGRyLCBidWYtPnNpemUp OwogCiAJcmV0dXJuIHNndDsKIH0KQEAgLTM5Niw2ICs0NzMsOSBAQCBzdGF0aWMgc3RydWN0IGRt YV9idWYgKnZiMl9kY19nZXRfZG1hYnVmKHZvaWQgKmJ1Zl9wcml2LCB1bnNpZ25lZCBsb25nIGZs YWdzKQogCWV4cF9pbmZvLmZsYWdzID0gZmxhZ3M7CiAJZXhwX2luZm8ucHJpdiA9IGJ1ZjsKIAor CWRldl9pbmZvKGJ1Zi0+ZGV2LCAiJXM6IGJ1ZiAlcCBjb29raWUgJXAgdmFkZHIgJXAgZGNfY29v a2llICVwIGRtYV9hZGRyICUweCBzaXplICVsdVxuIiwKKwkJIF9fZnVuY19fLCBidWYsIGJ1Zi0+ Y29va2llLCBidWYtPnZhZGRyLCAmYnVmLT5kbWFfYWRkciwgYnVmLT5kbWFfYWRkciwgYnVmLT5z aXplKTsKKwogCWlmICghYnVmLT5zZ3RfYmFzZSkKIAkJYnVmLT5zZ3RfYmFzZSA9IHZiMl9kY19n ZXRfYmFzZV9zZ3QoYnVmKTsKIApAQCAtNDUzLDIxICs1MzMsMjUgQEAgc3RhdGljIHZvaWQgdmIy X2RjX3B1dF91c2VycHRyKHZvaWQgKmJ1Zl9wcml2KQogI2lmZGVmIF9fYXJjaF9wZm5fdG9fZG1h CiBzdGF0aWMgaW5saW5lIGRtYV9hZGRyX3QgdmIyX2RjX3Bmbl90b19kbWEoc3RydWN0IGRldmlj ZSAqZGV2LCB1bnNpZ25lZCBsb25nIHBmbikKIHsKKwlkZXZfaW5mbyhkZXYsICIlczogcGZuICVs dVxuIiwgX19mdW5jX18sIHBmbik7CiAJcmV0dXJuIChkbWFfYWRkcl90KV9fYXJjaF9wZm5fdG9f ZG1hKGRldiwgcGZuKTsKIH0KICNlbGlmIGRlZmluZWQoX19wZm5fdG9fYnVzKQogc3RhdGljIGlu bGluZSBkbWFfYWRkcl90IHZiMl9kY19wZm5fdG9fZG1hKHN0cnVjdCBkZXZpY2UgKmRldiwgdW5z aWduZWQgbG9uZyBwZm4pCiB7CisJZGV2X2luZm8oZGV2LCAiJXM6IHBmbiAlbHVcbiIsIF9fZnVu Y19fLCBwZm4pOwogCXJldHVybiAoZG1hX2FkZHJfdClfX3Bmbl90b19idXMocGZuKTsKIH0KICNl bGlmIGRlZmluZWQoX19wZm5fdG9fcGh5cykKIHN0YXRpYyBpbmxpbmUgZG1hX2FkZHJfdCB2YjJf ZGNfcGZuX3RvX2RtYShzdHJ1Y3QgZGV2aWNlICpkZXYsIHVuc2lnbmVkIGxvbmcgcGZuKQogewor CWRldl9pbmZvKGRldiwgIiVzOiBwZm4gJWx1XG4iLCBfX2Z1bmNfXywgcGZuKTsKIAlyZXR1cm4g KGRtYV9hZGRyX3QpX19wZm5fdG9fcGh5cyhwZm4pOwogfQogI2Vsc2UKIHN0YXRpYyBpbmxpbmUg ZG1hX2FkZHJfdCB2YjJfZGNfcGZuX3RvX2RtYShzdHJ1Y3QgZGV2aWNlICpkZXYsIHVuc2lnbmVk IGxvbmcgcGZuKQogeworCWRldl9pbmZvKGRldiwgIiVzOiBwZm4gJWx1XG4iLCBfX2Z1bmNfXywg cGZuKTsKIAkvKiByZWFsbHksIHdlIGNhbm5vdCBkbyBhbnl0aGluZyBiZXR0ZXIgYXQgdGhpcyBw b2ludCAqLwogCXJldHVybiAoZG1hX2FkZHJfdCkocGZuKSA8PCBQQUdFX1NISUZUOwogfQpAQCAt NjA5LDYgKzY5Myw4IEBAIHN0YXRpYyBpbnQgdmIyX2RjX21hcF9kbWFidWYodm9pZCAqbWVtX3By aXYpCiAJCXJldHVybiAwOwogCX0KIAorCWRldl9pbmZvKGJ1Zi0+ZGV2LCAiJXM6IGJ1ZiAlcCBk Yl9hdHRhY2ggJXAgZG1hX2RpciAlZCBjb29raWUgJXAgdmFkZHIgJXAgZGNfY29va2llICVwIGRt YV9hZGRyICVseCBzaXplICVsdVxuIiwKKwkJIF9fZnVuY19fLCBidWYsIGJ1Zi0+ZGJfYXR0YWNo LCBidWYtPmRtYV9kaXIsIGJ1Zi0+Y29va2llLCBidWYtPnZhZGRyLCAmYnVmLT5kbWFfYWRkciwg YnVmLT5zaXplKTsKIAkvKiBnZXQgdGhlIGFzc29jaWF0ZWQgc2NhdHRlcmxpc3QgZm9yIHRoaXMg YnVmZmVyICovCiAJc2d0ID0gZG1hX2J1Zl9tYXBfYXR0YWNobWVudChidWYtPmRiX2F0dGFjaCwg YnVmLT5kbWFfZGlyKTsKIAlpZiAoSVNfRVJSKHNndCkpIHsKQEAgLTYxNiw2ICs3MDIsOSBAQCBz dGF0aWMgaW50IHZiMl9kY19tYXBfZG1hYnVmKHZvaWQgKm1lbV9wcml2KQogCQlyZXR1cm4gLUVJ TlZBTDsKIAl9CiAKKwlkZXZfaW5mbyhidWYtPmRldiwgIiVzOiBidWYgJXAgZGJfYXR0YWNoICVw IGRtYV9kaXIgJWQgY29va2llICVwIHZhZGRyICVwIGRjX2Nvb2tpZSAlcCBkbWFfYWRkciAlbHgg c2l6ZSAlbHVcbiIsCisJCSBfX2Z1bmNfXywgYnVmLCBidWYtPmRiX2F0dGFjaCwgYnVmLT5kbWFf ZGlyLCBidWYtPmNvb2tpZSwgYnVmLT52YWRkciwgJmJ1Zi0+ZG1hX2FkZHIsIGJ1Zi0+c2l6ZSk7 CisKIAkvKiBjaGVja2luZyBpZiBkbWFidWYgaXMgYmlnIGVub3VnaCB0byBzdG9yZSBjb250aWd1 b3VzIGNodW5rICovCiAJY29udGlnX3NpemUgPSB2YjJfZGNfZ2V0X2NvbnRpZ3VvdXNfc2l6ZShz Z3QpOwogCWlmIChjb250aWdfc2l6ZSA8IGJ1Zi0+c2l6ZSkgewpAQCAtNzUyLDEyICs4NDEsMTUg QEAgRVhQT1JUX1NZTUJPTF9HUEwodmIyX2RtYV9jb250aWdfbWVtb3BzKTsKIGludCB2YjJfZG1h X2NvbnRpZ19zZXRfbWF4X3NlZ19zaXplKHN0cnVjdCBkZXZpY2UgKmRldiwgdW5zaWduZWQgaW50 IHNpemUpCiB7CiAJaWYgKCFkZXYtPmRtYV9wYXJtcykgeworCQlkZXZfaW5mbyhkZXYsICJBbGxv Y2F0aW5nIGRtYV9wYXJtc1xuIik7CiAJCWRldi0+ZG1hX3Bhcm1zID0ga3phbGxvYyhzaXplb2Yo KmRldi0+ZG1hX3Bhcm1zKSwgR0ZQX0tFUk5FTCk7CiAJCWlmICghZGV2LT5kbWFfcGFybXMpCiAJ CQlyZXR1cm4gLUVOT01FTTsKIAl9Ci0JaWYgKGRtYV9nZXRfbWF4X3NlZ19zaXplKGRldikgPCBz aXplKQorCWlmIChkbWFfZ2V0X21heF9zZWdfc2l6ZShkZXYpIDwgc2l6ZSkgeworCQlkZXZfaW5m byhkZXYsICJTZXR0aW5nIG1heCBzZWcgc2l6ZSB0byAlZFxuIiwgc2l6ZSk7CiAJCXJldHVybiBk bWFfc2V0X21heF9zZWdfc2l6ZShkZXYsIHNpemUpOworCX0KIAogCXJldHVybiAwOwogfQo= --f403043c4fa0e9809f054b579c0a-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: shuahkhan@gmail.com (Shuah Khan) Date: Wed, 22 Mar 2017 14:29:09 -0600 Subject: [PATCH] ARM: dma-mapping: Fix dma_get_sgtable() for regions without struct page In-Reply-To: <20170322163237.GF21222@n2100.armlinux.org.uk> References: <1487682998-2564-1-git-send-email-m.szyprowski@samsung.com> <20170322111020.GD21222@n2100.armlinux.org.uk> <20170322163237.GF21222@n2100.armlinux.org.uk> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Wed, Mar 22, 2017 at 10:32 AM, Russell King - ARM Linux wrote: > On Wed, Mar 22, 2017 at 09:46:05AM -0600, Shuah Khan wrote: >> This check doesn't really do any kind of validation. > > Exactly right. > >> I have been debugging >> a pagefault when the sgtable created by arm_dma_get_sgtable() runs into >> the following kernel error when vb2_dc_dmabuf_ops_map() tries to map >> the exported buffer from another driver. > > It is _completely_ incorrect to even try to pass memory allocated from > the coherent DMA allocator back into the _streaming_ DMA API. That is > just something that the streaming DMA API does not support. > > I am dismayed at dma_get_sgtable() ever having been proposed - I think > it's completely wrong. Looking back to when it was created in 2012, it > appears that it never got any review from architecture people - there > certainly are no architecture people in the attributations for it. > > There are two Samsung people and a DRM person in the sign-off, and IMHO > that's not good enough for a proposal which completely changes the way > DMA memory is dealt with - especially as _that_ is soo architecture > dependent. > > Annoyingly, when I was looking at the dma_buf API while I was implementing > the armada DRM driver, I did point out the problems with the dma_buf API > wrt passing contiguous regions from one driver to another, and I suggested > separating the DMA mapping operation somewhat, but all that fell on deaf > ears. I guess the results of that are now coming home to roost. > > The fundamental point about coherent DMA memory, or memory that is > provided via the declaration mechanisms is that there is _no_ gurantee > that there will be a "struct page" backing it, so creating a scatterlist > with "struct page" pointers, which is then subsequently dma_map_sg()'d > elsewhere is always going to be totally broken. In the case of some > coherent DMA allocators, there _may_ be a "struct page" backing the > memory, but that's not guaranteed, and since dma_map_sg() relies on > there being a valid "struct page", things will go wrong. > > And by "wrong", I mean it could oops the kernel because of trying to > dereference the "struct page" pointer, or if that's successful, we > could end up invalidating the cache lines for some random region of > memory, which then goes on to cause a completely different failure, or > maybe filesystem corruption (eg, what if it hits a page cache page > containing an inode table, and invalidates all updates to that page?) > The consequences really don't bare thinking about. > >> Just an an experiment, I added this hack: in this case __dc_alloc() >> returned values is the mapped page. This is by mo means a fix - I am >> not sure if we can rely on this page, since at DQBUF its get unmapped. >> >> /* In this case cpu_addr is the page - just use it */ >> if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0) >> page = cpu_addr; >> >> At this point I dumped some data: >> >> [ 154.627505] platform 11000000.codec:left: >> debug_arm_dma_get_sgtable: bus_to_pfn 772608 dma_pfn_offset 0 >> PHYS_PFN_OFFSET 262144 >> [ 154.627512] platform 11000000.codec:left: >> debug_arm_dma_get_sgtable: dma_to_pfn 772608 virt_to_pfn 470272 >> virt_to_dma 1926234112 virt_to_page ef6ca000 page f0004000 >> >> cpu_addr is >> [ 154.627520] platform 11000000.codec:left: >> debug_arm_dma_get_sgtable: cpu_addr f2d00000 handle bca00000 page = >> f2d00000 size 946176 align size 946176 >> [ 154.627527] platform 11000000.codec:left: after >> dma_get_sgtable_attrs() vb2_dc_get_base_sgt: sgt ecdcec00 page >> f2d00000 buf ecdced80 cookie f2d00000 vaddr f2d00000 dc_cookie >> ecdced90 dma_addr bca00000 size 946176 >> >> cpu_addr is f2d00000 and the page you get from >> pfn_to_page(dma_to_pfn(dev, handle)) is f0004000. > > I'd like to see the diff that produced the above dumps to make better > sense of it... I added debug to track the buffer from allocation time to export buf time and all the way to the time map_sg fails. Please see the attached diff. It is somewhat cluttered with lots of debug messages. I also added a debug version (duplicate) arm_dma_get_sgtable() so I can limit the messages to when it gets called from vb2_dc_get_base_sgt() > >> The page you get with struct page *page = pfn_to_page(dma_to_pfn(dev, handle)); >> is bogus. So this check Merek added doesn't find a bogus page, does a reverse >> validation of the bogus page. > > Right - dma_to_pfn() will correctly translate the "handle" to a PFN > number, so we can just multiply that with PAGE_SIZE to get the physical > address, which if I'm interpreting your information above correctly, > would be 0x72d00000. Correct paddr is 0x72d00000. I forgot to add the allocation time debug dump: [ 154.619456] platform 11000000.codec:left: vb2_dc_alloc: buf ecdced80 cookie f2d00000 vaddr f2d00000 paddr 72d00000 dma_addr bca00000 size 946176 dc_cookie ecdced90 dma_to_pfn 772608 I added debug to track the buffer from allocation time to export buf time and all the way to the time map_sg fails. > > However, what you're saying is that there is no valid "struct page" > associated with that address. >>From what I can tell from the data I have, dma_to_pfn(dev, handle); doesn't return valid page at arm_dma_get_sgtable(), even though dma_to_pfn(dev, handle); itself stays the same from alloc time, as you can see in the above dump from vb2_dc_alloc(): dma_to_pfn 772608 I am guessing page might not be valid, even at the alloc time. That is next on debug list. Which is somewhat puzzling, because dma_attrs don't have the DMA_ATTR_NO_KERNEL_MAP set, __dma_alloc() passes in want_vaddr true. __alloc_from_contiguous() will map the page and return a valid vaddr __dma_alloc() then does *handle = pfn_to_dma(dev, page_to_pfn(page)); to return the handle. > > The correct way to validate this is: > > unsigned long pfn = dma_to_pfn(dev, handle); > struct page *page; > > if (!pfn_valid(pfn)) > return -ENXIO; > > page = pfn_to_page(pfn); > > However, I have a huge big problem with this - this can pass even for > memory obtained through the coherent allocator (because of the way CMA > works.) Right. DMA_ATTR_NO_KERNEL_MAP is another variable, if set, there won't be an kernel mapping. > > This can end up with the DMA streaming API kmapping memory using > cacheable attributes while CMA has it mapped with uncacheable attributes, > where (as mentioned on IRC this morning): Which irc channel?? I can get on there for the discussion if it makes it easier. > > you enter the realm of "mismatched memory attributes", > which don't blow up, but you can lose guarantees of both > mappings wrt coherence and ordering > > We really don't want to go there. The problem, however, is that we > can't distingish a CMA page allocated via the DMA coherent allocator > (and so potentially mapped uncacheable) with a normal allocation of > the same page, which would be valid for streaming DMA API use. > > I think the best we can do is as per the above for declared DMA memory > (iow, memory that isn't struct page backed.) It should _never_ be > passed into the streaming DMA API. We can see some of the evidence here in this case. One driver allocates the buffer and another driber tries to import. After buffer is exported, there is a window between DQBUF and QBUF, DQBUF unmaps the buffers. Then the importer comes along referencing the sg_table and looks to map them. Before mapping, looks like D-cache clean has to occur and that would need vaddr if I understand correctly. So there a lot of places, it can trip all over. > > I also think that the dma_get_sgtable() thing is a total trainwreck, > and we need to have a rethink about this. I don't think it works for the case I am running into. > > This isn't a case of "something changed in the architecture that broke > dma_get_sgtable()" - this is a case that it's always been broken in > this way ever since dma_get_sgtable() was introduced, and dma_get_sgtable() > was never (properly?) reviewed with this kind of use-case in mind. I think the use-case I am debugging has always been broken. This is a fairly involved gstreamer pipeline to my knowledge has never been tested. This page fault is a known issue at least for a couple of releases- I started looking into it a couple of releases ago towards the end of 4.9. Maybe there are some simpler use-cases that will work. I am also curious if this works well when DMA_ATTR_NO_KERNEL_MAPPING is set. It is not in my use-case. Thanks, -- Shuah > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net. -------------- next part -------------- A non-text attachment was scrubbed... Name: videobuf2-dma-contig.c_diff Type: application/octet-stream Size: 9116 bytes Desc: not available URL: