From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA037C1B0F7 for ; Fri, 18 Jan 2019 20:46:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9F52320896 for ; Fri, 18 Jan 2019 20:46:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729535AbfARUqh (ORCPT ); Fri, 18 Jan 2019 15:46:37 -0500 Received: from mail-qt1-f175.google.com ([209.85.160.175]:35999 "EHLO mail-qt1-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729488AbfARUqh (ORCPT ); Fri, 18 Jan 2019 15:46:37 -0500 Received: by mail-qt1-f175.google.com with SMTP id t13so16733531qtn.3 for ; Fri, 18 Jan 2019 12:46:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=SfVK3e4SvCHLJ9vXT+j4zqH+UJhSUx5J4Aed1DcHXvY=; b=WaIP3uwtvPyRbtegaWq2QFUnVCsMRyR/XkNUjVNQ/SifbMD8emT1Hf8fPlqwL0ZjGM kz1xZxZgu55cWdl9Ro6OIIszSce7LKLSME6a4Zt2LbvFOHuK6aUyHQuJzRhLcG7bWZt9 VAuXcd7SvGG+tcZgBWbLsPxgHheP1bULT91sitDGFdbtkEjNzPssk5y8yIFGSlcaaRUP 0xGTdUCdrH7Z3aaBrPcOs0BkjjOUi/iPqRtf5UrTeZvHBLMPjOokH1UY6QUGA4TuX6B8 tT7bjxByFiHA/qEtwFkUXlhFHRwYtF2ph/l3fH8oAx3aizE2dg8ZsgM92ja/TKN5p44T Vvvg== X-Gm-Message-State: AJcUukd4iJMbHXdv24zHTm7hblZ5ifpj+gLW5ZEymmydG07PRoNmpgtA 2HzeaWJppUUiWHmXohFByT/4Lw== X-Google-Smtp-Source: ALg8bN42oyOlamURWl76K39ij6rrDklpaNByy+X3NXRiN0dNyc/qZ2U3awUG4nesjzqzN7uqW+xxJg== X-Received: by 2002:ac8:4258:: with SMTP id r24mr17844066qtm.213.1547844395838; Fri, 18 Jan 2019 12:46:35 -0800 (PST) Received: from ?IPv6:2601:602:9800:dae6::fb21? ([2601:602:9800:dae6::fb21]) by smtp.gmail.com with ESMTPSA id 5sm67808240qtw.50.2019.01.18.12.46.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 18 Jan 2019 12:46:35 -0800 (PST) Subject: Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap To: "Andrew F. Davis" , Liam Mark Cc: Sumit Semwal , Greg Kroah-Hartman , =?UTF-8?Q?Arve_Hj=c3=b8nnev=c3=a5g?= , devel@driverdev.osuosl.org, linux-kernel@vger.kernel.org, dri-devel References: <20190111180523.27862-1-afd@ti.com> <20190111180523.27862-14-afd@ti.com> <79eb70f6-00b0-2939-5ec9-65e196ab4987@ti.com> <99ca0b08-02bd-64fd-d43c-c330f0d11639@ti.com> <7620534f-b749-76f9-0f53-f73e3f12e9a9@ti.com> <748fdc07-1e0d-b933-f7f3-2e79cda20271@ti.com> From: Laura Abbott Message-ID: <41c5c506-8c45-9062-a26d-79a9e5428f65@redhat.com> Date: Fri, 18 Jan 2019 12:46:33 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <748fdc07-1e0d-b933-f7f3-2e79cda20271@ti.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/18/19 12:43 PM, Andrew F. Davis wrote: > On 1/18/19 2:31 PM, Laura Abbott wrote: >> On 1/17/19 8:13 AM, Andrew F. Davis wrote: >>> On 1/16/19 4:48 PM, Liam Mark wrote: >>>> On Wed, 16 Jan 2019, Andrew F. Davis wrote: >>>> >>>>> On 1/15/19 1:05 PM, Laura Abbott wrote: >>>>>> On 1/15/19 10:38 AM, Andrew F. Davis wrote: >>>>>>> On 1/15/19 11:45 AM, Liam Mark wrote: >>>>>>>> On Tue, 15 Jan 2019, Andrew F. Davis wrote: >>>>>>>> >>>>>>>>> On 1/14/19 11:13 AM, Liam Mark wrote: >>>>>>>>>> On Fri, 11 Jan 2019, Andrew F. Davis wrote: >>>>>>>>>> >>>>>>>>>>> Buffers may not be mapped from the CPU so skip cache maintenance >>>>>>>>>>> here. >>>>>>>>>>> Accesses from the CPU to a cached heap should be bracketed with >>>>>>>>>>> {begin,end}_cpu_access calls so maintenance should not be needed >>>>>>>>>>> anyway. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Andrew F. Davis >>>>>>>>>>> --- >>>>>>>>>>>    drivers/staging/android/ion/ion.c | 7 ++++--- >>>>>>>>>>>    1 file changed, 4 insertions(+), 3 deletions(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/drivers/staging/android/ion/ion.c >>>>>>>>>>> b/drivers/staging/android/ion/ion.c >>>>>>>>>>> index 14e48f6eb734..09cb5a8e2b09 100644 >>>>>>>>>>> --- a/drivers/staging/android/ion/ion.c >>>>>>>>>>> +++ b/drivers/staging/android/ion/ion.c >>>>>>>>>>> @@ -261,8 +261,8 @@ static struct sg_table >>>>>>>>>>> *ion_map_dma_buf(struct >>>>>>>>>>> dma_buf_attachment *attachment, >>>>>>>>>>>          table = a->table; >>>>>>>>>>>    -    if (!dma_map_sg(attachment->dev, table->sgl, >>>>>>>>>>> table->nents, >>>>>>>>>>> -            direction)) >>>>>>>>>>> +    if (!dma_map_sg_attrs(attachment->dev, table->sgl, >>>>>>>>>>> table->nents, >>>>>>>>>>> +                  direction, DMA_ATTR_SKIP_CPU_SYNC)) >>>>>>>>>> >>>>>>>>>> Unfortunately I don't think you can do this for a couple reasons. >>>>>>>>>> You can't rely on {begin,end}_cpu_access calls to do cache >>>>>>>>>> maintenance. >>>>>>>>>> If the calls to {begin,end}_cpu_access were made before the >>>>>>>>>> call to >>>>>>>>>> dma_buf_attach then there won't have been a device attached so the >>>>>>>>>> calls >>>>>>>>>> to {begin,end}_cpu_access won't have done any cache maintenance. >>>>>>>>>> >>>>>>>>> >>>>>>>>> That should be okay though, if you have no attachments (or all >>>>>>>>> attachments are IO-coherent) then there is no need for cache >>>>>>>>> maintenance. Unless you mean a sequence where a non-io-coherent >>>>>>>>> device >>>>>>>>> is attached later after data has already been written. Does that >>>>>>>>> sequence need supporting? >>>>>>>> >>>>>>>> Yes, but also I think there are cases where CPU access can happen >>>>>>>> before >>>>>>>> in Android, but I will focus on later for now. >>>>>>>> >>>>>>>>> DMA-BUF doesn't have to allocate the backing >>>>>>>>> memory until map_dma_buf() time, and that should only happen >>>>>>>>> after all >>>>>>>>> the devices have attached so it can know where to put the >>>>>>>>> buffer. So we >>>>>>>>> shouldn't expect any CPU access to buffers before all the >>>>>>>>> devices are >>>>>>>>> attached and mapped, right? >>>>>>>>> >>>>>>>> >>>>>>>> Here is an example where CPU access can happen later in Android. >>>>>>>> >>>>>>>> Camera device records video -> software post processing -> video >>>>>>>> device >>>>>>>> (who does compression of raw data) and writes to a file >>>>>>>> >>>>>>>> In this example assume the buffer is cached and the devices are not >>>>>>>> IO-coherent (quite common). >>>>>>>> >>>>>>> >>>>>>> This is the start of the problem, having cached mappings of memory >>>>>>> that >>>>>>> is also being accessed non-coherently is going to cause issues one >>>>>>> way >>>>>>> or another. On top of the speculative cache fills that have to be >>>>>>> constantly fought back against with CMOs like below; some coherent >>>>>>> interconnects behave badly when you mix coherent and non-coherent >>>>>>> access >>>>>>> (snoop filters get messed up). >>>>>>> >>>>>>> The solution is to either always have the addresses marked >>>>>>> non-coherent >>>>>>> (like device memory, no-map carveouts), or if you really want to use >>>>>>> regular system memory allocated at runtime, then all cached >>>>>>> mappings of >>>>>>> it need to be dropped, even the kernel logical address (area as >>>>>>> painful >>>>>>> as that would be). >>>>>>> >>>>>> >>>>>> I agree it's broken, hence my desire to remove it :) >>>>>> >>>>>> The other problem is that uncached buffers are being used for >>>>>> performance reason so anything that would involve getting >>>>>> rid of the logical address would probably negate any performance >>>>>> benefit. >>>>>> >>>>> >>>>> I wouldn't go as far as to remove them just yet.. Liam seems pretty >>>>> adamant that they have valid uses. I'm just not sure performance is one >>>>> of them, maybe in the case of software locks between devices or >>>>> something where there needs to be a lot of back and forth interleaved >>>>> access on small amounts of data? >>>>> >>>> >>>> I wasn't aware that ARM considered this not supported, I thought it was >>>> supported but they advised against it because of the potential >>>> performance >>>> impact. >>>> >>> >>> Not sure what you mean by "this" being not supported, do you mean mixed >>> attribute mappings? If so, it will certainly cause problems, and the >>> problems will change from platform to platform, avoid at all costs is my >>> understanding of ARM's position. >>> >>>> This is after all supported in the DMA APIs and up until now devices >>>> have >>>> been successfully commercializing with this configurations, and I think >>>> they will continue to commercialize with these configurations for >>>> quite a >>>> while. >>>> >>> >>> Use of uncached memory mappings are almost always wrong in my experience >>> and are used to work around some bug or because the user doesn't want to >>> implement proper CMOs. Counter examples welcome. >>> >>>> It would be really unfortunate if support was removed as I think that >>>> would drive clients away from using upstream ION. >>>> >>> >>> I'm not petitioning to remove support, but at very least lets reverse >>> the ION_FLAG_CACHED flag. Ion should hand out cached normal memory by >>> default, to get uncached you should need to add a flag to your >>> allocation command pointing out you know what you are doing. >>> >> >> I thought about doing that, the problem is it becomes an ABI break for >> existing users which I really didn't want to do again. If it >> ends up being the last thing we do before moving out of staging, >> I'd consider doing it. >> >>>>>>>> ION buffer is allocated. >>>>>>>> >>>>>>>> //Camera device records video >>>>>>>> dma_buf_attach >>>>>>>> dma_map_attachment (buffer needs to be cleaned) >>>>>>> >>>>>>> Why does the buffer need to be cleaned here? I just got through >>>>>>> reading >>>>>>> the thread linked by Laura in the other reply. I do like +Brian's >>>>>>> suggestion of tracking if the buffer has had CPU access since the >>>>>>> last >>>>>>> time and only flushing the cache if it has. As unmapped heaps >>>>>>> never get >>>>>>> CPU mapped this would never be the case for unmapped heaps, it >>>>>>> solves my >>>>>>> problem. >>>>>>> >>>>>>>> [camera device writes to buffer] >>>>>>>> dma_buf_unmap_attachment (buffer needs to be invalidated) >>>>>>> >>>>>>> It doesn't know there will be any further CPU access, it could get >>>>>>> freed >>>>>>> after this for all we know, the invalidate can be saved until the CPU >>>>>>> requests access again. >>>>>>> >>>>>>>> dma_buf_detach  (device cannot stay attached because it is being >>>>>>>> sent >>>>>>>> down >>>>>>>> the pipeline and Camera doesn't know the end of the use case) >>>>>>>> >>>>>>> >>>>>>> This seems like a broken use-case, I understand the desire to keep >>>>>>> everything as modular as possible and separate the steps, but at this >>>>>>> point no one owns this buffers backing memory, not the CPU or any >>>>>>> device. I would go as far as to say DMA-BUF should be free now to >>>>>>> de-allocate the backing storage if it wants, that way it could get >>>>>>> ready >>>>>>> for the next attachment, which may change the required backing memory >>>>>>> completely. >>>>>>> >>>>>>> All devices should attach before the first mapping, and only let go >>>>>>> after the task is complete, otherwise this buffers data needs >>>>>>> copied off >>>>>>> to a different location or the CPU needs to take ownership >>>>>>> in-between. >>>>>>> >>>>>> >>>>>> Maybe it's broken but it's the status quo and we spent a good >>>>>> amount of time at plumbers concluding there isn't a great way >>>>>> to fix it :/ >>>>>> >>>>> >>>>> Hmm, guess that doesn't prove there is not a great way to fix it >>>>> either.. :/ >>>>> >>>>> Perhaps just stronger rules on sequencing of operations? I'm not saying >>>>> I have a good solution either, I just don't see any way forward without >>>>> some use-case getting broken, so better to fix now over later. >>>>> >>>> >>>> I can see the benefits of Android doing things the way they do, I would >>>> request that changes we make continue to support Android, or we find >>>> a way >>>> to convice them to change, as they are the main ION client and I assume >>>> other ION clients in the future will want to do this as well. >>>> >>> >>> Android may be the biggest user today (makes sense, Ion come out of the >>> Android project), but that can change, and getting changes into Android >>> will be easier that the upstream kernel once Ion is out of staging. >>> >>> Unlike some other big ARM vendors, we (TI) do not primarily build mobile >>> chips targeting Android, our core offerings target more traditional >>> Linux userspaces, and I'm guessing others will start to do the same as >>> ARM tries to push more into desktop, server, and other spaces again. >>> >>>> I am concerned that if you go with a solution which enforces what you >>>> mention above, and bring ION out of staging that way, it will make it >>>> that >>>> much harder to solve this for Android and therefore harder to get >>>> Android clients to move to the upstream ION (and get everybody off their >>>> vendor modified Android versions). >>>> >>> >>> That would be an Android problem, reducing functionality in upstream to >>> match what some evil vendor trees do to support Android is not the way >>> forward on this. At least for us we are going to try to make all our >>> software offerings follow proper buffer ownership (including our Android >>> offering). >>> >> >> I don't think this is reducing functionality, it's about not breaking >> what already works. There is some level of Android testing on a mainline >> tree (hikey boards). I would say if we can come to an agreement on >> a correct API, we could always merge the 'correct' version out of >> staging and keep a legacy driver around for some time as a transition. >> > > I'm not sure that is what staging should be for, but I can certainly see > why you would want that (I help maintain our Android offering and every > kernel migration I get to go fixup libion and all its users..). > > I'm sure we all know the API will get broken to get this out of staging, > so maybe we need to start a list (or update the TODO) with all the > things we agree need changed during the last step before destaging. > Sounds like you agree about the ION_FLAG_CACHED reversal for starters. I > think direct heap managed dma_buf_ops will be needed. > > What's left, do we have any current proposals for the heap query > floating around that can go up for review? > I was hoping the last time I broke the API would be the last time we would need to break it again. The query ioctl is already merged and I haven't seen any other counter proposals around for discussion. The TODO list could probably use some updating though. > Thanks, > Andrew > >> Thanks, >> Laura