From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88A18C4332F for ; Thu, 2 Sep 2021 21:55:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7200660F12 for ; Thu, 2 Sep 2021 21:55:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347898AbhIBV4p (ORCPT ); Thu, 2 Sep 2021 17:56:45 -0400 Received: from foss.arm.com ([217.140.110.172]:58246 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347907AbhIBV4Q (ORCPT ); Thu, 2 Sep 2021 17:56:16 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8F07431B; Thu, 2 Sep 2021 14:55:15 -0700 (PDT) Received: from [10.57.15.112] (unknown [10.57.15.112]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1CC343F5A1; Thu, 2 Sep 2021 14:55:13 -0700 (PDT) Subject: Re: [GIT PULL] (swiotlb) stable/for-linus-5.15 To: Linus Torvalds , Konrad Rzeszutek Wilk , Christoph Hellwig , Marek Szyprowski , Claire Chang , Stefano Stabellini Cc: Will Deacon , pasic@linux.ibm.com, Linux Kernel Mailing List References: From: Robin Murphy Message-ID: Date: Thu, 2 Sep 2021 22:55:05 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Linus, On 2021-09-02 19:54, Linus Torvalds wrote: > On Wed, Sep 1, 2021 at 2:09 PM Konrad Rzeszutek Wilk > wrote: >> >> which has a new feature called restricted DMA pools. It allows SWIOTLB to utilize >> per-device (or per-platform) allocated memory pools instead of using the global one. >> >> The first big user of this is ARM Confidential Computing where the memory for DMA >> operations can be set per platform. > > Hmm. > > I was going to merge this, but I had already merged the dma-mapping > updates from Christoph Hellwig first. > > And it turns out that the conflicts are trivial to do the > straightforward way, that's not a problem. > > But I think that straightforward way is actually buggy, because it > makes the is_swiotlb_for_alloc() type allocations then use that new > global pool for allocations. > > Just to check, I looked into what linux-next did, in case this had > caused any discussion there. Nope. Linux-next does the obvious > straightforward merge. > > So I'm going to hold off merging until I can clarify what the rules > for these restricted DMA pools are, because I get the feeling that > people didn't actually think about the interaction with the new global > pool very much. Not to defend the state things have got into - frankly I've found it increasingly hard to follow for some time now too - but I believe the unwritten story is of the one true dma-direct trying to be all things to all users, and there are several aspects at play here which are mutually exclusive in practice. > Or if they did, it didn't get communicated to me. Christoph mentioned > the merge conflict, but he implied that I should just merge it the > obvious way. And the more I look at that code, the less I think that's > the right thing to do. > > So just to make this very concrete, look at dma_direct_alloc(). The > straightforward merge is to do this: > > if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) && > !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && > !IS_ENABLED(CONFIG_DMA_GLOBAL_POOL) && > !dev_is_dma_coherent(dev) && > !is_swiotlb_for_alloc(dev)) > return arch_dma_alloc(dev, size, dma_handle, gfp, attrs); > > if (IS_ENABLED(CONFIG_DMA_GLOBAL_POOL) && > !dev_is_dma_coherent(dev)) > return dma_alloc_from_global_coherent(dev, size, dma_handle); > > and that seems to be what linux-next did too. > > But that means that a is_swiotlb_for_alloc(dev) allocation can use > that new global coherent pool (see that second case). > > Maybe that's ok. But I get the feeling that it is not. It sure as heck > isn't obvious. If it's ok to use the global pool, then fine - but I > want to know why (and I want the code to say why). The global pool is essentially for noMMU, so should not be expected to overlap with SWIOTLB at all, much less systems using restricted DMA. > I think that the second conditional should probably get an all-new added > > && !is_swiotlb_for_alloc(dev) > > in it, but that's not what happened in linux-next. And all this code > is messy and illogical and crazy. > > Btw, what's the actual logic with the > > !dev_is_dma_coherent(dev) > > tests anyway? Those are pre-existing, but seem odd and senseless. The > function is called "dma_alloc_from_global_coherent()", but it is *not* > called when "dev_is_dma_coherent()". Heh, good terminology for this is hard to find. One way to make sense of it is to consider that either the device itself is coherent - i.e. can snoop caches such that we don't have to much in the way of special treatment - or otherwise it's the memory which has to be made coherent, i.e. remapped non-cacheable, such that CPUs and non-snooping devices can still maintain a consistent view of it (note: I'm focusing on caches here for simplicity, but in practice there's other stuff like memory encryption in the mix too). Hence a "coherent pool" is actually a pool of non-cacheably-mapped memory from which to allocate for non-hardware-coherent devices in contexts where we can't remap regular page allocations on-the-fly (also called an "atomic pool" in various places for that reason). For noMMU with caches, that's effectively all the time, which is where the global pool came from - unlike the atomic pool(s) for MMU systems which can just be allocated and remapped at boot, the global pool is often subject to platform-specific restrictions like being at a special physical alias, so starts life in a very different way. [ And the reason it's called the global coherent pool is from the implementation growing out of per-device coherent pools, which are yet another similar-but-slightly-different thing. For added fun, those may also intersect with restricted DMA, but that all forks off down the dma_alloc_from_dev_coherent() path before we even get to dma-direct. ] After the initial setup, though, I guess it might be possible with a bit more refactoring to streamline the global pool bits into the dma_alloc_from_pool() path, since by that point it's functionally equivalent to the atomic pool(s), just under different conditions. > Is it just me, or does anybody else also get the feeling that too many > recreational drugs may have been involved in the creation of this code > over the years? People possibly self-medicating to take away the pain? > > Anyway - I'm going to punt on this pull request, because I want > clarification that didn't happen despite the conflict in linux-next. > > I also have to say, looking at that code, you guys write collectively > some disgusting code. > > I think individually each change has been probably fairly small, but > taken together it's just too nasty for words. That code should be > taken out behind a shed and shot in the head, because it's entirely > impossible to read all those nasty conditionals over multiple lines. > > So regardless of what the merge resolution should be, I think this > code needs reorganizing. Because those entirely random conditionals > with several different config tests that DO NOT MAKE SENSE TOGETHER > should not exist. > > The merge conflict is just a symptom of the disease in this code, not > the problem itself. > > That code needs to be split up, and the logic needs to be made > understandable. Preferably *obvious*. Which it isn't now. I suppose one logical option would be to split out dedicated versions of dma-direct for at least the major differences - noMMU, CONFIG_DMA_DIRECT_REMAP, etc. - but that does run the risk of ending up as a lot of very similar looking code and being a rich source of bugs where things get added/fixed in one place but not the others. > In particular, if the rule is that "is_swiotlb_for_alloc()" means that > you *have* to use swiotlb_alloc() and absolutely nothing else", then > that should be the logic, and it should be at the top, and that should > be all it is. No m ixing with the other tests that are about entirely > different issues. Sadly it doesn't entirely mean that. A device using restricted DMA *should* usually end up allocating from its own dedicated SWIOTLB pool, but there may be cases where falling back to dma_direct_alloc_from_pool() is still OK, or maybe even necessary. > Because right now, that code is literally set up to use completely > random different allocation choices depending on completely random and > illogical combinations of config rules mixed with random device rules. > > Make each conditional be about *one* thing, and one thing only. Not > about three unrelated things that may or may have the same end result > rule. > > See what I'm saying? > > The solution may be something like > > if (is_swiotlb_for_alloc(dev)) > goto swiotlb_alloc_only; > > at the top of the allocation routines, or whatever. Don't make the > other conditionals - that have nothing to do with this thing - get > mixed up in the fundamental rule. > > Also, that's an odd name. "is_swiotlb_for_alloc()"? Those four words > do not make sense together in that order. Shouldn't it be > "use_swiotlb_for_alloc()"? > > So make the actual rules individual, and make them make sense. It's ok > to have five (or more) of those kinds of things, but they need to be > individually sensible, and not randomly tied to each other as one big > broken non-sensible thing. > > Please no more of this completely crazy "let's mix it all together in > a blender, and write code that makes no sense". > > In the immediate short term, please > > (a) explain what the rules actually are I hope some of the background above has helped a bit. > (b) so that I can do a merge that actually does the right thing (ie I > _think_ I should add that extra test above, but I want this thought > about and confirmed or be told why that isn't the case). > > and then the rewrite should be a future step. But that rewrite of > those disgusting "rules" absolutely NEEDS to happen, because looking > at that spaghetti of random config options and other random > conditionals, that's simply not acceptable. FWIW, no argument from me there - I believe it's actually ChromeOS and Android pKVM having the initial use-cases for getting this new SWIOTLB functionality landed, but in terms of Arm CCA we certainly still have some interest in it at the moment, so I might be able to wrangle some time to take on a bit of cleanup from that angle. Cheers, Robin. > It's particularly not acceptable since I suspect that it all probably > *works* even if you allocate from the wrong pool, but it then > presumably violates some security rule that it was set up so that > allocations would *not* preferably get mixed up in some global pool. > > Pls advise. I'm not asking anybody to do the merge for me - I'm > literally asking for that code and the rules to be clarified. > > Linus >