linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] (swiotlb) stable/for-linus-5.15
@ 2021-09-01 21:08 Konrad Rzeszutek Wilk
  2021-09-02 18:54 ` Linus Torvalds
  2021-09-03 18:45 ` pr-tracker-bot
  0 siblings, 2 replies; 8+ messages in thread
From: Konrad Rzeszutek Wilk @ 2021-09-01 21:08 UTC (permalink / raw)
  To: tientzu, will, pasic, linux-kernel, Linus Torvalds

Hey Linus,

Please git pull the following branch:

git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git stable/for-linus-5.15

which has a new feature called restricted DMA pools. It allows SWIOTLB to utilize
per-device (or per-platform) allocated memory pools instead of using the global one.

The first big user of this is ARM Confidential Computing where the memory for DMA
operations can be set per platform.

Please pull:

 .../bindings/reserved-memory/reserved-memory.txt   |  36 ++-
 arch/powerpc/platforms/pseries/svm.c               |   6 +
 arch/s390/mm/init.c                                |   2 +-
 drivers/base/core.c                                |   4 +
 drivers/gpu/drm/i915/gem/i915_gem_internal.c       |   2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c              |   2 +-
 drivers/iommu/dma-iommu.c                          |  12 +-
 drivers/of/device.c                                |  40 +++
 drivers/pci/xen-pcifront.c                         |   2 +-
 drivers/xen/swiotlb-xen.c                          |   8 +-
 include/linux/device.h                             |   4 +
 include/linux/swiotlb.h                            |  57 +++-
 kernel/dma/Kconfig                                 |  13 +
 kernel/dma/direct.c                                |  59 +++-
 kernel/dma/direct.h                                |   8 +-
 kernel/dma/swiotlb.c                               | 352 +++++++++++++++------
 16 files changed, 469 insertions(+), 138 deletions(-)

Claire Chang (14):
      swiotlb: Refactor swiotlb init functions
      swiotlb: Refactor swiotlb_create_debugfs
      swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used
      swiotlb: Update is_swiotlb_buffer to add a struct device argument
      swiotlb: Update is_swiotlb_active to add a struct device argument
      swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
      swiotlb: Move alloc_size to swiotlb_find_slots
      swiotlb: Refactor swiotlb_tbl_unmap_single
      swiotlb: Add restricted DMA alloc/free support
      swiotlb: Add restricted DMA pool initialization
      dt-bindings: of: Add restricted DMA pool
      of: Add plumbing for restricted DMA pool
      swiotlb: fix implicit debugfs declarations
      swiotlb: use depends on for DMA_RESTRICTED_POOL

Dominique Martinet (1):
      swiotlb: add overflow checks to swiotlb_bounce

Halil Pasic (1):
      s390/pv: fix the forcing of the swiotlb

Will Deacon (7):
      of: Return success from of_dma_set_restricted_buffer() when !OF_ADDRESS
      swiotlb: Convert io_default_tlb_mem to static allocation
      swiotlb: Emit diagnostic in swiotlb_exit()
      swiotlb: Free tbl memory in swiotlb_exit()
      powerpc/svm: Don't issue ultracalls if !mem_encrypt_active()
      of: Move of_dma_set_restricted_buffer() into device.c
      of: restricted dma: Don't fail device probe on rmem init failure


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] (swiotlb) stable/for-linus-5.15
  2021-09-01 21:08 [GIT PULL] (swiotlb) stable/for-linus-5.15 Konrad Rzeszutek Wilk
@ 2021-09-02 18:54 ` Linus Torvalds
  2021-09-02 21:55   ` Robin Murphy
  2021-09-03 18:45 ` pr-tracker-bot
  1 sibling, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2021-09-02 18:54 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Christoph Hellwig, Marek Szyprowski,
	Robin Murphy, Claire Chang, Stefano Stabellini
  Cc: Will Deacon, pasic, Linux Kernel Mailing List

On Wed, Sep 1, 2021 at 2:09 PM Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>
> which has a new feature called restricted DMA pools. It allows SWIOTLB to utilize
> per-device (or per-platform) allocated memory pools instead of using the global one.
>
> The first big user of this is ARM Confidential Computing where the memory for DMA
> operations can be set per platform.

Hmm.

I was going to merge this, but I had already merged the dma-mapping
updates from Christoph Hellwig first.

And it turns out that the conflicts are trivial to do the
straightforward way, that's not a problem.

But I think that straightforward way is actually buggy, because it
makes the is_swiotlb_for_alloc() type allocations then use that new
global pool for allocations.

Just to check, I looked into what linux-next did, in case this had
caused any discussion there. Nope. Linux-next does the obvious
straightforward merge.

So I'm going to hold off merging until I can clarify what the rules
for these restricted DMA pools are, because I get the feeling that
people didn't actually think about the interaction with the new global
pool very much.

Or if they did, it didn't get communicated to me. Christoph mentioned
the merge conflict, but he implied that I should just merge it the
obvious way. And the more I look at that code, the less I think that's
the right thing to do.

So just to make this very concrete, look at dma_direct_alloc(). The
straightforward merge is to do this:

        if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
            !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
            !IS_ENABLED(CONFIG_DMA_GLOBAL_POOL) &&
            !dev_is_dma_coherent(dev) &&
            !is_swiotlb_for_alloc(dev))
                return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);

        if (IS_ENABLED(CONFIG_DMA_GLOBAL_POOL) &&
            !dev_is_dma_coherent(dev))
                return dma_alloc_from_global_coherent(dev, size, dma_handle);

and that seems to be what linux-next did too.

But that means that a is_swiotlb_for_alloc(dev) allocation can use
that new global coherent pool (see that second case).

Maybe that's ok. But I get the feeling that it is not. It sure as heck
isn't obvious. If it's ok to use the global pool, then fine - but I
want to know why (and I want the code to say why).

I think that the second conditional should probably get an all-new added

            && !is_swiotlb_for_alloc(dev)

in it, but that's not what happened in linux-next. And all this code
is messy and illogical and crazy.

Btw, what's the actual logic with the

            !dev_is_dma_coherent(dev)

tests anyway? Those are pre-existing, but seem odd and senseless. The
function is called "dma_alloc_from_global_coherent()", but it is *not*
called when "dev_is_dma_coherent()".

Is it just me, or does anybody else also get the feeling that too many
recreational drugs may have been involved in the creation of this code
over the years? People possibly self-medicating to take away the pain?

Anyway - I'm going to punt on this pull request, because I want
clarification that didn't happen despite the conflict in linux-next.

I also have to say, looking at that code, you guys write collectively
some disgusting code.

I think individually each change has been probably fairly small, but
taken together it's just too nasty for words. That code should be
taken out behind a shed and shot in the head, because it's entirely
impossible to read all those nasty conditionals over multiple lines.

So regardless of what the merge resolution should be, I think this
code needs reorganizing. Because those entirely random conditionals
with several different config tests that DO NOT MAKE SENSE TOGETHER
should not exist.

The merge conflict is just a symptom of the disease in this code, not
the problem itself.

That code needs to be split up, and the logic needs to be made
understandable. Preferably *obvious*. Which it isn't now.

In particular, if the rule is that "is_swiotlb_for_alloc()" means that
you *have* to use swiotlb_alloc() and absolutely nothing else", then
that should be the logic, and it should be at the top, and that should
be all it is. No m ixing with the other tests that are about entirely
different issues.

Because right now, that code is literally set up to use completely
random different allocation choices depending on completely random and
illogical combinations of config rules mixed with random device rules.

Make each conditional be about *one* thing, and one thing only. Not
about three unrelated things that may or may have the same end result
rule.

See what I'm saying?

The solution may be something like

        if (is_swiotlb_for_alloc(dev))
                goto swiotlb_alloc_only;

at the top of the allocation routines, or whatever. Don't make the
other conditionals - that have nothing to do with this thing - get
mixed up in the fundamental rule.

Also, that's an odd name. "is_swiotlb_for_alloc()"? Those four words
do not make sense together in that order. Shouldn't it be
"use_swiotlb_for_alloc()"?

So make the actual rules individual, and make them make sense. It's ok
to have five (or more) of those kinds of things, but they need to be
individually sensible, and not randomly tied to each other as one big
broken non-sensible thing.

Please no more of this completely crazy "let's mix it all together in
a blender, and write code that makes no sense".

In the immediate short term, please

 (a) explain what the rules actually are

 (b) so that I can do a merge that actually does the right thing (ie I
_think_ I should add that extra test above, but I want this thought
about and confirmed or be told why that isn't the case).

and then the rewrite should be a future step.  But that rewrite of
those disgusting "rules" absolutely NEEDS to happen, because looking
at that spaghetti of random config options and other random
conditionals, that's simply not acceptable.

It's particularly not acceptable since I suspect that it all probably
*works* even if you allocate from the wrong pool, but it then
presumably violates some security rule that it was set up so that
allocations would *not* preferably get mixed up in some global pool.

Pls advise. I'm not asking anybody to do the merge for me - I'm
literally asking for that code and the rules to be clarified.

             Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] (swiotlb) stable/for-linus-5.15
  2021-09-02 18:54 ` Linus Torvalds
@ 2021-09-02 21:55   ` Robin Murphy
  2021-09-03  0:42     ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Robin Murphy @ 2021-09-02 21:55 UTC (permalink / raw)
  To: Linus Torvalds, Konrad Rzeszutek Wilk, Christoph Hellwig,
	Marek Szyprowski, Claire Chang, Stefano Stabellini
  Cc: Will Deacon, pasic, Linux Kernel Mailing List

Hi Linus,

On 2021-09-02 19:54, Linus Torvalds wrote:
> On Wed, Sep 1, 2021 at 2:09 PM Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>>
>> which has a new feature called restricted DMA pools. It allows SWIOTLB to utilize
>> per-device (or per-platform) allocated memory pools instead of using the global one.
>>
>> The first big user of this is ARM Confidential Computing where the memory for DMA
>> operations can be set per platform.
> 
> Hmm.
> 
> I was going to merge this, but I had already merged the dma-mapping
> updates from Christoph Hellwig first.
> 
> And it turns out that the conflicts are trivial to do the
> straightforward way, that's not a problem.
> 
> But I think that straightforward way is actually buggy, because it
> makes the is_swiotlb_for_alloc() type allocations then use that new
> global pool for allocations.
> 
> Just to check, I looked into what linux-next did, in case this had
> caused any discussion there. Nope. Linux-next does the obvious
> straightforward merge.
> 
> So I'm going to hold off merging until I can clarify what the rules
> for these restricted DMA pools are, because I get the feeling that
> people didn't actually think about the interaction with the new global
> pool very much.

Not to defend the state things have got into - frankly I've found it 
increasingly hard to follow for some time now too - but I believe the 
unwritten story is of the one true dma-direct trying to be all things to 
all users, and there are several aspects at play here which are mutually 
exclusive in practice.

> Or if they did, it didn't get communicated to me. Christoph mentioned
> the merge conflict, but he implied that I should just merge it the
> obvious way. And the more I look at that code, the less I think that's
> the right thing to do.
> 
> So just to make this very concrete, look at dma_direct_alloc(). The
> straightforward merge is to do this:
> 
>          if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
>              !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
>              !IS_ENABLED(CONFIG_DMA_GLOBAL_POOL) &&
>              !dev_is_dma_coherent(dev) &&
>              !is_swiotlb_for_alloc(dev))
>                  return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
> 
>          if (IS_ENABLED(CONFIG_DMA_GLOBAL_POOL) &&
>              !dev_is_dma_coherent(dev))
>                  return dma_alloc_from_global_coherent(dev, size, dma_handle);
> 
> and that seems to be what linux-next did too.
> 
> But that means that a is_swiotlb_for_alloc(dev) allocation can use
> that new global coherent pool (see that second case).
 >
> Maybe that's ok. But I get the feeling that it is not. It sure as heck
> isn't obvious. If it's ok to use the global pool, then fine - but I
> want to know why (and I want the code to say why).

The global pool is essentially for noMMU, so should not be expected to 
overlap with SWIOTLB at all, much less systems using restricted DMA.

> I think that the second conditional should probably get an all-new added
> 
>              && !is_swiotlb_for_alloc(dev)
> 
> in it, but that's not what happened in linux-next. And all this code
> is messy and illogical and crazy.
> 
> Btw, what's the actual logic with the
> 
>              !dev_is_dma_coherent(dev)
> 
> tests anyway? Those are pre-existing, but seem odd and senseless. The
> function is called "dma_alloc_from_global_coherent()", but it is *not*
> called when "dev_is_dma_coherent()".

Heh, good terminology for this is hard to find. One way to make sense of 
it is to consider that either the device itself is coherent - i.e. can 
snoop caches such that we don't have to much in the way of special 
treatment - or otherwise it's the memory which has to be made coherent, 
i.e. remapped non-cacheable, such that CPUs and non-snooping devices can 
still maintain a consistent view of it (note: I'm focusing on caches 
here for simplicity, but in practice there's other stuff like memory 
encryption in the mix too). Hence a "coherent pool" is actually a pool 
of non-cacheably-mapped memory from which to allocate for 
non-hardware-coherent devices in contexts where we can't remap regular 
page allocations on-the-fly (also called an "atomic pool" in various 
places for that reason). For noMMU with caches, that's effectively all 
the time, which is where the global pool came from - unlike the atomic 
pool(s) for MMU systems which can just be allocated and remapped at 
boot, the global pool is often subject to platform-specific restrictions 
like being at a special physical alias, so starts life in a very 
different way.

[ And the reason it's called the global coherent pool is from the 
implementation growing out of per-device coherent pools, which are yet 
another similar-but-slightly-different thing. For added fun, those may 
also intersect with restricted DMA, but that all forks off down the 
dma_alloc_from_dev_coherent() path before we even get to dma-direct. ]

After the initial setup, though, I guess it might be possible with a bit 
more refactoring to streamline the global pool bits into the 
dma_alloc_from_pool() path, since by that point it's functionally 
equivalent to the atomic pool(s), just under different conditions.

> Is it just me, or does anybody else also get the feeling that too many
> recreational drugs may have been involved in the creation of this code
> over the years? People possibly self-medicating to take away the pain?
> 
> Anyway - I'm going to punt on this pull request, because I want
> clarification that didn't happen despite the conflict in linux-next.
> 
> I also have to say, looking at that code, you guys write collectively
> some disgusting code.
> 
> I think individually each change has been probably fairly small, but
> taken together it's just too nasty for words. That code should be
> taken out behind a shed and shot in the head, because it's entirely
> impossible to read all those nasty conditionals over multiple lines.
> 
> So regardless of what the merge resolution should be, I think this
> code needs reorganizing. Because those entirely random conditionals
> with several different config tests that DO NOT MAKE SENSE TOGETHER
> should not exist.
> 
> The merge conflict is just a symptom of the disease in this code, not
> the problem itself.
> 
> That code needs to be split up, and the logic needs to be made
> understandable. Preferably *obvious*. Which it isn't now.

I suppose one logical option would be to split out dedicated versions of 
dma-direct for at least the major differences - noMMU, 
CONFIG_DMA_DIRECT_REMAP, etc. - but that does run the risk of ending up 
as a lot of very similar looking code and being a rich source of bugs 
where things get added/fixed in one place but not the others.

> In particular, if the rule is that "is_swiotlb_for_alloc()" means that
> you *have* to use swiotlb_alloc() and absolutely nothing else", then
> that should be the logic, and it should be at the top, and that should
> be all it is. No m ixing with the other tests that are about entirely
> different issues.

Sadly it doesn't entirely mean that. A device using restricted DMA 
*should* usually end up allocating from its own dedicated SWIOTLB pool, 
but there may be cases where falling back to 
dma_direct_alloc_from_pool() is still OK, or maybe even necessary.

> Because right now, that code is literally set up to use completely
> random different allocation choices depending on completely random and
> illogical combinations of config rules mixed with random device rules.
> 
> Make each conditional be about *one* thing, and one thing only. Not
> about three unrelated things that may or may have the same end result
> rule.
> 
> See what I'm saying?
> 
> The solution may be something like
> 
>          if (is_swiotlb_for_alloc(dev))
>                  goto swiotlb_alloc_only;
> 
> at the top of the allocation routines, or whatever. Don't make the
> other conditionals - that have nothing to do with this thing - get
> mixed up in the fundamental rule.
> 
> Also, that's an odd name. "is_swiotlb_for_alloc()"? Those four words
> do not make sense together in that order. Shouldn't it be
> "use_swiotlb_for_alloc()"?
> 
> So make the actual rules individual, and make them make sense. It's ok
> to have five (or more) of those kinds of things, but they need to be
> individually sensible, and not randomly tied to each other as one big
> broken non-sensible thing.
> 
> Please no more of this completely crazy "let's mix it all together in
> a blender, and write code that makes no sense".
> 
> In the immediate short term, please
> 
>   (a) explain what the rules actually are

I hope some of the background above has helped a bit.

>   (b) so that I can do a merge that actually does the right thing (ie I
> _think_ I should add that extra test above, but I want this thought
> about and confirmed or be told why that isn't the case).
> 
> and then the rewrite should be a future step.  But that rewrite of
> those disgusting "rules" absolutely NEEDS to happen, because looking
> at that spaghetti of random config options and other random
> conditionals, that's simply not acceptable.

FWIW, no argument from me there - I believe it's actually ChromeOS and 
Android pKVM having the initial use-cases for getting this new SWIOTLB 
functionality landed, but in terms of Arm CCA we certainly still have 
some interest in it at the moment, so I might be able to wrangle some 
time to take on a bit of cleanup from that angle.

Cheers,
Robin.

> It's particularly not acceptable since I suspect that it all probably
> *works* even if you allocate from the wrong pool, but it then
> presumably violates some security rule that it was set up so that
> allocations would *not* preferably get mixed up in some global pool.
> 
> Pls advise. I'm not asking anybody to do the merge for me - I'm
> literally asking for that code and the rules to be clarified.
> 
>               Linus
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] (swiotlb) stable/for-linus-5.15
  2021-09-02 21:55   ` Robin Murphy
@ 2021-09-03  0:42     ` Linus Torvalds
  2021-09-03  5:09       ` Christoph Hellwig
  2021-09-03 13:51       ` Robin Murphy
  0 siblings, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2021-09-03  0:42 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Konrad Rzeszutek Wilk, Christoph Hellwig, Marek Szyprowski,
	Claire Chang, Stefano Stabellini, Will Deacon, pasic,
	Linux Kernel Mailing List

On Thu, Sep 2, 2021 at 2:55 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> Not to defend the state things have got into - frankly I've found it
> increasingly hard to follow for some time now too - but I believe the
> unwritten story is of the one true dma-direct trying to be all things to
> all users, and there are several aspects at play here which are mutually
> exclusive in practice.

Ugh.

Some of that "mutual exclusion" seems to be encoded in peoples heads,
not in the code itself.

> The global pool is essentially for noMMU, so should not be expected to
> overlap with SWIOTLB at all, much less systems using restricted DMA.

If I read that right, you're saying that in this case, the possible
mis-merge doesn't matter?

> > Btw, what's the actual logic with the
> >
> >              !dev_is_dma_coherent(dev)
> >
> > tests anyway? Those are pre-existing, but seem odd and senseless. The
> > function is called "dma_alloc_from_global_coherent()", but it is *not*
> > called when "dev_is_dma_coherent()".
>
> Heh, good terminology for this is hard to find. One way to make sense of
> it is to consider that either the device itself is coherent - i.e. can
> snoop caches such that we don't have to much in the way of special
> treatment - or otherwise it's the memory which has to be made coherent,
> i.e. remapped non-cacheable, such that CPUs and non-snooping devices can
> still maintain a consistent view of it (note: I'm focusing on caches
> here for simplicity, but in practice there's other stuff like memory
> encryption in the mix too). Hence a "coherent pool" is actually a pool
> of non-cacheably-mapped memory from which to allocate for
> non-hardware-coherent devices

Ok. Reading that code as an innocent (well, somewhat!) outsider sure
makes that confusing, though.

> > That code needs to be split up, and the logic needs to be made
> > understandable. Preferably *obvious*. Which it isn't now.
>
> I suppose one logical option would be to split out dedicated versions of
> dma-direct for at least the major differences - noMMU,
> CONFIG_DMA_DIRECT_REMAP, etc. - but that does run the risk of ending up
> as a lot of very similar looking code and being a rich source of bugs
> where things get added/fixed in one place but not the others.

I would hope that all the actual *code* is in these helper functions
that do not have the "in this situation, do X" AT ALL.

That does actually seem to be what the intent of some of those
functions are - so there are functions like that
"dma_direct_alloc_from_pool()", which does one thing and one thing
only, and doesn't have any odd conditionals.

And then dma_direct_alloc() _kind_ of tries to be the wrapper function
that dispatches to the different versions. I think.

Except the dispatch logic ends up being all those different cases that
have nothing to do with each other, because people have very clearly
tried to avoid code duplication by just saying "ok, in these two
entirely unrelated situations, we do the same thing".

So instead of having a

     if (case_X) do_Y();

kind of logic, it ends up doing

     if (case_X && !case_Y) do_A();

even if case_X and case_Y don't seem to really be related in any real way.

And any time somebody comes up with a new case, it gets added to the pile.

Which doesn't help readability.

It doesn't really help when the code also has comments like

        /* we always manually zero the memory once we are done */

that aren't actually true at all - the "always" here is about that one
single case, and the comment comes from the fact that in *that* case -
but not other cases - we're now masking off the __GFP_ZERO from the
gfp flags, and then zeroing manually at the end.

I had to go back into the history of where that comment came from to
really make sense of it. The "always" used to be true within the
context of __dma_direct_alloc_pages(), then it got moved to be one
sub-case in dma_direct_alloc() and now it's really more of a "in this
case" than "always", because other cases depend on the zeroing being
done by the other helper cases.

It's certainly possible that the code makes more sense to somebody who
has stared at that more than I have.

> > In the immediate short term, please
> >
> >   (a) explain what the rules actually are
>
> I hope some of the background above has helped a bit.

Well, I'd really like a very explicit "the is_swiotlb_for_alloc()
situation cannot happen for that global pool case because XYZ, so the
condition in that conflict doesn't matter".

That's what I _think_ you implied at the top. But it wasn't very
explicit. You say "essentially noMMU, so it should be expected to
overlap".

"essentially"? "shouldn't be expected"? I'm not getting a lot of warm
and fuzzies from that. I'd like to have something a bit more concrete.

Then I can do that merge that I think looks nonsensical, but in the hope that

 (a) it works correctly and does what people expect

and

 (b) that hopefully somebody will look at that logic and make it
actually make sense.

which would just make me feel a lot better about things.

         Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] (swiotlb) stable/for-linus-5.15
  2021-09-03  0:42     ` Linus Torvalds
@ 2021-09-03  5:09       ` Christoph Hellwig
  2021-09-03 13:51       ` Robin Murphy
  1 sibling, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2021-09-03  5:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Robin Murphy, Konrad Rzeszutek Wilk, Christoph Hellwig,
	Marek Szyprowski, Claire Chang, Stefano Stabellini, Will Deacon,
	pasic, Linux Kernel Mailing List

On Thu, Sep 02, 2021 at 05:42:11PM -0700, Linus Torvalds wrote:
> On Thu, Sep 2, 2021 at 2:55 PM Robin Murphy <robin.murphy@arm.com> wrote:
> >
> > Not to defend the state things have got into - frankly I've found it
> > increasingly hard to follow for some time now too - but I believe the
> > unwritten story is of the one true dma-direct trying to be all things to
> > all users, and there are several aspects at play here which are mutually
> > exclusive in practice.
> 
> Ugh.
> 
> Some of that "mutual exclusion" seems to be encoded in peoples heads,
> not in the code itself.

Some of it is.  This is mostly because we're still most in stage 1
of the grand plan:

  "consolidate 100% identical concepts various architectures reimplement"

which will slowly give way to actually clean up various mostly the same
concepts.  And the big items here is the various pools.  This simple
coherent pool and the one we use for GFP_ATOMIC alloctions for
not coherent devices would be a very obvious first step.  Swiotlb would
be the next one, but it is much more complicated as swiotlb is messy
and has a rather "special" allocator algorithm.

> 
> > The global pool is essentially for noMMU, so should not be expected to
> > overlap with SWIOTLB at all, much less systems using restricted DMA.
> 
> If I read that right, you're saying that in this case, the possible
> mis-merge doesn't matter?

Currently (and for the foreseeable future) it does not matter.


> Except the dispatch logic ends up being all those different cases that
> have nothing to do with each other, because people have very clearly
> tried to avoid code duplication by just saying "ok, in these two
> entirely unrelated situations, we do the same thing".

Not just that.  There are plenty of cases where architectures need
multiple of these cases depending on runtime checks.

But yes, this funtion turned into a mess, and we can improve both
the overall structure and the omments explaining it a fair bit.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] (swiotlb) stable/for-linus-5.15
  2021-09-03  0:42     ` Linus Torvalds
  2021-09-03  5:09       ` Christoph Hellwig
@ 2021-09-03 13:51       ` Robin Murphy
  2021-09-03 17:37         ` Linus Torvalds
  1 sibling, 1 reply; 8+ messages in thread
From: Robin Murphy @ 2021-09-03 13:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Konrad Rzeszutek Wilk, Christoph Hellwig, Marek Szyprowski,
	Claire Chang, Stefano Stabellini, Will Deacon, pasic,
	Linux Kernel Mailing List

On 2021-09-03 01:42, Linus Torvalds wrote:
> On Thu, Sep 2, 2021 at 2:55 PM Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> Not to defend the state things have got into - frankly I've found it
>> increasingly hard to follow for some time now too - but I believe the
>> unwritten story is of the one true dma-direct trying to be all things to
>> all users, and there are several aspects at play here which are mutually
>> exclusive in practice.
> 
> Ugh.
> 
> Some of that "mutual exclusion" seems to be encoded in peoples heads,
> not in the code itself.
> 
>> The global pool is essentially for noMMU, so should not be expected to
>> overlap with SWIOTLB at all, much less systems using restricted DMA.
> 
> If I read that right, you're saying that in this case, the possible
> mis-merge doesn't matter?

Yes, as things currently stand, DMA_GLOBAL_POOL is only enabled by Arm 
noMMU and Hexagon, neither of which enable SWIOTLB.

>>> Btw, what's the actual logic with the
>>>
>>>               !dev_is_dma_coherent(dev)
>>>
>>> tests anyway? Those are pre-existing, but seem odd and senseless. The
>>> function is called "dma_alloc_from_global_coherent()", but it is *not*
>>> called when "dev_is_dma_coherent()".
>>
>> Heh, good terminology for this is hard to find. One way to make sense of
>> it is to consider that either the device itself is coherent - i.e. can
>> snoop caches such that we don't have to much in the way of special
>> treatment - or otherwise it's the memory which has to be made coherent,
>> i.e. remapped non-cacheable, such that CPUs and non-snooping devices can
>> still maintain a consistent view of it (note: I'm focusing on caches
>> here for simplicity, but in practice there's other stuff like memory
>> encryption in the mix too). Hence a "coherent pool" is actually a pool
>> of non-cacheably-mapped memory from which to allocate for
>> non-hardware-coherent devices
> 
> Ok. Reading that code as an innocent (well, somewhat!) outsider sure
> makes that confusing, though.
> 
>>> That code needs to be split up, and the logic needs to be made
>>> understandable. Preferably *obvious*. Which it isn't now.
>>
>> I suppose one logical option would be to split out dedicated versions of
>> dma-direct for at least the major differences - noMMU,
>> CONFIG_DMA_DIRECT_REMAP, etc. - but that does run the risk of ending up
>> as a lot of very similar looking code and being a rich source of bugs
>> where things get added/fixed in one place but not the others.
> 
> I would hope that all the actual *code* is in these helper functions
> that do not have the "in this situation, do X" AT ALL.
> 
> That does actually seem to be what the intent of some of those
> functions are - so there are functions like that
> "dma_direct_alloc_from_pool()", which does one thing and one thing
> only, and doesn't have any odd conditionals.
> 
> And then dma_direct_alloc() _kind_ of tries to be the wrapper function
> that dispatches to the different versions. I think.
> 
> Except the dispatch logic ends up being all those different cases that
> have nothing to do with each other, because people have very clearly
> tried to avoid code duplication by just saying "ok, in these two
> entirely unrelated situations, we do the same thing".

Indeed, although one could argue "entirely unrelated" - the fact that 
they end up wanting the same handling *makes* them related by purpose. 
By way of analogy, when I go to the supermarket I'll usually find the 
tomatoes closer to other salad vegetables than to the potatoes ;)

> So instead of having a
> 
>       if (case_X) do_Y();
> 
> kind of logic, it ends up doing
> 
>       if (case_X && !case_Y) do_A();
> 
> even if case_X and case_Y don't seem to really be related in any real way.
> 
> And any time somebody comes up with a new case, it gets added to the pile.
> 
> Which doesn't help readability.

Yup, however the inherently overlapping nature of some of the things 
involved means that flipping things around arguably just brings a 
*different* set of readability/maintainability issues:

	if (X) //do A
	else if (Y) //do A and C
	else if (Z) //do B and C

Basically if you consolidate the conditions then you're doomed to 
duplicate non-trivial amounts of the actual handling. In fact now that 
I'm looking closely I'm wondering why we already have the 
set_memory_decrypted() stanza in there twice, and starting to 
second-guess myself as to whether there might be some really subtle 
significance to it that I can't see :/

> It doesn't really help when the code also has comments like
> 
>          /* we always manually zero the memory once we are done */
> 
> that aren't actually true at all - the "always" here is about that one
> single case, and the comment comes from the fact that in *that* case -
> but not other cases - we're now masking off the __GFP_ZERO from the
> gfp flags, and then zeroing manually at the end.

Hmm, I guess I'd read "always" in the middle of a function as inherently 
only qualifying execution which actually reaches that point in the 
function, but either way it's indeed a long way from "We mask out 
__GFP_ZERO here because if we're remapping then we need to memset() 
through the new address anyway, so it's simplest if we do that 
unconditionally and skip any other possibly-redundant zeroing."

> I had to go back into the history of where that comment came from to
> really make sense of it. The "always" used to be true within the
> context of __dma_direct_alloc_pages(), then it got moved to be one
> sub-case in dma_direct_alloc() and now it's really more of a "in this
> case" than "always", because other cases depend on the zeroing being
> done by the other helper cases.
> 
> It's certainly possible that the code makes more sense to somebody who
> has stared at that more than I have.
> 
>>> In the immediate short term, please
>>>
>>>    (a) explain what the rules actually are
>>
>> I hope some of the background above has helped a bit.
> 
> Well, I'd really like a very explicit "the is_swiotlb_for_alloc()
> situation cannot happen for that global pool case because XYZ, so the
> condition in that conflict doesn't matter".
> 
> That's what I _think_ you implied at the top. But it wasn't very
> explicit. You say "essentially noMMU, so it should be expected to
> overlap".
> 
> "essentially"? "shouldn't be expected"? I'm not getting a lot of warm
> and fuzzies from that. I'd like to have something a bit more concrete.

As above, upstream Kconfigs do prevent any possible overlap at the 
moment (with "swiotlb: use depends on for DMA_RESTRICTED_POOL" fixing 
the bug which inadvertently made SWIOTLB user-selectable). I was a bit 
handwavy since I'm familiar with the global pool as used by Arm noMMU, 
but I'm not entirely sure what Hexagon's deal is (it looks like it does 
have an MMU, yet still doesn't do remapping) so I didn't want to speak 
in absolutes which might prove to be inaccurate. I also can't 
unequivocally rule out someone coming along in future with some wacky 
use-case for enabling both configs at once (TBH I can imagine some of 
the protected VM schemes maybe going in that direction), but that 
couldn't happen without further patches anyway.

Thanks,
Robin.

> Then I can do that merge that I think looks nonsensical, but in the hope that
> 
>   (a) it works correctly and does what people expect
> 
> and
> 
>   (b) that hopefully somebody will look at that logic and make it
> actually make sense.
> 
> which would just make me feel a lot better about things.
> 
>           Linus
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] (swiotlb) stable/for-linus-5.15
  2021-09-03 13:51       ` Robin Murphy
@ 2021-09-03 17:37         ` Linus Torvalds
  0 siblings, 0 replies; 8+ messages in thread
From: Linus Torvalds @ 2021-09-03 17:37 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Konrad Rzeszutek Wilk, Christoph Hellwig, Marek Szyprowski,
	Claire Chang, Stefano Stabellini, Will Deacon, pasic,
	Linux Kernel Mailing List

On Fri, Sep 3, 2021 at 6:51 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> > "essentially"? "shouldn't be expected"? I'm not getting a lot of warm
> > and fuzzies from that. I'd like to have something a bit more concrete.
>
> As above, upstream Kconfigs do prevent any possible overlap at the
> moment (with "swiotlb: use depends on for DMA_RESTRICTED_POOL" fixing
> the bug which inadvertently made SWIOTLB user-selectable).

Ok. I've done the merge, I do hope that this code can be clarified at
some point in the future.

             Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] (swiotlb) stable/for-linus-5.15
  2021-09-01 21:08 [GIT PULL] (swiotlb) stable/for-linus-5.15 Konrad Rzeszutek Wilk
  2021-09-02 18:54 ` Linus Torvalds
@ 2021-09-03 18:45 ` pr-tracker-bot
  1 sibling, 0 replies; 8+ messages in thread
From: pr-tracker-bot @ 2021-09-03 18:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: tientzu, will, pasic, linux-kernel, Linus Torvalds

The pull request you sent on Wed, 1 Sep 2021 17:08:53 -0400:

> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git stable/for-linus-5.15

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/3de18c865f504ab59ed2588b1e11acd4bcb9ea09

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-09-03 18:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-01 21:08 [GIT PULL] (swiotlb) stable/for-linus-5.15 Konrad Rzeszutek Wilk
2021-09-02 18:54 ` Linus Torvalds
2021-09-02 21:55   ` Robin Murphy
2021-09-03  0:42     ` Linus Torvalds
2021-09-03  5:09       ` Christoph Hellwig
2021-09-03 13:51       ` Robin Murphy
2021-09-03 17:37         ` Linus Torvalds
2021-09-03 18:45 ` pr-tracker-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).