All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: David Laight <David.Laight@aculab.com>
Cc: "Halil Pasic" <pasic@linux.ibm.com>,
	"Toke Høiland-Jørgensen" <toke@toke.dk>,
	"Robin Murphy" <robin.murphy@arm.com>,
	"Christoph Hellwig" <hch@lst.de>,
	"Maxime Bizon" <mbizon@freebox.fr>,
	"Oleksandr Natalenko" <oleksandr@natalenko.name>,
	"Marek Szyprowski" <m.szyprowski@samsung.com>,
	"Kalle Valo" <kvalo@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Olha Cherevyk" <olha.cherevyk@gmail.com>,
	iommu <iommu@lists.linux-foundation.org>,
	linux-wireless <linux-wireless@vger.kernel.org>,
	Netdev <netdev@vger.kernel.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	stable <stable@vger.kernel.org>
Subject: Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP
Date: Sun, 27 Mar 2022 12:23:34 -0700	[thread overview]
Message-ID: <CAHk-=wgLyqNJx=bb8=o0Nk5U8gMnf0-=qx53ShLEb3V=Yrt8fw@mail.gmail.com> (raw)
In-Reply-To: <0745b44456d44d1e9fc364e5a3780d9a@AcuMS.aculab.com>

On Sun, Mar 27, 2022 at 8:24 AM David Laight <David.Laight@aculab.com> wrote:
>
> Aren't bounce buffers just a more extreme case on non-coherent
> memory accesses?

No.

In fact, this whoe change came about exactly because bounce buffers
are different.

The difference is that bounce buffers have that (wait for it) bounce
buffer, which can have stale contents.

> They just need explicit memory copies rather than just cache
> writeback and invalidate operations.

That's the thing - the memory copies introduce entirely new issues.

I really think that instead of making up abstract rules ("ownership"
or "bounce buffers are just extreme cases of non-coherency") we should
make the rules very much practical and down to earth, and write out
exactly what they *do*.

The whole "sync DMA" is odd and abstract enough as a concept on its
own, we shouldn't then make the rules for it odd and abstract. We
should make them very very practical.

So I will propose that we really make it very much about practical
concerns, and we document things as

 (a) the "sync" operation has by definition a "whose _future_ access
do we sync for" notion.

     So "dma_sync_single_for_cpu()" says that the future accesses to
this dma area is for the CPU.

     Note how it does *NOT* say that the "CPU owns the are". That's
bullsh*t, and we now know it's BS.

 (b) but the sync operation also has a "who wrote the data we're syncing"

     Note that this is *not* "who accessed or owned it last", because
that's nonsensical: if we're syncing for the CPU, then the only reason
to do so is because we expect that the last access was by the device,
so specifying that separately would just be redundant and stupid.

     But specifying who *wrote* to the area is meaningful and matters.
It matters for the non-coherent cache case (because of writeback
issues), but it also matters for the bounce buffer case (becasue it
determines which way we should copy).

Note how this makes sense: a "sync" operation is clearly about taking
some past state, and making it palatable for a future use. The past
state is pretty much defined by who wrote the data, and then we can
use that and the "the next thing to access it" to determine what we
need to do about the sync.

It is *NOT* about "ownership".

So let's go through the cases, and I'm going to ignore the "CPU caches
are coherent with device DMA" case because that's always going to be a
no-op wrt data movement (but it will still generally need a memory
barrier, which I will mention here and then ignore going forward).

Syncing for *CPU* accesses (ie dma_sync_single_for_cpu()) has four
choices I can see:

 - nobody wrote the data at all (aka DMA_NONE).

   This is nonsensical and should warn. If nobody wrote to it, why
would the CPU ever validly access it?

   Maybe you should have written "memset(buffer, 0, size)" instead?

 - the CPU wrote the data in the first place (aka DMA_TO_DEVICE)

   This is a no-op (possibly a memory barrier), because even stale CPU
caches are fine, and even if it was in a bounce buffer, the original
CPU-side data is fine.

 - the device wrote the data (aka DMA_FROM_DEVICE)

   This is just the regular case of a device having written something,
and the CPU wants to see it.

   It obviously needs real work, but it's simple and straightforward.

   For non-coherent caches, it needs a cache invalidate. For a bounce
buffer, it needs a copy from the bounce buffer to the "real" buffer.

 - it's not clear who write the data (aka DMA_BIDIRECTIONAL)

   This is not really doable for a bounce buffer - we just don't know
which buffer contents are valid.

   I think it's very very questionable for non-coherent caches too,
but "writeback and invalidate" probably can't hurt.

   So probably warn about it, and do whatever we used to do historically.

Syncing for device accesses (ie dma_sync_single_for_device()) also has
the same four choices I can see, but obviously does different things:

 - nobody wrote the data at all (aka DMA_NONE)

   This sounds as nonsensical as the CPU case, but maybe isn't.

   We may not have any previous explicit writes, but we *do* have that
"insecure and possibly stale buffer contents" bounce buffer thing on
the device side.

   So with a bounce buffer, it's actually somewhat sane to say
"initialize the bounce buffer to a known state".

   Because bounce buffers *are* special. Unlike even the "noncoherent
caches" issue, they have that entirely *other* hidden state in the
form of the bounce buffer itself.

   Discuss.

 - the CPU wrote the data in the first place (aka DMA_TO_DEVICE).

   This is the regular and common case of "we have data on the CPU
side that is written to the device".

   Again, needs work, but is simple and straightforward.

   For non-coherent caches, we need a writeback on the CPU. For a
bounce buffer, we need to copy from the regular buffer to the bounce
buffer.

 - the device wrote the data in the first place (aka DMA_FROM_DEVICE)

   This is the case that we hit on ath9k. It's *not* obvious, but when
we write this out this way, I really think the semantics are pretty
clear.

   For non-coherent caches, we may need an "invalidate". For a bounce
buffer, it's a no-op (because the bounce buffer already contains the
data)

 - it's not clear who write the data (aka DMA_BIDIRECTIONAL)

   This is again not really doable for a bounce buffer. We don't know
which buffer contains the right data, we should warn about it and do
whatever we used to do historically.

   Again, it's very questionable for non-coherent caches too, but
"writeback and invalidate" probably at least can't hurt.

So hey, that's my thinking. The whole "ownership"  model is, I think,
obviously untenable.

But just going through and listing the different cases and making them
explicit I think explains exactly what the different situations are,
and that then makes it fairly clear what the different combinations
should do.

No?

           Linus

WARNING: multiple messages have this Message-ID (diff)
From: Linus Torvalds <torvalds@linux-foundation.org>
To: David Laight <David.Laight@aculab.com>
Cc: "Toke Høiland-Jørgensen" <toke@toke.dk>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	Netdev <netdev@vger.kernel.org>, "Kalle Valo" <kvalo@kernel.org>,
	linux-wireless <linux-wireless@vger.kernel.org>,
	"Oleksandr Natalenko" <oleksandr@natalenko.name>,
	stable <stable@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Halil Pasic" <pasic@linux.ibm.com>,
	iommu <iommu@lists.linux-foundation.org>,
	"Olha Cherevyk" <olha.cherevyk@gmail.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Maxime Bizon" <mbizon@freebox.fr>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Robin Murphy" <robin.murphy@arm.com>,
	"Christoph Hellwig" <hch@lst.de>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>
Subject: Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP
Date: Sun, 27 Mar 2022 12:23:34 -0700	[thread overview]
Message-ID: <CAHk-=wgLyqNJx=bb8=o0Nk5U8gMnf0-=qx53ShLEb3V=Yrt8fw@mail.gmail.com> (raw)
In-Reply-To: <0745b44456d44d1e9fc364e5a3780d9a@AcuMS.aculab.com>

On Sun, Mar 27, 2022 at 8:24 AM David Laight <David.Laight@aculab.com> wrote:
>
> Aren't bounce buffers just a more extreme case on non-coherent
> memory accesses?

No.

In fact, this whoe change came about exactly because bounce buffers
are different.

The difference is that bounce buffers have that (wait for it) bounce
buffer, which can have stale contents.

> They just need explicit memory copies rather than just cache
> writeback and invalidate operations.

That's the thing - the memory copies introduce entirely new issues.

I really think that instead of making up abstract rules ("ownership"
or "bounce buffers are just extreme cases of non-coherency") we should
make the rules very much practical and down to earth, and write out
exactly what they *do*.

The whole "sync DMA" is odd and abstract enough as a concept on its
own, we shouldn't then make the rules for it odd and abstract. We
should make them very very practical.

So I will propose that we really make it very much about practical
concerns, and we document things as

 (a) the "sync" operation has by definition a "whose _future_ access
do we sync for" notion.

     So "dma_sync_single_for_cpu()" says that the future accesses to
this dma area is for the CPU.

     Note how it does *NOT* say that the "CPU owns the are". That's
bullsh*t, and we now know it's BS.

 (b) but the sync operation also has a "who wrote the data we're syncing"

     Note that this is *not* "who accessed or owned it last", because
that's nonsensical: if we're syncing for the CPU, then the only reason
to do so is because we expect that the last access was by the device,
so specifying that separately would just be redundant and stupid.

     But specifying who *wrote* to the area is meaningful and matters.
It matters for the non-coherent cache case (because of writeback
issues), but it also matters for the bounce buffer case (becasue it
determines which way we should copy).

Note how this makes sense: a "sync" operation is clearly about taking
some past state, and making it palatable for a future use. The past
state is pretty much defined by who wrote the data, and then we can
use that and the "the next thing to access it" to determine what we
need to do about the sync.

It is *NOT* about "ownership".

So let's go through the cases, and I'm going to ignore the "CPU caches
are coherent with device DMA" case because that's always going to be a
no-op wrt data movement (but it will still generally need a memory
barrier, which I will mention here and then ignore going forward).

Syncing for *CPU* accesses (ie dma_sync_single_for_cpu()) has four
choices I can see:

 - nobody wrote the data at all (aka DMA_NONE).

   This is nonsensical and should warn. If nobody wrote to it, why
would the CPU ever validly access it?

   Maybe you should have written "memset(buffer, 0, size)" instead?

 - the CPU wrote the data in the first place (aka DMA_TO_DEVICE)

   This is a no-op (possibly a memory barrier), because even stale CPU
caches are fine, and even if it was in a bounce buffer, the original
CPU-side data is fine.

 - the device wrote the data (aka DMA_FROM_DEVICE)

   This is just the regular case of a device having written something,
and the CPU wants to see it.

   It obviously needs real work, but it's simple and straightforward.

   For non-coherent caches, it needs a cache invalidate. For a bounce
buffer, it needs a copy from the bounce buffer to the "real" buffer.

 - it's not clear who write the data (aka DMA_BIDIRECTIONAL)

   This is not really doable for a bounce buffer - we just don't know
which buffer contents are valid.

   I think it's very very questionable for non-coherent caches too,
but "writeback and invalidate" probably can't hurt.

   So probably warn about it, and do whatever we used to do historically.

Syncing for device accesses (ie dma_sync_single_for_device()) also has
the same four choices I can see, but obviously does different things:

 - nobody wrote the data at all (aka DMA_NONE)

   This sounds as nonsensical as the CPU case, but maybe isn't.

   We may not have any previous explicit writes, but we *do* have that
"insecure and possibly stale buffer contents" bounce buffer thing on
the device side.

   So with a bounce buffer, it's actually somewhat sane to say
"initialize the bounce buffer to a known state".

   Because bounce buffers *are* special. Unlike even the "noncoherent
caches" issue, they have that entirely *other* hidden state in the
form of the bounce buffer itself.

   Discuss.

 - the CPU wrote the data in the first place (aka DMA_TO_DEVICE).

   This is the regular and common case of "we have data on the CPU
side that is written to the device".

   Again, needs work, but is simple and straightforward.

   For non-coherent caches, we need a writeback on the CPU. For a
bounce buffer, we need to copy from the regular buffer to the bounce
buffer.

 - the device wrote the data in the first place (aka DMA_FROM_DEVICE)

   This is the case that we hit on ath9k. It's *not* obvious, but when
we write this out this way, I really think the semantics are pretty
clear.

   For non-coherent caches, we may need an "invalidate". For a bounce
buffer, it's a no-op (because the bounce buffer already contains the
data)

 - it's not clear who write the data (aka DMA_BIDIRECTIONAL)

   This is again not really doable for a bounce buffer. We don't know
which buffer contains the right data, we should warn about it and do
whatever we used to do historically.

   Again, it's very questionable for non-coherent caches too, but
"writeback and invalidate" probably at least can't hurt.

So hey, that's my thinking. The whole "ownership"  model is, I think,
obviously untenable.

But just going through and listing the different cases and making them
explicit I think explains exactly what the different situations are,
and that then makes it fairly clear what the different combinations
should do.

No?

           Linus
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2022-03-27 19:24 UTC|newest]

Thread overview: 139+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-23  7:19 [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP Oleksandr Natalenko
2022-03-23  7:19 ` Oleksandr Natalenko via iommu
2022-03-23  7:28 ` Kalle Valo
2022-03-23  7:28   ` Kalle Valo
2022-03-23 17:27 ` Linus Torvalds
2022-03-23 17:27   ` Linus Torvalds
2022-03-23 19:06   ` Robin Murphy
2022-03-23 19:06     ` Robin Murphy
2022-03-23 19:16     ` Linus Torvalds
2022-03-23 19:16       ` Linus Torvalds
2022-03-23 20:54       ` Robin Murphy
2022-03-23 20:54         ` Robin Murphy
2022-03-24  5:57         ` Christoph Hellwig
2022-03-24  5:57           ` Christoph Hellwig
2022-03-24 10:25           ` Oleksandr Natalenko
2022-03-24 10:25             ` Oleksandr Natalenko via iommu
2022-03-24 11:05             ` Robin Murphy
2022-03-24 11:05               ` Robin Murphy
2022-03-24 14:27               ` Toke Høiland-Jørgensen
2022-03-24 14:27                 ` Toke Høiland-Jørgensen via iommu
2022-03-24 16:29                 ` Maxime Bizon
2022-03-24 16:29                   ` Maxime Bizon
2022-03-24 16:31                   ` Christoph Hellwig
2022-03-24 16:31                     ` Christoph Hellwig
2022-03-24 16:52                     ` Robin Murphy
2022-03-24 16:52                       ` Robin Murphy
2022-03-24 17:07                       ` Toke Høiland-Jørgensen
2022-03-24 17:07                         ` Toke Høiland-Jørgensen via iommu
2022-03-24 19:26                         ` Linus Torvalds
2022-03-24 19:26                           ` Linus Torvalds
2022-03-24 21:14                           ` Toke Høiland-Jørgensen
2022-03-24 21:14                             ` Toke Høiland-Jørgensen via iommu
2022-03-25 10:25                           ` Maxime Bizon
2022-03-25 10:25                             ` Maxime Bizon
2022-03-25 11:27                             ` Robin Murphy
2022-03-25 11:27                               ` Robin Murphy
2022-03-25 23:38                               ` Halil Pasic
2022-03-25 23:38                                 ` Halil Pasic
2022-03-26 16:05                                 ` Toke Høiland-Jørgensen
2022-03-26 16:05                                   ` Toke Høiland-Jørgensen via iommu
2022-03-26 18:38                                   ` Linus Torvalds
2022-03-26 18:38                                     ` Linus Torvalds
2022-03-26 22:38                                     ` David Laight
2022-03-26 22:38                                       ` David Laight
2022-03-26 22:41                                       ` Linus Torvalds
2022-03-26 22:41                                         ` Linus Torvalds
2022-03-25 16:25                             ` Toke Høiland-Jørgensen
2022-03-25 16:25                               ` Toke Høiland-Jørgensen via iommu
2022-03-25 16:45                               ` Robin Murphy
2022-03-25 16:45                                 ` Robin Murphy
2022-03-25 18:13                                 ` Toke Høiland-Jørgensen via iommu
2022-03-25 18:13                                   ` Toke Høiland-Jørgensen
2022-03-25 18:30                             ` Linus Torvalds
2022-03-25 18:30                               ` Linus Torvalds
2022-03-25 19:14                               ` Robin Murphy
2022-03-25 19:14                                 ` Robin Murphy
2022-03-25 19:21                                 ` Linus Torvalds
2022-03-25 19:21                                   ` Linus Torvalds
2022-03-25 19:26                               ` Oleksandr Natalenko via iommu
2022-03-25 19:26                                 ` Oleksandr Natalenko
2022-03-25 19:27                                 ` Linus Torvalds
2022-03-25 19:27                                   ` Linus Torvalds
2022-03-25 19:35                                   ` Oleksandr Natalenko via iommu
2022-03-25 19:35                                     ` Oleksandr Natalenko
2022-03-25 20:37                               ` Johannes Berg
2022-03-25 20:37                                 ` Johannes Berg
2022-03-25 20:47                                 ` Linus Torvalds
2022-03-25 20:47                                   ` Linus Torvalds
2022-03-25 21:13                                   ` Johannes Berg
2022-03-25 21:13                                     ` Johannes Berg
2022-03-25 21:40                                     ` David Laight
2022-03-25 21:40                                       ` David Laight
2022-03-25 21:56                                     ` Linus Torvalds
2022-03-25 21:56                                       ` Linus Torvalds
2022-03-25 22:41                                       ` David Laight
2022-03-25 22:41                                         ` David Laight
2022-03-27  3:15                                     ` Halil Pasic
2022-03-27  3:15                                       ` Halil Pasic
2022-03-28  9:48                                       ` Johannes Berg
2022-03-28  9:48                                         ` Johannes Berg
2022-03-28  9:50                                         ` Johannes Berg
2022-03-28  9:50                                           ` Johannes Berg
2022-03-28  9:57                                           ` Johannes Berg
2022-03-28  9:57                                             ` Johannes Berg
2022-03-27  3:48                           ` Halil Pasic
2022-03-27  3:48                             ` Halil Pasic
2022-03-27  5:06                             ` Linus Torvalds
2022-03-27  5:06                               ` Linus Torvalds
2022-03-27  5:21                               ` Linus Torvalds
2022-03-27  5:21                                 ` Linus Torvalds
2022-03-27 15:24                                 ` David Laight
2022-03-27 15:24                                   ` David Laight
2022-03-27 19:23                                   ` Linus Torvalds [this message]
2022-03-27 19:23                                     ` Linus Torvalds
2022-03-27 20:04                                     ` Linus Torvalds
2022-03-27 20:04                                       ` Linus Torvalds
2022-03-27 23:52                                 ` Halil Pasic
2022-03-27 23:52                                   ` Halil Pasic
2022-03-28  0:30                                   ` Linus Torvalds
2022-03-28  0:30                                     ` Linus Torvalds
2022-03-28 12:02                                     ` Halil Pasic
2022-03-28 12:02                                       ` Halil Pasic
2022-03-27 23:37                               ` Halil Pasic
2022-03-27 23:37                                 ` Halil Pasic
2022-03-28  0:37                                 ` Linus Torvalds
2022-03-28  0:37                                   ` Linus Torvalds
2022-03-25  7:12                         ` Oleksandr Natalenko
2022-03-25  7:12                           ` Oleksandr Natalenko via iommu
2022-03-25  9:21                           ` Thorsten Leemhuis
2022-03-25  9:21                             ` Thorsten Leemhuis
2022-03-24 18:31                       ` Halil Pasic
2022-03-24 18:31                         ` Halil Pasic
2022-03-25 16:31                         ` Christoph Hellwig
2022-03-25 16:31                           ` Christoph Hellwig
2022-03-24 18:02         ` Halil Pasic
2022-03-24 18:02           ` Halil Pasic
2022-03-25 15:25           ` Halil Pasic
2022-03-25 15:25             ` Halil Pasic
2022-03-25 16:23             ` Robin Murphy
2022-03-25 16:23               ` Robin Murphy
2022-03-25 16:32           ` Christoph Hellwig
2022-03-25 16:32             ` Christoph Hellwig
2022-03-25 18:15             ` Toke Høiland-Jørgensen via iommu
2022-03-25 18:15               ` Toke Høiland-Jørgensen
2022-03-25 18:42               ` Robin Murphy
2022-03-25 18:42                 ` Robin Murphy
2022-03-25 18:46                 ` Linus Torvalds
2022-03-25 18:46                   ` Linus Torvalds
2022-03-28  6:37                   ` Christoph Hellwig
2022-03-28  6:37                     ` Christoph Hellwig
2022-03-28  8:15                     ` David Laight
2022-03-28  8:15                       ` David Laight
2022-03-30 12:11                     ` Halil Pasic
2022-03-30 12:11                       ` Halil Pasic
2022-03-24  8:55   ` Oleksandr Natalenko
2022-03-24  8:55     ` Oleksandr Natalenko via iommu
2022-03-24 12:32 ` [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP #forregzbot Thorsten Leemhuis
2022-03-25  9:24   ` Thorsten Leemhuis
2022-03-27  9:00     ` Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wgLyqNJx=bb8=o0Nk5U8gMnf0-=qx53ShLEb3V=Yrt8fw@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=David.Laight@aculab.com \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux-foundation.org \
    --cc=kuba@kernel.org \
    --cc=kvalo@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=mbizon@freebox.fr \
    --cc=netdev@vger.kernel.org \
    --cc=oleksandr@natalenko.name \
    --cc=olha.cherevyk@gmail.com \
    --cc=pabeni@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=robin.murphy@arm.com \
    --cc=stable@vger.kernel.org \
    --cc=toke@toke.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.