iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* a saner API for allocating DMA addressable pages
@ 2020-08-19  6:55 ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 01/28] mm: turn alloc_pages into an inline function Christoph Hellwig
                     ` (29 more replies)
  0 siblings, 30 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Hi all,

this series replaced the DMA_ATTR_NON_CONSISTENT flag to dma_alloc_attrs
with a separate new dma_alloc_pages API, which is available on all
platforms.  In addition to cleaning up the convoluted code path, this
ensures that other drivers that have asked for better support for
non-coherent DMA to pages with incurring bounce buffering over can finally
be properly supported.

I'm still a little unsure about the API naming, as alloc_pages sort of
implies a struct page return value, but we return a kernel virtual
address.  The other alternative would be to name the API
dma_alloc_noncoherent, but the whole non-coherent naming seems to put
people off.  As a follow up I plan to move the implementation of the
DMA_ATTR_NO_KERNEL_MAPPING flag over to this framework as well, given
that is also is a fundamentally non coherent allocation.  The replacement
for that flag would then return a struct page, as it is allowed to
actually return pages without a kernel mapping as the name suggested
(although most of the time they will actually have a kernel mapping..)

In addition to the conversions of the existing non-coherent DMA users
the last three patches also convert the DMA coherent allocations in
the NVMe driver to use this new framework through a dmapool addition.
This was both to give me a good testing vehicle, but also because it
should speed up the NVMe driver on platforms with non-coherent DMA
nicely, without a downside on platforms with cache coherent DMA.


A git tree is available here:

    git://git.infradead.org/users/hch/misc.git dma_alloc_pages

Gitweb:

    http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma_alloc_pages


Diffstat:
 Documentation/core-api/dma-api.rst                       |   92 ++----
 Documentation/core-api/dma-attributes.rst                |    8 
 Documentation/userspace-api/media/v4l/buffer.rst         |   17 -
 Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst |    1 
 arch/alpha/kernel/pci_iommu.c                            |    2 
 arch/arm/include/asm/dma-direct.h                        |    4 
 arch/arm/mm/dma-mapping-nommu.c                          |    2 
 arch/arm/mm/dma-mapping.c                                |    4 
 arch/ia64/Kconfig                                        |    3 
 arch/ia64/hp/common/sba_iommu.c                          |    2 
 arch/ia64/kernel/dma-mapping.c                           |   14 
 arch/ia64/mm/init.c                                      |    3 
 arch/mips/Kconfig                                        |    1 
 arch/mips/bmips/dma.c                                    |    4 
 arch/mips/cavium-octeon/dma-octeon.c                     |    4 
 arch/mips/include/asm/dma-direct.h                       |    4 
 arch/mips/include/asm/jazzdma.h                          |    2 
 arch/mips/jazz/jazzdma.c                                 |  102 +------
 arch/mips/loongson2ef/fuloong-2e/dma.c                   |    4 
 arch/mips/loongson2ef/lemote-2f/dma.c                    |    4 
 arch/mips/loongson64/dma.c                               |    4 
 arch/mips/mm/dma-noncoherent.c                           |   48 +--
 arch/mips/pci/pci-ar2315.c                               |    4 
 arch/mips/pci/pci-xtalk-bridge.c                         |    4 
 arch/mips/sgi-ip32/ip32-dma.c                            |    4 
 arch/parisc/Kconfig                                      |    1 
 arch/parisc/kernel/pci-dma.c                             |    6 
 arch/powerpc/include/asm/dma-direct.h                    |    4 
 arch/powerpc/kernel/dma-iommu.c                          |    2 
 arch/powerpc/platforms/ps3/system-bus.c                  |    4 
 arch/powerpc/platforms/pseries/vio.c                     |    2 
 arch/s390/pci/pci_dma.c                                  |    2 
 arch/x86/kernel/amd_gart_64.c                            |    8 
 drivers/gpu/drm/exynos/exynos_drm_gem.c                  |    2 
 drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c      |    3 
 drivers/iommu/dma-iommu.c                                |    2 
 drivers/iommu/intel/iommu.c                              |    6 
 drivers/media/common/videobuf2/videobuf2-core.c          |   36 --
 drivers/media/common/videobuf2/videobuf2-dma-contig.c    |   19 -
 drivers/media/common/videobuf2/videobuf2-dma-sg.c        |    3 
 drivers/media/common/videobuf2/videobuf2-v4l2.c          |   12 
 drivers/net/ethernet/amd/au1000_eth.c                    |   15 -
 drivers/net/ethernet/i825xx/lasi_82596.c                 |   36 +-
 drivers/net/ethernet/i825xx/lib82596.c                   |  148 +++++-----
 drivers/net/ethernet/i825xx/sni_82596.c                  |   23 -
 drivers/net/ethernet/seeq/sgiseeq.c                      |   24 -
 drivers/nvme/host/pci.c                                  |   79 ++---
 drivers/parisc/ccio-dma.c                                |    2 
 drivers/parisc/sba_iommu.c                               |    2 
 drivers/scsi/53c700.c                                    |  120 ++++----
 drivers/scsi/53c700.h                                    |    9 
 drivers/scsi/sgiwd93.c                                   |   14 
 drivers/xen/swiotlb-xen.c                                |    2 
 include/linux/dma-direct.h                               |   55 ++-
 include/linux/dma-mapping.h                              |   32 +-
 include/linux/dma-noncoherent.h                          |   21 -
 include/linux/dmapool.h                                  |   23 +
 include/linux/gfp.h                                      |    6 
 include/media/videobuf2-core.h                           |    3 
 include/uapi/linux/videodev2.h                           |    2 
 kernel/dma/Kconfig                                       |    9 
 kernel/dma/Makefile                                      |    1 
 kernel/dma/coherent.c                                    |   17 +
 kernel/dma/direct.c                                      |  112 +++++--
 kernel/dma/mapping.c                                     |  104 ++-----
 kernel/dma/ops_helpers.c                                 |   86 ++++++
 kernel/dma/pool.c                                        |    2 
 kernel/dma/swiotlb.c                                     |    4 
 kernel/dma/virt.c                                        |    2 
 mm/dmapool.c                                             |  211 +++++++++------
 sound/mips/hal2.c                                        |   58 +---
 71 files changed, 872 insertions(+), 803 deletions(-)
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH 01/28] mm: turn alloc_pages into an inline function
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 02/28] drm/exynos: stop setting DMA_ATTR_NON_CONSISTENT Christoph Hellwig
                     ` (28 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

To prevent a compiler error when a method call alloc_pages is
added (which I plan to for the dma_map_ops).

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/gfp.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 67a0774e080b98..dd2577c5407112 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -550,8 +550,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
 #define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
 	alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)
 #else
-#define alloc_pages(gfp_mask, order) \
-		alloc_pages_node(numa_node_id(), gfp_mask, order)
+static inline struct page *alloc_pages(gfp_t gfp_mask, unsigned int order)
+{
+	return alloc_pages_node(numa_node_id(), gfp_mask, order);
+}
 #define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\
 	alloc_pages(gfp_mask, order)
 #define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 02/28] drm/exynos: stop setting DMA_ATTR_NON_CONSISTENT
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 01/28] mm: turn alloc_pages into an inline function Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 03/28] drm/nouveau/gk20a: " Christoph Hellwig
                     ` (27 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

DMA_ATTR_NON_CONSISTENT is a no-op except on PARISC and some mips
configs, so don't set it in this ARM specific driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/gpu/drm/exynos/exynos_drm_gem.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index efa476858db54b..07073222b8f691 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -42,8 +42,6 @@ static int exynos_drm_alloc_buf(struct exynos_drm_gem *exynos_gem, bool kvmap)
 	if (exynos_gem->flags & EXYNOS_BO_WC ||
 			!(exynos_gem->flags & EXYNOS_BO_CACHABLE))
 		attr |= DMA_ATTR_WRITE_COMBINE;
-	else
-		attr |= DMA_ATTR_NON_CONSISTENT;
 
 	/* FBDev emulation requires kernel mapping */
 	if (!kvmap)
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 03/28] drm/nouveau/gk20a: stop setting DMA_ATTR_NON_CONSISTENT
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 01/28] mm: turn alloc_pages into an inline function Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 02/28] drm/exynos: stop setting DMA_ATTR_NON_CONSISTENT Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 04/28] net/au1000-eth: stop using DMA_ATTR_NON_CONSISTENT Christoph Hellwig
                     ` (26 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

DMA_ATTR_NON_CONSISTENT is a no-op except on PARISC and some mips
configs, so don't set it in this ARM specific driver part.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
index 985f2990ab0dda..13d4d7ac0697b4 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
@@ -594,8 +594,7 @@ gk20a_instmem_new(struct nvkm_device *device, int index,
 
 		nvkm_info(&imem->base.subdev, "using IOMMU\n");
 	} else {
-		imem->attrs = DMA_ATTR_NON_CONSISTENT |
-			      DMA_ATTR_WEAK_ORDERING |
+		imem->attrs = DMA_ATTR_WEAK_ORDERING |
 			      DMA_ATTR_WRITE_COMBINE;
 
 		nvkm_info(&imem->base.subdev, "using DMA API\n");
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 04/28] net/au1000-eth: stop using DMA_ATTR_NON_CONSISTENT
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (2 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 03/28] drm/nouveau/gk20a: " Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT Christoph Hellwig
                     ` (25 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

The au1000-eth driver contains none of the manual cache synchronization
required for using DMA_ATTR_NON_CONSISTENT.  From what I can tell it
can be used on both dma coherent and non-coherent DMA platforms, but
I suspect it has been buggy on the non-coherent platforms all along.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/net/ethernet/amd/au1000_eth.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/amd/au1000_eth.c b/drivers/net/ethernet/amd/au1000_eth.c
index 75dbd221dc594b..19e195420e2434 100644
--- a/drivers/net/ethernet/amd/au1000_eth.c
+++ b/drivers/net/ethernet/amd/au1000_eth.c
@@ -1131,10 +1131,9 @@ static int au1000_probe(struct platform_device *pdev)
 	/* Allocate the data buffers
 	 * Snooping works fine with eth on all au1xxx
 	 */
-	aup->vaddr = (u32)dma_alloc_attrs(&pdev->dev, MAX_BUF_SIZE *
+	aup->vaddr = (u32)dma_alloc_coherent(&pdev->dev, MAX_BUF_SIZE *
 					  (NUM_TX_BUFFS + NUM_RX_BUFFS),
-					  &aup->dma_addr, 0,
-					  DMA_ATTR_NON_CONSISTENT);
+					  &aup->dma_addr, 0);
 	if (!aup->vaddr) {
 		dev_err(&pdev->dev, "failed to allocate data buffers\n");
 		err = -ENOMEM;
@@ -1310,9 +1309,8 @@ static int au1000_probe(struct platform_device *pdev)
 err_remap2:
 	iounmap(aup->mac);
 err_remap1:
-	dma_free_attrs(&pdev->dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + NUM_RX_BUFFS),
-			(void *)aup->vaddr, aup->dma_addr,
-			DMA_ATTR_NON_CONSISTENT);
+	dma_free_coherent(&pdev->dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + NUM_RX_BUFFS),
+			(void *)aup->vaddr, aup->dma_addr);
 err_vaddr:
 	free_netdev(dev);
 err_alloc:
@@ -1344,9 +1342,8 @@ static int au1000_remove(struct platform_device *pdev)
 		if (aup->tx_db_inuse[i])
 			au1000_ReleaseDB(aup, aup->tx_db_inuse[i]);
 
-	dma_free_attrs(&pdev->dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + NUM_RX_BUFFS),
-			(void *)aup->vaddr, aup->dma_addr,
-			DMA_ATTR_NON_CONSISTENT);
+	dma_free_coherent(&pdev->dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + NUM_RX_BUFFS),
+			(void *)aup->vaddr, aup->dma_addr);
 
 	iounmap(aup->macdma);
 	iounmap(aup->mac);
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (3 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 04/28] net/au1000-eth: stop using DMA_ATTR_NON_CONSISTENT Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19 11:16     ` Tomasz Figa
  2020-08-19  6:55   ` [PATCH 06/28] lib82596: move DMA allocation into the callers of i82596_probe Christoph Hellwig
                     ` (24 subsequent siblings)
  29 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused, and causes
weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is
unimplemented except on PARISC and some MIPS configs, and about to be
removed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 .../userspace-api/media/v4l/buffer.rst        | 17 ---------
 .../media/v4l/vidioc-reqbufs.rst              |  1 -
 .../media/common/videobuf2/videobuf2-core.c   | 36 +------------------
 .../common/videobuf2/videobuf2-dma-contig.c   | 19 ----------
 .../media/common/videobuf2/videobuf2-dma-sg.c |  3 +-
 .../media/common/videobuf2/videobuf2-v4l2.c   | 12 -------
 include/media/videobuf2-core.h                |  3 +-
 include/uapi/linux/videodev2.h                |  2 --
 8 files changed, 3 insertions(+), 90 deletions(-)

diff --git a/Documentation/userspace-api/media/v4l/buffer.rst b/Documentation/userspace-api/media/v4l/buffer.rst
index 57e752aaf414a7..2044ed13cd9d7d 100644
--- a/Documentation/userspace-api/media/v4l/buffer.rst
+++ b/Documentation/userspace-api/media/v4l/buffer.rst
@@ -701,23 +701,6 @@ Memory Consistency Flags
     :stub-columns: 0
     :widths:       3 1 4
 
-    * .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`:
-
-      - ``V4L2_FLAG_MEMORY_NON_CONSISTENT``
-      - 0x00000001
-      - A buffer is allocated either in consistent (it will be automatically
-	coherent between the CPU and the bus) or non-consistent memory. The
-	latter can provide performance gains, for instance the CPU cache
-	sync/flush operations can be avoided if the buffer is accessed by the
-	corresponding device only and the CPU does not read/write to/from that
-	buffer. However, this requires extra care from the driver -- it must
-	guarantee memory consistency by issuing a cache flush/sync when
-	consistency is needed. If this flag is set V4L2 will attempt to
-	allocate the buffer in non-consistent memory. The flag takes effect
-	only if the buffer is used for :ref:`memory mapping <mmap>` I/O and the
-	queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
-	<V4L2-BUF-CAP-SUPPORTS-MMAP-CACHE-HINTS>` capability.
-
 .. c:type:: v4l2_memory
 
 enum v4l2_memory
diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
index 75d894d9c36c42..3180c111d368ee 100644
--- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
+++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
@@ -169,7 +169,6 @@ aborting or finishing any DMA in progress, an implicit
       - This capability is set by the driver to indicate that the queue supports
         cache and memory management hints. However, it's only valid when the
         queue is used for :ref:`memory mapping <mmap>` streaming I/O. See
-        :ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT <V4L2-FLAG-MEMORY-NON-CONSISTENT>`,
         :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE <V4L2-BUF-FLAG-NO-CACHE-INVALIDATE>` and
         :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN <V4L2-BUF-FLAG-NO-CACHE-CLEAN>`.
 
diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
index f544d3393e9d6b..66a41cef33c1b1 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -721,39 +721,14 @@ int vb2_verify_memory_type(struct vb2_queue *q,
 }
 EXPORT_SYMBOL(vb2_verify_memory_type);
 
-static void set_queue_consistency(struct vb2_queue *q, bool consistent_mem)
-{
-	q->dma_attrs &= ~DMA_ATTR_NON_CONSISTENT;
-
-	if (!vb2_queue_allows_cache_hints(q))
-		return;
-	if (!consistent_mem)
-		q->dma_attrs |= DMA_ATTR_NON_CONSISTENT;
-}
-
-static bool verify_consistency_attr(struct vb2_queue *q, bool consistent_mem)
-{
-	bool queue_is_consistent = !(q->dma_attrs & DMA_ATTR_NON_CONSISTENT);
-
-	if (consistent_mem != queue_is_consistent) {
-		dprintk(q, 1, "memory consistency model mismatch\n");
-		return false;
-	}
-	return true;
-}
-
 int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
 		     unsigned int flags, unsigned int *count)
 {
 	unsigned int num_buffers, allocated_buffers, num_planes = 0;
 	unsigned plane_sizes[VB2_MAX_PLANES] = { };
-	bool consistent_mem = true;
 	unsigned int i;
 	int ret;
 
-	if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT)
-		consistent_mem = false;
-
 	if (q->streaming) {
 		dprintk(q, 1, "streaming active\n");
 		return -EBUSY;
@@ -765,8 +740,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
 	}
 
 	if (*count == 0 || q->num_buffers != 0 ||
-	    (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory) ||
-	    !verify_consistency_attr(q, consistent_mem)) {
+	    (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory)) {
 		/*
 		 * We already have buffers allocated, so first check if they
 		 * are not in use and can be freed.
@@ -803,7 +777,6 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
 	num_buffers = min_t(unsigned int, num_buffers, VB2_MAX_FRAME);
 	memset(q->alloc_devs, 0, sizeof(q->alloc_devs));
 	q->memory = memory;
-	set_queue_consistency(q, consistent_mem);
 
 	/*
 	 * Ask the driver how many buffers and planes per buffer it requires.
@@ -894,12 +867,8 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
 {
 	unsigned int num_planes = 0, num_buffers, allocated_buffers;
 	unsigned plane_sizes[VB2_MAX_PLANES] = { };
-	bool consistent_mem = true;
 	int ret;
 
-	if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT)
-		consistent_mem = false;
-
 	if (q->num_buffers == VB2_MAX_FRAME) {
 		dprintk(q, 1, "maximum number of buffers already allocated\n");
 		return -ENOBUFS;
@@ -912,15 +881,12 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
 		}
 		memset(q->alloc_devs, 0, sizeof(q->alloc_devs));
 		q->memory = memory;
-		set_queue_consistency(q, consistent_mem);
 		q->waiting_for_buffers = !q->is_output;
 	} else {
 		if (q->memory != memory) {
 			dprintk(q, 1, "memory model mismatch\n");
 			return -EINVAL;
 		}
-		if (!verify_consistency_attr(q, consistent_mem))
-			return -EINVAL;
 	}
 
 	num_buffers = min(*count, VB2_MAX_FRAME - q->num_buffers);
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index ec3446cc45b8da..7b1b86ec942d7d 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -42,11 +42,6 @@ struct vb2_dc_buf {
 	struct dma_buf_attachment	*db_attach;
 };
 
-static inline bool vb2_dc_buffer_consistent(unsigned long attr)
-{
-	return !(attr & DMA_ATTR_NON_CONSISTENT);
-}
-
 /*********************************************/
 /*        scatterlist table functions        */
 /*********************************************/
@@ -341,13 +336,6 @@ static int
 vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
 				   enum dma_data_direction direction)
 {
-	struct vb2_dc_buf *buf = dbuf->priv;
-	struct sg_table *sgt = buf->dma_sgt;
-
-	if (vb2_dc_buffer_consistent(buf->attrs))
-		return 0;
-
-	dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
 	return 0;
 }
 
@@ -355,13 +343,6 @@ static int
 vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf,
 				 enum dma_data_direction direction)
 {
-	struct vb2_dc_buf *buf = dbuf->priv;
-	struct sg_table *sgt = buf->dma_sgt;
-
-	if (vb2_dc_buffer_consistent(buf->attrs))
-		return 0;
-
-	dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
 	return 0;
 }
 
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
index 0a40e00f0d7e5c..a86fce5d8ea8bf 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
@@ -123,8 +123,7 @@ static void *vb2_dma_sg_alloc(struct device *dev, unsigned long dma_attrs,
 	/*
 	 * NOTE: dma-sg allocates memory using the page allocator directly, so
 	 * there is no memory consistency guarantee, hence dma-sg ignores DMA
-	 * attributes passed from the upper layer. That means that
-	 * V4L2_FLAG_MEMORY_NON_CONSISTENT has no effect on dma-sg buffers.
+	 * attributes passed from the upper layer.
 	 */
 	buf->pages = kvmalloc_array(buf->num_pages, sizeof(struct page *),
 				    GFP_KERNEL | __GFP_ZERO);
diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
index 30caad27281e1a..de83ad48783821 100644
--- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
+++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
@@ -722,20 +722,11 @@ static void fill_buf_caps(struct vb2_queue *q, u32 *caps)
 #endif
 }
 
-static void clear_consistency_attr(struct vb2_queue *q,
-				   int memory,
-				   unsigned int *flags)
-{
-	if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP)
-		*flags &= ~V4L2_FLAG_MEMORY_NON_CONSISTENT;
-}
-
 int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req)
 {
 	int ret = vb2_verify_memory_type(q, req->memory, req->type);
 
 	fill_buf_caps(q, &req->capabilities);
-	clear_consistency_attr(q, req->memory, &req->flags);
 	return ret ? ret : vb2_core_reqbufs(q, req->memory,
 					    req->flags, &req->count);
 }
@@ -769,7 +760,6 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create)
 	unsigned i;
 
 	fill_buf_caps(q, &create->capabilities);
-	clear_consistency_attr(q, create->memory, &create->flags);
 	create->index = q->num_buffers;
 	if (create->count == 0)
 		return ret != -EBUSY ? ret : 0;
@@ -998,7 +988,6 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv,
 	int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type);
 
 	fill_buf_caps(vdev->queue, &p->capabilities);
-	clear_consistency_attr(vdev->queue, p->memory, &p->flags);
 	if (res)
 		return res;
 	if (vb2_queue_is_busy(vdev, file))
@@ -1021,7 +1010,6 @@ int vb2_ioctl_create_bufs(struct file *file, void *priv,
 
 	p->index = vdev->queue->num_buffers;
 	fill_buf_caps(vdev->queue, &p->capabilities);
-	clear_consistency_attr(vdev->queue, p->memory, &p->flags);
 	/*
 	 * If count == 0, then just check if memory and type are valid.
 	 * Any -EBUSY result from vb2_verify_memory_type can be mapped to 0.
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index 52ef92049073e3..4c7f25b07e9375 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -744,8 +744,7 @@ void vb2_core_querybuf(struct vb2_queue *q, unsigned int index, void *pb);
  * vb2_core_reqbufs() - Initiate streaming.
  * @q:		pointer to &struct vb2_queue with videobuf2 queue.
  * @memory:	memory type, as defined by &enum vb2_memory.
- * @flags:	auxiliary queue/buffer management flags. Currently, the only
- *		used flag is %V4L2_FLAG_MEMORY_NON_CONSISTENT.
+ * @flags:	auxiliary queue/buffer management flags.
  * @count:	requested buffer count.
  *
  * Videobuf2 core helper to implement VIDIOC_REQBUF() operation. It is called
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index c7b70ff53bc1dd..5c00f63d9c1b58 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -191,8 +191,6 @@ enum v4l2_memory {
 	V4L2_MEMORY_DMABUF           = 4,
 };
 
-#define V4L2_FLAG_MEMORY_NON_CONSISTENT		(1 << 0)
-
 /* see also http://vektor.theorem.ca/graphics/ycbcr/ */
 enum v4l2_colorspace {
 	/*
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 06/28] lib82596: move DMA allocation into the callers of i82596_probe
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (4 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-09-01 13:29     ` Thomas Bogendoerfer
  2020-08-19  6:55   ` [PATCH 07/28] 53c700: improve non-coherent DMA handling Christoph Hellwig
                     ` (23 subsequent siblings)
  29 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

This allows us to get rid of the LIB82596_DMA_ATTR defined and prepare
for untangling the coherent vs non-coherent DMA allocation API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/net/ethernet/i825xx/lasi_82596.c | 24 ++++++++++------
 drivers/net/ethernet/i825xx/lib82596.c   | 36 ++++++++----------------
 drivers/net/ethernet/i825xx/sni_82596.c  | 19 +++++++++----
 3 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/i825xx/lasi_82596.c b/drivers/net/ethernet/i825xx/lasi_82596.c
index aec7e98bcc853a..8c5ab9b7553e75 100644
--- a/drivers/net/ethernet/i825xx/lasi_82596.c
+++ b/drivers/net/ethernet/i825xx/lasi_82596.c
@@ -96,8 +96,6 @@
 
 #define OPT_SWAP_PORT	0x0001	/* Need to wordswp on the MPU port */
 
-#define LIB82596_DMA_ATTR	DMA_ATTR_NON_CONSISTENT
-
 #define DMA_WBACK(ndev, addr, len) \
 	do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, DMA_TO_DEVICE); } while (0)
 
@@ -155,7 +153,7 @@ lan_init_chip(struct parisc_device *dev)
 {
 	struct	net_device *netdevice;
 	struct i596_private *lp;
-	int	retval;
+	int retval = -ENOMEM;
 	int i;
 
 	if (!dev->irq) {
@@ -186,12 +184,22 @@ lan_init_chip(struct parisc_device *dev)
 
 	lp = netdev_priv(netdevice);
 	lp->options = dev->id.sversion == 0x72 ? OPT_SWAP_PORT : 0;
+	lp->dma = dma_alloc_attrs(dev->dev.parent, sizeof(struct i596_dma),
+			      &lp->dma_addr, GFP_KERNEL,
+			      DMA_ATTR_NON_CONSISTENT);
+	if (!lp->dma)
+		goto out_free_netdev;
 
 	retval = i82596_probe(netdevice);
-	if (retval) {
-		free_netdev(netdevice);
-		return -ENODEV;
-	}
+	if (retval)
+		goto out_free_dma;
+	return 0;
+
+out_free_dma:
+	dma_free_attrs(dev->dev.parent, sizeof(struct i596_dma),
+		       lp->dma, lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
+out_free_netdev:
+	free_netdev(netdevice);
 	return retval;
 }
 
@@ -202,7 +210,7 @@ static int __exit lan_remove_chip(struct parisc_device *pdev)
 
 	unregister_netdev (dev);
 	dma_free_attrs(&pdev->dev, sizeof(struct i596_private), lp->dma,
-		       lp->dma_addr, LIB82596_DMA_ATTR);
+		       lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
 	free_netdev (dev);
 	return 0;
 }
diff --git a/drivers/net/ethernet/i825xx/lib82596.c b/drivers/net/ethernet/i825xx/lib82596.c
index b03757e169e475..b4e4b3eb5758b5 100644
--- a/drivers/net/ethernet/i825xx/lib82596.c
+++ b/drivers/net/ethernet/i825xx/lib82596.c
@@ -1047,9 +1047,8 @@ static const struct net_device_ops i596_netdev_ops = {
 
 static int i82596_probe(struct net_device *dev)
 {
-	int i;
 	struct i596_private *lp = netdev_priv(dev);
-	struct i596_dma *dma;
+	int ret;
 
 	/* This lot is ensure things have been cache line aligned. */
 	BUILD_BUG_ON(sizeof(struct i596_rfd) != 32);
@@ -1063,41 +1062,28 @@ static int i82596_probe(struct net_device *dev)
 	if (!dev->base_addr || !dev->irq)
 		return -ENODEV;
 
-	dma = dma_alloc_attrs(dev->dev.parent, sizeof(struct i596_dma),
-			      &lp->dma_addr, GFP_KERNEL,
-			      LIB82596_DMA_ATTR);
-	if (!dma) {
-		printk(KERN_ERR "%s: Couldn't get shared memory\n", __FILE__);
-		return -ENOMEM;
-	}
-
 	dev->netdev_ops = &i596_netdev_ops;
 	dev->watchdog_timeo = TX_TIMEOUT;
 
-	memset(dma, 0, sizeof(struct i596_dma));
-	lp->dma = dma;
-
-	dma->scb.command = 0;
-	dma->scb.cmd = I596_NULL;
-	dma->scb.rfd = I596_NULL;
+	memset(lp->dma, 0, sizeof(struct i596_dma));
+	lp->dma->scb.command = 0;
+	lp->dma->scb.cmd = I596_NULL;
+	lp->dma->scb.rfd = I596_NULL;
 	spin_lock_init(&lp->lock);
 
-	DMA_WBACK_INV(dev, dma, sizeof(struct i596_dma));
+	DMA_WBACK_INV(dev, lp->dma, sizeof(struct i596_dma));
 
-	i = register_netdev(dev);
-	if (i) {
-		dma_free_attrs(dev->dev.parent, sizeof(struct i596_dma),
-			       dma, lp->dma_addr, LIB82596_DMA_ATTR);
-		return i;
-	}
+	ret = register_netdev(dev);
+	if (ret)
+		return ret;
 
 	DEB(DEB_PROBE, printk(KERN_INFO "%s: 82596 at %#3lx, %pM IRQ %d.\n",
 			      dev->name, dev->base_addr, dev->dev_addr,
 			      dev->irq));
 	DEB(DEB_INIT, printk(KERN_INFO
 			     "%s: dma at 0x%p (%d bytes), lp->scb at 0x%p\n",
-			     dev->name, dma, (int)sizeof(struct i596_dma),
-			     &dma->scb));
+			     dev->name, lp->dma, (int)sizeof(struct i596_dma),
+			     &lp->dma->scb));
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/i825xx/sni_82596.c b/drivers/net/ethernet/i825xx/sni_82596.c
index 22f5887578b2bd..e80e790ffbd4d4 100644
--- a/drivers/net/ethernet/i825xx/sni_82596.c
+++ b/drivers/net/ethernet/i825xx/sni_82596.c
@@ -24,8 +24,6 @@
 
 static const char sni_82596_string[] = "snirm_82596";
 
-#define LIB82596_DMA_ATTR	0
-
 #define DMA_WBACK(priv, addr, len)     do { } while (0)
 #define DMA_INV(priv, addr, len)       do { } while (0)
 #define DMA_WBACK_INV(priv, addr, len) do { } while (0)
@@ -134,10 +132,19 @@ static int sni_82596_probe(struct platform_device *dev)
 	lp->ca = ca_addr;
 	lp->mpu_port = mpu_addr;
 
+	lp->dma = dma_alloc_coherent(dev->dev.parent, sizeof(struct i596_dma),
+				     &lp->dma_addr, GFP_KERNEL);
+	if (!lp->dma)
+		goto probe_failed;
+
 	retval = i82596_probe(netdevice);
-	if (retval == 0)
-		return 0;
+	if (retval)
+		goto probe_failed_free_dma;
+	return 0;
 
+probe_failed_free_dma:
+	dma_free_coherent(dev->dev.parent, sizeof(struct i596_dma), lp->dma,
+			  lp->dma_addr);
 probe_failed:
 	free_netdev(netdevice);
 probe_failed_free_ca:
@@ -153,8 +160,8 @@ static int sni_82596_driver_remove(struct platform_device *pdev)
 	struct i596_private *lp = netdev_priv(dev);
 
 	unregister_netdev(dev);
-	dma_free_attrs(dev->dev.parent, sizeof(struct i596_private), lp->dma,
-		       lp->dma_addr, LIB82596_DMA_ATTR);
+	dma_free_coherent(dev->dev.parent, sizeof(struct i596_private), lp->dma,
+			  lp->dma_addr);
 	iounmap(lp->ca);
 	iounmap(lp->mpu_port);
 	free_netdev (dev);
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 07/28] 53c700: improve non-coherent DMA handling
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (5 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 06/28] lib82596: move DMA allocation into the callers of i82596_probe Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-09-01 14:52     ` James Bottomley
  2020-08-19  6:55   ` [PATCH 08/28] MIPS: make dma_sync_*_for_cpu a little less overzealous Christoph Hellwig
                     ` (22 subsequent siblings)
  29 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Switch the 53c700 driver to only use non-coherent descriptor memory if it
really has to because dma_alloc_coherent fails.  This doesn't matter for
any of the platforms it runs on currently, but that will change soon.

To help with this two new helpers to transfer ownership to and from the
device are added that abstract the syncing of the non-coherent memory.
The two current bidirectional cases are mapped to transfers to the
device, as that appears to what they are used for.  Note that for parisc,
which is the only architecture this driver needs to use non-coherent
memory on, the direction argument of dma_cache_sync is ignored, so this
will not change behavior in any way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/53c700.c | 113 +++++++++++++++++++++++-------------------
 drivers/scsi/53c700.h |   9 ++--
 2 files changed, 68 insertions(+), 54 deletions(-)

diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
index 461b3babb601ef..b197ed9399e2e0 100644
--- a/drivers/scsi/53c700.c
+++ b/drivers/scsi/53c700.c
@@ -269,6 +269,20 @@ NCR_700_get_SXFER(struct scsi_device *SDp)
 					      spi_period(SDp->sdev_target));
 }
 
+static inline void dma_sync_to_dev(struct NCR_700_Host_Parameters *h,
+		void *addr, size_t size)
+{
+	if (h->noncoherent)
+		dma_cache_sync(h->dev, addr, size, DMA_TO_DEVICE);
+}
+
+static inline void dma_sync_from_dev(struct NCR_700_Host_Parameters *h,
+		void *addr, size_t size)
+{
+	if (h->noncoherent)
+		dma_cache_sync(h->dev, addr, size, DMA_FROM_DEVICE);
+}
+
 struct Scsi_Host *
 NCR_700_detect(struct scsi_host_template *tpnt,
 	       struct NCR_700_Host_Parameters *hostdata, struct device *dev)
@@ -283,9 +297,13 @@ NCR_700_detect(struct scsi_host_template *tpnt,
 	if(tpnt->sdev_attrs == NULL)
 		tpnt->sdev_attrs = NCR_700_dev_attrs;
 
-	memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, &pScript,
-				 GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
-	if(memory == NULL) {
+	memory = dma_alloc_coherent(dev, TOTAL_MEM_SIZE, &pScript, GFP_KERNEL);
+	if (!memory) {
+		hostdata->noncoherent = 1;
+		memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, &pScript,
+					 GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+	}
+	if (!memory) {
 		printk(KERN_ERR "53c700: Failed to allocate memory for driver, detaching\n");
 		return NULL;
 	}
@@ -339,11 +357,11 @@ NCR_700_detect(struct scsi_host_template *tpnt,
 	for (j = 0; j < PATCHES; j++)
 		script[LABELPATCHES[j]] = bS_to_host(pScript + SCRIPT[LABELPATCHES[j]]);
 	/* now patch up fixed addresses. */
-	script_patch_32(hostdata->dev, script, MessageLocation,
+	script_patch_32(hostdata, script, MessageLocation,
 			pScript + MSGOUT_OFFSET);
-	script_patch_32(hostdata->dev, script, StatusAddress,
+	script_patch_32(hostdata, script, StatusAddress,
 			pScript + STATUS_OFFSET);
-	script_patch_32(hostdata->dev, script, ReceiveMsgAddress,
+	script_patch_32(hostdata, script, ReceiveMsgAddress,
 			pScript + MSGIN_OFFSET);
 
 	hostdata->script = script;
@@ -395,8 +413,12 @@ NCR_700_release(struct Scsi_Host *host)
 	struct NCR_700_Host_Parameters *hostdata = 
 		(struct NCR_700_Host_Parameters *)host->hostdata[0];
 
-	dma_free_attrs(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script,
-		       hostdata->pScript, DMA_ATTR_NON_CONSISTENT);
+	if (hostdata->noncoherent)
+		dma_free_attrs(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script,
+			       hostdata->pScript, DMA_ATTR_NON_CONSISTENT);
+	else
+		dma_free_coherent(hostdata->dev, TOTAL_MEM_SIZE,
+				  hostdata->script, hostdata->pScript);
 	return 1;
 }
 
@@ -804,8 +826,8 @@ process_extended_message(struct Scsi_Host *host,
 			shost_printk(KERN_WARNING, host,
 				"Unexpected SDTR msg\n");
 			hostdata->msgout[0] = A_REJECT_MSG;
-			dma_cache_sync(hostdata->dev, hostdata->msgout, 1, DMA_TO_DEVICE);
-			script_patch_16(hostdata->dev, hostdata->script,
+			dma_sync_to_dev(hostdata, hostdata->msgout, 1);
+			script_patch_16(hostdata, hostdata->script,
 			                MessageCount, 1);
 			/* SendMsgOut returns, so set up the return
 			 * address */
@@ -817,9 +839,8 @@ process_extended_message(struct Scsi_Host *host,
 		printk(KERN_INFO "scsi%d: (%d:%d), Unsolicited WDTR after CMD, Rejecting\n",
 		       host->host_no, pun, lun);
 		hostdata->msgout[0] = A_REJECT_MSG;
-		dma_cache_sync(hostdata->dev, hostdata->msgout, 1, DMA_TO_DEVICE);
-		script_patch_16(hostdata->dev, hostdata->script, MessageCount,
-		                1);
+		dma_sync_to_dev(hostdata, hostdata->msgout, 1);
+		script_patch_16(hostdata, hostdata->script, MessageCount, 1);
 		resume_offset = hostdata->pScript + Ent_SendMessageWithATN;
 
 		break;
@@ -832,9 +853,8 @@ process_extended_message(struct Scsi_Host *host,
 		printk("\n");
 		/* just reject it */
 		hostdata->msgout[0] = A_REJECT_MSG;
-		dma_cache_sync(hostdata->dev, hostdata->msgout, 1, DMA_TO_DEVICE);
-		script_patch_16(hostdata->dev, hostdata->script, MessageCount,
-		                1);
+		dma_sync_to_dev(hostdata, hostdata->msgout, 1);
+		script_patch_16(hostdata, hostdata->script, MessageCount, 1);
 		/* SendMsgOut returns, so set up the return
 		 * address */
 		resume_offset = hostdata->pScript + Ent_SendMessageWithATN;
@@ -917,9 +937,8 @@ process_message(struct Scsi_Host *host,	struct NCR_700_Host_Parameters *hostdata
 		printk("\n");
 		/* just reject it */
 		hostdata->msgout[0] = A_REJECT_MSG;
-		dma_cache_sync(hostdata->dev, hostdata->msgout, 1, DMA_TO_DEVICE);
-		script_patch_16(hostdata->dev, hostdata->script, MessageCount,
-		                1);
+		dma_sync_to_dev(hostdata, hostdata->msgout, 1);
+		script_patch_16(hostdata, hostdata->script, MessageCount, 1);
 		/* SendMsgOut returns, so set up the return
 		 * address */
 		resume_offset = hostdata->pScript + Ent_SendMessageWithATN;
@@ -928,7 +947,7 @@ process_message(struct Scsi_Host *host,	struct NCR_700_Host_Parameters *hostdata
 	}
 	NCR_700_writel(temp, host, TEMP_REG);
 	/* set us up to receive another message */
-	dma_cache_sync(hostdata->dev, hostdata->msgin, MSG_ARRAY_SIZE, DMA_FROM_DEVICE);
+	dma_sync_from_dev(hostdata, hostdata->msgin, MSG_ARRAY_SIZE);
 	return resume_offset;
 }
 
@@ -1008,8 +1027,8 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp,
 				slot->SG[1].ins = bS_to_host(SCRIPT_RETURN);
 				slot->SG[1].pAddr = 0;
 				slot->resume_offset = hostdata->pScript;
-				dma_cache_sync(hostdata->dev, slot->SG, sizeof(slot->SG[0])*2, DMA_TO_DEVICE);
-				dma_cache_sync(hostdata->dev, SCp->sense_buffer, SCSI_SENSE_BUFFERSIZE, DMA_FROM_DEVICE);
+				dma_sync_to_dev(hostdata, slot->SG, sizeof(slot->SG[0])*2);
+				dma_sync_from_dev(hostdata, SCp->sense_buffer, SCSI_SENSE_BUFFERSIZE);
 
 				/* queue the command for reissue */
 				slot->state = NCR_700_SLOT_QUEUED;
@@ -1129,11 +1148,11 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp,
 			hostdata->cmd = slot->cmnd;
 
 			/* re-patch for this command */
-			script_patch_32_abs(hostdata->dev, hostdata->script,
+			script_patch_32_abs(hostdata, hostdata->script,
 			                    CommandAddress, slot->pCmd);
-			script_patch_16(hostdata->dev, hostdata->script,
+			script_patch_16(hostdata, hostdata->script,
 					CommandCount, slot->cmnd->cmd_len);
-			script_patch_32_abs(hostdata->dev, hostdata->script,
+			script_patch_32_abs(hostdata, hostdata->script,
 			                    SGScriptStartAddress,
 					    to32bit(&slot->pSG[0].ins));
 
@@ -1144,14 +1163,14 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp,
 			 * should therefore always clear ACK */
 			NCR_700_writeb(NCR_700_get_SXFER(hostdata->cmd->device),
 				       host, SXFER_REG);
-			dma_cache_sync(hostdata->dev, hostdata->msgin,
-				       MSG_ARRAY_SIZE, DMA_FROM_DEVICE);
-			dma_cache_sync(hostdata->dev, hostdata->msgout,
-				       MSG_ARRAY_SIZE, DMA_TO_DEVICE);
+			dma_sync_from_dev(hostdata, hostdata->msgin,
+				       MSG_ARRAY_SIZE);
+			dma_sync_to_dev(hostdata, hostdata->msgout,
+				       MSG_ARRAY_SIZE);
 			/* I'm just being paranoid here, the command should
 			 * already have been flushed from the cache */
-			dma_cache_sync(hostdata->dev, slot->cmnd->cmnd,
-				       slot->cmnd->cmd_len, DMA_TO_DEVICE);
+			dma_sync_to_dev(hostdata, slot->cmnd->cmnd,
+				       slot->cmnd->cmd_len);
 
 
 			
@@ -1214,8 +1233,7 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp,
 		hostdata->reselection_id = reselection_id;
 		/* just in case we have a stale simple tag message, clear it */
 		hostdata->msgin[1] = 0;
-		dma_cache_sync(hostdata->dev, hostdata->msgin,
-			       MSG_ARRAY_SIZE, DMA_BIDIRECTIONAL);
+		dma_sync_to_dev(hostdata, hostdata->msgin, MSG_ARRAY_SIZE);
 		if(hostdata->tag_negotiated & (1<<reselection_id)) {
 			resume_offset = hostdata->pScript + Ent_GetReselectionWithTag;
 		} else {
@@ -1329,8 +1347,7 @@ process_selection(struct Scsi_Host *host, __u32 dsp)
 	hostdata->cmd = NULL;
 	/* clear any stale simple tag message */
 	hostdata->msgin[1] = 0;
-	dma_cache_sync(hostdata->dev, hostdata->msgin, MSG_ARRAY_SIZE,
-		       DMA_BIDIRECTIONAL);
+	dma_sync_to_dev(hostdata, hostdata->msgin, MSG_ARRAY_SIZE);
 
 	if(id == 0xff) {
 		/* Selected as target, Ignore */
@@ -1427,30 +1444,26 @@ NCR_700_start_command(struct scsi_cmnd *SCp)
 		NCR_700_set_flag(SCp->device, NCR_700_DEV_BEGIN_SYNC_NEGOTIATION);
 	}
 
-	script_patch_16(hostdata->dev, hostdata->script, MessageCount, count);
-
+	script_patch_16(hostdata, hostdata->script, MessageCount, count);
 
-	script_patch_ID(hostdata->dev, hostdata->script,
-			Device_ID, 1<<scmd_id(SCp));
+	script_patch_ID(hostdata, hostdata->script, Device_ID, 1<<scmd_id(SCp));
 
-	script_patch_32_abs(hostdata->dev, hostdata->script, CommandAddress,
+	script_patch_32_abs(hostdata, hostdata->script, CommandAddress,
 			    slot->pCmd);
-	script_patch_16(hostdata->dev, hostdata->script, CommandCount,
-	                SCp->cmd_len);
+	script_patch_16(hostdata, hostdata->script, CommandCount, SCp->cmd_len);
 	/* finally plumb the beginning of the SG list into the script
 	 * */
-	script_patch_32_abs(hostdata->dev, hostdata->script,
+	script_patch_32_abs(hostdata, hostdata->script,
 	                    SGScriptStartAddress, to32bit(&slot->pSG[0].ins));
 	NCR_700_clear_fifo(SCp->device->host);
 
 	if(slot->resume_offset == 0)
 		slot->resume_offset = hostdata->pScript;
 	/* now perform all the writebacks and invalidates */
-	dma_cache_sync(hostdata->dev, hostdata->msgout, count, DMA_TO_DEVICE);
-	dma_cache_sync(hostdata->dev, hostdata->msgin, MSG_ARRAY_SIZE,
-		       DMA_FROM_DEVICE);
-	dma_cache_sync(hostdata->dev, SCp->cmnd, SCp->cmd_len, DMA_TO_DEVICE);
-	dma_cache_sync(hostdata->dev, hostdata->status, 1, DMA_FROM_DEVICE);
+	dma_sync_to_dev(hostdata, hostdata->msgout, count);
+	dma_sync_from_dev(hostdata, hostdata->msgin, MSG_ARRAY_SIZE);
+	dma_sync_to_dev(hostdata, SCp->cmnd, SCp->cmd_len);
+	dma_sync_from_dev(hostdata, hostdata->status, 1);
 
 	/* set the synchronous period/offset */
 	NCR_700_writeb(NCR_700_get_SXFER(SCp->device),
@@ -1626,7 +1639,7 @@ NCR_700_intr(int irq, void *dev_id)
 					slot->SG[i].ins = bS_to_host(SCRIPT_NOP);
 					slot->SG[i].pAddr = 0;
 				}
-				dma_cache_sync(hostdata->dev, slot->SG, sizeof(slot->SG), DMA_TO_DEVICE);
+				dma_sync_to_dev(hostdata, slot->SG, sizeof(slot->SG));
 				/* and pretend we disconnected after
 				 * the command phase */
 				resume_offset = hostdata->pScript + Ent_MsgInDuringData;
@@ -1878,7 +1891,7 @@ NCR_700_queuecommand_lck(struct scsi_cmnd *SCp, void (*done)(struct scsi_cmnd *)
 		}
 		slot->SG[i].ins = bS_to_host(SCRIPT_RETURN);
 		slot->SG[i].pAddr = 0;
-		dma_cache_sync(hostdata->dev, slot->SG, sizeof(slot->SG), DMA_TO_DEVICE);
+		dma_sync_to_dev(hostdata, slot->SG, sizeof(slot->SG));
 		DEBUG((" SETTING %p to %x\n",
 		       (&slot->pSG[i].ins),
 		       slot->SG[i].ins));
diff --git a/drivers/scsi/53c700.h b/drivers/scsi/53c700.h
index 05fe439b66afe5..0f545b05fe611d 100644
--- a/drivers/scsi/53c700.h
+++ b/drivers/scsi/53c700.h
@@ -209,6 +209,7 @@ struct NCR_700_Host_Parameters {
 #endif
 	__u32	chip710:1;	/* set if really a 710 not 700 */
 	__u32	burst_length:4;	/* set to 0 to disable 710 bursting */
+	__u32	noncoherent:1;	/* needs to use non-coherent DMA */
 
 	/* NOTHING BELOW HERE NEEDS ALTERING */
 	__u32	fast:1;		/* if we can alter the SCSI bus clock
@@ -429,7 +430,7 @@ struct NCR_700_Host_Parameters {
 	for(i=0; i< (sizeof(A_##symbol##_used) / sizeof(__u32)); i++) { \
 		__u32 val = bS_to_cpu((script)[A_##symbol##_used[i]]) + da; \
 		(script)[A_##symbol##_used[i]] = bS_to_host(val); \
-		dma_cache_sync((dev), &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \
+		dma_sync_to_dev((dev), &(script)[A_##symbol##_used[i]], 4); \
 		DEBUG((" script, patching %s at %d to %pad\n", \
 		       #symbol, A_##symbol##_used[i], &da)); \
 	} \
@@ -441,7 +442,7 @@ struct NCR_700_Host_Parameters {
 	dma_addr_t da = value; \
 	for(i=0; i< (sizeof(A_##symbol##_used) / sizeof(__u32)); i++) { \
 		(script)[A_##symbol##_used[i]] = bS_to_host(da); \
-		dma_cache_sync((dev), &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \
+		dma_sync_to_dev((dev), &(script)[A_##symbol##_used[i]], 4); \
 		DEBUG((" script, patching %s at %d to %pad\n", \
 		       #symbol, A_##symbol##_used[i], &da)); \
 	} \
@@ -456,7 +457,7 @@ struct NCR_700_Host_Parameters {
 		val &= 0xff00ffff; \
 		val |= ((value) & 0xff) << 16; \
 		(script)[A_##symbol##_used[i]] = bS_to_host(val); \
-		dma_cache_sync((dev), &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \
+		dma_sync_to_dev((dev), &(script)[A_##symbol##_used[i]], 4); \
 		DEBUG((" script, patching ID field %s at %d to 0x%x\n", \
 		       #symbol, A_##symbol##_used[i], val)); \
 	} \
@@ -470,7 +471,7 @@ struct NCR_700_Host_Parameters {
 		val &= 0xffff0000; \
 		val |= ((value) & 0xffff); \
 		(script)[A_##symbol##_used[i]] = bS_to_host(val); \
-		dma_cache_sync((dev), &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \
+		dma_sync_to_dev((dev), &(script)[A_##symbol##_used[i]], 4); \
 		DEBUG((" script, patching short field %s at %d to 0x%x\n", \
 		       #symbol, A_##symbol##_used[i], val)); \
 	} \
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 08/28] MIPS: make dma_sync_*_for_cpu a little less overzealous
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (6 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 07/28] 53c700: improve non-coherent DMA handling Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-09-01 13:53     ` Thomas Bogendoerfer
  2020-08-19  6:55   ` [PATCH 09/28] MIPS/jazzdma: remove the unused vdma_remap function Christoph Hellwig
                     ` (21 subsequent siblings)
  29 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

When transferring DMA ownership back to the CPU there should never
be any writeback from the cache, as the buffer was owned by the
device until now.  Instead it should just be invalidated for the
mapping directions where the device could have written data.
Note that the changes rely on the fact that kmap_atomic is stubbed
out for the !HIGHMEM case to simplify the code a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/mips/mm/dma-noncoherent.c | 44 +++++++++++++++++++++-------------
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c
index 563c2c0d0c8193..97a14adbafc99c 100644
--- a/arch/mips/mm/dma-noncoherent.c
+++ b/arch/mips/mm/dma-noncoherent.c
@@ -55,22 +55,34 @@ void *arch_dma_set_uncached(void *addr, size_t size)
 	return (void *)(__pa(addr) + UNCAC_BASE);
 }
 
-static inline void dma_sync_virt(void *addr, size_t size,
+static inline void dma_sync_virt_for_device(void *addr, size_t size,
 		enum dma_data_direction dir)
 {
 	switch (dir) {
 	case DMA_TO_DEVICE:
 		dma_cache_wback((unsigned long)addr, size);
 		break;
-
 	case DMA_FROM_DEVICE:
 		dma_cache_inv((unsigned long)addr, size);
 		break;
-
 	case DMA_BIDIRECTIONAL:
 		dma_cache_wback_inv((unsigned long)addr, size);
 		break;
+	default:
+		BUG();
+	}
+}
 
+static inline void dma_sync_virt_for_cpu(void *addr, size_t size,
+		enum dma_data_direction dir)
+{
+	switch (dir) {
+	case DMA_TO_DEVICE:
+		break;
+	case DMA_FROM_DEVICE:
+	case DMA_BIDIRECTIONAL:
+		dma_cache_inv((unsigned long)addr, size);
+		break;
 	default:
 		BUG();
 	}
@@ -82,7 +94,7 @@ static inline void dma_sync_virt(void *addr, size_t size,
  * configured then the bulk of this loop gets optimized out.
  */
 static inline void dma_sync_phys(phys_addr_t paddr, size_t size,
-		enum dma_data_direction dir)
+		enum dma_data_direction dir, bool for_device)
 {
 	struct page *page = pfn_to_page(paddr >> PAGE_SHIFT);
 	unsigned long offset = paddr & ~PAGE_MASK;
@@ -90,18 +102,20 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size,
 
 	do {
 		size_t len = left;
+		void *addr;
 
 		if (PageHighMem(page)) {
-			void *addr;
-
 			if (offset + len > PAGE_SIZE)
 				len = PAGE_SIZE - offset;
+		}
+
+		addr = kmap_atomic(page);
+		if (for_device)
+			dma_sync_virt_for_device(addr + offset, len, dir);
+		else
+			dma_sync_virt_for_cpu(addr + offset, len, dir);
+		kunmap_atomic(addr);
 
-			addr = kmap_atomic(page);
-			dma_sync_virt(addr + offset, len, dir);
-			kunmap_atomic(addr);
-		} else
-			dma_sync_virt(page_address(page) + offset, size, dir);
 		offset = 0;
 		page++;
 		left -= len;
@@ -111,7 +125,7 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t size,
 void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
 		enum dma_data_direction dir)
 {
-	dma_sync_phys(paddr, size, dir);
+	dma_sync_phys(paddr, size, dir, true);
 }
 
 #ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU
@@ -119,16 +133,14 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 		enum dma_data_direction dir)
 {
 	if (cpu_needs_post_dma_flush())
-		dma_sync_phys(paddr, size, dir);
+		dma_sync_phys(paddr, size, dir, false);
 }
 #endif
 
 void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 		enum dma_data_direction direction)
 {
-	BUG_ON(direction == DMA_NONE);
-
-	dma_sync_virt(vaddr, size, direction);
+	dma_sync_virt_for_device(vaddr, size, direction);
 }
 
 #ifdef CONFIG_DMA_PERDEV_COHERENT
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 09/28] MIPS/jazzdma: remove the unused vdma_remap function
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (7 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 08/28] MIPS: make dma_sync_*_for_cpu a little less overzealous Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-09-01 13:49     ` Thomas Bogendoerfer
  2020-08-19  6:55   ` [PATCH 10/28] MIPS/jazzdma: decouple from dma-direct Christoph Hellwig
                     ` (20 subsequent siblings)
  29 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/mips/include/asm/jazzdma.h |  2 -
 arch/mips/jazz/jazzdma.c        | 70 ---------------------------------
 2 files changed, 72 deletions(-)

diff --git a/arch/mips/include/asm/jazzdma.h b/arch/mips/include/asm/jazzdma.h
index d13f940022d5f9..c831da7fa89803 100644
--- a/arch/mips/include/asm/jazzdma.h
+++ b/arch/mips/include/asm/jazzdma.h
@@ -10,8 +10,6 @@
  */
 extern unsigned long vdma_alloc(unsigned long paddr, unsigned long size);
 extern int vdma_free(unsigned long laddr);
-extern int vdma_remap(unsigned long laddr, unsigned long paddr,
-		      unsigned long size);
 extern unsigned long vdma_phys2log(unsigned long paddr);
 extern unsigned long vdma_log2phys(unsigned long laddr);
 extern void vdma_stats(void);		/* for debugging only */
diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c
index 014773f0bfcd74..fe40dbed04c1d6 100644
--- a/arch/mips/jazz/jazzdma.c
+++ b/arch/mips/jazz/jazzdma.c
@@ -209,76 +209,6 @@ int vdma_free(unsigned long laddr)
 
 EXPORT_SYMBOL(vdma_free);
 
-/*
- * Map certain page(s) to another physical address.
- * Caller must have allocated the page(s) before.
- */
-int vdma_remap(unsigned long laddr, unsigned long paddr, unsigned long size)
-{
-	int first, pages;
-
-	if (laddr > 0xffffff) {
-		if (vdma_debug)
-			printk
-			    ("vdma_map: Invalid logical address: %08lx\n",
-			     laddr);
-		return -EINVAL; /* invalid logical address */
-	}
-	if (paddr > 0x1fffffff) {
-		if (vdma_debug)
-			printk
-			    ("vdma_map: Invalid physical address: %08lx\n",
-			     paddr);
-		return -EINVAL; /* invalid physical address */
-	}
-
-	pages = (((paddr & (VDMA_PAGESIZE - 1)) + size) >> 12) + 1;
-	first = laddr >> 12;
-	if (vdma_debug)
-		printk("vdma_remap: first=%x, pages=%x\n", first, pages);
-	if (first + pages > VDMA_PGTBL_ENTRIES) {
-		if (vdma_debug)
-			printk("vdma_alloc: Invalid size: %08lx\n", size);
-		return -EINVAL;
-	}
-
-	paddr &= ~(VDMA_PAGESIZE - 1);
-	while (pages > 0 && first < VDMA_PGTBL_ENTRIES) {
-		if (pgtbl[first].owner != laddr) {
-			if (vdma_debug)
-				printk("Trying to remap other's pages.\n");
-			return -EPERM;	/* not owner */
-		}
-		pgtbl[first].frame = paddr;
-		paddr += VDMA_PAGESIZE;
-		first++;
-		pages--;
-	}
-
-	/*
-	 * Update translation table
-	 */
-	r4030_write_reg32(JAZZ_R4030_TRSTBL_INV, 0);
-
-	if (vdma_debug > 2) {
-		int i;
-		pages = (((paddr & (VDMA_PAGESIZE - 1)) + size) >> 12) + 1;
-		first = laddr >> 12;
-		printk("LADDR: ");
-		for (i = first; i < first + pages; i++)
-			printk("%08x ", i << 12);
-		printk("\nPADDR: ");
-		for (i = first; i < first + pages; i++)
-			printk("%08x ", pgtbl[i].frame);
-		printk("\nOWNER: ");
-		for (i = first; i < first + pages; i++)
-			printk("%08x ", pgtbl[i].owner);
-		printk("\n");
-	}
-
-	return 0;
-}
-
 /*
  * Translate a physical address to a logical address.
  * This will return the logical address of the first
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 10/28] MIPS/jazzdma: decouple from dma-direct
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (8 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 09/28] MIPS/jazzdma: remove the unused vdma_remap function Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-09-01 13:49     ` Thomas Bogendoerfer
  2020-08-19  6:55   ` [PATCH 11/28] dma-mapping: add (back) arch_dma_mark_clean for ia64 Christoph Hellwig
                     ` (19 subsequent siblings)
  29 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

The jazzdma ops implement support for a very basic IOMMU.  Thus we really
should not use the dma-direct code that takes physical address limits
into account.  This survived through the great MIPS DMA ops cleanup mostly
because I was lazy, but now it is time to fully split the implementations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/mips/jazz/jazzdma.c | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c
index fe40dbed04c1d6..d0b5a2ba2b1a8a 100644
--- a/arch/mips/jazz/jazzdma.c
+++ b/arch/mips/jazz/jazzdma.c
@@ -16,7 +16,6 @@
 #include <linux/memblock.h>
 #include <linux/spinlock.h>
 #include <linux/gfp.h>
-#include <linux/dma-direct.h>
 #include <linux/dma-noncoherent.h>
 #include <asm/mipsregs.h>
 #include <asm/jazz.h>
@@ -492,26 +491,38 @@ int vdma_get_enable(int channel)
 static void *jazz_dma_alloc(struct device *dev, size_t size,
 		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
 {
+	struct page *page;
 	void *ret;
 
-	ret = dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs);
-	if (!ret)
-		return NULL;
+	if (attrs & DMA_ATTR_NO_WARN)
+		gfp |= __GFP_NOWARN;
 
-	*dma_handle = vdma_alloc(virt_to_phys(ret), size);
-	if (*dma_handle == DMA_MAPPING_ERROR) {
-		dma_direct_free_pages(dev, size, ret, *dma_handle, attrs);
+	size = PAGE_ALIGN(size);
+	page = alloc_pages(gfp, get_order(size));
+	if (!page)
 		return NULL;
-	}
+	ret = page_address(page);
+	*dma_handle = vdma_alloc(virt_to_phys(ret), size);
+	if (*dma_handle == DMA_MAPPING_ERROR)
+		goto out_free_pages;
+
+	if (attrs & DMA_ATTR_NON_CONSISTENT)
+		return ret;
+	arch_dma_prep_coherent(page, size);
+	return (void *)(UNCAC_BASE + __pa(ret));
 
-	return ret;
+out_free_pages:
+	__free_pages(page, get_order(size));
+	return NULL;
 }
 
 static void jazz_dma_free(struct device *dev, size_t size, void *vaddr,
 		dma_addr_t dma_handle, unsigned long attrs)
 {
 	vdma_free(dma_handle);
-	dma_direct_free_pages(dev, size, vaddr, dma_handle, attrs);
+	if (!(attrs & DMA_ATTR_NON_CONSISTENT))
+		vaddr = __va(vaddr - UNCAC_BASE);
+	__free_pages(virt_to_page(vaddr), get_order(size));
 }
 
 static dma_addr_t jazz_dma_map_page(struct device *dev, struct page *page,
@@ -608,7 +619,6 @@ const struct dma_map_ops jazz_dma_ops = {
 	.sync_single_for_device	= jazz_dma_sync_single_for_device,
 	.sync_sg_for_cpu	= jazz_dma_sync_sg_for_cpu,
 	.sync_sg_for_device	= jazz_dma_sync_sg_for_device,
-	.dma_supported		= dma_direct_supported,
 	.cache_sync		= arch_dma_cache_sync,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 11/28] dma-mapping: add (back) arch_dma_mark_clean for ia64
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (9 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 10/28] MIPS/jazzdma: decouple from dma-direct Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 12/28] dma-direct: remove dma_direct_{alloc,free}_pages Christoph Hellwig
                     ` (18 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Add back a hook to optimize dcache flushing after reading executable
code using DMA.  This gets ia64 out of the business of pretending to
be dma incoherent just for this optimization.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/ia64/Kconfig               |  3 +--
 arch/ia64/kernel/dma-mapping.c  | 14 +-------------
 arch/ia64/mm/init.c             |  3 +--
 include/linux/dma-direct.h      |  3 +++
 include/linux/dma-noncoherent.h |  8 ++++++++
 kernel/dma/Kconfig              |  6 ++++++
 kernel/dma/direct.c             |  3 +++
 7 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 5b4ec80bf5863a..513ba0c5d33610 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -8,6 +8,7 @@ menu "Processor type and features"
 
 config IA64
 	bool
+	select ARCH_HAS_DMA_MARK_CLEAN
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
 	select ACPI
@@ -32,8 +33,6 @@ config IA64
 	select TTY
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_VIRT_CPU_ACCOUNTING
-	select DMA_NONCOHERENT_MMAP
-	select ARCH_HAS_SYNC_DMA_FOR_CPU
 	select VIRT_TO_BUS
 	select GENERIC_IRQ_PROBE
 	select GENERIC_PENDING_IRQ if SMP
diff --git a/arch/ia64/kernel/dma-mapping.c b/arch/ia64/kernel/dma-mapping.c
index 09ef9ce9988d1f..f640ed6fe1d576 100644
--- a/arch/ia64/kernel/dma-mapping.c
+++ b/arch/ia64/kernel/dma-mapping.c
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
-#include <linux/dma-direct.h>
+#include <linux/dma-mapping.h>
 #include <linux/export.h>
 
 /* Set this to 1 if there is a HW IOMMU in the system */
@@ -7,15 +7,3 @@ int iommu_detected __read_mostly;
 
 const struct dma_map_ops *dma_ops;
 EXPORT_SYMBOL(dma_ops);
-
-void *arch_dma_alloc(struct device *dev, size_t size,
-		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
-{
-	return dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs);
-}
-
-void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
-		dma_addr_t dma_addr, unsigned long attrs)
-{
-	dma_direct_free_pages(dev, size, cpu_addr, dma_addr, attrs);
-}
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 0b3fb4c7af2920..02e5aa08294ee0 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -73,8 +73,7 @@ __ia64_sync_icache_dcache (pte_t pte)
  * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
  * flush them when they get mapped into an executable vm-area.
  */
-void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
-		enum dma_data_direction dir)
+void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
 {
 	unsigned long pfn = PHYS_PFN(paddr);
 
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 5a3ce2a2479437..738485b3578062 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -153,6 +153,9 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
 
 	if (unlikely(is_swiotlb_buffer(paddr)))
 		swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_CPU);
+
+	if (dir == DMA_FROM_DEVICE)
+		arch_dma_mark_clean(paddr, size);
 }
 
 static inline dma_addr_t dma_direct_map_page(struct device *dev,
diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index ca09a4e07d2d3d..b9bc6c557ea46f 100644
--- a/include/linux/dma-noncoherent.h
+++ b/include/linux/dma-noncoherent.h
@@ -108,6 +108,14 @@ static inline void arch_dma_prep_coherent(struct page *page, size_t size)
 }
 #endif /* CONFIG_ARCH_HAS_DMA_PREP_COHERENT */
 
+#ifdef CONFIG_ARCH_HAS_DMA_MARK_CLEAN
+void arch_dma_mark_clean(phys_addr_t paddr, size_t size);
+#else
+static inline void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
+{
+}
+#endif /* ARCH_HAS_DMA_MARK_CLEAN */
+
 void *arch_dma_set_uncached(void *addr, size_t size);
 void arch_dma_clear_uncached(void *addr, size_t size);
 
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 847a9d1fa6343d..6cf7f7947ae797 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -43,6 +43,12 @@ config ARCH_HAS_DMA_SET_MASK
 config ARCH_HAS_DMA_WRITE_COMBINE
 	bool
 
+#
+# Select if the architectures provides the arch_dma_mark_clean hook
+#
+config ARCH_HAS_DMA_MARK_CLEAN
+	bool
+
 config DMA_DECLARE_COHERENT
 	bool
 
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index bb0041e9965975..1123e767f4315f 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -340,6 +340,9 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
 		if (unlikely(is_swiotlb_buffer(paddr)))
 			swiotlb_tbl_sync_single(dev, paddr, sg->length, dir,
 					SYNC_FOR_CPU);
+
+		if (dir == DMA_FROM_DEVICE)
+			arch_dma_mark_clean(paddr, sg->length);
 	}
 
 	if (!dev_is_dma_coherent(dev))
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 12/28] dma-direct: remove dma_direct_{alloc,free}_pages
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (10 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 11/28] dma-mapping: add (back) arch_dma_mark_clean for ia64 Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 13/28] dma-direct: lift gfp_t manipulation out of__dma_direct_alloc_pages Christoph Hellwig
                     ` (17 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Just merge these helpers into the main dma_direct_{alloc,free} routines,
as the additional checks are always false for the two callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/x86/kernel/amd_gart_64.c |  6 +++---
 include/linux/dma-direct.h    |  4 ----
 kernel/dma/direct.c           | 39 ++++++++++++++---------------------
 kernel/dma/pool.c             |  2 +-
 4 files changed, 19 insertions(+), 32 deletions(-)

diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
index e89031e9c84761..adbf616d35d15d 100644
--- a/arch/x86/kernel/amd_gart_64.c
+++ b/arch/x86/kernel/amd_gart_64.c
@@ -468,7 +468,7 @@ gart_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_addr,
 {
 	void *vaddr;
 
-	vaddr = dma_direct_alloc_pages(dev, size, dma_addr, flag, attrs);
+	vaddr = dma_direct_alloc(dev, size, dma_addr, flag, attrs);
 	if (!vaddr ||
 	    !force_iommu || dev->coherent_dma_mask <= DMA_BIT_MASK(24))
 		return vaddr;
@@ -480,7 +480,7 @@ gart_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_addr,
 		goto out_free;
 	return vaddr;
 out_free:
-	dma_direct_free_pages(dev, size, vaddr, *dma_addr, attrs);
+	dma_direct_free(dev, size, vaddr, *dma_addr, attrs);
 	return NULL;
 }
 
@@ -490,7 +490,7 @@ gart_free_coherent(struct device *dev, size_t size, void *vaddr,
 		   dma_addr_t dma_addr, unsigned long attrs)
 {
 	gart_unmap_page(dev, dma_addr, size, DMA_BIDIRECTIONAL, 0);
-	dma_direct_free_pages(dev, size, vaddr, dma_addr, attrs);
+	dma_direct_free(dev, size, vaddr, dma_addr, attrs);
 }
 
 static int no_agp;
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 738485b3578062..6a96a8ecac7cbc 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -80,10 +80,6 @@ void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
 		gfp_t gfp, unsigned long attrs);
 void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
 		dma_addr_t dma_addr, unsigned long attrs);
-void *dma_direct_alloc_pages(struct device *dev, size_t size,
-		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs);
-void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr,
-		dma_addr_t dma_addr, unsigned long attrs);
 int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		unsigned long attrs);
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 1123e767f4315f..8da9a62dd9a72c 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -151,13 +151,18 @@ static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
 	return page;
 }
 
-void *dma_direct_alloc_pages(struct device *dev, size_t size,
+void *dma_direct_alloc(struct device *dev, size_t size,
 		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
 {
 	struct page *page;
 	void *ret;
 	int err;
 
+	if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
+	    !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+	    dma_alloc_need_uncached(dev, attrs))
+		return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
+
 	size = PAGE_ALIGN(size);
 
 	if (dma_should_alloc_from_pool(dev, gfp, attrs)) {
@@ -251,11 +256,18 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size,
 	return NULL;
 }
 
-void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr,
-		dma_addr_t dma_addr, unsigned long attrs)
+void dma_direct_free(struct device *dev, size_t size,
+		void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs)
 {
 	unsigned int page_order = get_order(size);
 
+	if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
+	    !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+	    dma_alloc_need_uncached(dev, attrs)) {
+		arch_dma_free(dev, size, cpu_addr, dma_addr, attrs);
+		return;
+	}
+
 	/* If cpu_addr is not from an atomic pool, dma_free_from_pool() fails */
 	if (dma_should_free_from_pool(dev, attrs) &&
 	    dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
@@ -279,27 +291,6 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr,
 	dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size);
 }
 
-void *dma_direct_alloc(struct device *dev, size_t size,
-		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
-{
-	if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-	    !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-	    dma_alloc_need_uncached(dev, attrs))
-		return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
-	return dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs);
-}
-
-void dma_direct_free(struct device *dev, size_t size,
-		void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs)
-{
-	if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-	    !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-	    dma_alloc_need_uncached(dev, attrs))
-		arch_dma_free(dev, size, cpu_addr, dma_addr, attrs);
-	else
-		dma_direct_free_pages(dev, size, cpu_addr, dma_addr, attrs);
-}
-
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
     defined(CONFIG_SWIOTLB)
 void dma_direct_sync_sg_for_device(struct device *dev,
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 6bc74a2d51273e..222cebf1f10548 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -86,7 +86,7 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
 #endif
 	/*
 	 * Memory in the atomic DMA pools must be unencrypted, the pools do not
-	 * shrink so no re-encryption occurs in dma_direct_free_pages().
+	 * shrink so no re-encryption occurs in dma_direct_free().
 	 */
 	ret = set_memory_decrypted((unsigned long)page_to_virt(page),
 				   1 << order);
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 13/28] dma-direct: lift gfp_t manipulation out of__dma_direct_alloc_pages
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (11 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 12/28] dma-direct: remove dma_direct_{alloc,free}_pages Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 14/28] dma-direct: use phys_to_dma_direct in dma_direct_alloc Christoph Hellwig
                     ` (16 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Move the detailed gfp_t setup from __dma_direct_alloc_pages into the
caller to clean things up a little.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/direct.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 8da9a62dd9a72c..01120510968fa1 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -108,7 +108,7 @@ static inline bool dma_should_free_from_pool(struct device *dev,
 }
 
 static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
-		gfp_t gfp, unsigned long attrs)
+		gfp_t gfp)
 {
 	int node = dev_to_node(dev);
 	struct page *page = NULL;
@@ -116,11 +116,6 @@ static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
 
 	WARN_ON_ONCE(!PAGE_ALIGNED(size));
 
-	if (attrs & DMA_ATTR_NO_WARN)
-		gfp |= __GFP_NOWARN;
-
-	/* we always manually zero the memory once we are done: */
-	gfp &= ~__GFP_ZERO;
 	gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
 					   &phys_limit);
 	page = dma_alloc_contiguous(dev, size, gfp);
@@ -164,6 +159,8 @@ void *dma_direct_alloc(struct device *dev, size_t size,
 		return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
 
 	size = PAGE_ALIGN(size);
+	if (attrs & DMA_ATTR_NO_WARN)
+		gfp |= __GFP_NOWARN;
 
 	if (dma_should_alloc_from_pool(dev, gfp, attrs)) {
 		ret = dma_alloc_from_pool(dev, size, &page, gfp);
@@ -172,7 +169,8 @@ void *dma_direct_alloc(struct device *dev, size_t size,
 		goto done;
 	}
 
-	page = __dma_direct_alloc_pages(dev, size, gfp, attrs);
+	/* we always manually zero the memory once we are done */
+	page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
 	if (!page)
 		return NULL;
 
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 14/28] dma-direct: use phys_to_dma_direct in dma_direct_alloc
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (12 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 13/28] dma-direct: lift gfp_t manipulation out of__dma_direct_alloc_pages Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 15/28] dma-direct: remove __dma_to_phys Christoph Hellwig
                     ` (15 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Replace the currently open code copy.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/direct.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 01120510968fa1..2e280b9c063449 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -235,10 +235,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
 			goto out_encrypt_pages;
 	}
 done:
-	if (force_dma_unencrypted(dev))
-		*dma_handle = __phys_to_dma(dev, page_to_phys(page));
-	else
-		*dma_handle = phys_to_dma(dev, page_to_phys(page));
+	*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
 	return ret;
 
 out_encrypt_pages:
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 15/28] dma-direct: remove __dma_to_phys
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (13 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 14/28] dma-direct: use phys_to_dma_direct in dma_direct_alloc Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 16/28] dma-direct: rename and cleanup __phys_to_dma Christoph Hellwig
                     ` (14 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

There is no harm in just always clearing the SME encryption bit, while
significantly simplifying the interface.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm/include/asm/dma-direct.h      |  2 +-
 arch/mips/bmips/dma.c                  |  2 +-
 arch/mips/cavium-octeon/dma-octeon.c   |  2 +-
 arch/mips/include/asm/dma-direct.h     |  2 +-
 arch/mips/loongson2ef/fuloong-2e/dma.c |  2 +-
 arch/mips/loongson2ef/lemote-2f/dma.c  |  2 +-
 arch/mips/loongson64/dma.c             |  2 +-
 arch/mips/pci/pci-ar2315.c             |  2 +-
 arch/mips/pci/pci-xtalk-bridge.c       |  2 +-
 arch/mips/sgi-ip32/ip32-dma.c          |  2 +-
 arch/powerpc/include/asm/dma-direct.h  |  2 +-
 include/linux/dma-direct.h             | 21 +++++++++++----------
 kernel/dma/direct.c                    |  6 +-----
 13 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/arch/arm/include/asm/dma-direct.h b/arch/arm/include/asm/dma-direct.h
index 7c3001a6a775bf..a8cee87a93e8ab 100644
--- a/arch/arm/include/asm/dma-direct.h
+++ b/arch/arm/include/asm/dma-direct.h
@@ -8,7 +8,7 @@ static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
 	return pfn_to_dma(dev, __phys_to_pfn(paddr)) + offset;
 }
 
-static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dev_addr)
+static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dev_addr)
 {
 	unsigned int offset = dev_addr & ~PAGE_MASK;
 	return __pfn_to_phys(dma_to_pfn(dev, dev_addr)) + offset;
diff --git a/arch/mips/bmips/dma.c b/arch/mips/bmips/dma.c
index df56bf4179e347..ba2a5d33dfd3fa 100644
--- a/arch/mips/bmips/dma.c
+++ b/arch/mips/bmips/dma.c
@@ -52,7 +52,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t pa)
 	return pa;
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dma_addr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dma_addr)
 {
 	struct bmips_dma_range *r;
 
diff --git a/arch/mips/cavium-octeon/dma-octeon.c b/arch/mips/cavium-octeon/dma-octeon.c
index 14ea680d180e07..388b13ba2558c2 100644
--- a/arch/mips/cavium-octeon/dma-octeon.c
+++ b/arch/mips/cavium-octeon/dma-octeon.c
@@ -177,7 +177,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
 	return paddr;
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
 {
 #ifdef CONFIG_PCI
 	if (dev && dev_is_pci(dev))
diff --git a/arch/mips/include/asm/dma-direct.h b/arch/mips/include/asm/dma-direct.h
index 14e352651ce946..8e178651c638c2 100644
--- a/arch/mips/include/asm/dma-direct.h
+++ b/arch/mips/include/asm/dma-direct.h
@@ -3,6 +3,6 @@
 #define _MIPS_DMA_DIRECT_H 1
 
 dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr);
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr);
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr);
 
 #endif /* _MIPS_DMA_DIRECT_H */
diff --git a/arch/mips/loongson2ef/fuloong-2e/dma.c b/arch/mips/loongson2ef/fuloong-2e/dma.c
index e122292bf6660a..83fadeb3fd7d56 100644
--- a/arch/mips/loongson2ef/fuloong-2e/dma.c
+++ b/arch/mips/loongson2ef/fuloong-2e/dma.c
@@ -6,7 +6,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
 	return paddr | 0x80000000;
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dma_addr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dma_addr)
 {
 	return dma_addr & 0x7fffffff;
 }
diff --git a/arch/mips/loongson2ef/lemote-2f/dma.c b/arch/mips/loongson2ef/lemote-2f/dma.c
index abf0e39d7e4696..302b43a14eee74 100644
--- a/arch/mips/loongson2ef/lemote-2f/dma.c
+++ b/arch/mips/loongson2ef/lemote-2f/dma.c
@@ -6,7 +6,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
 	return paddr | 0x80000000;
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dma_addr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dma_addr)
 {
 	if (dma_addr > 0x8fffffff)
 		return dma_addr;
diff --git a/arch/mips/loongson64/dma.c b/arch/mips/loongson64/dma.c
index dbfe6e82fddd1c..b3dc5d0bd2b113 100644
--- a/arch/mips/loongson64/dma.c
+++ b/arch/mips/loongson64/dma.c
@@ -13,7 +13,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
 	return ((nid << 44) ^ paddr) | (nid << node_id_offset);
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
 {
 	/* We extract 2bit node id (bit 44~47, only bit 44~45 used now) from
 	 * Loongson-3's 48bit address space and embed it into 40bit */
diff --git a/arch/mips/pci/pci-ar2315.c b/arch/mips/pci/pci-ar2315.c
index 490953f515282a..d88395684f487d 100644
--- a/arch/mips/pci/pci-ar2315.c
+++ b/arch/mips/pci/pci-ar2315.c
@@ -175,7 +175,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
 	return paddr + ar2315_dev_offset(dev);
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dma_addr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dma_addr)
 {
 	return dma_addr - ar2315_dev_offset(dev);
 }
diff --git a/arch/mips/pci/pci-xtalk-bridge.c b/arch/mips/pci/pci-xtalk-bridge.c
index 9b3cc775c55e05..f1b37f32b55395 100644
--- a/arch/mips/pci/pci-xtalk-bridge.c
+++ b/arch/mips/pci/pci-xtalk-bridge.c
@@ -33,7 +33,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
 	return bc->baddr + paddr;
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dma_addr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dma_addr)
 {
 	return dma_addr & ~(0xffUL << 56);
 }
diff --git a/arch/mips/sgi-ip32/ip32-dma.c b/arch/mips/sgi-ip32/ip32-dma.c
index fa7b17cb53853e..160317294d97a9 100644
--- a/arch/mips/sgi-ip32/ip32-dma.c
+++ b/arch/mips/sgi-ip32/ip32-dma.c
@@ -27,7 +27,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
 	return dma_addr;
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dma_addr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dma_addr)
 {
 	phys_addr_t paddr = dma_addr & RAM_OFFSET_MASK;
 
diff --git a/arch/powerpc/include/asm/dma-direct.h b/arch/powerpc/include/asm/dma-direct.h
index abc154d784b078..95b09313d2a4cf 100644
--- a/arch/powerpc/include/asm/dma-direct.h
+++ b/arch/powerpc/include/asm/dma-direct.h
@@ -7,7 +7,7 @@ static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
 	return paddr + dev->archdata.dma_offset;
 }
 
-static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr)
+static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
 {
 	return daddr - dev->archdata.dma_offset;
 }
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 6a96a8ecac7cbc..811582a39e291f 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -24,11 +24,17 @@ static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
 	return dev_addr - ((dma_addr_t)dev->dma_pfn_offset << PAGE_SHIFT);
 }
 
-static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dev_addr)
+static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dev_addr)
 {
-	phys_addr_t paddr = (phys_addr_t)dev_addr;
-
-	return paddr + ((phys_addr_t)dev->dma_pfn_offset << PAGE_SHIFT);
+	phys_addr_t paddr = (phys_addr_t)dev_addr +
+		((phys_addr_t)dev->dma_pfn_offset << PAGE_SHIFT);
+
+	/*
+	 * Clear the Memory encryption mask if support by the architecture.  We
+	 * do this unconditionally so that we don't have to track if someone
+	 * fed us an encrypted or unencryped DMA address.
+	 */
+	return __sme_clr(paddr);
 }
 #endif /* !CONFIG_ARCH_HAS_PHYS_TO_DMA */
 
@@ -44,7 +50,7 @@ static inline bool force_dma_unencrypted(struct device *dev)
 /*
  * If memory encryption is supported, phys_to_dma will set the memory encryption
  * bit in the DMA address, and dma_to_phys will clear it.  The raw __phys_to_dma
- * and __dma_to_phys versions should only be used on non-encrypted memory for
+ * version should only be used on non-encrypted memory for
  * special occasions like DMA coherent buffers.
  */
 static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
@@ -52,11 +58,6 @@ static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 	return __sme_set(__phys_to_dma(dev, paddr));
 }
 
-static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
-{
-	return __sme_clr(__dma_to_phys(dev, daddr));
-}
-
 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size,
 		bool is_ram)
 {
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 2e280b9c063449..a97835983a34f7 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -48,11 +48,6 @@ gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
 {
 	u64 dma_limit = min_not_zero(dma_mask, dev->bus_dma_limit);
 
-	if (force_dma_unencrypted(dev))
-		*phys_limit = __dma_to_phys(dev, dma_limit);
-	else
-		*phys_limit = dma_to_phys(dev, dma_limit);
-
 	/*
 	 * Optimistically try the zone that the physical address mask falls
 	 * into first.  If that returns memory that isn't actually addressable
@@ -61,6 +56,7 @@ gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
 	 * Note that GFP_DMA32 and GFP_DMA are no ops without the corresponding
 	 * zones.
 	 */
+	*phys_limit = dma_to_phys(dev, dma_limit);
 	if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
 		return GFP_DMA;
 	if (*phys_limit <= DMA_BIT_MASK(32))
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 16/28] dma-direct: rename and cleanup __phys_to_dma
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (14 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 15/28] dma-direct: remove __dma_to_phys Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 17/28] dma-mapping: move dma_common_{mmap, get_sgtable} out of mapping.c Christoph Hellwig
                     ` (13 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

The __phys_to_dma vs phys_to_dma distinction isn't exactly obvious.  Try
to improve the situation by renaming __phys_to_dma to
phys_to_dma_unencryped, and not forcing architectures that want to
override phys_to_dma to actually provide __phys_to_dma.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm/include/asm/dma-direct.h      |  2 +-
 arch/mips/bmips/dma.c                  |  2 +-
 arch/mips/cavium-octeon/dma-octeon.c   |  2 +-
 arch/mips/include/asm/dma-direct.h     |  2 +-
 arch/mips/loongson2ef/fuloong-2e/dma.c |  2 +-
 arch/mips/loongson2ef/lemote-2f/dma.c  |  2 +-
 arch/mips/loongson64/dma.c             |  2 +-
 arch/mips/pci/pci-ar2315.c             |  2 +-
 arch/mips/pci/pci-xtalk-bridge.c       |  2 +-
 arch/mips/sgi-ip32/ip32-dma.c          |  2 +-
 arch/powerpc/include/asm/dma-direct.h  |  2 +-
 drivers/iommu/intel/iommu.c            |  2 +-
 include/linux/dma-direct.h             | 28 +++++++++++++++-----------
 kernel/dma/direct.c                    |  8 ++++----
 kernel/dma/swiotlb.c                   |  4 ++--
 15 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/arch/arm/include/asm/dma-direct.h b/arch/arm/include/asm/dma-direct.h
index a8cee87a93e8ab..bca0de56753439 100644
--- a/arch/arm/include/asm/dma-direct.h
+++ b/arch/arm/include/asm/dma-direct.h
@@ -2,7 +2,7 @@
 #ifndef ASM_ARM_DMA_DIRECT_H
 #define ASM_ARM_DMA_DIRECT_H 1
 
-static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 	unsigned int offset = paddr & ~PAGE_MASK;
 	return pfn_to_dma(dev, __phys_to_pfn(paddr)) + offset;
diff --git a/arch/mips/bmips/dma.c b/arch/mips/bmips/dma.c
index ba2a5d33dfd3fa..49061b870680b9 100644
--- a/arch/mips/bmips/dma.c
+++ b/arch/mips/bmips/dma.c
@@ -40,7 +40,7 @@ static struct bmips_dma_range *bmips_dma_ranges;
 
 #define FLUSH_RAC		0x100
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t pa)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t pa)
 {
 	struct bmips_dma_range *r;
 
diff --git a/arch/mips/cavium-octeon/dma-octeon.c b/arch/mips/cavium-octeon/dma-octeon.c
index 388b13ba2558c2..232fa1017b1ec9 100644
--- a/arch/mips/cavium-octeon/dma-octeon.c
+++ b/arch/mips/cavium-octeon/dma-octeon.c
@@ -168,7 +168,7 @@ void __init octeon_pci_dma_init(void)
 }
 #endif /* CONFIG_PCI */
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 #ifdef CONFIG_PCI
 	if (dev && dev_is_pci(dev))
diff --git a/arch/mips/include/asm/dma-direct.h b/arch/mips/include/asm/dma-direct.h
index 8e178651c638c2..9a640118316c9d 100644
--- a/arch/mips/include/asm/dma-direct.h
+++ b/arch/mips/include/asm/dma-direct.h
@@ -2,7 +2,7 @@
 #ifndef _MIPS_DMA_DIRECT_H
 #define _MIPS_DMA_DIRECT_H 1
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr);
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr);
 phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr);
 
 #endif /* _MIPS_DMA_DIRECT_H */
diff --git a/arch/mips/loongson2ef/fuloong-2e/dma.c b/arch/mips/loongson2ef/fuloong-2e/dma.c
index 83fadeb3fd7d56..cea167d8aba8db 100644
--- a/arch/mips/loongson2ef/fuloong-2e/dma.c
+++ b/arch/mips/loongson2ef/fuloong-2e/dma.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/dma-direct.h>
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 	return paddr | 0x80000000;
 }
diff --git a/arch/mips/loongson2ef/lemote-2f/dma.c b/arch/mips/loongson2ef/lemote-2f/dma.c
index 302b43a14eee74..3c9e994563578c 100644
--- a/arch/mips/loongson2ef/lemote-2f/dma.c
+++ b/arch/mips/loongson2ef/lemote-2f/dma.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/dma-direct.h>
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 	return paddr | 0x80000000;
 }
diff --git a/arch/mips/loongson64/dma.c b/arch/mips/loongson64/dma.c
index b3dc5d0bd2b113..364f2f27c8723f 100644
--- a/arch/mips/loongson64/dma.c
+++ b/arch/mips/loongson64/dma.c
@@ -4,7 +4,7 @@
 #include <linux/swiotlb.h>
 #include <boot_param.h>
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 	/* We extract 2bit node id (bit 44~47, only bit 44~45 used now) from
 	 * Loongson-3's 48bit address space and embed it into 40bit */
diff --git a/arch/mips/pci/pci-ar2315.c b/arch/mips/pci/pci-ar2315.c
index d88395684f487d..cef4a47ab06311 100644
--- a/arch/mips/pci/pci-ar2315.c
+++ b/arch/mips/pci/pci-ar2315.c
@@ -170,7 +170,7 @@ static inline dma_addr_t ar2315_dev_offset(struct device *dev)
 	return 0;
 }
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 	return paddr + ar2315_dev_offset(dev);
 }
diff --git a/arch/mips/pci/pci-xtalk-bridge.c b/arch/mips/pci/pci-xtalk-bridge.c
index f1b37f32b55395..50f7d42cca5a78 100644
--- a/arch/mips/pci/pci-xtalk-bridge.c
+++ b/arch/mips/pci/pci-xtalk-bridge.c
@@ -25,7 +25,7 @@
 /*
  * Common phys<->dma mapping for platforms using pci xtalk bridge
  */
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct bridge_controller *bc = BRIDGE_CONTROLLER(pdev->bus);
diff --git a/arch/mips/sgi-ip32/ip32-dma.c b/arch/mips/sgi-ip32/ip32-dma.c
index 160317294d97a9..20c6da9d76bc5e 100644
--- a/arch/mips/sgi-ip32/ip32-dma.c
+++ b/arch/mips/sgi-ip32/ip32-dma.c
@@ -18,7 +18,7 @@
 
 #define RAM_OFFSET_MASK 0x3fffffffUL
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 	dma_addr_t dma_addr = paddr & RAM_OFFSET_MASK;
 
diff --git a/arch/powerpc/include/asm/dma-direct.h b/arch/powerpc/include/asm/dma-direct.h
index 95b09313d2a4cf..128304cbee1d87 100644
--- a/arch/powerpc/include/asm/dma-direct.h
+++ b/arch/powerpc/include/asm/dma-direct.h
@@ -2,7 +2,7 @@
 #ifndef ASM_POWERPC_DMA_DIRECT_H
 #define ASM_POWERPC_DMA_DIRECT_H 1
 
-static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 	return paddr + dev->archdata.dma_offset;
 }
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index e9864e52b0e96a..99aa80456b7145 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3736,7 +3736,7 @@ bounce_map_single(struct device *dev, phys_addr_t paddr, size_t size,
 	 */
 	if (!IS_ALIGNED(paddr | size, VTD_PAGE_SIZE)) {
 		tlb_addr = swiotlb_tbl_map_single(dev,
-				__phys_to_dma(dev, io_tlb_start),
+				phys_to_dma_unencrypted(dev, io_tlb_start),
 				paddr, size, aligned_size, dir, attrs);
 		if (tlb_addr == DMA_MAPPING_ERROR) {
 			goto swiotlb_error;
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 811582a39e291f..3797ecccc15466 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -16,14 +16,29 @@ extern unsigned int zone_dma_bits;
 
 #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
 #include <asm/dma-direct.h>
+#ifndef phys_to_dma_unencrypted
+#define phys_to_dma_unencrypted		phys_to_dma
+#endif
 #else
-static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+static inline dma_addr_t phys_to_dma_unencrypted(struct device *dev,
+		phys_addr_t paddr)
 {
 	dma_addr_t dev_addr = (dma_addr_t)paddr;
 
 	return dev_addr - ((dma_addr_t)dev->dma_pfn_offset << PAGE_SHIFT);
 }
 
+/*
+ * If memory encryption is supported, phys_to_dma will set the memory encryption
+ * bit in the DMA address, and dma_to_phys will clear it.
+ * phys_to_dma_unencrypted is for use on special unencrypted memory like swiotlb
+ * buffers.
+ */
+static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
+{
+	return __sme_set(phys_to_dma_unencrypted(dev, paddr));
+}
+
 static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dev_addr)
 {
 	phys_addr_t paddr = (phys_addr_t)dev_addr +
@@ -47,17 +62,6 @@ static inline bool force_dma_unencrypted(struct device *dev)
 }
 #endif /* CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED */
 
-/*
- * If memory encryption is supported, phys_to_dma will set the memory encryption
- * bit in the DMA address, and dma_to_phys will clear it.  The raw __phys_to_dma
- * version should only be used on non-encrypted memory for
- * special occasions like DMA coherent buffers.
- */
-static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
-{
-	return __sme_set(__phys_to_dma(dev, paddr));
-}
-
 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size,
 		bool is_ram)
 {
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index a97835983a34f7..e7963e51660792 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -25,7 +25,7 @@ static inline dma_addr_t phys_to_dma_direct(struct device *dev,
 		phys_addr_t phys)
 {
 	if (force_dma_unencrypted(dev))
-		return __phys_to_dma(dev, phys);
+		return phys_to_dma_unencrypted(dev, phys);
 	return phys_to_dma(dev, phys);
 }
 
@@ -433,13 +433,13 @@ int dma_direct_supported(struct device *dev, u64 mask)
 		return 1;
 
 	/*
-	 * This check needs to be against the actual bit mask value, so
-	 * use __phys_to_dma() here so that the SME encryption mask isn't
+	 * This check needs to be against the actual bit mask value, so use
+	 * phys_to_dma_unencrypted() here so that the SME encryption mask isn't
 	 * part of the check.
 	 */
 	if (IS_ENABLED(CONFIG_ZONE_DMA))
 		min_mask = min_t(u64, min_mask, DMA_BIT_MASK(zone_dma_bits));
-	return mask >= __phys_to_dma(dev, min_mask);
+	return mask >= phys_to_dma_unencrypted(dev, min_mask);
 }
 
 size_t dma_direct_max_mapping_size(struct device *dev)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index c19379fabd200e..4ea72d145cd27d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -670,13 +670,13 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
 			      swiotlb_force);
 
 	swiotlb_addr = swiotlb_tbl_map_single(dev,
-			__phys_to_dma(dev, io_tlb_start),
+			phys_to_dma_unencrypted(dev, io_tlb_start),
 			paddr, size, size, dir, attrs);
 	if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
 		return DMA_MAPPING_ERROR;
 
 	/* Ensure that the address returned is DMA'ble */
-	dma_addr = __phys_to_dma(dev, swiotlb_addr);
+	dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);
 	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
 		swiotlb_tbl_unmap_single(dev, swiotlb_addr, size, size, dir,
 			attrs | DMA_ATTR_SKIP_CPU_SYNC);
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 17/28] dma-mapping: move dma_common_{mmap, get_sgtable} out of mapping.c
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (15 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 16/28] dma-direct: rename and cleanup __phys_to_dma Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 18/28] dma-mapping: move the dma_declare_coherent_memory documentation Christoph Hellwig
                     ` (12 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Add a new file that contains helpera for misc DMA ops, which is only
built when CONFIG_DMA_OPS is set.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/Makefile      |  1 +
 kernel/dma/mapping.c     | 47 +-----------------------------------
 kernel/dma/ops_helpers.c | 51 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 53 insertions(+), 46 deletions(-)
 create mode 100644 kernel/dma/ops_helpers.c

diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile
index 32c7c1942bbd6c..dc755ab68aabf9 100644
--- a/kernel/dma/Makefile
+++ b/kernel/dma/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_HAS_DMA)			+= mapping.o direct.o
+obj-$(CONFIG_DMA_OPS)			+= ops_helpers.o
 obj-$(CONFIG_DMA_OPS)			+= dummy.o
 obj-$(CONFIG_DMA_CMA)			+= contiguous.o
 obj-$(CONFIG_DMA_DECLARE_COHERENT)	+= coherent.o
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 0d129421e75fc8..848c95c27d79ff 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -8,7 +8,7 @@
 #include <linux/memblock.h> /* for max_pfn */
 #include <linux/acpi.h>
 #include <linux/dma-direct.h>
-#include <linux/dma-noncoherent.h>
+#include <linux/dma-mapping.h>
 #include <linux/export.h>
 #include <linux/gfp.h>
 #include <linux/of_device.h>
@@ -295,22 +295,6 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 }
 EXPORT_SYMBOL(dma_sync_sg_for_device);
 
-/*
- * Create scatter-list for the already allocated DMA buffer.
- */
-int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt,
-		 void *cpu_addr, dma_addr_t dma_addr, size_t size,
-		 unsigned long attrs)
-{
-	struct page *page = virt_to_page(cpu_addr);
-	int ret;
-
-	ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
-	if (!ret)
-		sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
-	return ret;
-}
-
 /*
  * The whole dma_get_sgtable() idea is fundamentally unsafe - it seems
  * that the intention is to allow exporting memory allocated via the
@@ -358,35 +342,6 @@ pgprot_t dma_pgprot(struct device *dev, pgprot_t prot, unsigned long attrs)
 }
 #endif /* CONFIG_MMU */
 
-/*
- * Create userspace mapping for the DMA-coherent memory.
- */
-int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
-		void *cpu_addr, dma_addr_t dma_addr, size_t size,
-		unsigned long attrs)
-{
-#ifdef CONFIG_MMU
-	unsigned long user_count = vma_pages(vma);
-	unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	unsigned long off = vma->vm_pgoff;
-	int ret = -ENXIO;
-
-	vma->vm_page_prot = dma_pgprot(dev, vma->vm_page_prot, attrs);
-
-	if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, &ret))
-		return ret;
-
-	if (off >= count || user_count > count - off)
-		return -ENXIO;
-
-	return remap_pfn_range(vma, vma->vm_start,
-			page_to_pfn(virt_to_page(cpu_addr)) + vma->vm_pgoff,
-			user_count << PAGE_SHIFT, vma->vm_page_prot);
-#else
-	return -ENXIO;
-#endif /* CONFIG_MMU */
-}
-
 /**
  * dma_can_mmap - check if a given device supports dma_mmap_*
  * @dev: device to check
diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
new file mode 100644
index 00000000000000..e443c69be4299f
--- /dev/null
+++ b/kernel/dma/ops_helpers.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Helpers for DMA ops implementations.  These generally rely on the fact that
+ * the allocated memory contains normal pages in the direct kernel mapping.
+ */
+#include <linux/dma-noncoherent.h>
+
+/*
+ * Create scatter-list for the already allocated DMA buffer.
+ */
+int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt,
+		 void *cpu_addr, dma_addr_t dma_addr, size_t size,
+		 unsigned long attrs)
+{
+	struct page *page = virt_to_page(cpu_addr);
+	int ret;
+
+	ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
+	if (!ret)
+		sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
+	return ret;
+}
+
+/*
+ * Create userspace mapping for the DMA-coherent memory.
+ */
+int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
+		void *cpu_addr, dma_addr_t dma_addr, size_t size,
+		unsigned long attrs)
+{
+#ifdef CONFIG_MMU
+	unsigned long user_count = vma_pages(vma);
+	unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	unsigned long off = vma->vm_pgoff;
+	int ret = -ENXIO;
+
+	vma->vm_page_prot = dma_pgprot(dev, vma->vm_page_prot, attrs);
+
+	if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, &ret))
+		return ret;
+
+	if (off >= count || user_count > count - off)
+		return -ENXIO;
+
+	return remap_pfn_range(vma, vma->vm_start,
+			page_to_pfn(virt_to_page(cpu_addr)) + vma->vm_pgoff,
+			user_count << PAGE_SHIFT, vma->vm_page_prot);
+#else
+	return -ENXIO;
+#endif /* CONFIG_MMU */
+}
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 18/28] dma-mapping: move the dma_declare_coherent_memory documentation
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (16 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 17/28] dma-mapping: move dma_common_{mmap, get_sgtable} out of mapping.c Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 19/28] dma-mapping: replace DMA_ATTR_NON_CONSISTENT with dma_{alloc, free}_pages Christoph Hellwig
                     ` (11 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

dma_declare_coherent_memory should not be in a DMA API guide aimed
at driver writers (that is consumers of the API).  Move it to a comment
near the function instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/core-api/dma-api.rst | 24 ------------------------
 kernel/dma/coherent.c              | 17 +++++++++++++++++
 2 files changed, 17 insertions(+), 24 deletions(-)

diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst
index 3b3abbbb4b9a6f..90239348b30f6f 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -586,30 +586,6 @@ the DMA_ATTR_NON_CONSISTENT flag starting at virtual address vaddr and
 continuing on for size.  Again, you *must* observe the cache line
 boundaries when doing this.
 
-::
-
-	int
-	dma_declare_coherent_memory(struct device *dev, phys_addr_t phys_addr,
-				    dma_addr_t device_addr, size_t size);
-
-Declare region of memory to be handed out by dma_alloc_coherent() when
-it's asked for coherent memory for this device.
-
-phys_addr is the CPU physical address to which the memory is currently
-assigned (this will be ioremapped so the CPU can access the region).
-
-device_addr is the DMA address the device needs to be programmed
-with to actually address this memory (this will be handed out as the
-dma_addr_t in dma_alloc_coherent()).
-
-size is the size of the area (must be multiples of PAGE_SIZE).
-
-As a simplification for the platforms, only *one* such region of
-memory may be declared per device.
-
-For reasons of efficiency, most platforms choose to track the declared
-region only at the granularity of a page.  For smaller allocations,
-you should use the dma_pool() API.
 
 Part III - Debug drivers use of the DMA-API
 -------------------------------------------
diff --git a/kernel/dma/coherent.c b/kernel/dma/coherent.c
index 2a0c4985f38e41..f85d14bbfcbe03 100644
--- a/kernel/dma/coherent.c
+++ b/kernel/dma/coherent.c
@@ -107,6 +107,23 @@ static int dma_assign_coherent_memory(struct device *dev,
 	return 0;
 }
 
+/*
+ * Declare a region of memory to be handed out by dma_alloc_coherent() when it
+ * is asked for coherent memory for this device.  This shall only be used
+ * from platform code, usually based on the device tree description.
+ * 
+ * phys_addr is the CPU physical address to which the memory is currently
+ * assigned (this will be ioremapped so the CPU can access the region).
+ *
+ * device_addr is the DMA address the device needs to be programmed with to
+ * actually address this memory (this will be handed out as the dma_addr_t in
+ * dma_alloc_coherent()).
+ *
+ * size is the size of the area (must be a multiple of PAGE_SIZE).
+ *
+ * As a simplification for the platforms, only *one* such region of memory may
+ * be declared per device.
+ */
 int dma_declare_coherent_memory(struct device *dev, phys_addr_t phys_addr,
 				dma_addr_t device_addr, size_t size)
 {
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 19/28] dma-mapping: replace DMA_ATTR_NON_CONSISTENT with dma_{alloc, free}_pages
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (17 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 18/28] dma-mapping: move the dma_declare_coherent_memory documentation Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19 15:03     ` Tomasz Figa
  2020-08-19  6:55   ` [PATCH 20/28] sgiwd93: convert from dma_cache_sync to dma_sync_single_for_device Christoph Hellwig
                     ` (10 subsequent siblings)
  29 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Add a new API to allocate and free pages that are guaranteed to be
addressable by a device, but otherwise behave like pages allocated by
alloc_pages.  The intended APIs to sync them for use with the device
and cpu are dma_sync_single_for_{device,cpu} that are also used for
streaming mappings.

Switch all drivers over to this new API, but keep the usage of the
crufty dma_cache_sync API for now, which will be cleaned up on a driver
by driver basis.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/core-api/dma-api.rst        | 68 +++++++++++------------
 Documentation/core-api/dma-attributes.rst |  8 ---
 arch/alpha/kernel/pci_iommu.c             |  2 +
 arch/arm/mm/dma-mapping-nommu.c           |  2 +
 arch/arm/mm/dma-mapping.c                 |  4 ++
 arch/ia64/hp/common/sba_iommu.c           |  2 +
 arch/mips/jazz/jazzdma.c                  |  7 +--
 arch/powerpc/kernel/dma-iommu.c           |  2 +
 arch/powerpc/platforms/ps3/system-bus.c   |  4 ++
 arch/powerpc/platforms/pseries/vio.c      |  2 +
 arch/s390/pci/pci_dma.c                   |  2 +
 arch/x86/kernel/amd_gart_64.c             |  2 +
 drivers/iommu/dma-iommu.c                 |  2 +
 drivers/iommu/intel/iommu.c               |  4 ++
 drivers/net/ethernet/i825xx/lasi_82596.c  | 13 ++---
 drivers/net/ethernet/seeq/sgiseeq.c       | 12 ++--
 drivers/parisc/ccio-dma.c                 |  2 +
 drivers/parisc/sba_iommu.c                |  2 +
 drivers/scsi/53c700.c                     |  8 +--
 drivers/scsi/sgiwd93.c                    | 12 ++--
 drivers/xen/swiotlb-xen.c                 |  2 +
 include/linux/dma-direct.h                |  5 ++
 include/linux/dma-mapping.h               | 29 ++++++++--
 include/linux/dma-noncoherent.h           |  3 -
 kernel/dma/direct.c                       | 51 ++++++++++++++++-
 kernel/dma/mapping.c                      | 43 +++++++++++++-
 kernel/dma/ops_helpers.c                  | 35 ++++++++++++
 kernel/dma/virt.c                         |  2 +
 sound/mips/hal2.c                         | 20 +++----
 29 files changed, 254 insertions(+), 96 deletions(-)

diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst
index 90239348b30f6f..047fcfffa0e5cf 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -516,48 +516,53 @@ routines, e.g.:::
 	}
 
 
-Part II - Advanced dma usage
-----------------------------
+Part II - Non-coherent DMA allocations
+--------------------------------------
 
-Warning: These pieces of the DMA API should not be used in the
-majority of cases, since they cater for unlikely corner cases that
-don't belong in usual drivers.
+These APIs allow to allocate pages that can be used like normal pages
+in the kernel direct mapping, but are guaranteed to be DMA addressable.
 
 If you don't understand how cache line coherency works between a
 processor and an I/O device, you should not be using this part of the
-API at all.
+API.
 
 ::
 
 	void *
-	dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
-			gfp_t flag, unsigned long attrs)
+	dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle,
+			enum dma_data_direction dir, gfp_t gfp)
+
+This routine allocates a region of <size> bytes of consistent memory.  It
+returns a pointer to the allocated region (in the processor's virtual address
+space) or NULL if the allocation failed. The returned memory is guanteed to
+behave like memory allocated using alloc_pages.
+
+It also returns a <dma_handle> which may be cast to an unsigned integer the
+same width as the bus and given to the device as the DMA address base of
+the region.
 
-Identical to dma_alloc_coherent() except that when the
-DMA_ATTR_NON_CONSISTENT flags is passed in the attrs argument, the
-platform will choose to return either consistent or non-consistent memory
-as it sees fit.  By using this API, you are guaranteeing to the platform
-that you have all the correct and necessary sync points for this memory
-in the driver should it choose to return non-consistent memory.
+The dir parameter specified if data is read and/or written by the device,
+see dma_map_single() for details.
 
-Note: where the platform can return consistent memory, it will
-guarantee that the sync points become nops.
+The gfp parameter allows the caller to specify the ``GFP_`` flags (see
+kmalloc()) for the allocation, but rejects flags used to specify a memory
+zone such as GFP_DMA or GFP_HIGHMEM.
 
-Warning:  Handling non-consistent memory is a real pain.  You should
-only use this API if you positively know your driver will be
-required to work on one of the rare (usually non-PCI) architectures
-that simply cannot make consistent memory.
+Before giving the memory to the device, dma_sync_single_for_device() needs
+to be called, and before reading memory written by the device,
+dma_sync_single_for_cpu(), just like for streaming DMA mappings that are
+reused.
 
 ::
 
 	void
-	dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
-		       dma_addr_t dma_handle, unsigned long attrs)
+	dma_free_pages(struct device *dev, size_t size, void *cpu_addr,
+			dma_addr_t dma_handle, enum dma_data_direction dir)
 
-Free memory allocated by the dma_alloc_attrs().  All common
-parameters must be identical to those otherwise passed to dma_free_coherent,
-and the attrs argument must be identical to the attrs passed to
-dma_alloc_attrs().
+Free a region of memory previously allocated using dma_alloc_pages().  dev,
+size and dma_handle and dir must all be the same as those passed into
+dma_alloc_pages().  cpu_addr must be the virtual address returned by
+the dma_alloc_pages().
 
 ::
 
@@ -575,17 +580,6 @@ memory or doing partial flushes.
 	into the width returned by this call.  It will also always be a power
 	of two for easy alignment.
 
-::
-
-	void
-	dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-		       enum dma_data_direction direction)
-
-Do a partial sync of memory that was allocated by dma_alloc_attrs() with
-the DMA_ATTR_NON_CONSISTENT flag starting at virtual address vaddr and
-continuing on for size.  Again, you *must* observe the cache line
-boundaries when doing this.
-
 
 Part III - Debug drivers use of the DMA-API
 -------------------------------------------
diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 29dcbe8826e85e..1887d92e8e9269 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -25,14 +25,6 @@ Since it is optional for platforms to implement DMA_ATTR_WRITE_COMBINE,
 those that do not will simply ignore the attribute and exhibit default
 behavior.
 
-DMA_ATTR_NON_CONSISTENT
------------------------
-
-DMA_ATTR_NON_CONSISTENT lets the platform to choose to return either
-consistent or non-consistent memory as it sees fit.  By using this API,
-you are guaranteeing to the platform that you have all the correct and
-necessary sync points for this memory in the driver.
-
 DMA_ATTR_NO_KERNEL_MAPPING
 --------------------------
 
diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index 81037907268d5c..291121e3b5a583 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -957,5 +957,7 @@ const struct dma_map_ops alpha_pci_ops = {
 	.dma_supported		= alpha_pci_supported,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
+	.alloc_pages		= dma_common_alloc_pages,
+	.free_pages		= dma_common_free_pages,
 };
 EXPORT_SYMBOL(alpha_pci_ops);
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index 287ef898a55e11..43c6d66b6e733a 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -176,6 +176,8 @@ static void arm_nommu_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist
 const struct dma_map_ops arm_nommu_dma_ops = {
 	.alloc			= arm_nommu_dma_alloc,
 	.free			= arm_nommu_dma_free,
+	.alloc_pages		= dma_direct_alloc_pages,
+	.free_pages		= dma_direct_free_pages,
 	.mmap			= arm_nommu_dma_mmap,
 	.map_page		= arm_nommu_dma_map_page,
 	.unmap_page		= arm_nommu_dma_unmap_page,
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 8a8949174b1c06..7738b4d23f692f 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -199,6 +199,8 @@ static int arm_dma_supported(struct device *dev, u64 mask)
 const struct dma_map_ops arm_dma_ops = {
 	.alloc			= arm_dma_alloc,
 	.free			= arm_dma_free,
+	.alloc_pages		= dma_direct_alloc_pages,
+	.free_pages		= dma_direct_free_pages,
 	.mmap			= arm_dma_mmap,
 	.get_sgtable		= arm_dma_get_sgtable,
 	.map_page		= arm_dma_map_page,
@@ -226,6 +228,8 @@ static int arm_coherent_dma_mmap(struct device *dev, struct vm_area_struct *vma,
 const struct dma_map_ops arm_coherent_dma_ops = {
 	.alloc			= arm_coherent_dma_alloc,
 	.free			= arm_coherent_dma_free,
+	.alloc_pages		= dma_direct_alloc_pages,
+	.free_pages		= dma_direct_free_pages,
 	.mmap			= arm_coherent_dma_mmap,
 	.get_sgtable		= arm_dma_get_sgtable,
 	.map_page		= arm_coherent_dma_map_page,
diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 656a4888c300b5..a51adc11a0b54b 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -2071,6 +2071,8 @@ static const struct dma_map_ops sba_dma_ops = {
 	.dma_supported		= sba_dma_supported,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
+	.alloc_pages		= dma_common_alloc_pages,
+	.free_pages		= dma_common_free_pages,
 };
 
 static int __init
diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c
index d0b5a2ba2b1a8a..0f9a9cb7fe7a95 100644
--- a/arch/mips/jazz/jazzdma.c
+++ b/arch/mips/jazz/jazzdma.c
@@ -505,9 +505,6 @@ static void *jazz_dma_alloc(struct device *dev, size_t size,
 	*dma_handle = vdma_alloc(virt_to_phys(ret), size);
 	if (*dma_handle == DMA_MAPPING_ERROR)
 		goto out_free_pages;
-
-	if (attrs & DMA_ATTR_NON_CONSISTENT)
-		return ret;
 	arch_dma_prep_coherent(page, size);
 	return (void *)(UNCAC_BASE + __pa(ret));
 
@@ -520,8 +517,6 @@ static void jazz_dma_free(struct device *dev, size_t size, void *vaddr,
 		dma_addr_t dma_handle, unsigned long attrs)
 {
 	vdma_free(dma_handle);
-	if (!(attrs & DMA_ATTR_NON_CONSISTENT))
-		vaddr = __va(vaddr - UNCAC_BASE);
 	__free_pages(virt_to_page(vaddr), get_order(size));
 }
 
@@ -622,5 +617,7 @@ const struct dma_map_ops jazz_dma_ops = {
 	.cache_sync		= arch_dma_cache_sync,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
+	.alloc_pages		= dma_common_alloc_pages,
+	.free_pages		= dma_common_free_pages,
 };
 EXPORT_SYMBOL(jazz_dma_ops);
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 569fecd7b5b234..d4e702d74b3393 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -137,4 +137,6 @@ const struct dma_map_ops dma_iommu_ops = {
 	.get_required_mask	= dma_iommu_get_required_mask,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
+	.alloc_pages		= dma_common_alloc_pages,
+	.free_pages		= dma_common_free_pages,
 };
diff --git a/arch/powerpc/platforms/ps3/system-bus.c b/arch/powerpc/platforms/ps3/system-bus.c
index 3542b7bd6a4689..7bc5f9be3e12d8 100644
--- a/arch/powerpc/platforms/ps3/system-bus.c
+++ b/arch/powerpc/platforms/ps3/system-bus.c
@@ -696,6 +696,8 @@ static const struct dma_map_ops ps3_sb_dma_ops = {
 	.unmap_page = ps3_unmap_page,
 	.mmap = dma_common_mmap,
 	.get_sgtable = dma_common_get_sgtable,
+	.alloc_pages = dma_common_alloc_pages,
+	.free_pages = dma_common_free_pages,
 };
 
 static const struct dma_map_ops ps3_ioc0_dma_ops = {
@@ -708,6 +710,8 @@ static const struct dma_map_ops ps3_ioc0_dma_ops = {
 	.unmap_page = ps3_unmap_page,
 	.mmap = dma_common_mmap,
 	.get_sgtable = dma_common_get_sgtable,
+	.alloc_pages = dma_common_alloc_pages,
+	.free_pages = dma_common_free_pages,
 };
 
 /**
diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c
index 0487b26f6f1af3..98ed7b09b3fe50 100644
--- a/arch/powerpc/platforms/pseries/vio.c
+++ b/arch/powerpc/platforms/pseries/vio.c
@@ -608,6 +608,8 @@ static const struct dma_map_ops vio_dma_mapping_ops = {
 	.get_required_mask = dma_iommu_get_required_mask,
 	.mmap		   = dma_common_mmap,
 	.get_sgtable	   = dma_common_get_sgtable,
+	.alloc_pages	   = dma_common_alloc_pages,
+	.free_pages	   = dma_common_free_pages,
 };
 
 /**
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 64b1399a73f04d..44004f790bdc44 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -670,6 +670,8 @@ const struct dma_map_ops s390_pci_dma_ops = {
 	.unmap_page	= s390_dma_unmap_pages,
 	.mmap		= dma_common_mmap,
 	.get_sgtable	= dma_common_get_sgtable,
+	.alloc_pages	= dma_common_alloc_pages,
+	.free_pages	= dma_common_free_pages,
 	/* dma_supported is unconditionally true without a callback */
 };
 EXPORT_SYMBOL_GPL(s390_pci_dma_ops);
diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
index adbf616d35d15d..0310e6569350da 100644
--- a/arch/x86/kernel/amd_gart_64.c
+++ b/arch/x86/kernel/amd_gart_64.c
@@ -678,6 +678,8 @@ static const struct dma_map_ops gart_dma_ops = {
 	.get_sgtable			= dma_common_get_sgtable,
 	.dma_supported			= dma_direct_supported,
 	.get_required_mask		= dma_direct_get_required_mask,
+	.alloc_pages			= dma_direct_alloc_pages,
+	.free_pages			= dma_direct_free_pages,
 };
 
 static void gart_iommu_shutdown(void)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4959f5df21bd07..3da06df0f327c2 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1120,6 +1120,8 @@ static unsigned long iommu_dma_get_merge_boundary(struct device *dev)
 static const struct dma_map_ops iommu_dma_ops = {
 	.alloc			= iommu_dma_alloc,
 	.free			= iommu_dma_free,
+	.alloc_pages		= dma_common_alloc_pages,
+	.free_pages		= dma_common_free_pages,
 	.mmap			= iommu_dma_mmap,
 	.get_sgtable		= iommu_dma_get_sgtable,
 	.map_page		= iommu_dma_map_page,
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 99aa80456b7145..41fb349c1fd76b 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3669,6 +3669,8 @@ static const struct dma_map_ops intel_dma_ops = {
 	.dma_supported = dma_direct_supported,
 	.mmap = dma_common_mmap,
 	.get_sgtable = dma_common_get_sgtable,
+	.alloc_pages = dma_common_alloc_pages,
+	.free_pages = dma_common_free_pages,
 	.get_required_mask = intel_get_required_mask,
 };
 
@@ -3922,6 +3924,8 @@ static const struct dma_map_ops bounce_dma_ops = {
 	.sync_sg_for_device	= bounce_sync_sg_for_device,
 	.map_resource		= bounce_map_resource,
 	.unmap_resource		= bounce_unmap_resource,
+	.alloc_pages		= dma_common_alloc_pages,
+	.free_pages		= dma_common_free_pages,
 	.dma_supported		= dma_direct_supported,
 };
 
diff --git a/drivers/net/ethernet/i825xx/lasi_82596.c b/drivers/net/ethernet/i825xx/lasi_82596.c
index 8c5ab9b7553e75..0c493b7237a910 100644
--- a/drivers/net/ethernet/i825xx/lasi_82596.c
+++ b/drivers/net/ethernet/i825xx/lasi_82596.c
@@ -184,9 +184,8 @@ lan_init_chip(struct parisc_device *dev)
 
 	lp = netdev_priv(netdevice);
 	lp->options = dev->id.sversion == 0x72 ? OPT_SWAP_PORT : 0;
-	lp->dma = dma_alloc_attrs(dev->dev.parent, sizeof(struct i596_dma),
-			      &lp->dma_addr, GFP_KERNEL,
-			      DMA_ATTR_NON_CONSISTENT);
+	lp->dma = dma_alloc_pages(dev->dev.parent, sizeof(struct i596_dma),
+				  &lp->dma_addr, DMA_BIDIRECTIONAL, GFP_KERNEL);
 	if (!lp->dma)
 		goto out_free_netdev;
 
@@ -196,8 +195,8 @@ lan_init_chip(struct parisc_device *dev)
 	return 0;
 
 out_free_dma:
-	dma_free_attrs(dev->dev.parent, sizeof(struct i596_dma),
-		       lp->dma, lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
+	dma_free_pages(dev->dev.parent, sizeof(struct i596_dma),
+		       lp->dma, lp->dma_addr, DMA_BIDIRECTIONAL);
 out_free_netdev:
 	free_netdev(netdevice);
 	return retval;
@@ -209,8 +208,8 @@ static int __exit lan_remove_chip(struct parisc_device *pdev)
 	struct i596_private *lp = netdev_priv(dev);
 
 	unregister_netdev (dev);
-	dma_free_attrs(&pdev->dev, sizeof(struct i596_private), lp->dma,
-		       lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
+	dma_free_pages(&pdev->dev, sizeof(struct i596_private), lp->dma,
+		       lp->dma_addr, DMA_BIDIRECTIONAL);
 	free_netdev (dev);
 	return 0;
 }
diff --git a/drivers/net/ethernet/seeq/sgiseeq.c b/drivers/net/ethernet/seeq/sgiseeq.c
index 8507ff2420143a..39599bbb5d45b6 100644
--- a/drivers/net/ethernet/seeq/sgiseeq.c
+++ b/drivers/net/ethernet/seeq/sgiseeq.c
@@ -740,8 +740,8 @@ static int sgiseeq_probe(struct platform_device *pdev)
 	sp = netdev_priv(dev);
 
 	/* Make private data page aligned */
-	sr = dma_alloc_attrs(&pdev->dev, sizeof(*sp->srings), &sp->srings_dma,
-			     GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+	sr = dma_alloc_pages(&pdev->dev, sizeof(*sp->srings), &sp->srings_dma,
+			     DMA_BIDIRECTIONAL, GFP_KERNEL);
 	if (!sr) {
 		printk(KERN_ERR "Sgiseeq: Page alloc failed, aborting.\n");
 		err = -ENOMEM;
@@ -802,8 +802,8 @@ static int sgiseeq_probe(struct platform_device *pdev)
 	return 0;
 
 err_out_free_attrs:
-	dma_free_attrs(&pdev->dev, sizeof(*sp->srings), sp->srings,
-		       sp->srings_dma, DMA_ATTR_NON_CONSISTENT);
+	dma_free_pages(&pdev->dev, sizeof(*sp->srings), sp->srings,
+		       sp->srings_dma, DMA_BIDIRECTIONAL);
 err_out_free_dev:
 	free_netdev(dev);
 
@@ -817,8 +817,8 @@ static int sgiseeq_remove(struct platform_device *pdev)
 	struct sgiseeq_private *sp = netdev_priv(dev);
 
 	unregister_netdev(dev);
-	dma_free_attrs(&pdev->dev, sizeof(*sp->srings), sp->srings,
-		       sp->srings_dma, DMA_ATTR_NON_CONSISTENT);
+	dma_free_pages(&pdev->dev, sizeof(*sp->srings), sp->srings,
+		       sp->srings_dma, DMA_BIDIRECTIONAL);
 	free_netdev(dev);
 
 	return 0;
diff --git a/drivers/parisc/ccio-dma.c b/drivers/parisc/ccio-dma.c
index a5507f75b524c4..ae557e740781e3 100644
--- a/drivers/parisc/ccio-dma.c
+++ b/drivers/parisc/ccio-dma.c
@@ -1025,6 +1025,8 @@ static const struct dma_map_ops ccio_ops = {
 	.map_sg = 		ccio_map_sg,
 	.unmap_sg = 		ccio_unmap_sg,
 	.get_sgtable =		dma_common_get_sgtable,
+	.alloc_pages =		dma_common_alloc_pages,
+	.free_pages =		dma_common_free_pages,
 };
 
 #ifdef CONFIG_PROC_FS
diff --git a/drivers/parisc/sba_iommu.c b/drivers/parisc/sba_iommu.c
index d4314fba026914..d3514c5761a944 100644
--- a/drivers/parisc/sba_iommu.c
+++ b/drivers/parisc/sba_iommu.c
@@ -1077,6 +1077,8 @@ static const struct dma_map_ops sba_ops = {
 	.map_sg =		sba_map_sg,
 	.unmap_sg =		sba_unmap_sg,
 	.get_sgtable =		dma_common_get_sgtable,
+	.alloc_pages =		dma_common_alloc_pages,
+	.free_pages =		dma_common_free_pages,
 };
 
 
diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
index b197ed9399e2e0..521950d0731e4a 100644
--- a/drivers/scsi/53c700.c
+++ b/drivers/scsi/53c700.c
@@ -300,8 +300,8 @@ NCR_700_detect(struct scsi_host_template *tpnt,
 	memory = dma_alloc_coherent(dev, TOTAL_MEM_SIZE, &pScript, GFP_KERNEL);
 	if (!memory) {
 		hostdata->noncoherent = 1;
-		memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, &pScript,
-					 GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+		memory = dma_alloc_pages(dev, TOTAL_MEM_SIZE, &pScript,
+					 GFP_KERNEL, DMA_BIDIRECTIONAL);
 	}
 	if (!memory) {
 		printk(KERN_ERR "53c700: Failed to allocate memory for driver, detaching\n");
@@ -414,8 +414,8 @@ NCR_700_release(struct Scsi_Host *host)
 		(struct NCR_700_Host_Parameters *)host->hostdata[0];
 
 	if (hostdata->noncoherent)
-		dma_free_attrs(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script,
-			       hostdata->pScript, DMA_ATTR_NON_CONSISTENT);
+		dma_free_pages(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script,
+			       hostdata->pScript, DMA_BIDIRECTIONAL);
 	else
 		dma_free_coherent(hostdata->dev, TOTAL_MEM_SIZE,
 				  hostdata->script, hostdata->pScript);
diff --git a/drivers/scsi/sgiwd93.c b/drivers/scsi/sgiwd93.c
index 3bdf0deb8f1529..133adcf99e9340 100644
--- a/drivers/scsi/sgiwd93.c
+++ b/drivers/scsi/sgiwd93.c
@@ -234,8 +234,8 @@ static int sgiwd93_probe(struct platform_device *pdev)
 
 	hdata = host_to_hostdata(host);
 	hdata->dev = &pdev->dev;
-	hdata->cpu = dma_alloc_attrs(&pdev->dev, HPC_DMA_SIZE, &hdata->dma,
-				     GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+	hdata->cpu = dma_alloc_pages(&pdev->dev, HPC_DMA_SIZE, &hdata->dma,
+				     DMA_BIDIRECTIONAL, GFP_KERNEL);
 	if (!hdata->cpu) {
 		printk(KERN_WARNING "sgiwd93: Could not allocate memory for "
 		       "host %d buffer.\n", unit);
@@ -274,8 +274,8 @@ static int sgiwd93_probe(struct platform_device *pdev)
 out_irq:
 	free_irq(irq, host);
 out_free:
-	dma_free_attrs(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
-		       DMA_ATTR_NON_CONSISTENT);
+	dma_free_pages(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
+			DMA_BIDIRECTIONAL);
 out_put:
 	scsi_host_put(host);
 out:
@@ -291,8 +291,8 @@ static int sgiwd93_remove(struct platform_device *pdev)
 
 	scsi_remove_host(host);
 	free_irq(pd->irq, host);
-	dma_free_attrs(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
-		       DMA_ATTR_NON_CONSISTENT);
+	dma_free_pages(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
+			DMA_BIDIRECTIONAL);
 	scsi_host_put(host);
 	return 0;
 }
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 39a0f2e0847c95..030a225624b060 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -578,4 +578,6 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
 	.dma_supported = xen_swiotlb_dma_supported,
 	.mmap = dma_common_mmap,
 	.get_sgtable = dma_common_get_sgtable,
+	.alloc_pages = dma_common_alloc_pages,
+	.free_pages = dma_common_free_pages,
 };
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 3797ecccc15466..7871ce3887cf85 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -85,6 +85,11 @@ void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
 		gfp_t gfp, unsigned long attrs);
 void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
 		dma_addr_t dma_addr, unsigned long attrs);
+void *dma_direct_alloc_pages(struct device *dev, size_t size,
+		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp);
+void dma_direct_free_pages(struct device *dev, size_t size,
+		void *cpu_addr, dma_addr_t dma_addr,
+		enum dma_data_direction dir);
 int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		unsigned long attrs);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 016b96b384bdda..73fa6e10c5c8b5 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -27,11 +27,6 @@
  * buffered to improve performance.
  */
 #define DMA_ATTR_WRITE_COMBINE		(1UL << 2)
-/*
- * DMA_ATTR_NON_CONSISTENT: Lets the platform to choose to return either
- * consistent or non-consistent memory as it sees fit.
- */
-#define DMA_ATTR_NON_CONSISTENT		(1UL << 3)
 /*
  * DMA_ATTR_NO_KERNEL_MAPPING: Lets the platform to avoid creating a kernel
  * virtual mapping for the allocated buffer.
@@ -80,6 +75,11 @@ struct dma_map_ops {
 	void (*free)(struct device *dev, size_t size,
 			      void *vaddr, dma_addr_t dma_handle,
 			      unsigned long attrs);
+	void *(*alloc_pages)(struct device *dev, size_t size,
+			dma_addr_t *dma_handle, enum dma_data_direction dir,
+			gfp_t gfp);
+	void (*free_pages)(struct device *dev, size_t size, void *vaddr,
+			dma_addr_t dma_handle, enum dma_data_direction dir);
 	int (*mmap)(struct device *, struct vm_area_struct *,
 			  void *, dma_addr_t, size_t,
 			  unsigned long attrs);
@@ -254,6 +254,11 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
 		gfp_t gfp, unsigned long attrs);
 void dmam_free_coherent(struct device *dev, size_t size, void *vaddr,
 		dma_addr_t dma_handle);
+void *dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle,
+		enum dma_data_direction dir, gfp_t gfp);
+void dma_free_pages(struct device *dev, size_t size, void *vaddr,
+		dma_addr_t dma_handle, enum dma_data_direction dir);
+/* dma_cache_sync is deprecated: don't use in new code */
 void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 		enum dma_data_direction dir);
 int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
@@ -339,6 +344,15 @@ static inline void dmam_free_coherent(struct device *dev, size_t size,
 		void *vaddr, dma_addr_t dma_handle)
 {
 }
+static inline void *dma_alloc_pages(struct device *dev, size_t size,
+		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
+{
+	return NULL;
+}
+static inline void dma_free_pages(struct device *dev, size_t size, void *vaddr,
+		dma_addr_t dma_handle, enum dma_data_direction dir)
+{
+}
 static inline void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 		enum dma_data_direction dir)
 {
@@ -513,7 +527,10 @@ static inline void dma_sync_sgtable_for_device(struct device *dev,
 extern int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		unsigned long attrs);
-
+void *dma_common_alloc_pages(struct device *dev, size_t size,
+		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp);
+void dma_common_free_pages(struct device *dev, size_t size, void *vaddr,
+		dma_addr_t dma_handle, enum dma_data_direction dir);
 struct page **dma_common_find_pages(void *cpu_addr);
 void *dma_common_contiguous_remap(struct page *page, size_t size,
 			pgprot_t prot, const void *caller);
diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index b9bc6c557ea46f..1eecfd24d434f8 100644
--- a/include/linux/dma-noncoherent.h
+++ b/include/linux/dma-noncoherent.h
@@ -31,9 +31,6 @@ static __always_inline bool dma_alloc_need_uncached(struct device *dev,
 		return false;
 	if (attrs & DMA_ATTR_NO_KERNEL_MAPPING)
 		return false;
-	if (IS_ENABLED(CONFIG_DMA_NONCOHERENT_CACHE_SYNC) &&
-	    (attrs & DMA_ATTR_NON_CONSISTENT))
-		return false;
 	return true;
 }
 
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index e7963e51660792..a0c4be0953b2b5 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- * Copyright (C) 2018 Christoph Hellwig.
+ * Copyright (C) 2018-2020 Christoph Hellwig.
  *
  * DMA operations that map physical memory directly without using an IOMMU.
  */
@@ -282,6 +282,55 @@ void dma_direct_free(struct device *dev, size_t size,
 	dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size);
 }
 
+void *dma_direct_alloc_pages(struct device *dev, size_t size,
+		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
+{
+	struct page *page;
+	void *ret;
+
+	if (dma_should_alloc_from_pool(dev, gfp, 0)) {
+		page = dma_alloc_from_pool(dev, size, &ret, gfp,
+				dma_coherent_ok);
+		if (!page)
+			return NULL;
+		goto done;
+	}
+
+	page = __dma_direct_alloc_pages(dev, size, gfp);
+	if (!page)
+		return NULL;
+	ret = page_address(page);
+	if (force_dma_unencrypted(dev)) {
+		if (set_memory_decrypted((unsigned long)ret,
+				1 << get_order(size)))
+			goto out_free_pages;
+	}
+	memset(ret, 0, size);
+done:
+	*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
+	return ret;
+out_free_pages:
+	dma_free_contiguous(dev, page, size);
+	return NULL;
+}
+
+void dma_direct_free_pages(struct device *dev, size_t size,
+		void *cpu_addr, dma_addr_t dma_addr,
+		enum dma_data_direction dir)
+{
+	unsigned int page_order = get_order(size);
+
+	/* If cpu_addr is not from an atomic pool, dma_free_from_pool() fails */
+	if (dma_should_free_from_pool(dev, 0) &&
+	    dma_free_from_pool(dev, cpu_addr, size))
+		return;
+
+	if (force_dma_unencrypted(dev))
+		set_memory_encrypted((unsigned long)cpu_addr, 1 << page_order);
+
+	dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size);
+}
+
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
     defined(CONFIG_SWIOTLB)
 void dma_direct_sync_sg_for_device(struct device *dev,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 848c95c27d79ff..dacdb7226caacd 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -330,9 +330,7 @@ pgprot_t dma_pgprot(struct device *dev, pgprot_t prot, unsigned long attrs)
 {
 	if (force_dma_unencrypted(dev))
 		prot = pgprot_decrypted(prot);
-	if (dev_is_dma_coherent(dev) ||
-	    (IS_ENABLED(CONFIG_DMA_NONCOHERENT_CACHE_SYNC) &&
-             (attrs & DMA_ATTR_NON_CONSISTENT)))
+	if (dev_is_dma_coherent(dev))
 		return prot;
 #ifdef CONFIG_ARCH_HAS_DMA_WRITE_COMBINE
 	if (attrs & DMA_ATTR_WRITE_COMBINE)
@@ -461,6 +459,45 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
 }
 EXPORT_SYMBOL(dma_free_attrs);
 
+void *dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle,
+		enum dma_data_direction dir, gfp_t gfp)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+	void *vaddr;
+
+	if (WARN_ON_ONCE(!dev->coherent_dma_mask))
+		return NULL;
+	if (WARN_ON_ONCE(gfp & (__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM)))
+		return NULL;
+
+	size = PAGE_ALIGN(size);
+	if (dma_alloc_direct(dev, ops))
+		vaddr = dma_direct_alloc_pages(dev, size, dma_handle, dir, gfp);
+	else if (ops->alloc_pages)
+		vaddr = ops->alloc_pages(dev, size, dma_handle, dir, gfp);
+	else
+		return NULL;
+
+	debug_dma_map_page(dev, virt_to_page(vaddr), 0, size, dir, *dma_handle);
+	return vaddr;
+}
+EXPORT_SYMBOL_GPL(dma_alloc_pages);
+
+void dma_free_pages(struct device *dev, size_t size, void *vaddr,
+		dma_addr_t dma_handle, enum dma_data_direction dir)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+
+	size = PAGE_ALIGN(size);
+	debug_dma_unmap_page(dev, dma_handle, size, dir);
+
+	if (dma_alloc_direct(dev, ops))
+		dma_direct_free_pages(dev, size, vaddr, dma_handle, dir);
+	else if (ops->free_pages)
+		ops->free_pages(dev, size, vaddr, dma_handle, dir);
+}
+EXPORT_SYMBOL_GPL(dma_free_pages);
+
 int dma_supported(struct device *dev, u64 mask)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
index e443c69be4299f..17e66a41dc9c78 100644
--- a/kernel/dma/ops_helpers.c
+++ b/kernel/dma/ops_helpers.c
@@ -3,6 +3,7 @@
  * Helpers for DMA ops implementations.  These generally rely on the fact that
  * the allocated memory contains normal pages in the direct kernel mapping.
  */
+#include <linux/dma-contiguous.h>
 #include <linux/dma-noncoherent.h>
 
 /*
@@ -49,3 +50,37 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
 	return -ENXIO;
 #endif /* CONFIG_MMU */
 }
+
+void *dma_common_alloc_pages(struct device *dev, size_t size,
+		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+	struct page *page;
+
+	page = dma_alloc_contiguous(dev, size, gfp);
+	if (!page)
+		page = alloc_pages_node(dev_to_node(dev), gfp, get_order(size));
+	if (!page)
+		return NULL;
+
+	*dma_handle = ops->map_page(dev, page, 0, size, dir,
+				    DMA_ATTR_SKIP_CPU_SYNC);
+	if (*dma_handle == DMA_MAPPING_ERROR) {
+		dma_free_contiguous(dev, page, size);
+		return NULL;
+	}
+
+	memset(page_address(page), 0, size);
+	return page_address(page);
+}
+
+void dma_common_free_pages(struct device *dev, size_t size, void *vaddr,
+		dma_addr_t dma_handle, enum dma_data_direction dir)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+
+	if (ops->unmap_page)
+		ops->unmap_page(dev, dma_handle, size, dir,
+				DMA_ATTR_SKIP_CPU_SYNC);
+	dma_free_contiguous(dev, virt_to_page(vaddr), size);
+}
diff --git a/kernel/dma/virt.c b/kernel/dma/virt.c
index ebe128833af7b5..6986bf1fd6689c 100644
--- a/kernel/dma/virt.c
+++ b/kernel/dma/virt.c
@@ -55,5 +55,7 @@ const struct dma_map_ops dma_virt_ops = {
 	.free			= dma_virt_free,
 	.map_page		= dma_virt_map_page,
 	.map_sg			= dma_virt_map_sg,
+	.alloc_pages		= dma_common_alloc_pages,
+	.free_pages		= dma_common_free_pages,
 };
 EXPORT_SYMBOL(dma_virt_ops);
diff --git a/sound/mips/hal2.c b/sound/mips/hal2.c
index ec84bc4c3a6e77..746c410bd9bf11 100644
--- a/sound/mips/hal2.c
+++ b/sound/mips/hal2.c
@@ -449,15 +449,15 @@ static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec)
 	int count = H2_BUF_SIZE / H2_BLOCK_SIZE;
 	int i;
 
-	codec->buffer = dma_alloc_attrs(dev, H2_BUF_SIZE, &buffer_dma,
-					GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+	codec->buffer = dma_alloc_pages(dev, H2_BUF_SIZE, &buffer_dma,
+					DMA_BIDIRECTIONAL, GFP_KERNEL);
 	if (!codec->buffer)
 		return -ENOMEM;
-	desc = dma_alloc_attrs(dev, count * sizeof(struct hal2_desc),
-			       &desc_dma, GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+	desc = dma_alloc_pages(dev, count * sizeof(struct hal2_desc), &desc_dma,
+			       DMA_BIDIRECTIONAL, GFP_KERNEL);
 	if (!desc) {
-		dma_free_attrs(dev, H2_BUF_SIZE, codec->buffer, buffer_dma,
-			       DMA_ATTR_NON_CONSISTENT);
+		dma_free_pages(dev, H2_BUF_SIZE, codec->buffer, buffer_dma,
+				DMA_BIDIRECTIONAL);
 		return -ENOMEM;
 	}
 	codec->buffer_dma = buffer_dma;
@@ -480,10 +480,10 @@ static void hal2_free_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec)
 {
 	struct device *dev = hal2->card->dev;
 
-	dma_free_attrs(dev, codec->desc_count * sizeof(struct hal2_desc),
-		       codec->desc, codec->desc_dma, DMA_ATTR_NON_CONSISTENT);
-	dma_free_attrs(dev, H2_BUF_SIZE, codec->buffer, codec->buffer_dma,
-		       DMA_ATTR_NON_CONSISTENT);
+	dma_free_pages(dev, codec->desc_count * sizeof(struct hal2_desc),
+		       codec->desc, codec->desc_dma, DMA_BIDIRECTIONAL);
+	dma_free_pages(dev, H2_BUF_SIZE, codec->buffer, codec->buffer_dma,
+			DMA_BIDIRECTIONAL);
 }
 
 static const struct snd_pcm_hardware hal2_pcm_hw = {
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 20/28] sgiwd93: convert from dma_cache_sync to dma_sync_single_for_device
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (18 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 19/28] dma-mapping: replace DMA_ATTR_NON_CONSISTENT with dma_{alloc, free}_pages Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 21/28] hal2: " Christoph Hellwig
                     ` (9 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Use the proper modern API to transfer cache ownership for incoherent DMA.
This also means we can allocate the memory as DMA_TO_DEVICE instead
of bidirectional.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sgiwd93.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/sgiwd93.c b/drivers/scsi/sgiwd93.c
index 133adcf99e9340..1538f65307f22f 100644
--- a/drivers/scsi/sgiwd93.c
+++ b/drivers/scsi/sgiwd93.c
@@ -95,7 +95,7 @@ void fill_hpc_entries(struct ip22_hostdata *hd, struct scsi_cmnd *cmd, int din)
 	 */
 	hcp->desc.pbuf = 0;
 	hcp->desc.cntinfo = HPCDMA_EOX;
-	dma_cache_sync(hd->dev, hd->cpu,
+	dma_sync_single_for_device(hd->dev, hd->dma,
 		       (unsigned long)(hcp + 1) - (unsigned long)hd->cpu,
 		       DMA_TO_DEVICE);
 }
@@ -235,7 +235,7 @@ static int sgiwd93_probe(struct platform_device *pdev)
 	hdata = host_to_hostdata(host);
 	hdata->dev = &pdev->dev;
 	hdata->cpu = dma_alloc_pages(&pdev->dev, HPC_DMA_SIZE, &hdata->dma,
-				     DMA_BIDIRECTIONAL, GFP_KERNEL);
+				     DMA_TO_DEVICE, GFP_KERNEL);
 	if (!hdata->cpu) {
 		printk(KERN_WARNING "sgiwd93: Could not allocate memory for "
 		       "host %d buffer.\n", unit);
@@ -275,7 +275,7 @@ static int sgiwd93_probe(struct platform_device *pdev)
 	free_irq(irq, host);
 out_free:
 	dma_free_pages(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
-			DMA_BIDIRECTIONAL);
+			DMA_TO_DEVICE);
 out_put:
 	scsi_host_put(host);
 out:
@@ -292,7 +292,7 @@ static int sgiwd93_remove(struct platform_device *pdev)
 	scsi_remove_host(host);
 	free_irq(pd->irq, host);
 	dma_free_pages(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
-			DMA_BIDIRECTIONAL);
+			DMA_TO_DEVICE);
 	scsi_host_put(host);
 	return 0;
 }
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 21/28] hal2: convert from dma_cache_sync to dma_sync_single_for_device
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (19 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 20/28] sgiwd93: convert from dma_cache_sync to dma_sync_single_for_device Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 22/28] sgiseeq: " Christoph Hellwig
                     ` (8 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Use the proper modern API to transfer cache ownership for incoherent DMA.
This also means we can allocate the buffer memory with the proper
direction instead of bidirectional.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 sound/mips/hal2.c | 44 ++++++++++++++++++++------------------------
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/sound/mips/hal2.c b/sound/mips/hal2.c
index 746c410bd9bf11..c8e429a5f48f85 100644
--- a/sound/mips/hal2.c
+++ b/sound/mips/hal2.c
@@ -441,7 +441,8 @@ static inline void hal2_stop_adc(struct snd_hal2 *hal2)
 	hal2->adc.pbus.pbus->pbdma_ctrl = HPC3_PDMACTRL_LD;
 }
 
-static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec)
+static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec,
+		enum dma_data_direction buffer_dir)
 {
 	struct device *dev = hal2->card->dev;
 	struct hal2_desc *desc;
@@ -450,14 +451,14 @@ static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec)
 	int i;
 
 	codec->buffer = dma_alloc_pages(dev, H2_BUF_SIZE, &buffer_dma,
-					DMA_BIDIRECTIONAL, GFP_KERNEL);
+					buffer_dir, GFP_KERNEL);
 	if (!codec->buffer)
 		return -ENOMEM;
 	desc = dma_alloc_pages(dev, count * sizeof(struct hal2_desc), &desc_dma,
 			       DMA_BIDIRECTIONAL, GFP_KERNEL);
 	if (!desc) {
 		dma_free_pages(dev, H2_BUF_SIZE, codec->buffer, buffer_dma,
-				DMA_BIDIRECTIONAL);
+				buffer_dir);
 		return -ENOMEM;
 	}
 	codec->buffer_dma = buffer_dma;
@@ -470,20 +471,22 @@ static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec)
 		      desc_dma : desc_dma + (i + 1) * sizeof(struct hal2_desc);
 		desc++;
 	}
-	dma_cache_sync(dev, codec->desc, count * sizeof(struct hal2_desc),
-		       DMA_TO_DEVICE);
+	dma_sync_single_for_device(dev, codec->desc_dma,
+				   count * sizeof(struct hal2_desc),
+				   DMA_BIDIRECTIONAL);
 	codec->desc_count = count;
 	return 0;
 }
 
-static void hal2_free_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec)
+static void hal2_free_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec,
+		enum dma_data_direction buffer_dir)
 {
 	struct device *dev = hal2->card->dev;
 
 	dma_free_pages(dev, codec->desc_count * sizeof(struct hal2_desc),
 		       codec->desc, codec->desc_dma, DMA_BIDIRECTIONAL);
 	dma_free_pages(dev, H2_BUF_SIZE, codec->buffer, codec->buffer_dma,
-			DMA_BIDIRECTIONAL);
+			buffer_dir);
 }
 
 static const struct snd_pcm_hardware hal2_pcm_hw = {
@@ -509,21 +512,16 @@ static int hal2_playback_open(struct snd_pcm_substream *substream)
 {
 	struct snd_pcm_runtime *runtime = substream->runtime;
 	struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
-	int err;
 
 	runtime->hw = hal2_pcm_hw;
-
-	err = hal2_alloc_dmabuf(hal2, &hal2->dac);
-	if (err)
-		return err;
-	return 0;
+	return hal2_alloc_dmabuf(hal2, &hal2->dac, DMA_TO_DEVICE);
 }
 
 static int hal2_playback_close(struct snd_pcm_substream *substream)
 {
 	struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
 
-	hal2_free_dmabuf(hal2, &hal2->dac);
+	hal2_free_dmabuf(hal2, &hal2->dac, DMA_TO_DEVICE);
 	return 0;
 }
 
@@ -579,7 +577,9 @@ static void hal2_playback_transfer(struct snd_pcm_substream *substream,
 	unsigned char *buf = hal2->dac.buffer + rec->hw_data;
 
 	memcpy(buf, substream->runtime->dma_area + rec->sw_data, bytes);
-	dma_cache_sync(hal2->card->dev, buf, bytes, DMA_TO_DEVICE);
+	dma_sync_single_for_device(hal2->card->dev,
+			hal2->dac.buffer_dma + rec->hw_data, bytes,
+			DMA_TO_DEVICE);
 
 }
 
@@ -597,22 +597,16 @@ static int hal2_capture_open(struct snd_pcm_substream *substream)
 {
 	struct snd_pcm_runtime *runtime = substream->runtime;
 	struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
-	struct hal2_codec *adc = &hal2->adc;
-	int err;
 
 	runtime->hw = hal2_pcm_hw;
-
-	err = hal2_alloc_dmabuf(hal2, adc);
-	if (err)
-		return err;
-	return 0;
+	return hal2_alloc_dmabuf(hal2, &hal2->adc, DMA_FROM_DEVICE);
 }
 
 static int hal2_capture_close(struct snd_pcm_substream *substream)
 {
 	struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
 
-	hal2_free_dmabuf(hal2, &hal2->adc);
+	hal2_free_dmabuf(hal2, &hal2->adc, DMA_FROM_DEVICE);
 	return 0;
 }
 
@@ -667,7 +661,9 @@ static void hal2_capture_transfer(struct snd_pcm_substream *substream,
 	struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
 	unsigned char *buf = hal2->adc.buffer + rec->hw_data;
 
-	dma_cache_sync(hal2->card->dev, buf, bytes, DMA_FROM_DEVICE);
+	dma_sync_single_for_cpu(hal2->card->dev,
+			hal2->adc.buffer_dma + rec->hw_data, bytes,
+			DMA_FROM_DEVICE);
 	memcpy(substream->runtime->dma_area + rec->sw_data, buf, bytes);
 }
 
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 22/28] sgiseeq: convert from dma_cache_sync to dma_sync_single_for_device
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (20 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 21/28] hal2: " Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-09-01 15:22     ` Thomas Bogendoerfer
  2020-08-19  6:55   ` [PATCH 23/28] lib82596: " Christoph Hellwig
                     ` (7 subsequent siblings)
  29 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Use the proper modern API to transfer cache ownership for incoherent DMA.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/net/ethernet/seeq/sgiseeq.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/seeq/sgiseeq.c b/drivers/net/ethernet/seeq/sgiseeq.c
index 39599bbb5d45b6..f91dae16d69a19 100644
--- a/drivers/net/ethernet/seeq/sgiseeq.c
+++ b/drivers/net/ethernet/seeq/sgiseeq.c
@@ -112,14 +112,18 @@ struct sgiseeq_private {
 
 static inline void dma_sync_desc_cpu(struct net_device *dev, void *addr)
 {
-	dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
-		       DMA_FROM_DEVICE);
+	struct sgiseeq_private *sp = netdev_priv(dev);
+
+	dma_sync_single_for_cpu(dev->dev.parent, VIRT_TO_DMA(sp, addr),
+			sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL);
 }
 
 static inline void dma_sync_desc_dev(struct net_device *dev, void *addr)
 {
-	dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
-		       DMA_TO_DEVICE);
+	struct sgiseeq_private *sp = netdev_priv(dev);
+
+	dma_sync_single_for_device(dev->dev.parent, VIRT_TO_DMA(sp, addr),
+			sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL);
 }
 
 static inline void hpc3_eth_reset(struct hpc3_ethregs *hregs)
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 23/28] lib82596: convert from dma_cache_sync to dma_sync_single_for_device
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (21 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 22/28] sgiseeq: " Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 24/28] 53c700: " Christoph Hellwig
                     ` (6 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Use the proper modern API to transfer cache ownership for incoherent DMA.
Note that this moves the DMA helpers to the main lib82596.c file, so
that they can use virt_to_dma.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/net/ethernet/i825xx/lasi_82596.c |  11 +--
 drivers/net/ethernet/i825xx/lib82596.c   | 114 ++++++++++++++---------
 drivers/net/ethernet/i825xx/sni_82596.c  |   4 -
 3 files changed, 73 insertions(+), 56 deletions(-)

diff --git a/drivers/net/ethernet/i825xx/lasi_82596.c b/drivers/net/ethernet/i825xx/lasi_82596.c
index 0c493b7237a910..d13b610935bcf3 100644
--- a/drivers/net/ethernet/i825xx/lasi_82596.c
+++ b/drivers/net/ethernet/i825xx/lasi_82596.c
@@ -96,21 +96,14 @@
 
 #define OPT_SWAP_PORT	0x0001	/* Need to wordswp on the MPU port */
 
-#define DMA_WBACK(ndev, addr, len) \
-	do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, DMA_TO_DEVICE); } while (0)
-
-#define DMA_INV(ndev, addr, len) \
-	do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, DMA_FROM_DEVICE); } while (0)
-
-#define DMA_WBACK_INV(ndev, addr, len) \
-	do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, DMA_BIDIRECTIONAL); } while (0)
-
 #define SYSBUS      0x0000006c
 
 /* big endian CPU, 82596 "big" endian mode */
 #define SWAP32(x)   (((u32)(x)<<16) | ((((u32)(x)))>>16))
 #define SWAP16(x)   (x)
 
+#define NONCOHERENT_DMA 1
+
 #include "lib82596.c"
 
 MODULE_AUTHOR("Richard Hirst");
diff --git a/drivers/net/ethernet/i825xx/lib82596.c b/drivers/net/ethernet/i825xx/lib82596.c
index b4e4b3eb5758b5..ca2fb303fcc6f6 100644
--- a/drivers/net/ethernet/i825xx/lib82596.c
+++ b/drivers/net/ethernet/i825xx/lib82596.c
@@ -365,13 +365,44 @@ static int max_cmd_backlog = TX_RING_SIZE-1;
 static void i596_poll_controller(struct net_device *dev);
 #endif
 
+static inline dma_addr_t virt_to_dma(struct i596_private *lp, volatile void *v)
+{
+	return lp->dma_addr + ((unsigned long)v - (unsigned long)lp->dma);
+}
+
+#ifdef NONCOHERENT_DMA
+static inline void dma_sync_dev(struct net_device *ndev, volatile void *addr,
+		size_t len)
+{
+	dma_sync_single_for_device(ndev->dev.parent,
+			virt_to_dma(netdev_priv(ndev), addr), len,
+			DMA_BIDIRECTIONAL);
+}
+
+static inline void dma_sync_cpu(struct net_device *ndev, volatile void *addr,
+		size_t len)
+{
+	dma_sync_single_for_cpu(ndev->dev.parent,
+			virt_to_dma(netdev_priv(ndev), addr), len,
+			DMA_BIDIRECTIONAL);
+}
+#else
+static inline void dma_sync_dev(struct net_device *ndev, volatile void *addr,
+		size_t len)
+{
+}
+static inline void dma_sync_cpu(struct net_device *ndev, volatile void *addr,
+		size_t len)
+{
+}
+#endif /* NONCOHERENT_DMA */
 
 static inline int wait_istat(struct net_device *dev, struct i596_dma *dma, int delcnt, char *str)
 {
-	DMA_INV(dev, &(dma->iscp), sizeof(struct i596_iscp));
+	dma_sync_cpu(dev, &(dma->iscp), sizeof(struct i596_iscp));
 	while (--delcnt && dma->iscp.stat) {
 		udelay(10);
-		DMA_INV(dev, &(dma->iscp), sizeof(struct i596_iscp));
+		dma_sync_cpu(dev, &(dma->iscp), sizeof(struct i596_iscp));
 	}
 	if (!delcnt) {
 		printk(KERN_ERR "%s: %s, iscp.stat %04x, didn't clear\n",
@@ -384,10 +415,10 @@ static inline int wait_istat(struct net_device *dev, struct i596_dma *dma, int d
 
 static inline int wait_cmd(struct net_device *dev, struct i596_dma *dma, int delcnt, char *str)
 {
-	DMA_INV(dev, &(dma->scb), sizeof(struct i596_scb));
+	dma_sync_cpu(dev, &(dma->scb), sizeof(struct i596_scb));
 	while (--delcnt && dma->scb.command) {
 		udelay(10);
-		DMA_INV(dev, &(dma->scb), sizeof(struct i596_scb));
+		dma_sync_cpu(dev, &(dma->scb), sizeof(struct i596_scb));
 	}
 	if (!delcnt) {
 		printk(KERN_ERR "%s: %s, status %4.4x, cmd %4.4x.\n",
@@ -451,12 +482,9 @@ static void i596_display_data(struct net_device *dev)
 		       SWAP32(rbd->b_data), SWAP16(rbd->size));
 		rbd = rbd->v_next;
 	} while (rbd != lp->rbd_head);
-	DMA_INV(dev, dma, sizeof(struct i596_dma));
+	dma_sync_cpu(dev, dma, sizeof(struct i596_dma));
 }
 
-
-#define virt_to_dma(lp, v) ((lp)->dma_addr + (dma_addr_t)((unsigned long)(v)-(unsigned long)((lp)->dma)))
-
 static inline int init_rx_bufs(struct net_device *dev)
 {
 	struct i596_private *lp = netdev_priv(dev);
@@ -508,7 +536,7 @@ static inline int init_rx_bufs(struct net_device *dev)
 	rfd->b_next = SWAP32(virt_to_dma(lp, dma->rfds));
 	rfd->cmd = SWAP16(CMD_EOL|CMD_FLEX);
 
-	DMA_WBACK_INV(dev, dma, sizeof(struct i596_dma));
+	dma_sync_dev(dev, dma, sizeof(struct i596_dma));
 	return 0;
 }
 
@@ -547,7 +575,7 @@ static void rebuild_rx_bufs(struct net_device *dev)
 	lp->rbd_head = dma->rbds;
 	dma->rfds[0].rbd = SWAP32(virt_to_dma(lp, dma->rbds));
 
-	DMA_WBACK_INV(dev, dma, sizeof(struct i596_dma));
+	dma_sync_dev(dev, dma, sizeof(struct i596_dma));
 }
 
 
@@ -575,9 +603,9 @@ static int init_i596_mem(struct net_device *dev)
 
 	DEB(DEB_INIT, printk(KERN_DEBUG "%s: starting i82596.\n", dev->name));
 
-	DMA_WBACK(dev, &(dma->scp), sizeof(struct i596_scp));
-	DMA_WBACK(dev, &(dma->iscp), sizeof(struct i596_iscp));
-	DMA_WBACK(dev, &(dma->scb), sizeof(struct i596_scb));
+	dma_sync_dev(dev, &(dma->scp), sizeof(struct i596_scp));
+	dma_sync_dev(dev, &(dma->iscp), sizeof(struct i596_iscp));
+	dma_sync_dev(dev, &(dma->scb), sizeof(struct i596_scb));
 
 	mpu_port(dev, PORT_ALTSCP, virt_to_dma(lp, &dma->scp));
 	ca(dev);
@@ -596,24 +624,24 @@ static int init_i596_mem(struct net_device *dev)
 	rebuild_rx_bufs(dev);
 
 	dma->scb.command = 0;
-	DMA_WBACK(dev, &(dma->scb), sizeof(struct i596_scb));
+	dma_sync_dev(dev, &(dma->scb), sizeof(struct i596_scb));
 
 	DEB(DEB_INIT, printk(KERN_DEBUG
 			     "%s: queuing CmdConfigure\n", dev->name));
 	memcpy(dma->cf_cmd.i596_config, init_setup, 14);
 	dma->cf_cmd.cmd.command = SWAP16(CmdConfigure);
-	DMA_WBACK(dev, &(dma->cf_cmd), sizeof(struct cf_cmd));
+	dma_sync_dev(dev, &(dma->cf_cmd), sizeof(struct cf_cmd));
 	i596_add_cmd(dev, &dma->cf_cmd.cmd);
 
 	DEB(DEB_INIT, printk(KERN_DEBUG "%s: queuing CmdSASetup\n", dev->name));
 	memcpy(dma->sa_cmd.eth_addr, dev->dev_addr, ETH_ALEN);
 	dma->sa_cmd.cmd.command = SWAP16(CmdSASetup);
-	DMA_WBACK(dev, &(dma->sa_cmd), sizeof(struct sa_cmd));
+	dma_sync_dev(dev, &(dma->sa_cmd), sizeof(struct sa_cmd));
 	i596_add_cmd(dev, &dma->sa_cmd.cmd);
 
 	DEB(DEB_INIT, printk(KERN_DEBUG "%s: queuing CmdTDR\n", dev->name));
 	dma->tdr_cmd.cmd.command = SWAP16(CmdTDR);
-	DMA_WBACK(dev, &(dma->tdr_cmd), sizeof(struct tdr_cmd));
+	dma_sync_dev(dev, &(dma->tdr_cmd), sizeof(struct tdr_cmd));
 	i596_add_cmd(dev, &dma->tdr_cmd.cmd);
 
 	spin_lock_irqsave (&lp->lock, flags);
@@ -625,7 +653,7 @@ static int init_i596_mem(struct net_device *dev)
 	DEB(DEB_INIT, printk(KERN_DEBUG "%s: Issuing RX_START\n", dev->name));
 	dma->scb.command = SWAP16(RX_START);
 	dma->scb.rfd = SWAP32(virt_to_dma(lp, dma->rfds));
-	DMA_WBACK(dev, &(dma->scb), sizeof(struct i596_scb));
+	dma_sync_dev(dev, &(dma->scb), sizeof(struct i596_scb));
 
 	ca(dev);
 
@@ -659,13 +687,13 @@ static inline int i596_rx(struct net_device *dev)
 
 	rfd = lp->rfd_head;		/* Ref next frame to check */
 
-	DMA_INV(dev, rfd, sizeof(struct i596_rfd));
+	dma_sync_cpu(dev, rfd, sizeof(struct i596_rfd));
 	while (rfd->stat & SWAP16(STAT_C)) {	/* Loop while complete frames */
 		if (rfd->rbd == I596_NULL)
 			rbd = NULL;
 		else if (rfd->rbd == lp->rbd_head->b_addr) {
 			rbd = lp->rbd_head;
-			DMA_INV(dev, rbd, sizeof(struct i596_rbd));
+			dma_sync_cpu(dev, rbd, sizeof(struct i596_rbd));
 		} else {
 			printk(KERN_ERR "%s: rbd chain broken!\n", dev->name);
 			/* XXX Now what? */
@@ -713,7 +741,7 @@ static inline int i596_rx(struct net_device *dev)
 							  DMA_FROM_DEVICE);
 				rbd->v_data = newskb->data;
 				rbd->b_data = SWAP32(dma_addr);
-				DMA_WBACK_INV(dev, rbd, sizeof(struct i596_rbd));
+				dma_sync_dev(dev, rbd, sizeof(struct i596_rbd));
 			} else {
 				skb = netdev_alloc_skb_ip_align(dev, pkt_len);
 			}
@@ -765,7 +793,7 @@ static inline int i596_rx(struct net_device *dev)
 		if (rbd != NULL && (rbd->count & SWAP16(0x4000))) {
 			rbd->count = 0;
 			lp->rbd_head = rbd->v_next;
-			DMA_WBACK_INV(dev, rbd, sizeof(struct i596_rbd));
+			dma_sync_dev(dev, rbd, sizeof(struct i596_rbd));
 		}
 
 		/* Tidy the frame descriptor, marking it as end of list */
@@ -779,14 +807,14 @@ static inline int i596_rx(struct net_device *dev)
 
 		lp->dma->scb.rfd = rfd->b_next;
 		lp->rfd_head = rfd->v_next;
-		DMA_WBACK_INV(dev, rfd, sizeof(struct i596_rfd));
+		dma_sync_dev(dev, rfd, sizeof(struct i596_rfd));
 
 		/* Remove end-of-list from old end descriptor */
 
 		rfd->v_prev->cmd = SWAP16(CMD_FLEX);
-		DMA_WBACK_INV(dev, rfd->v_prev, sizeof(struct i596_rfd));
+		dma_sync_dev(dev, rfd->v_prev, sizeof(struct i596_rfd));
 		rfd = lp->rfd_head;
-		DMA_INV(dev, rfd, sizeof(struct i596_rfd));
+		dma_sync_cpu(dev, rfd, sizeof(struct i596_rfd));
 	}
 
 	DEB(DEB_RXFRAME, printk(KERN_DEBUG "frames %d\n", frames));
@@ -827,12 +855,12 @@ static inline void i596_cleanup_cmd(struct net_device *dev, struct i596_private
 			ptr->v_next = NULL;
 			ptr->b_next = I596_NULL;
 		}
-		DMA_WBACK_INV(dev, ptr, sizeof(struct i596_cmd));
+		dma_sync_dev(dev, ptr, sizeof(struct i596_cmd));
 	}
 
 	wait_cmd(dev, lp->dma, 100, "i596_cleanup_cmd timed out");
 	lp->dma->scb.cmd = I596_NULL;
-	DMA_WBACK(dev, &(lp->dma->scb), sizeof(struct i596_scb));
+	dma_sync_dev(dev, &(lp->dma->scb), sizeof(struct i596_scb));
 }
 
 
@@ -850,7 +878,7 @@ static inline void i596_reset(struct net_device *dev, struct i596_private *lp)
 
 	/* FIXME: this command might cause an lpmc */
 	lp->dma->scb.command = SWAP16(CUC_ABORT | RX_ABORT);
-	DMA_WBACK(dev, &(lp->dma->scb), sizeof(struct i596_scb));
+	dma_sync_dev(dev, &(lp->dma->scb), sizeof(struct i596_scb));
 	ca(dev);
 
 	/* wait for shutdown */
@@ -878,20 +906,20 @@ static void i596_add_cmd(struct net_device *dev, struct i596_cmd *cmd)
 	cmd->command |= SWAP16(CMD_EOL | CMD_INTR);
 	cmd->v_next = NULL;
 	cmd->b_next = I596_NULL;
-	DMA_WBACK(dev, cmd, sizeof(struct i596_cmd));
+	dma_sync_dev(dev, cmd, sizeof(struct i596_cmd));
 
 	spin_lock_irqsave (&lp->lock, flags);
 
 	if (lp->cmd_head != NULL) {
 		lp->cmd_tail->v_next = cmd;
 		lp->cmd_tail->b_next = SWAP32(virt_to_dma(lp, &cmd->status));
-		DMA_WBACK(dev, lp->cmd_tail, sizeof(struct i596_cmd));
+		dma_sync_dev(dev, lp->cmd_tail, sizeof(struct i596_cmd));
 	} else {
 		lp->cmd_head = cmd;
 		wait_cmd(dev, dma, 100, "i596_add_cmd timed out");
 		dma->scb.cmd = SWAP32(virt_to_dma(lp, &cmd->status));
 		dma->scb.command = SWAP16(CUC_START);
-		DMA_WBACK(dev, &(dma->scb), sizeof(struct i596_scb));
+		dma_sync_dev(dev, &(dma->scb), sizeof(struct i596_scb));
 		ca(dev);
 	}
 	lp->cmd_tail = cmd;
@@ -956,7 +984,7 @@ static void i596_tx_timeout (struct net_device *dev, unsigned int txqueue)
 		/* Issue a channel attention signal */
 		DEB(DEB_ERRORS, printk(KERN_DEBUG "Kicking board.\n"));
 		lp->dma->scb.command = SWAP16(CUC_START | RX_START);
-		DMA_WBACK_INV(dev, &(lp->dma->scb), sizeof(struct i596_scb));
+		dma_sync_dev(dev, &(lp->dma->scb), sizeof(struct i596_scb));
 		ca (dev);
 		lp->last_restart = dev->stats.tx_packets;
 	}
@@ -1014,8 +1042,8 @@ static netdev_tx_t i596_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		tbd->data = SWAP32(tx_cmd->dma_addr);
 
 		DEB(DEB_TXADDR, print_eth(skb->data, "tx-queued"));
-		DMA_WBACK_INV(dev, tx_cmd, sizeof(struct tx_cmd));
-		DMA_WBACK_INV(dev, tbd, sizeof(struct i596_tbd));
+		dma_sync_dev(dev, tx_cmd, sizeof(struct tx_cmd));
+		dma_sync_dev(dev, tbd, sizeof(struct i596_tbd));
 		i596_add_cmd(dev, &tx_cmd->cmd);
 
 		dev->stats.tx_packets++;
@@ -1071,7 +1099,7 @@ static int i82596_probe(struct net_device *dev)
 	lp->dma->scb.rfd = I596_NULL;
 	spin_lock_init(&lp->lock);
 
-	DMA_WBACK_INV(dev, lp->dma, sizeof(struct i596_dma));
+	dma_sync_dev(dev, lp->dma, sizeof(struct i596_dma));
 
 	ret = register_netdev(dev);
 	if (ret)
@@ -1141,7 +1169,7 @@ static irqreturn_t i596_interrupt(int irq, void *dev_id)
 				   dev->name, status & 0x0700));
 
 		while (lp->cmd_head != NULL) {
-			DMA_INV(dev, lp->cmd_head, sizeof(struct i596_cmd));
+			dma_sync_cpu(dev, lp->cmd_head, sizeof(struct i596_cmd));
 			if (!(lp->cmd_head->status & SWAP16(STAT_C)))
 				break;
 
@@ -1223,7 +1251,7 @@ static irqreturn_t i596_interrupt(int irq, void *dev_id)
 			}
 			ptr->v_next = NULL;
 			ptr->b_next = I596_NULL;
-			DMA_WBACK(dev, ptr, sizeof(struct i596_cmd));
+			dma_sync_dev(dev, ptr, sizeof(struct i596_cmd));
 			lp->last_cmd = jiffies;
 		}
 
@@ -1237,13 +1265,13 @@ static irqreturn_t i596_interrupt(int irq, void *dev_id)
 
 			ptr->command &= SWAP16(0x1fff);
 			ptr = ptr->v_next;
-			DMA_WBACK_INV(dev, prev, sizeof(struct i596_cmd));
+			dma_sync_dev(dev, prev, sizeof(struct i596_cmd));
 		}
 
 		if (lp->cmd_head != NULL)
 			ack_cmd |= CUC_START;
 		dma->scb.cmd = SWAP32(virt_to_dma(lp, &lp->cmd_head->status));
-		DMA_WBACK_INV(dev, &dma->scb, sizeof(struct i596_scb));
+		dma_sync_dev(dev, &dma->scb, sizeof(struct i596_scb));
 	}
 	if ((status & 0x1000) || (status & 0x4000)) {
 		if ((status & 0x4000))
@@ -1268,7 +1296,7 @@ static irqreturn_t i596_interrupt(int irq, void *dev_id)
 	}
 	wait_cmd(dev, dma, 100, "i596 interrupt, timeout");
 	dma->scb.command = SWAP16(ack_cmd);
-	DMA_WBACK(dev, &dma->scb, sizeof(struct i596_scb));
+	dma_sync_dev(dev, &dma->scb, sizeof(struct i596_scb));
 
 	/* DANGER: I suspect that some kind of interrupt
 	 acknowledgement aside from acking the 82596 might be needed
@@ -1299,7 +1327,7 @@ static int i596_close(struct net_device *dev)
 
 	wait_cmd(dev, lp->dma, 100, "close1 timed out");
 	lp->dma->scb.command = SWAP16(CUC_ABORT | RX_ABORT);
-	DMA_WBACK(dev, &lp->dma->scb, sizeof(struct i596_scb));
+	dma_sync_dev(dev, &lp->dma->scb, sizeof(struct i596_scb));
 
 	ca(dev);
 
@@ -1358,7 +1386,7 @@ static void set_multicast_list(struct net_device *dev)
 			       dev->name);
 		else {
 			dma->cf_cmd.cmd.command = SWAP16(CmdConfigure);
-			DMA_WBACK_INV(dev, &dma->cf_cmd, sizeof(struct cf_cmd));
+			dma_sync_dev(dev, &dma->cf_cmd, sizeof(struct cf_cmd));
 			i596_add_cmd(dev, &dma->cf_cmd.cmd);
 		}
 	}
@@ -1390,7 +1418,7 @@ static void set_multicast_list(struct net_device *dev)
 					   dev->name, cp));
 			cp += ETH_ALEN;
 		}
-		DMA_WBACK_INV(dev, &dma->mc_cmd, sizeof(struct mc_cmd));
+		dma_sync_dev(dev, &dma->mc_cmd, sizeof(struct mc_cmd));
 		i596_add_cmd(dev, &cmd->cmd);
 	}
 }
diff --git a/drivers/net/ethernet/i825xx/sni_82596.c b/drivers/net/ethernet/i825xx/sni_82596.c
index e80e790ffbd4d4..507d60cd6f9b33 100644
--- a/drivers/net/ethernet/i825xx/sni_82596.c
+++ b/drivers/net/ethernet/i825xx/sni_82596.c
@@ -24,10 +24,6 @@
 
 static const char sni_82596_string[] = "snirm_82596";
 
-#define DMA_WBACK(priv, addr, len)     do { } while (0)
-#define DMA_INV(priv, addr, len)       do { } while (0)
-#define DMA_WBACK_INV(priv, addr, len) do { } while (0)
-
 #define SYSBUS      0x00004400
 
 /* big endian CPU, 82596 little endian */
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 24/28] 53c700: convert from dma_cache_sync to dma_sync_single_for_device
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (22 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 23/28] lib82596: " Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 25/28] dma-mapping: remove dma_cache_sync Christoph Hellwig
                     ` (5 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Use the proper modern API to transfer cache ownership for incoherent DMA.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/53c700.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
index 521950d0731e4a..57a08c42d00325 100644
--- a/drivers/scsi/53c700.c
+++ b/drivers/scsi/53c700.c
@@ -269,18 +269,25 @@ NCR_700_get_SXFER(struct scsi_device *SDp)
 					      spi_period(SDp->sdev_target));
 }
 
+static inline dma_addr_t virt_to_dma(struct NCR_700_Host_Parameters *h, void *p)
+{
+	return h->pScript + ((uintptr_t)p - (uintptr_t)h->script);
+}
+
 static inline void dma_sync_to_dev(struct NCR_700_Host_Parameters *h,
 		void *addr, size_t size)
 {
 	if (h->noncoherent)
-		dma_cache_sync(h->dev, addr, size, DMA_TO_DEVICE);
+		dma_sync_single_for_device(h->dev, virt_to_dma(h, addr),
+					   size, DMA_BIDIRECTIONAL);
 }
 
 static inline void dma_sync_from_dev(struct NCR_700_Host_Parameters *h,
 		void *addr, size_t size)
 {
 	if (h->noncoherent)
-		dma_cache_sync(h->dev, addr, size, DMA_FROM_DEVICE);
+		dma_sync_single_for_device(h->dev, virt_to_dma(h, addr), size,
+					   DMA_BIDIRECTIONAL);
 }
 
 struct Scsi_Host *
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 25/28] dma-mapping: remove dma_cache_sync
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (23 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 24/28] 53c700: " Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 26/28] dmapool: add dma_alloc_pages support Christoph Hellwig
                     ` (4 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

All users are gone now, remove the API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/mips/Kconfig               |  1 -
 arch/mips/jazz/jazzdma.c        |  1 -
 arch/mips/mm/dma-noncoherent.c  |  6 ------
 arch/parisc/Kconfig             |  1 -
 arch/parisc/kernel/pci-dma.c    |  6 ------
 include/linux/dma-mapping.h     |  9 ---------
 include/linux/dma-noncoherent.h | 10 ----------
 kernel/dma/Kconfig              |  3 ---
 kernel/dma/mapping.c            | 14 --------------
 9 files changed, 51 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index c95fa3a2484cf0..1be91c5d666e61 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1134,7 +1134,6 @@ config DMA_NONCOHERENT
 	select ARCH_HAS_SYNC_DMA_FOR_DEVICE
 	select ARCH_HAS_DMA_SET_UNCACHED
 	select DMA_NONCOHERENT_MMAP
-	select DMA_NONCOHERENT_CACHE_SYNC
 	select NEED_DMA_MAP_STATE
 
 config SYS_HAS_EARLY_PRINTK
diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c
index 0f9a9cb7fe7a95..e2efe43f5f9cc3 100644
--- a/arch/mips/jazz/jazzdma.c
+++ b/arch/mips/jazz/jazzdma.c
@@ -614,7 +614,6 @@ const struct dma_map_ops jazz_dma_ops = {
 	.sync_single_for_device	= jazz_dma_sync_single_for_device,
 	.sync_sg_for_cpu	= jazz_dma_sync_sg_for_cpu,
 	.sync_sg_for_device	= jazz_dma_sync_sg_for_device,
-	.cache_sync		= arch_dma_cache_sync,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
 	.alloc_pages		= dma_common_alloc_pages,
diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c
index 97a14adbafc99c..f34ad1f09799f1 100644
--- a/arch/mips/mm/dma-noncoherent.c
+++ b/arch/mips/mm/dma-noncoherent.c
@@ -137,12 +137,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 }
 #endif
 
-void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-		enum dma_data_direction direction)
-{
-	dma_sync_virt_for_device(vaddr, size, direction);
-}
-
 #ifdef CONFIG_DMA_PERDEV_COHERENT
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 		const struct iommu_ops *iommu, bool coherent)
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 3b0f53dd70bc9b..ed15da1da174e0 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -195,7 +195,6 @@ config PA11
 	depends on PA7000 || PA7100LC || PA7200 || PA7300LC
 	select ARCH_HAS_SYNC_DMA_FOR_CPU
 	select ARCH_HAS_SYNC_DMA_FOR_DEVICE
-	select DMA_NONCOHERENT_CACHE_SYNC
 
 config PREFETCH
 	def_bool y
diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c
index 38c68e131bbe2a..ce38c0b9158125 100644
--- a/arch/parisc/kernel/pci-dma.c
+++ b/arch/parisc/kernel/pci-dma.c
@@ -454,9 +454,3 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 {
 	flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size);
 }
-
-void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-	       enum dma_data_direction direction)
-{
-	flush_kernel_dcache_range((unsigned long)vaddr, size);
-}
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 73fa6e10c5c8b5..7321df0b9ffc83 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -123,8 +123,6 @@ struct dma_map_ops {
 	void (*sync_sg_for_device)(struct device *dev,
 				   struct scatterlist *sg, int nents,
 				   enum dma_data_direction dir);
-	void (*cache_sync)(struct device *dev, void *vaddr, size_t size,
-			enum dma_data_direction direction);
 	int (*dma_supported)(struct device *dev, u64 mask);
 	u64 (*get_required_mask)(struct device *dev);
 	size_t (*max_mapping_size)(struct device *dev);
@@ -258,9 +256,6 @@ void *dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle,
 		enum dma_data_direction dir, gfp_t gfp);
 void dma_free_pages(struct device *dev, size_t size, void *vaddr,
 		dma_addr_t dma_handle, enum dma_data_direction dir);
-/* dma_cache_sync is deprecated: don't use in new code */
-void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-		enum dma_data_direction dir);
 int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		unsigned long attrs);
@@ -353,10 +348,6 @@ static inline void dma_free_pages(struct device *dev, size_t size, void *vaddr,
 		dma_addr_t dma_handle, enum dma_data_direction dir)
 {
 }
-static inline void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-		enum dma_data_direction dir)
-{
-}
 static inline int dma_get_sgtable_attrs(struct device *dev,
 		struct sg_table *sgt, void *cpu_addr, dma_addr_t dma_addr,
 		size_t size, unsigned long attrs)
diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index 1eecfd24d434f8..e61283e06576a8 100644
--- a/include/linux/dma-noncoherent.h
+++ b/include/linux/dma-noncoherent.h
@@ -59,16 +59,6 @@ static inline pgprot_t dma_pgprot(struct device *dev, pgprot_t prot,
 }
 #endif /* CONFIG_MMU */
 
-#ifdef CONFIG_DMA_NONCOHERENT_CACHE_SYNC
-void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-		enum dma_data_direction direction);
-#else
-static inline void arch_dma_cache_sync(struct device *dev, void *vaddr,
-		size_t size, enum dma_data_direction direction)
-{
-}
-#endif /* CONFIG_DMA_NONCOHERENT_CACHE_SYNC */
-
 #ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE
 void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
 		enum dma_data_direction dir);
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 6cf7f7947ae797..98417f8eef92b2 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -74,9 +74,6 @@ config ARCH_HAS_DMA_PREP_COHERENT
 config ARCH_HAS_FORCE_DMA_UNENCRYPTED
 	bool
 
-config DMA_NONCOHERENT_CACHE_SYNC
-	bool
-
 config DMA_VIRT_OPS
 	bool
 	depends on HAS_DMA
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index dacdb7226caacd..81b0492332d4c8 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -555,20 +555,6 @@ int dma_set_coherent_mask(struct device *dev, u64 mask)
 EXPORT_SYMBOL(dma_set_coherent_mask);
 #endif
 
-void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-		enum dma_data_direction dir)
-{
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-
-	BUG_ON(!valid_dma_direction(dir));
-
-	if (dma_alloc_direct(dev, ops))
-		arch_dma_cache_sync(dev, vaddr, size, dir);
-	else if (ops->cache_sync)
-		ops->cache_sync(dev, vaddr, size, dir);
-}
-EXPORT_SYMBOL(dma_cache_sync);
-
 size_t dma_max_mapping_size(struct device *dev)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 26/28] dmapool: add dma_alloc_pages support
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (24 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 25/28] dma-mapping: remove dma_cache_sync Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 27/28] nvme-pci: fix PRP pool size Christoph Hellwig
                     ` (3 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Add an new variant of a dmapool that uses non-coherent memory from
dma_alloc_pages.  Unlike the existing mempool_create this one
initialized a pool allocated by the caller to avoid a pointless extra
allocation.  At some point it might be worth to also switch the coherent
allocation over to a similar dma_pool_init_coherent helper, but that is
better done as a separate series including a few conversions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/dmapool.h |  23 ++++-
 mm/dmapool.c            | 211 +++++++++++++++++++++++++---------------
 2 files changed, 154 insertions(+), 80 deletions(-)

diff --git a/include/linux/dmapool.h b/include/linux/dmapool.h
index f632ecfb423840..1387525c4e52e8 100644
--- a/include/linux/dmapool.h
+++ b/include/linux/dmapool.h
@@ -11,6 +11,10 @@
 #ifndef LINUX_DMAPOOL_H
 #define	LINUX_DMAPOOL_H
 
+#include <linux/dma-direction.h>
+#include <linux/gfp.h>
+#include <linux/spinlock.h>
+#include <linux/list.h>
 #include <linux/scatterlist.h>
 #include <asm/io.h>
 
@@ -18,11 +22,28 @@ struct device;
 
 #ifdef CONFIG_HAS_DMA
 
+struct dma_pool {		/* the pool */
+	struct list_head page_list;
+	spinlock_t lock;
+	size_t size;
+	struct device *dev;
+	size_t allocation;
+	size_t boundary;
+	bool is_coherent;
+	enum dma_data_direction dir;
+	char name[32];
+	struct list_head pools;
+};
+
 struct dma_pool *dma_pool_create(const char *name, struct device *dev, 
 			size_t size, size_t align, size_t allocation);
-
 void dma_pool_destroy(struct dma_pool *pool);
 
+int dma_pool_init(struct device *dev, struct dma_pool *pool, const char *name,
+		size_t size, size_t align, size_t boundary,
+		enum dma_data_direction dir);
+void dma_pool_exit(struct dma_pool *pool);
+
 void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
 		     dma_addr_t *handle);
 void dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t addr);
diff --git a/mm/dmapool.c b/mm/dmapool.c
index f9fb9bbd733e0f..c60a48b22c8d6a 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -6,10 +6,10 @@
  * Copyright 2007 Intel Corporation
  *   Author: Matthew Wilcox <willy@linux.intel.com>
  *
- * This allocator returns small blocks of a given size which are DMA-able by
- * the given device.  It uses the dma_alloc_coherent page allocator to get
- * new pages, then splits them up into blocks of the required size.
- * Many older drivers still have their own code to do this.
+ * This allocator returns small blocks of a given size which are DMA-able by the
+ * given device.  It either uses the dma_alloc_coherent or the dma_alloc_pages
+ * allocator to get new pages, then splits them up into blocks of the required
+ * size.
  *
  * The current design of this allocator is fairly simple.  The pool is
  * represented by the 'struct dma_pool' which keeps a doubly-linked list of
@@ -39,17 +39,6 @@
 #define DMAPOOL_DEBUG 1
 #endif
 
-struct dma_pool {		/* the pool */
-	struct list_head page_list;
-	spinlock_t lock;
-	size_t size;
-	struct device *dev;
-	size_t allocation;
-	size_t boundary;
-	char name[32];
-	struct list_head pools;
-};
-
 struct dma_page {		/* cacheable header for 'allocation' bytes */
 	struct list_head page_list;
 	void *vaddr;
@@ -104,74 +93,40 @@ show_pools(struct device *dev, struct device_attribute *attr, char *buf)
 
 static DEVICE_ATTR(pools, 0444, show_pools, NULL);
 
-/**
- * dma_pool_create - Creates a pool of consistent memory blocks, for dma.
- * @name: name of pool, for diagnostics
- * @dev: device that will be doing the DMA
- * @size: size of the blocks in this pool.
- * @align: alignment requirement for blocks; must be a power of two
- * @boundary: returned blocks won't cross this power of two boundary
- * Context: not in_interrupt()
- *
- * Given one of these pools, dma_pool_alloc()
- * may be used to allocate memory.  Such memory will all have "consistent"
- * DMA mappings, accessible by the device and its driver without using
- * cache flushing primitives.  The actual size of blocks allocated may be
- * larger than requested because of alignment.
- *
- * If @boundary is nonzero, objects returned from dma_pool_alloc() won't
- * cross that size boundary.  This is useful for devices which have
- * addressing restrictions on individual DMA transfers, such as not crossing
- * boundaries of 4KBytes.
- *
- * Return: a dma allocation pool with the requested characteristics, or
- * %NULL if one can't be created.
- */
-struct dma_pool *dma_pool_create(const char *name, struct device *dev,
-				 size_t size, size_t align, size_t boundary)
+static int __dma_pool_init(struct device *dev, struct dma_pool *pool,
+		const char *name, size_t size, size_t align, size_t boundary)
 {
-	struct dma_pool *retval;
 	size_t allocation;
 	bool empty = false;
 
 	if (align == 0)
 		align = 1;
-	else if (align & (align - 1))
-		return NULL;
+	if (align & (align - 1))
+		return -EINVAL;
 
 	if (size == 0)
-		return NULL;
-	else if (size < 4)
-		size = 4;
-
-	size = ALIGN(size, align);
+		return -EINVAL;
+	size = ALIGN(min_t(size_t, size, 4), align);
 	allocation = max_t(size_t, size, PAGE_SIZE);
 
 	if (!boundary)
 		boundary = allocation;
-	else if ((boundary < size) || (boundary & (boundary - 1)))
-		return NULL;
-
-	retval = kmalloc_node(sizeof(*retval), GFP_KERNEL, dev_to_node(dev));
-	if (!retval)
-		return retval;
-
-	strlcpy(retval->name, name, sizeof(retval->name));
-
-	retval->dev = dev;
-
-	INIT_LIST_HEAD(&retval->page_list);
-	spin_lock_init(&retval->lock);
-	retval->size = size;
-	retval->boundary = boundary;
-	retval->allocation = allocation;
-
-	INIT_LIST_HEAD(&retval->pools);
+	if (boundary < size || (boundary & (boundary - 1)))
+		return -EINVAL;
+
+	strlcpy(pool->name, name, sizeof(pool->name));
+	pool->dev = dev;
+	INIT_LIST_HEAD(&pool->page_list);
+	spin_lock_init(&pool->lock);
+	pool->size = size;
+	pool->boundary = boundary;
+	pool->allocation = allocation;
+	INIT_LIST_HEAD(&pool->pools);
 
 	/*
 	 * pools_lock ensures that the ->dma_pools list does not get corrupted.
 	 * pools_reg_lock ensures that there is not a race between
-	 * dma_pool_create() and dma_pool_destroy() or within dma_pool_create()
+	 * __dma_pool_init() and dma_pool_exit() or within dma_pool_create()
 	 * when the first invocation of dma_pool_create() failed on
 	 * device_create_file() and the second assumes that it has been done (I
 	 * know it is a short window).
@@ -180,7 +135,7 @@ struct dma_pool *dma_pool_create(const char *name, struct device *dev,
 	mutex_lock(&pools_lock);
 	if (list_empty(&dev->dma_pools))
 		empty = true;
-	list_add(&retval->pools, &dev->dma_pools);
+	list_add(&pool->pools, &dev->dma_pools);
 	mutex_unlock(&pools_lock);
 	if (empty) {
 		int err;
@@ -188,18 +143,94 @@ struct dma_pool *dma_pool_create(const char *name, struct device *dev,
 		err = device_create_file(dev, &dev_attr_pools);
 		if (err) {
 			mutex_lock(&pools_lock);
-			list_del(&retval->pools);
+			list_del(&pool->pools);
 			mutex_unlock(&pools_lock);
 			mutex_unlock(&pools_reg_lock);
-			kfree(retval);
-			return NULL;
+			return err;
 		}
 	}
 	mutex_unlock(&pools_reg_lock);
-	return retval;
+	return 0;
+}
+
+/**
+ * dma_pool_create - Creates a pool of consistent memory blocks, for dma.
+ * @name: name of pool, for diagnostics
+ * @dev: device that will be doing the DMA
+ * @size: size of the blocks in this pool.
+ * @align: alignment requirement for blocks; must be a power of two
+ * @boundary: returned blocks won't cross this power of two boundary
+ * Context: not in_interrupt()
+ *
+ * Given one of these pools, dma_pool_alloc()
+ * may be used to allocate memory.  Such memory will all have "consistent"
+ * DMA mappings, accessible by the device and its driver without using
+ * cache flushing primitives.  The actual size of blocks allocated may be
+ * larger than requested because of alignment.
+ *
+ * If @boundary is nonzero, objects returned from dma_pool_alloc() won't
+ * cross that size boundary.  This is useful for devices which have
+ * addressing restrictions on individual DMA transfers, such as not crossing
+ * boundaries of 4KBytes.
+ *  Return: a dma allocation pool with the requested characteristics, or
+ * %NULL if one can't be created.
+ */
+struct dma_pool *dma_pool_create(const char *name, struct device *dev,
+				 size_t size, size_t align, size_t boundary)
+{
+	struct dma_pool *pool;
+
+	pool = kmalloc_node(sizeof(*pool), GFP_KERNEL, dev_to_node(dev));
+	if (!pool)
+		return NULL;
+	if (__dma_pool_init(dev, pool, name, size, align, boundary))
+		goto out_free_pool;
+	pool->is_coherent = true;
+	return pool;
+out_free_pool:
+	kfree(pool);
+	return NULL;
 }
 EXPORT_SYMBOL(dma_pool_create);
 
+/**
+ * dma_pool_init - initialize a pool DMA addressable memory
+ * @dev:	device that will be doing the DMA
+ * @pool:	pool to initialize
+ * @name:	name of pool, for diagnostics
+ * @size:	size of the blocks in this pool.
+ * @align:	alignment requirement for blocks; must be a power of two
+ * @boundary:	returned blocks won't cross this power of two boundary
+ * @dir:	DMA direction the allocations are going to be used for
+ *
+ * Context:	not in_interrupt()
+ *
+ * Given one of these pools, dma_pool_alloc() may be used to allocate memory.
+ * Such memory will have the same semantics as memory returned from
+ * dma_alloc_pages(), that is ownership needs to be transferred to and from the
+ * device.  The actual size of blocks allocated may be larger than requested
+ * because of alignment.
+ *
+ * If @boundary is nonzero, objects returned from dma_pool_alloc() won't
+ * cross that size boundary.  This is useful for devices which have
+ * addressing restrictions on individual DMA transfers, such as not crossing
+ * boundaries of 4KBytes.
+ */
+int dma_pool_init(struct device *dev, struct dma_pool *pool, const char *name,
+		size_t size, size_t align, size_t boundary,
+		enum dma_data_direction dir)
+{
+	int ret;
+
+	ret = __dma_pool_init(dev, pool, name, size, align, boundary);
+	if (ret)
+		return ret;
+	pool->is_coherent = false;
+	pool->dir = dir;
+	return 0;
+}
+EXPORT_SYMBOL(dma_pool_init);
+
 static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page)
 {
 	unsigned int offset = 0;
@@ -223,8 +254,12 @@ static struct dma_page *pool_alloc_page(struct dma_pool *pool, gfp_t mem_flags)
 	page = kmalloc(sizeof(*page), mem_flags);
 	if (!page)
 		return NULL;
-	page->vaddr = dma_alloc_coherent(pool->dev, pool->allocation,
-					 &page->dma, mem_flags);
+	if (pool->is_coherent)
+		page->vaddr = dma_alloc_coherent(pool->dev, pool->allocation,
+						 &page->dma, mem_flags);
+	else
+		page->vaddr = dma_alloc_pages(pool->dev, pool->allocation,
+					      &page->dma, pool->dir, mem_flags);
 	if (page->vaddr) {
 #ifdef	DMAPOOL_DEBUG
 		memset(page->vaddr, POOL_POISON_FREED, pool->allocation);
@@ -251,20 +286,25 @@ static void pool_free_page(struct dma_pool *pool, struct dma_page *page)
 #ifdef	DMAPOOL_DEBUG
 	memset(page->vaddr, POOL_POISON_FREED, pool->allocation);
 #endif
-	dma_free_coherent(pool->dev, pool->allocation, page->vaddr, dma);
+	if (pool->is_coherent)
+		dma_free_coherent(pool->dev, pool->allocation, page->vaddr,
+				  dma);
+	else
+		dma_free_pages(pool->dev, pool->allocation, page->vaddr, dma,
+			       pool->dir);
 	list_del(&page->page_list);
 	kfree(page);
 }
 
 /**
- * dma_pool_destroy - destroys a pool of dma memory blocks.
+ * dma_pool_exit - destroys a pool of dma memory blocks.
  * @pool: dma pool that will be destroyed
  * Context: !in_interrupt()
  *
- * Caller guarantees that no more memory from the pool is in use,
- * and that nothing will try to use the pool after this call.
+ * Caller guarantees that no more memory from the pool is in use, and that
+ * nothing will try to use the pool after this call.
  */
-void dma_pool_destroy(struct dma_pool *pool)
+void dma_pool_exit(struct dma_pool *pool)
 {
 	bool empty = false;
 
@@ -299,7 +339,20 @@ void dma_pool_destroy(struct dma_pool *pool)
 		} else
 			pool_free_page(pool, page);
 	}
+}
+EXPORT_SYMBOL(dma_pool_exit);
 
+/**
+ * dma_pool_destroy - destroys a pool of dma memory blocks.
+ * @pool: dma pool that will be destroyed
+ * Context: !in_interrupt()
+ *
+ * Caller guarantees that no more memory from the pool is in use,
+ * and that nothing will try to use the pool after this call.
+ */
+void dma_pool_destroy(struct dma_pool *pool)
+{
+	dma_pool_exit(pool);
 	kfree(pool);
 }
 EXPORT_SYMBOL(dma_pool_destroy);
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 27/28] nvme-pci: fix PRP pool size
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (25 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 26/28] dmapool: add dma_alloc_pages support Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-19  6:55   ` [PATCH 28/28] nvme-pci: use dma_alloc_pages backed dmapools Christoph Hellwig
                     ` (2 subsequent siblings)
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

All operations are based on the controller, not the host page size.
Switch the dma pool to use the controller page size as well to avoid
massive overallocations on large page size systems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ba725ae47305ef..a33adab62acbaf 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2460,7 +2460,8 @@ static int nvme_disable_prepare_reset(struct nvme_dev *dev, bool shutdown)
 static int nvme_setup_prp_pools(struct nvme_dev *dev)
 {
 	dev->prp_page_pool = dma_pool_create("prp list page", dev->dev,
-						PAGE_SIZE, PAGE_SIZE, 0);
+						NVME_CTRL_PAGE_SIZE,
+						NVME_CTRL_PAGE_SIZE, 0);
 	if (!dev->prp_page_pool)
 		return -ENOMEM;
 
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 28/28] nvme-pci: use dma_alloc_pages backed dmapools
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (26 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 27/28] nvme-pci: fix PRP pool size Christoph Hellwig
@ 2020-08-19  6:55   ` Christoph Hellwig
  2020-08-25 11:30   ` a saner API for allocating DMA addressable pages Marek Szyprowski
  2020-08-29  9:46   ` Helge Deller
  29 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19  6:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Thomas Bogendoerfer, James E.J. Bottomley,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Switch from coherent DMA pools to those backed by dma_alloc_pages.  This
helps device with non-coherent DMA to avoid host accesses to uncached
memory for every submission of a larger than single entry I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 80 ++++++++++++++++++++---------------------
 1 file changed, 40 insertions(+), 40 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a33adab62acbaf..fb34dbcb973673 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -114,8 +114,8 @@ struct nvme_dev {
 	struct blk_mq_tag_set admin_tagset;
 	u32 __iomem *dbs;
 	struct device *dev;
-	struct dma_pool *prp_page_pool;
-	struct dma_pool *prp_small_pool;
+	struct dma_pool prp_page_pool;
+	struct dma_pool prp_small_pool;
 	unsigned online_queues;
 	unsigned max_qid;
 	unsigned io_queues[HCTX_MAX_TYPES];
@@ -536,7 +536,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 
 
 	if (iod->npages == 0)
-		dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
+		dma_pool_free(&dev->prp_small_pool, nvme_pci_iod_list(req)[0],
 			dma_addr);
 
 	for (i = 0; i < iod->npages; i++) {
@@ -553,7 +553,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 			next_dma_addr = le64_to_cpu(prp_list[last_prp]);
 		}
 
-		dma_pool_free(dev->prp_page_pool, addr, dma_addr);
+		dma_pool_free(&dev->prp_page_pool, addr, dma_addr);
 		dma_addr = next_dma_addr;
 	}
 
@@ -611,10 +611,10 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev,
 
 	nprps = DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE);
 	if (nprps <= (256 / 8)) {
-		pool = dev->prp_small_pool;
+		pool = &dev->prp_small_pool;
 		iod->npages = 0;
 	} else {
-		pool = dev->prp_page_pool;
+		pool = &dev->prp_page_pool;
 		iod->npages = 1;
 	}
 
@@ -630,6 +630,11 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev,
 	for (;;) {
 		if (i == NVME_CTRL_PAGE_SIZE >> 3) {
 			__le64 *old_prp_list = prp_list;
+
+			dma_sync_single_for_device(dev->dev, prp_dma,
+						   i * sizeof(*prp_list),
+						   DMA_TO_DEVICE);
+
 			prp_list = dma_pool_alloc(pool, GFP_ATOMIC, &prp_dma);
 			if (!prp_list)
 				return BLK_STS_RESOURCE;
@@ -653,6 +658,8 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev,
 		dma_len = sg_dma_len(sg);
 	}
 
+	dma_sync_single_for_device(dev->dev, prp_dma, i * sizeof(*prp_list),
+				   DMA_TO_DEVICE);
 done:
 	cmnd->dptr.prp1 = cpu_to_le64(sg_dma_address(iod->sg));
 	cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma);
@@ -706,10 +713,10 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_dev *dev,
 	}
 
 	if (entries <= (256 / sizeof(struct nvme_sgl_desc))) {
-		pool = dev->prp_small_pool;
+		pool = &dev->prp_small_pool;
 		iod->npages = 0;
 	} else {
-		pool = dev->prp_page_pool;
+		pool = &dev->prp_page_pool;
 		iod->npages = 1;
 	}
 
@@ -728,6 +735,10 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_dev *dev,
 		if (i == SGES_PER_PAGE) {
 			struct nvme_sgl_desc *old_sg_desc = sg_list;
 			struct nvme_sgl_desc *link = &old_sg_desc[i - 1];
+	
+			dma_sync_single_for_device(dev->dev, sgl_dma,
+						   i * sizeof(*sg_list),
+						   DMA_TO_DEVICE);
 
 			sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &sgl_dma);
 			if (!sg_list)
@@ -743,6 +754,8 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_dev *dev,
 		sg = sg_next(sg);
 	} while (--entries > 0);
 
+	dma_sync_single_for_device(dev->dev, sgl_dma, i * sizeof(*sg_list),
+				   DMA_TO_DEVICE);
 	return BLK_STS_OK;
 }
 
@@ -2457,30 +2470,6 @@ static int nvme_disable_prepare_reset(struct nvme_dev *dev, bool shutdown)
 	return 0;
 }
 
-static int nvme_setup_prp_pools(struct nvme_dev *dev)
-{
-	dev->prp_page_pool = dma_pool_create("prp list page", dev->dev,
-						NVME_CTRL_PAGE_SIZE,
-						NVME_CTRL_PAGE_SIZE, 0);
-	if (!dev->prp_page_pool)
-		return -ENOMEM;
-
-	/* Optimisation for I/Os between 4k and 128k */
-	dev->prp_small_pool = dma_pool_create("prp list 256", dev->dev,
-						256, 256, 0);
-	if (!dev->prp_small_pool) {
-		dma_pool_destroy(dev->prp_page_pool);
-		return -ENOMEM;
-	}
-	return 0;
-}
-
-static void nvme_release_prp_pools(struct nvme_dev *dev)
-{
-	dma_pool_destroy(dev->prp_page_pool);
-	dma_pool_destroy(dev->prp_small_pool);
-}
-
 static void nvme_free_tagset(struct nvme_dev *dev)
 {
 	if (dev->tagset.tags)
@@ -2851,10 +2840,6 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	INIT_WORK(&dev->remove_work, nvme_remove_dead_ctrl_work);
 	mutex_init(&dev->shutdown_lock);
 
-	result = nvme_setup_prp_pools(dev);
-	if (result)
-		goto unmap;
-
 	quirks |= check_vendor_combination_bug(pdev);
 
 	if (!noacpi && nvme_acpi_storage_d3(pdev)) {
@@ -2867,6 +2852,18 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		quirks |= NVME_QUIRK_SIMPLE_SUSPEND;
 	}
 
+	result = dma_pool_init(dev->dev, &dev->prp_page_pool, "prp list page",
+			NVME_CTRL_PAGE_SIZE, NVME_CTRL_PAGE_SIZE, 0,
+			DMA_TO_DEVICE);
+	if (result)
+		goto unmap;
+
+	/* Optimisation for I/Os between 4k and 128k */
+	result = dma_pool_init(dev->dev, &dev->prp_small_pool, "prp list 256",
+			256, 256, 0, DMA_TO_DEVICE);
+	if (result)
+		goto release_prp_page_pool;
+
 	/*
 	 * Double check that our mempool alloc size will cover the biggest
 	 * command we support.
@@ -2880,7 +2877,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 						GFP_KERNEL, node);
 	if (!dev->iod_mempool) {
 		result = -ENOMEM;
-		goto release_pools;
+		goto release_prp_small_pool;
 	}
 
 	result = nvme_init_ctrl(&dev->ctrl, &pdev->dev, &nvme_pci_ctrl_ops,
@@ -2897,8 +2894,10 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
  release_mempool:
 	mempool_destroy(dev->iod_mempool);
- release_pools:
-	nvme_release_prp_pools(dev);
+ release_prp_small_pool:
+	dma_pool_exit(&dev->prp_small_pool);
+ release_prp_page_pool:
+	dma_pool_exit(&dev->prp_page_pool);
  unmap:
 	nvme_dev_unmap(dev);
  put_pci:
@@ -2963,7 +2962,8 @@ static void nvme_remove(struct pci_dev *pdev)
 	nvme_free_host_mem(dev);
 	nvme_dev_remove_admin(dev);
 	nvme_free_queues(dev, 0);
-	nvme_release_prp_pools(dev);
+	dma_pool_exit(&dev->prp_small_pool);
+	dma_pool_exit(&dev->prp_page_pool);
 	nvme_dev_unmap(dev);
 	nvme_uninit_ctrl(&dev->ctrl);
 }
-- 
2.28.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-19  6:55   ` [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT Christoph Hellwig
@ 2020-08-19 11:16     ` Tomasz Figa
  2020-08-19 11:51       ` Robin Murphy
  2020-08-19 13:54       ` Christoph Hellwig
  0 siblings, 2 replies; 77+ messages in thread
From: Tomasz Figa @ 2020-08-19 11:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, Linux Kernel Mailing List, James E.J. Bottomley,
	linux-mm, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	Kyungmin Park, Ben Skeggs, Matt Porter, Linux Media Mailing List,
	Tom Lendacky, Pawel Osciak, Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	linux-mips,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,

Hi Christoph,

On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch@lst.de> wrote:
>
> The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,

Could you explain what makes you think it's unused? It's a feature of
the UAPI generally supported by the videobuf2 framework and relied on
by Chromium OS to get any kind of reasonable performance when
accessing V4L2 buffers in the userspace.

> and causes
> weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is
> unimplemented except on PARISC and some MIPS configs, and about to be
> removed.

It is implemented by the generic DMA mapping layer [1], which is used
by a number of architectures including ARM64 and supposed to be used
by new architectures going forward.

[1] https://elixir.bootlin.com/linux/v5.9-rc1/source/kernel/dma/mapping.c#L341

When removing features from generic kernel code, I'd suggest first
providing viable alternatives for its users, rather than killing the
users altogether.

Given the above, I'm afraid I have to NAK this.

Best regards,
Tomasz

>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  .../userspace-api/media/v4l/buffer.rst        | 17 ---------
>  .../media/v4l/vidioc-reqbufs.rst              |  1 -
>  .../media/common/videobuf2/videobuf2-core.c   | 36 +------------------
>  .../common/videobuf2/videobuf2-dma-contig.c   | 19 ----------
>  .../media/common/videobuf2/videobuf2-dma-sg.c |  3 +-
>  .../media/common/videobuf2/videobuf2-v4l2.c   | 12 -------
>  include/media/videobuf2-core.h                |  3 +-
>  include/uapi/linux/videodev2.h                |  2 --
>  8 files changed, 3 insertions(+), 90 deletions(-)
>
> diff --git a/Documentation/userspace-api/media/v4l/buffer.rst b/Documentation/userspace-api/media/v4l/buffer.rst
> index 57e752aaf414a7..2044ed13cd9d7d 100644
> --- a/Documentation/userspace-api/media/v4l/buffer.rst
> +++ b/Documentation/userspace-api/media/v4l/buffer.rst
> @@ -701,23 +701,6 @@ Memory Consistency Flags
>      :stub-columns: 0
>      :widths:       3 1 4
>
> -    * .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`:
> -
> -      - ``V4L2_FLAG_MEMORY_NON_CONSISTENT``
> -      - 0x00000001
> -      - A buffer is allocated either in consistent (it will be automatically
> -       coherent between the CPU and the bus) or non-consistent memory. The
> -       latter can provide performance gains, for instance the CPU cache
> -       sync/flush operations can be avoided if the buffer is accessed by the
> -       corresponding device only and the CPU does not read/write to/from that
> -       buffer. However, this requires extra care from the driver -- it must
> -       guarantee memory consistency by issuing a cache flush/sync when
> -       consistency is needed. If this flag is set V4L2 will attempt to
> -       allocate the buffer in non-consistent memory. The flag takes effect
> -       only if the buffer is used for :ref:`memory mapping <mmap>` I/O and the
> -       queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
> -       <V4L2-BUF-CAP-SUPPORTS-MMAP-CACHE-HINTS>` capability.
> -
>  .. c:type:: v4l2_memory
>
>  enum v4l2_memory
> diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> index 75d894d9c36c42..3180c111d368ee 100644
> --- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> +++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> @@ -169,7 +169,6 @@ aborting or finishing any DMA in progress, an implicit
>        - This capability is set by the driver to indicate that the queue supports
>          cache and memory management hints. However, it's only valid when the
>          queue is used for :ref:`memory mapping <mmap>` streaming I/O. See
> -        :ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT <V4L2-FLAG-MEMORY-NON-CONSISTENT>`,
>          :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE <V4L2-BUF-FLAG-NO-CACHE-INVALIDATE>` and
>          :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN <V4L2-BUF-FLAG-NO-CACHE-CLEAN>`.
>
> diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
> index f544d3393e9d6b..66a41cef33c1b1 100644
> --- a/drivers/media/common/videobuf2/videobuf2-core.c
> +++ b/drivers/media/common/videobuf2/videobuf2-core.c
> @@ -721,39 +721,14 @@ int vb2_verify_memory_type(struct vb2_queue *q,
>  }
>  EXPORT_SYMBOL(vb2_verify_memory_type);
>
> -static void set_queue_consistency(struct vb2_queue *q, bool consistent_mem)
> -{
> -       q->dma_attrs &= ~DMA_ATTR_NON_CONSISTENT;
> -
> -       if (!vb2_queue_allows_cache_hints(q))
> -               return;
> -       if (!consistent_mem)
> -               q->dma_attrs |= DMA_ATTR_NON_CONSISTENT;
> -}
> -
> -static bool verify_consistency_attr(struct vb2_queue *q, bool consistent_mem)
> -{
> -       bool queue_is_consistent = !(q->dma_attrs & DMA_ATTR_NON_CONSISTENT);
> -
> -       if (consistent_mem != queue_is_consistent) {
> -               dprintk(q, 1, "memory consistency model mismatch\n");
> -               return false;
> -       }
> -       return true;
> -}
> -
>  int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
>                      unsigned int flags, unsigned int *count)
>  {
>         unsigned int num_buffers, allocated_buffers, num_planes = 0;
>         unsigned plane_sizes[VB2_MAX_PLANES] = { };
> -       bool consistent_mem = true;
>         unsigned int i;
>         int ret;
>
> -       if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT)
> -               consistent_mem = false;
> -
>         if (q->streaming) {
>                 dprintk(q, 1, "streaming active\n");
>                 return -EBUSY;
> @@ -765,8 +740,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
>         }
>
>         if (*count == 0 || q->num_buffers != 0 ||
> -           (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory) ||
> -           !verify_consistency_attr(q, consistent_mem)) {
> +           (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory)) {
>                 /*
>                  * We already have buffers allocated, so first check if they
>                  * are not in use and can be freed.
> @@ -803,7 +777,6 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
>         num_buffers = min_t(unsigned int, num_buffers, VB2_MAX_FRAME);
>         memset(q->alloc_devs, 0, sizeof(q->alloc_devs));
>         q->memory = memory;
> -       set_queue_consistency(q, consistent_mem);
>
>         /*
>          * Ask the driver how many buffers and planes per buffer it requires.
> @@ -894,12 +867,8 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
>  {
>         unsigned int num_planes = 0, num_buffers, allocated_buffers;
>         unsigned plane_sizes[VB2_MAX_PLANES] = { };
> -       bool consistent_mem = true;
>         int ret;
>
> -       if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT)
> -               consistent_mem = false;
> -
>         if (q->num_buffers == VB2_MAX_FRAME) {
>                 dprintk(q, 1, "maximum number of buffers already allocated\n");
>                 return -ENOBUFS;
> @@ -912,15 +881,12 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
>                 }
>                 memset(q->alloc_devs, 0, sizeof(q->alloc_devs));
>                 q->memory = memory;
> -               set_queue_consistency(q, consistent_mem);
>                 q->waiting_for_buffers = !q->is_output;
>         } else {
>                 if (q->memory != memory) {
>                         dprintk(q, 1, "memory model mismatch\n");
>                         return -EINVAL;
>                 }
> -               if (!verify_consistency_attr(q, consistent_mem))
> -                       return -EINVAL;
>         }
>
>         num_buffers = min(*count, VB2_MAX_FRAME - q->num_buffers);
> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> index ec3446cc45b8da..7b1b86ec942d7d 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> @@ -42,11 +42,6 @@ struct vb2_dc_buf {
>         struct dma_buf_attachment       *db_attach;
>  };
>
> -static inline bool vb2_dc_buffer_consistent(unsigned long attr)
> -{
> -       return !(attr & DMA_ATTR_NON_CONSISTENT);
> -}
> -
>  /*********************************************/
>  /*        scatterlist table functions        */
>  /*********************************************/
> @@ -341,13 +336,6 @@ static int
>  vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
>                                    enum dma_data_direction direction)
>  {
> -       struct vb2_dc_buf *buf = dbuf->priv;
> -       struct sg_table *sgt = buf->dma_sgt;
> -
> -       if (vb2_dc_buffer_consistent(buf->attrs))
> -               return 0;
> -
> -       dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
>         return 0;
>  }
>
> @@ -355,13 +343,6 @@ static int
>  vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf,
>                                  enum dma_data_direction direction)
>  {
> -       struct vb2_dc_buf *buf = dbuf->priv;
> -       struct sg_table *sgt = buf->dma_sgt;
> -
> -       if (vb2_dc_buffer_consistent(buf->attrs))
> -               return 0;
> -
> -       dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
>         return 0;
>  }
>
> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> index 0a40e00f0d7e5c..a86fce5d8ea8bf 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> @@ -123,8 +123,7 @@ static void *vb2_dma_sg_alloc(struct device *dev, unsigned long dma_attrs,
>         /*
>          * NOTE: dma-sg allocates memory using the page allocator directly, so
>          * there is no memory consistency guarantee, hence dma-sg ignores DMA
> -        * attributes passed from the upper layer. That means that
> -        * V4L2_FLAG_MEMORY_NON_CONSISTENT has no effect on dma-sg buffers.
> +        * attributes passed from the upper layer.
>          */
>         buf->pages = kvmalloc_array(buf->num_pages, sizeof(struct page *),
>                                     GFP_KERNEL | __GFP_ZERO);
> diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
> index 30caad27281e1a..de83ad48783821 100644
> --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
> +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
> @@ -722,20 +722,11 @@ static void fill_buf_caps(struct vb2_queue *q, u32 *caps)
>  #endif
>  }
>
> -static void clear_consistency_attr(struct vb2_queue *q,
> -                                  int memory,
> -                                  unsigned int *flags)
> -{
> -       if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP)
> -               *flags &= ~V4L2_FLAG_MEMORY_NON_CONSISTENT;
> -}
> -
>  int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req)
>  {
>         int ret = vb2_verify_memory_type(q, req->memory, req->type);
>
>         fill_buf_caps(q, &req->capabilities);
> -       clear_consistency_attr(q, req->memory, &req->flags);
>         return ret ? ret : vb2_core_reqbufs(q, req->memory,
>                                             req->flags, &req->count);
>  }
> @@ -769,7 +760,6 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create)
>         unsigned i;
>
>         fill_buf_caps(q, &create->capabilities);
> -       clear_consistency_attr(q, create->memory, &create->flags);
>         create->index = q->num_buffers;
>         if (create->count == 0)
>                 return ret != -EBUSY ? ret : 0;
> @@ -998,7 +988,6 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv,
>         int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type);
>
>         fill_buf_caps(vdev->queue, &p->capabilities);
> -       clear_consistency_attr(vdev->queue, p->memory, &p->flags);
>         if (res)
>                 return res;
>         if (vb2_queue_is_busy(vdev, file))
> @@ -1021,7 +1010,6 @@ int vb2_ioctl_create_bufs(struct file *file, void *priv,
>
>         p->index = vdev->queue->num_buffers;
>         fill_buf_caps(vdev->queue, &p->capabilities);
> -       clear_consistency_attr(vdev->queue, p->memory, &p->flags);
>         /*
>          * If count == 0, then just check if memory and type are valid.
>          * Any -EBUSY result from vb2_verify_memory_type can be mapped to 0.
> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> index 52ef92049073e3..4c7f25b07e9375 100644
> --- a/include/media/videobuf2-core.h
> +++ b/include/media/videobuf2-core.h
> @@ -744,8 +744,7 @@ void vb2_core_querybuf(struct vb2_queue *q, unsigned int index, void *pb);
>   * vb2_core_reqbufs() - Initiate streaming.
>   * @q:         pointer to &struct vb2_queue with videobuf2 queue.
>   * @memory:    memory type, as defined by &enum vb2_memory.
> - * @flags:     auxiliary queue/buffer management flags. Currently, the only
> - *             used flag is %V4L2_FLAG_MEMORY_NON_CONSISTENT.
> + * @flags:     auxiliary queue/buffer management flags.
>   * @count:     requested buffer count.
>   *
>   * Videobuf2 core helper to implement VIDIOC_REQBUF() operation. It is called
> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> index c7b70ff53bc1dd..5c00f63d9c1b58 100644
> --- a/include/uapi/linux/videodev2.h
> +++ b/include/uapi/linux/videodev2.h
> @@ -191,8 +191,6 @@ enum v4l2_memory {
>         V4L2_MEMORY_DMABUF           = 4,
>  };
>
> -#define V4L2_FLAG_MEMORY_NON_CONSISTENT                (1 << 0)
> -
>  /* see also http://vektor.theorem.ca/graphics/ycbcr/ */
>  enum v4l2_colorspace {
>         /*
> --
> 2.28.0
>
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-19 11:16     ` Tomasz Figa
@ 2020-08-19 11:51       ` Robin Murphy
  2020-08-19 12:49         ` Tomasz Figa
  2020-08-19 13:54       ` Christoph Hellwig
  1 sibling, 1 reply; 77+ messages in thread
From: Robin Murphy @ 2020-08-19 11:51 UTC (permalink / raw)
  To: Tomasz Figa, Christoph Hellwig
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, linux-mips, James E.J. Bottomley, linux-mm,
	linux-samsung-soc, Joonyoung Shim, linux-scsi, iommu, Ben Skeggs,
	Matt Porter, Linux Media Mailing List, Tom Lendacky,
	Pawel Osciak, Mauro Carvalho Chehab, linux-arm-kernel,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	Linux Kernel Mailing List, Kyungmin Park

Hi Tomasz,

On 2020-08-19 12:16, Tomasz Figa wrote:
> Hi Christoph,
> 
> On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch@lst.de> wrote:
>>
>> The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,
> 
> Could you explain what makes you think it's unused? It's a feature of
> the UAPI generally supported by the videobuf2 framework and relied on
> by Chromium OS to get any kind of reasonable performance when
> accessing V4L2 buffers in the userspace.
> 
>> and causes
>> weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is
>> unimplemented except on PARISC and some MIPS configs, and about to be
>> removed.
> 
> It is implemented by the generic DMA mapping layer [1], which is used
> by a number of architectures including ARM64 and supposed to be used
> by new architectures going forward.

AFAICS all that V4L2_FLAG_MEMORY_NON_CONSISTENT does is end up 
controling whether DMA_ATTR_NON_CONSISTENT is added to vb2_queue::dma_attrs.

Please can you point to where DMA_ATTR_NON_CONSISTENT does anything at 
all on arm64?

Also, I posit that videobuf2 is not actually relying on 
DMA_ATTR_NON_CONSISTENT anyway, since it's clearly not using it properly:

"By using this API, you are guaranteeing to the platform
that you have all the correct and necessary sync points for this memory
in the driver should it choose to return non-consistent memory."

$ git grep dma_cache_sync drivers/media
$

Robin.

> [1] https://elixir.bootlin.com/linux/v5.9-rc1/source/kernel/dma/mapping.c#L341
> 
> When removing features from generic kernel code, I'd suggest first
> providing viable alternatives for its users, rather than killing the
> users altogether.
> 
> Given the above, I'm afraid I have to NAK this.
> 
> Best regards,
> Tomasz
> 
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>>   .../userspace-api/media/v4l/buffer.rst        | 17 ---------
>>   .../media/v4l/vidioc-reqbufs.rst              |  1 -
>>   .../media/common/videobuf2/videobuf2-core.c   | 36 +------------------
>>   .../common/videobuf2/videobuf2-dma-contig.c   | 19 ----------
>>   .../media/common/videobuf2/videobuf2-dma-sg.c |  3 +-
>>   .../media/common/videobuf2/videobuf2-v4l2.c   | 12 -------
>>   include/media/videobuf2-core.h                |  3 +-
>>   include/uapi/linux/videodev2.h                |  2 --
>>   8 files changed, 3 insertions(+), 90 deletions(-)
>>
>> diff --git a/Documentation/userspace-api/media/v4l/buffer.rst b/Documentation/userspace-api/media/v4l/buffer.rst
>> index 57e752aaf414a7..2044ed13cd9d7d 100644
>> --- a/Documentation/userspace-api/media/v4l/buffer.rst
>> +++ b/Documentation/userspace-api/media/v4l/buffer.rst
>> @@ -701,23 +701,6 @@ Memory Consistency Flags
>>       :stub-columns: 0
>>       :widths:       3 1 4
>>
>> -    * .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`:
>> -
>> -      - ``V4L2_FLAG_MEMORY_NON_CONSISTENT``
>> -      - 0x00000001
>> -      - A buffer is allocated either in consistent (it will be automatically
>> -       coherent between the CPU and the bus) or non-consistent memory. The
>> -       latter can provide performance gains, for instance the CPU cache
>> -       sync/flush operations can be avoided if the buffer is accessed by the
>> -       corresponding device only and the CPU does not read/write to/from that
>> -       buffer. However, this requires extra care from the driver -- it must
>> -       guarantee memory consistency by issuing a cache flush/sync when
>> -       consistency is needed. If this flag is set V4L2 will attempt to
>> -       allocate the buffer in non-consistent memory. The flag takes effect
>> -       only if the buffer is used for :ref:`memory mapping <mmap>` I/O and the
>> -       queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
>> -       <V4L2-BUF-CAP-SUPPORTS-MMAP-CACHE-HINTS>` capability.
>> -
>>   .. c:type:: v4l2_memory
>>
>>   enum v4l2_memory
>> diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
>> index 75d894d9c36c42..3180c111d368ee 100644
>> --- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
>> +++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
>> @@ -169,7 +169,6 @@ aborting or finishing any DMA in progress, an implicit
>>         - This capability is set by the driver to indicate that the queue supports
>>           cache and memory management hints. However, it's only valid when the
>>           queue is used for :ref:`memory mapping <mmap>` streaming I/O. See
>> -        :ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT <V4L2-FLAG-MEMORY-NON-CONSISTENT>`,
>>           :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE <V4L2-BUF-FLAG-NO-CACHE-INVALIDATE>` and
>>           :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN <V4L2-BUF-FLAG-NO-CACHE-CLEAN>`.
>>
>> diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
>> index f544d3393e9d6b..66a41cef33c1b1 100644
>> --- a/drivers/media/common/videobuf2/videobuf2-core.c
>> +++ b/drivers/media/common/videobuf2/videobuf2-core.c
>> @@ -721,39 +721,14 @@ int vb2_verify_memory_type(struct vb2_queue *q,
>>   }
>>   EXPORT_SYMBOL(vb2_verify_memory_type);
>>
>> -static void set_queue_consistency(struct vb2_queue *q, bool consistent_mem)
>> -{
>> -       q->dma_attrs &= ~DMA_ATTR_NON_CONSISTENT;
>> -
>> -       if (!vb2_queue_allows_cache_hints(q))
>> -               return;
>> -       if (!consistent_mem)
>> -               q->dma_attrs |= DMA_ATTR_NON_CONSISTENT;
>> -}
>> -
>> -static bool verify_consistency_attr(struct vb2_queue *q, bool consistent_mem)
>> -{
>> -       bool queue_is_consistent = !(q->dma_attrs & DMA_ATTR_NON_CONSISTENT);
>> -
>> -       if (consistent_mem != queue_is_consistent) {
>> -               dprintk(q, 1, "memory consistency model mismatch\n");
>> -               return false;
>> -       }
>> -       return true;
>> -}
>> -
>>   int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
>>                       unsigned int flags, unsigned int *count)
>>   {
>>          unsigned int num_buffers, allocated_buffers, num_planes = 0;
>>          unsigned plane_sizes[VB2_MAX_PLANES] = { };
>> -       bool consistent_mem = true;
>>          unsigned int i;
>>          int ret;
>>
>> -       if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT)
>> -               consistent_mem = false;
>> -
>>          if (q->streaming) {
>>                  dprintk(q, 1, "streaming active\n");
>>                  return -EBUSY;
>> @@ -765,8 +740,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
>>          }
>>
>>          if (*count == 0 || q->num_buffers != 0 ||
>> -           (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory) ||
>> -           !verify_consistency_attr(q, consistent_mem)) {
>> +           (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory)) {
>>                  /*
>>                   * We already have buffers allocated, so first check if they
>>                   * are not in use and can be freed.
>> @@ -803,7 +777,6 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
>>          num_buffers = min_t(unsigned int, num_buffers, VB2_MAX_FRAME);
>>          memset(q->alloc_devs, 0, sizeof(q->alloc_devs));
>>          q->memory = memory;
>> -       set_queue_consistency(q, consistent_mem);
>>
>>          /*
>>           * Ask the driver how many buffers and planes per buffer it requires.
>> @@ -894,12 +867,8 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
>>   {
>>          unsigned int num_planes = 0, num_buffers, allocated_buffers;
>>          unsigned plane_sizes[VB2_MAX_PLANES] = { };
>> -       bool consistent_mem = true;
>>          int ret;
>>
>> -       if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT)
>> -               consistent_mem = false;
>> -
>>          if (q->num_buffers == VB2_MAX_FRAME) {
>>                  dprintk(q, 1, "maximum number of buffers already allocated\n");
>>                  return -ENOBUFS;
>> @@ -912,15 +881,12 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
>>                  }
>>                  memset(q->alloc_devs, 0, sizeof(q->alloc_devs));
>>                  q->memory = memory;
>> -               set_queue_consistency(q, consistent_mem);
>>                  q->waiting_for_buffers = !q->is_output;
>>          } else {
>>                  if (q->memory != memory) {
>>                          dprintk(q, 1, "memory model mismatch\n");
>>                          return -EINVAL;
>>                  }
>> -               if (!verify_consistency_attr(q, consistent_mem))
>> -                       return -EINVAL;
>>          }
>>
>>          num_buffers = min(*count, VB2_MAX_FRAME - q->num_buffers);
>> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> index ec3446cc45b8da..7b1b86ec942d7d 100644
>> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>> @@ -42,11 +42,6 @@ struct vb2_dc_buf {
>>          struct dma_buf_attachment       *db_attach;
>>   };
>>
>> -static inline bool vb2_dc_buffer_consistent(unsigned long attr)
>> -{
>> -       return !(attr & DMA_ATTR_NON_CONSISTENT);
>> -}
>> -
>>   /*********************************************/
>>   /*        scatterlist table functions        */
>>   /*********************************************/
>> @@ -341,13 +336,6 @@ static int
>>   vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
>>                                     enum dma_data_direction direction)
>>   {
>> -       struct vb2_dc_buf *buf = dbuf->priv;
>> -       struct sg_table *sgt = buf->dma_sgt;
>> -
>> -       if (vb2_dc_buffer_consistent(buf->attrs))
>> -               return 0;
>> -
>> -       dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
>>          return 0;
>>   }
>>
>> @@ -355,13 +343,6 @@ static int
>>   vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf,
>>                                   enum dma_data_direction direction)
>>   {
>> -       struct vb2_dc_buf *buf = dbuf->priv;
>> -       struct sg_table *sgt = buf->dma_sgt;
>> -
>> -       if (vb2_dc_buffer_consistent(buf->attrs))
>> -               return 0;
>> -
>> -       dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
>>          return 0;
>>   }
>>
>> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
>> index 0a40e00f0d7e5c..a86fce5d8ea8bf 100644
>> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
>> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
>> @@ -123,8 +123,7 @@ static void *vb2_dma_sg_alloc(struct device *dev, unsigned long dma_attrs,
>>          /*
>>           * NOTE: dma-sg allocates memory using the page allocator directly, so
>>           * there is no memory consistency guarantee, hence dma-sg ignores DMA
>> -        * attributes passed from the upper layer. That means that
>> -        * V4L2_FLAG_MEMORY_NON_CONSISTENT has no effect on dma-sg buffers.
>> +        * attributes passed from the upper layer.
>>           */
>>          buf->pages = kvmalloc_array(buf->num_pages, sizeof(struct page *),
>>                                      GFP_KERNEL | __GFP_ZERO);
>> diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
>> index 30caad27281e1a..de83ad48783821 100644
>> --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
>> +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
>> @@ -722,20 +722,11 @@ static void fill_buf_caps(struct vb2_queue *q, u32 *caps)
>>   #endif
>>   }
>>
>> -static void clear_consistency_attr(struct vb2_queue *q,
>> -                                  int memory,
>> -                                  unsigned int *flags)
>> -{
>> -       if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP)
>> -               *flags &= ~V4L2_FLAG_MEMORY_NON_CONSISTENT;
>> -}
>> -
>>   int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req)
>>   {
>>          int ret = vb2_verify_memory_type(q, req->memory, req->type);
>>
>>          fill_buf_caps(q, &req->capabilities);
>> -       clear_consistency_attr(q, req->memory, &req->flags);
>>          return ret ? ret : vb2_core_reqbufs(q, req->memory,
>>                                              req->flags, &req->count);
>>   }
>> @@ -769,7 +760,6 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create)
>>          unsigned i;
>>
>>          fill_buf_caps(q, &create->capabilities);
>> -       clear_consistency_attr(q, create->memory, &create->flags);
>>          create->index = q->num_buffers;
>>          if (create->count == 0)
>>                  return ret != -EBUSY ? ret : 0;
>> @@ -998,7 +988,6 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv,
>>          int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type);
>>
>>          fill_buf_caps(vdev->queue, &p->capabilities);
>> -       clear_consistency_attr(vdev->queue, p->memory, &p->flags);
>>          if (res)
>>                  return res;
>>          if (vb2_queue_is_busy(vdev, file))
>> @@ -1021,7 +1010,6 @@ int vb2_ioctl_create_bufs(struct file *file, void *priv,
>>
>>          p->index = vdev->queue->num_buffers;
>>          fill_buf_caps(vdev->queue, &p->capabilities);
>> -       clear_consistency_attr(vdev->queue, p->memory, &p->flags);
>>          /*
>>           * If count == 0, then just check if memory and type are valid.
>>           * Any -EBUSY result from vb2_verify_memory_type can be mapped to 0.
>> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
>> index 52ef92049073e3..4c7f25b07e9375 100644
>> --- a/include/media/videobuf2-core.h
>> +++ b/include/media/videobuf2-core.h
>> @@ -744,8 +744,7 @@ void vb2_core_querybuf(struct vb2_queue *q, unsigned int index, void *pb);
>>    * vb2_core_reqbufs() - Initiate streaming.
>>    * @q:         pointer to &struct vb2_queue with videobuf2 queue.
>>    * @memory:    memory type, as defined by &enum vb2_memory.
>> - * @flags:     auxiliary queue/buffer management flags. Currently, the only
>> - *             used flag is %V4L2_FLAG_MEMORY_NON_CONSISTENT.
>> + * @flags:     auxiliary queue/buffer management flags.
>>    * @count:     requested buffer count.
>>    *
>>    * Videobuf2 core helper to implement VIDIOC_REQBUF() operation. It is called
>> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
>> index c7b70ff53bc1dd..5c00f63d9c1b58 100644
>> --- a/include/uapi/linux/videodev2.h
>> +++ b/include/uapi/linux/videodev2.h
>> @@ -191,8 +191,6 @@ enum v4l2_memory {
>>          V4L2_MEMORY_DMABUF           = 4,
>>   };
>>
>> -#define V4L2_FLAG_MEMORY_NON_CONSISTENT                (1 << 0)
>> -
>>   /* see also http://vektor.theorem.ca/graphics/ycbcr/ */
>>   enum v4l2_colorspace {
>>          /*
>> --
>> 2.28.0
>>
>> _______________________________________________
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-19 11:51       ` Robin Murphy
@ 2020-08-19 12:49         ` Tomasz Figa
  2020-08-19 13:57           ` Christoph Hellwig
  2020-08-19 14:07           ` Robin Murphy
  0 siblings, 2 replies; 77+ messages in thread
From: Tomasz Figa @ 2020-08-19 12:49 UTC (permalink / raw)
  To: Robin Murphy
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, linux-mips, James E.J. Bottomley, linux-mm,
	Christoph Hellwig, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	list@263.net:IOMMU DRIVERS, Ben Skeggs, Matt Porter,
	Linux Media Mailing List, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	Linux Kernel Mailing List, Kyungmin Park

On Wed, Aug 19, 2020 at 1:51 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> Hi Tomasz,
>
> On 2020-08-19 12:16, Tomasz Figa wrote:
> > Hi Christoph,
> >
> > On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch@lst.de> wrote:
> >>
> >> The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,
> >
> > Could you explain what makes you think it's unused? It's a feature of
> > the UAPI generally supported by the videobuf2 framework and relied on
> > by Chromium OS to get any kind of reasonable performance when
> > accessing V4L2 buffers in the userspace.
> >
> >> and causes
> >> weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is
> >> unimplemented except on PARISC and some MIPS configs, and about to be
> >> removed.
> >
> > It is implemented by the generic DMA mapping layer [1], which is used
> > by a number of architectures including ARM64 and supposed to be used
> > by new architectures going forward.
>
> AFAICS all that V4L2_FLAG_MEMORY_NON_CONSISTENT does is end up
> controling whether DMA_ATTR_NON_CONSISTENT is added to vb2_queue::dma_attrs.
>
> Please can you point to where DMA_ATTR_NON_CONSISTENT does anything at
> all on arm64?
>

With the default config it doesn't, but with
CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep
the pgprot value as is, without enforcing coherence attributes.


> Also, I posit that videobuf2 is not actually relying on
> DMA_ATTR_NON_CONSISTENT anyway, since it's clearly not using it properly:
>
> "By using this API, you are guaranteeing to the platform
> that you have all the correct and necessary sync points for this memory
> in the driver should it choose to return non-consistent memory."
>
> $ git grep dma_cache_sync drivers/media
> $

AFAIK dma_cache_sync() isn't the only way to perform the cache
synchronization. The earlier patch series that I reviewed relied on
dma_get_sgtable() and then dma_sync_sg_*() (which existed in the
vb2-dc since forever [1]). However, it looks like with the final code
the sgtable isn't acquired and the synchronization isn't happening, so
you have a point.

FWIW, I asked back in time what the plan is for non-coherent
allocations and it seemed like DMA_ATTR_NON_CONSISTENT and
dma_sync_*() was supposed to be the right thing to go with. [2] The
same thread also explains why dma_alloc_pages() isn't suitable for the
users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT.

I think we could make a deal here. We could revert back the parts
using DMA_ATTR_NON_CONSISTENT, keeping the UAPI intact, but just
rendering it no-op, since it's just a hint after all. Then, you would
propose a proper, functionally equivalent and working for ARM64,
replacement for dma_alloc_attrs(..., DMA_ATTR_NON_CONSISTENT), which
we could then use to enable the functionality expected by this UAPI.
Does it sound like something that could work as a way forward here?

By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any
series related to the subsystem-facing DMA API changes, since
videobuf2 is one of the biggest users of it.

[1] https://elixir.bootlin.com/linux/v5.9-rc1/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L98
[2] https://patchwork.kernel.org/comment/23312203/

Best regards,
Tomasz


>
> Robin.
>
> > [1] https://elixir.bootlin.com/linux/v5.9-rc1/source/kernel/dma/mapping.c#L341
> >
> > When removing features from generic kernel code, I'd suggest first
> > providing viable alternatives for its users, rather than killing the
> > users altogether.
> >
> > Given the above, I'm afraid I have to NAK this.
> >
> > Best regards,
> > Tomasz
> >
> >>
> >> Signed-off-by: Christoph Hellwig <hch@lst.de>
> >> ---
> >>   .../userspace-api/media/v4l/buffer.rst        | 17 ---------
> >>   .../media/v4l/vidioc-reqbufs.rst              |  1 -
> >>   .../media/common/videobuf2/videobuf2-core.c   | 36 +------------------
> >>   .../common/videobuf2/videobuf2-dma-contig.c   | 19 ----------
> >>   .../media/common/videobuf2/videobuf2-dma-sg.c |  3 +-
> >>   .../media/common/videobuf2/videobuf2-v4l2.c   | 12 -------
> >>   include/media/videobuf2-core.h                |  3 +-
> >>   include/uapi/linux/videodev2.h                |  2 --
> >>   8 files changed, 3 insertions(+), 90 deletions(-)
> >>
> >> diff --git a/Documentation/userspace-api/media/v4l/buffer.rst b/Documentation/userspace-api/media/v4l/buffer.rst
> >> index 57e752aaf414a7..2044ed13cd9d7d 100644
> >> --- a/Documentation/userspace-api/media/v4l/buffer.rst
> >> +++ b/Documentation/userspace-api/media/v4l/buffer.rst
> >> @@ -701,23 +701,6 @@ Memory Consistency Flags
> >>       :stub-columns: 0
> >>       :widths:       3 1 4
> >>
> >> -    * .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`:
> >> -
> >> -      - ``V4L2_FLAG_MEMORY_NON_CONSISTENT``
> >> -      - 0x00000001
> >> -      - A buffer is allocated either in consistent (it will be automatically
> >> -       coherent between the CPU and the bus) or non-consistent memory. The
> >> -       latter can provide performance gains, for instance the CPU cache
> >> -       sync/flush operations can be avoided if the buffer is accessed by the
> >> -       corresponding device only and the CPU does not read/write to/from that
> >> -       buffer. However, this requires extra care from the driver -- it must
> >> -       guarantee memory consistency by issuing a cache flush/sync when
> >> -       consistency is needed. If this flag is set V4L2 will attempt to
> >> -       allocate the buffer in non-consistent memory. The flag takes effect
> >> -       only if the buffer is used for :ref:`memory mapping <mmap>` I/O and the
> >> -       queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
> >> -       <V4L2-BUF-CAP-SUPPORTS-MMAP-CACHE-HINTS>` capability.
> >> -
> >>   .. c:type:: v4l2_memory
> >>
> >>   enum v4l2_memory
> >> diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> >> index 75d894d9c36c42..3180c111d368ee 100644
> >> --- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> >> +++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> >> @@ -169,7 +169,6 @@ aborting or finishing any DMA in progress, an implicit
> >>         - This capability is set by the driver to indicate that the queue supports
> >>           cache and memory management hints. However, it's only valid when the
> >>           queue is used for :ref:`memory mapping <mmap>` streaming I/O. See
> >> -        :ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT <V4L2-FLAG-MEMORY-NON-CONSISTENT>`,
> >>           :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE <V4L2-BUF-FLAG-NO-CACHE-INVALIDATE>` and
> >>           :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN <V4L2-BUF-FLAG-NO-CACHE-CLEAN>`.
> >>
> >> diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
> >> index f544d3393e9d6b..66a41cef33c1b1 100644
> >> --- a/drivers/media/common/videobuf2/videobuf2-core.c
> >> +++ b/drivers/media/common/videobuf2/videobuf2-core.c
> >> @@ -721,39 +721,14 @@ int vb2_verify_memory_type(struct vb2_queue *q,
> >>   }
> >>   EXPORT_SYMBOL(vb2_verify_memory_type);
> >>
> >> -static void set_queue_consistency(struct vb2_queue *q, bool consistent_mem)
> >> -{
> >> -       q->dma_attrs &= ~DMA_ATTR_NON_CONSISTENT;
> >> -
> >> -       if (!vb2_queue_allows_cache_hints(q))
> >> -               return;
> >> -       if (!consistent_mem)
> >> -               q->dma_attrs |= DMA_ATTR_NON_CONSISTENT;
> >> -}
> >> -
> >> -static bool verify_consistency_attr(struct vb2_queue *q, bool consistent_mem)
> >> -{
> >> -       bool queue_is_consistent = !(q->dma_attrs & DMA_ATTR_NON_CONSISTENT);
> >> -
> >> -       if (consistent_mem != queue_is_consistent) {
> >> -               dprintk(q, 1, "memory consistency model mismatch\n");
> >> -               return false;
> >> -       }
> >> -       return true;
> >> -}
> >> -
> >>   int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
> >>                       unsigned int flags, unsigned int *count)
> >>   {
> >>          unsigned int num_buffers, allocated_buffers, num_planes = 0;
> >>          unsigned plane_sizes[VB2_MAX_PLANES] = { };
> >> -       bool consistent_mem = true;
> >>          unsigned int i;
> >>          int ret;
> >>
> >> -       if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT)
> >> -               consistent_mem = false;
> >> -
> >>          if (q->streaming) {
> >>                  dprintk(q, 1, "streaming active\n");
> >>                  return -EBUSY;
> >> @@ -765,8 +740,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
> >>          }
> >>
> >>          if (*count == 0 || q->num_buffers != 0 ||
> >> -           (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory) ||
> >> -           !verify_consistency_attr(q, consistent_mem)) {
> >> +           (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory)) {
> >>                  /*
> >>                   * We already have buffers allocated, so first check if they
> >>                   * are not in use and can be freed.
> >> @@ -803,7 +777,6 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
> >>          num_buffers = min_t(unsigned int, num_buffers, VB2_MAX_FRAME);
> >>          memset(q->alloc_devs, 0, sizeof(q->alloc_devs));
> >>          q->memory = memory;
> >> -       set_queue_consistency(q, consistent_mem);
> >>
> >>          /*
> >>           * Ask the driver how many buffers and planes per buffer it requires.
> >> @@ -894,12 +867,8 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
> >>   {
> >>          unsigned int num_planes = 0, num_buffers, allocated_buffers;
> >>          unsigned plane_sizes[VB2_MAX_PLANES] = { };
> >> -       bool consistent_mem = true;
> >>          int ret;
> >>
> >> -       if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT)
> >> -               consistent_mem = false;
> >> -
> >>          if (q->num_buffers == VB2_MAX_FRAME) {
> >>                  dprintk(q, 1, "maximum number of buffers already allocated\n");
> >>                  return -ENOBUFS;
> >> @@ -912,15 +881,12 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
> >>                  }
> >>                  memset(q->alloc_devs, 0, sizeof(q->alloc_devs));
> >>                  q->memory = memory;
> >> -               set_queue_consistency(q, consistent_mem);
> >>                  q->waiting_for_buffers = !q->is_output;
> >>          } else {
> >>                  if (q->memory != memory) {
> >>                          dprintk(q, 1, "memory model mismatch\n");
> >>                          return -EINVAL;
> >>                  }
> >> -               if (!verify_consistency_attr(q, consistent_mem))
> >> -                       return -EINVAL;
> >>          }
> >>
> >>          num_buffers = min(*count, VB2_MAX_FRAME - q->num_buffers);
> >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> >> index ec3446cc45b8da..7b1b86ec942d7d 100644
> >> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> >> @@ -42,11 +42,6 @@ struct vb2_dc_buf {
> >>          struct dma_buf_attachment       *db_attach;
> >>   };
> >>
> >> -static inline bool vb2_dc_buffer_consistent(unsigned long attr)
> >> -{
> >> -       return !(attr & DMA_ATTR_NON_CONSISTENT);
> >> -}
> >> -
> >>   /*********************************************/
> >>   /*        scatterlist table functions        */
> >>   /*********************************************/
> >> @@ -341,13 +336,6 @@ static int
> >>   vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
> >>                                     enum dma_data_direction direction)
> >>   {
> >> -       struct vb2_dc_buf *buf = dbuf->priv;
> >> -       struct sg_table *sgt = buf->dma_sgt;
> >> -
> >> -       if (vb2_dc_buffer_consistent(buf->attrs))
> >> -               return 0;
> >> -
> >> -       dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
> >>          return 0;
> >>   }
> >>
> >> @@ -355,13 +343,6 @@ static int
> >>   vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf,
> >>                                   enum dma_data_direction direction)
> >>   {
> >> -       struct vb2_dc_buf *buf = dbuf->priv;
> >> -       struct sg_table *sgt = buf->dma_sgt;
> >> -
> >> -       if (vb2_dc_buffer_consistent(buf->attrs))
> >> -               return 0;
> >> -
> >> -       dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
> >>          return 0;
> >>   }
> >>
> >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> >> index 0a40e00f0d7e5c..a86fce5d8ea8bf 100644
> >> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> >> @@ -123,8 +123,7 @@ static void *vb2_dma_sg_alloc(struct device *dev, unsigned long dma_attrs,
> >>          /*
> >>           * NOTE: dma-sg allocates memory using the page allocator directly, so
> >>           * there is no memory consistency guarantee, hence dma-sg ignores DMA
> >> -        * attributes passed from the upper layer. That means that
> >> -        * V4L2_FLAG_MEMORY_NON_CONSISTENT has no effect on dma-sg buffers.
> >> +        * attributes passed from the upper layer.
> >>           */
> >>          buf->pages = kvmalloc_array(buf->num_pages, sizeof(struct page *),
> >>                                      GFP_KERNEL | __GFP_ZERO);
> >> diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
> >> index 30caad27281e1a..de83ad48783821 100644
> >> --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
> >> +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
> >> @@ -722,20 +722,11 @@ static void fill_buf_caps(struct vb2_queue *q, u32 *caps)
> >>   #endif
> >>   }
> >>
> >> -static void clear_consistency_attr(struct vb2_queue *q,
> >> -                                  int memory,
> >> -                                  unsigned int *flags)
> >> -{
> >> -       if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP)
> >> -               *flags &= ~V4L2_FLAG_MEMORY_NON_CONSISTENT;
> >> -}
> >> -
> >>   int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req)
> >>   {
> >>          int ret = vb2_verify_memory_type(q, req->memory, req->type);
> >>
> >>          fill_buf_caps(q, &req->capabilities);
> >> -       clear_consistency_attr(q, req->memory, &req->flags);
> >>          return ret ? ret : vb2_core_reqbufs(q, req->memory,
> >>                                              req->flags, &req->count);
> >>   }
> >> @@ -769,7 +760,6 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create)
> >>          unsigned i;
> >>
> >>          fill_buf_caps(q, &create->capabilities);
> >> -       clear_consistency_attr(q, create->memory, &create->flags);
> >>          create->index = q->num_buffers;
> >>          if (create->count == 0)
> >>                  return ret != -EBUSY ? ret : 0;
> >> @@ -998,7 +988,6 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv,
> >>          int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type);
> >>
> >>          fill_buf_caps(vdev->queue, &p->capabilities);
> >> -       clear_consistency_attr(vdev->queue, p->memory, &p->flags);
> >>          if (res)
> >>                  return res;
> >>          if (vb2_queue_is_busy(vdev, file))
> >> @@ -1021,7 +1010,6 @@ int vb2_ioctl_create_bufs(struct file *file, void *priv,
> >>
> >>          p->index = vdev->queue->num_buffers;
> >>          fill_buf_caps(vdev->queue, &p->capabilities);
> >> -       clear_consistency_attr(vdev->queue, p->memory, &p->flags);
> >>          /*
> >>           * If count == 0, then just check if memory and type are valid.
> >>           * Any -EBUSY result from vb2_verify_memory_type can be mapped to 0.
> >> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> >> index 52ef92049073e3..4c7f25b07e9375 100644
> >> --- a/include/media/videobuf2-core.h
> >> +++ b/include/media/videobuf2-core.h
> >> @@ -744,8 +744,7 @@ void vb2_core_querybuf(struct vb2_queue *q, unsigned int index, void *pb);
> >>    * vb2_core_reqbufs() - Initiate streaming.
> >>    * @q:         pointer to &struct vb2_queue with videobuf2 queue.
> >>    * @memory:    memory type, as defined by &enum vb2_memory.
> >> - * @flags:     auxiliary queue/buffer management flags. Currently, the only
> >> - *             used flag is %V4L2_FLAG_MEMORY_NON_CONSISTENT.
> >> + * @flags:     auxiliary queue/buffer management flags.
> >>    * @count:     requested buffer count.
> >>    *
> >>    * Videobuf2 core helper to implement VIDIOC_REQBUF() operation. It is called
> >> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> >> index c7b70ff53bc1dd..5c00f63d9c1b58 100644
> >> --- a/include/uapi/linux/videodev2.h
> >> +++ b/include/uapi/linux/videodev2.h
> >> @@ -191,8 +191,6 @@ enum v4l2_memory {
> >>          V4L2_MEMORY_DMABUF           = 4,
> >>   };
> >>
> >> -#define V4L2_FLAG_MEMORY_NON_CONSISTENT                (1 << 0)
> >> -
> >>   /* see also http://vektor.theorem.ca/graphics/ycbcr/ */
> >>   enum v4l2_colorspace {
> >>          /*
> >> --
> >> 2.28.0
> >>
> >> _______________________________________________
> >> iommu mailing list
> >> iommu@lists.linux-foundation.org
> >> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> >
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-19 11:16     ` Tomasz Figa
  2020-08-19 11:51       ` Robin Murphy
@ 2020-08-19 13:54       ` Christoph Hellwig
  2020-08-19 13:57         ` Tomasz Figa
  1 sibling, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19 13:54 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, linux-mips, James E.J. Bottomley, linux-mm,
	Christoph Hellwig, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	Kyungmin Park, Ben Skeggs, Matt Porter, Linux Media Mailing List,
	Tom Lendacky, Pawel Osciak, Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	Linux Kernel Mailing List,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,

On Wed, Aug 19, 2020 at 01:16:51PM +0200, Tomasz Figa wrote:
> Hi Christoph,
> 
> On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch@lst.de> wrote:
> >
> > The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,
> 
> Could you explain what makes you think it's unused? It's a feature of
> the UAPI generally supported by the videobuf2 framework and relied on
> by Chromium OS to get any kind of reasonable performance when
> accessing V4L2 buffers in the userspace.

Because it doesn't do anything except on PARISC and non-coherent MIPS,
so by definition it isn't used by any of these media drivers.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-19 12:49         ` Tomasz Figa
@ 2020-08-19 13:57           ` Christoph Hellwig
  2020-08-19 14:11             ` Tomasz Figa
  2020-08-19 14:07           ` Robin Murphy
  1 sibling, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-19 13:57 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, linux-mips, James E.J. Bottomley, linux-mm,
	Christoph Hellwig, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	list@263.net:IOMMU DRIVERS, Ben Skeggs, Matt Porter,
	Linux Media Mailing List, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	Linux Kernel Mailing List, Kyungmin Park, Robin Murphy

On Wed, Aug 19, 2020 at 02:49:01PM +0200, Tomasz Figa wrote:
> With the default config it doesn't, but with
> CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep
> the pgprot value as is, without enforcing coherence attributes.

Which isn't selected on arm64, and that is for a good reason.

> AFAIK dma_cache_sync() isn't the only way to perform the cache
> synchronization.

Yes, it is the only documented way to do it.  And if you read the whole
series instead of screaming you'd see that it provides a proper way
to deal with non-coherent memory which will also work with arm64.
instead of screaming 

> By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any
> series related to the subsystem-facing DMA API changes, since
> videobuf2 is one of the biggest users of it.

The cc list is too long - I cc lists and key maintainers.  As a reviewer
should should watch your subsystems lists closely.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-19 13:54       ` Christoph Hellwig
@ 2020-08-19 13:57         ` Tomasz Figa
  2020-08-20  4:43           ` Christoph Hellwig
  0 siblings, 1 reply; 77+ messages in thread
From: Tomasz Figa @ 2020-08-19 13:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, Linux Kernel Mailing List, James E.J. Bottomley,
	linux-mm, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	Kyungmin Park, Ben Skeggs, Matt Porter, Linux Media Mailing List,
	Tom Lendacky, Pawel Osciak, Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	linux-mips,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,

On Wed, Aug 19, 2020 at 3:55 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, Aug 19, 2020 at 01:16:51PM +0200, Tomasz Figa wrote:
> > Hi Christoph,
> >
> > On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch@lst.de> wrote:
> > >
> > > The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,
> >
> > Could you explain what makes you think it's unused? It's a feature of
> > the UAPI generally supported by the videobuf2 framework and relied on
> > by Chromium OS to get any kind of reasonable performance when
> > accessing V4L2 buffers in the userspace.
>
> Because it doesn't do anything except on PARISC and non-coherent MIPS,
> so by definition it isn't used by any of these media drivers.

It's still an UAPI feature, so we can't simply remove the flag, it
must stay there as a no-op, until the problem is resolved.

Also, it of course might be disputable as an out-of-tree usage, but
selecting CONFIG_DMA_NONCOHERENT_CACHE_SYNC makes the flag actually do
something on other platforms, including ARM64.

Best regards,
Tomasz
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-19 12:49         ` Tomasz Figa
  2020-08-19 13:57           ` Christoph Hellwig
@ 2020-08-19 14:07           ` Robin Murphy
  2020-08-19 14:22             ` Tomasz Figa
  2020-08-20  5:02             ` Christoph Hellwig
  1 sibling, 2 replies; 77+ messages in thread
From: Robin Murphy @ 2020-08-19 14:07 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, linux-mips, James E.J. Bottomley, linux-mm,
	Christoph Hellwig, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	list@263.net:IOMMU DRIVERS, Ben Skeggs, Matt Porter,
	Linux Media Mailing List, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, Thomas Bogendoerfer,
	linux-parisc, netdev, Seung-Woo Kim, Linux Kernel Mailing List,
	Kyungmin Park

On 2020-08-19 13:49, Tomasz Figa wrote:
> On Wed, Aug 19, 2020 at 1:51 PM Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> Hi Tomasz,
>>
>> On 2020-08-19 12:16, Tomasz Figa wrote:
>>> Hi Christoph,
>>>
>>> On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch@lst.de> wrote:
>>>>
>>>> The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,
>>>
>>> Could you explain what makes you think it's unused? It's a feature of
>>> the UAPI generally supported by the videobuf2 framework and relied on
>>> by Chromium OS to get any kind of reasonable performance when
>>> accessing V4L2 buffers in the userspace.
>>>
>>>> and causes
>>>> weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is
>>>> unimplemented except on PARISC and some MIPS configs, and about to be
>>>> removed.
>>>
>>> It is implemented by the generic DMA mapping layer [1], which is used
>>> by a number of architectures including ARM64 and supposed to be used
>>> by new architectures going forward.
>>
>> AFAICS all that V4L2_FLAG_MEMORY_NON_CONSISTENT does is end up
>> controling whether DMA_ATTR_NON_CONSISTENT is added to vb2_queue::dma_attrs.
>>
>> Please can you point to where DMA_ATTR_NON_CONSISTENT does anything at
>> all on arm64?
>>
> 
> With the default config it doesn't, but with
> CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep
> the pgprot value as is, without enforcing coherence attributes.

How active are the PA-RISC and MIPS ports of Chromium OS?

Hacking CONFIG_DMA_NONCOHERENT_CACHE_SYNC into an architecture that 
doesn't provide dma_cache_sync() is wrong, since at worst it may break 
other drivers. If downstream is wildly misusing an API then so be it, 
but it's hardly a strong basis for an upstream argument.

>> Also, I posit that videobuf2 is not actually relying on
>> DMA_ATTR_NON_CONSISTENT anyway, since it's clearly not using it properly:
>>
>> "By using this API, you are guaranteeing to the platform
>> that you have all the correct and necessary sync points for this memory
>> in the driver should it choose to return non-consistent memory."
>>
>> $ git grep dma_cache_sync drivers/media
>> $
> 
> AFAIK dma_cache_sync() isn't the only way to perform the cache
> synchronization. The earlier patch series that I reviewed relied on
> dma_get_sgtable() and then dma_sync_sg_*() (which existed in the
> vb2-dc since forever [1]). However, it looks like with the final code
> the sgtable isn't acquired and the synchronization isn't happening, so
> you have a point.

Using the streaming sync calls on coherent allocations has also always 
been wrong per the API, regardless of the bodies of code that have 
happened to get away with it for so long.

> FWIW, I asked back in time what the plan is for non-coherent
> allocations and it seemed like DMA_ATTR_NON_CONSISTENT and
> dma_sync_*() was supposed to be the right thing to go with. [2] The
> same thread also explains why dma_alloc_pages() isn't suitable for the
> users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT.

AFAICS even back then Christoph was implying getting rid of 
NON_CONSISTENT and *replacing* it with something streaming-API-based - 
i.e. this series - not encouraging mixing the existing APIs. It doesn't 
seem impossible to implement a remapping version of this new 
dma_alloc_pages() for IOMMU-backed ops if it's really warranted 
(although at that point it seems like "non-coherent" vb2-dc starts to 
have significant conceptual overlap with vb2-sg).

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-19 13:57           ` Christoph Hellwig
@ 2020-08-19 14:11             ` Tomasz Figa
  2020-08-20  4:45               ` Christoph Hellwig
  0 siblings, 1 reply; 77+ messages in thread
From: Tomasz Figa @ 2020-08-19 14:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, linux-mips, James E.J. Bottomley, linux-mm,
	linux-samsung-soc, Joonyoung Shim, linux-scsi,
	list@263.net:IOMMU DRIVERS, Ben Skeggs, Matt Porter,
	Linux Media Mailing List, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	Linux Kernel Mailing List, Kyungmin Park, Robin Murphy

On Wed, Aug 19, 2020 at 3:57 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, Aug 19, 2020 at 02:49:01PM +0200, Tomasz Figa wrote:
> > With the default config it doesn't, but with
> > CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep
> > the pgprot value as is, without enforcing coherence attributes.
>
> Which isn't selected on arm64, and that is for a good reason.
>
> > AFAIK dma_cache_sync() isn't the only way to perform the cache
> > synchronization.
>
> Yes, it is the only documented way to do it.  And if you read the whole
> series instead of screaming you'd see that it provides a proper way
> to deal with non-coherent memory which will also work with arm64.
> instead of screaming
>

I'm sorry if I have offended you in any way, but would also appreciate
it if a less aggressive tone was directed towards me as well.

I have valid reasons to object to this patch, as stated in my previous
emails. The fact that the original feature has problems is of course
another story and, as I mentioned too, I'm willing to look into fixing
them.

I'm of course happy to review the rest of the series and even more
happy to help migrating this code to whatever is added there, as long
as the functionality is preserved.

> > By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any
> > series related to the subsystem-facing DMA API changes, since
> > videobuf2 is one of the biggest users of it.
>
> The cc list is too long - I cc lists and key maintainers.  As a reviewer
> should should watch your subsystems lists closely.

Well, I guess we can disagree on this, because there is no clear
policy. I'm listed in the MAINTAINERS file for the subsystem and I
believe the purpose of the file is to list the people to CC on
relevant patches. We're all overloaded with work and having to look
through the huge volume of mailing lists like linux-media doesn't help
and thus I'd still appreciate being added on CC.

Best regards,
Tomasz
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-19 14:07           ` Robin Murphy
@ 2020-08-19 14:22             ` Tomasz Figa
  2020-08-20  4:52               ` Christoph Hellwig
  2020-08-20  5:02             ` Christoph Hellwig
  1 sibling, 1 reply; 77+ messages in thread
From: Tomasz Figa @ 2020-08-19 14:22 UTC (permalink / raw)
  To: Robin Murphy
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, linux-mips, James E.J. Bottomley, linux-mm,
	Christoph Hellwig, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	list@263.net:IOMMU DRIVERS, Ben Skeggs, Matt Porter,
	Linux Media Mailing List, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	Linux Kernel Mailing List, Kyungmin Park

On Wed, Aug 19, 2020 at 4:07 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2020-08-19 13:49, Tomasz Figa wrote:
> > On Wed, Aug 19, 2020 at 1:51 PM Robin Murphy <robin.murphy@arm.com> wrote:
> >>
> >> Hi Tomasz,
> >>
> >> On 2020-08-19 12:16, Tomasz Figa wrote:
> >>> Hi Christoph,
> >>>
> >>> On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch@lst.de> wrote:
> >>>>
> >>>> The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,
> >>>
> >>> Could you explain what makes you think it's unused? It's a feature of
> >>> the UAPI generally supported by the videobuf2 framework and relied on
> >>> by Chromium OS to get any kind of reasonable performance when
> >>> accessing V4L2 buffers in the userspace.
> >>>
> >>>> and causes
> >>>> weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is
> >>>> unimplemented except on PARISC and some MIPS configs, and about to be
> >>>> removed.
> >>>
> >>> It is implemented by the generic DMA mapping layer [1], which is used
> >>> by a number of architectures including ARM64 and supposed to be used
> >>> by new architectures going forward.
> >>
> >> AFAICS all that V4L2_FLAG_MEMORY_NON_CONSISTENT does is end up
> >> controling whether DMA_ATTR_NON_CONSISTENT is added to vb2_queue::dma_attrs.
> >>
> >> Please can you point to where DMA_ATTR_NON_CONSISTENT does anything at
> >> all on arm64?
> >>
> >
> > With the default config it doesn't, but with
> > CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep
> > the pgprot value as is, without enforcing coherence attributes.
>
> How active are the PA-RISC and MIPS ports of Chromium OS?

Not active. We enable CONFIG_DMA_NONCOHERENT_CACHE_SYNC for ARM64,
given the directions received back in April when discussing the
noncoherent memory functionality on the mailing list in the thread I
pointed out in my previous message and no clarification on why it is
disabled for ARM64 in upstream, despite making several attempts to get
some.

>
> Hacking CONFIG_DMA_NONCOHERENT_CACHE_SYNC into an architecture that
> doesn't provide dma_cache_sync() is wrong, since at worst it may break
> other drivers. If downstream is wildly misusing an API then so be it,
> but it's hardly a strong basis for an upstream argument.

I guess it means that we're wildly misusing the API, but it still does
work. Could you explain how it could break other drivers?

>
> >> Also, I posit that videobuf2 is not actually relying on
> >> DMA_ATTR_NON_CONSISTENT anyway, since it's clearly not using it properly:
> >>
> >> "By using this API, you are guaranteeing to the platform
> >> that you have all the correct and necessary sync points for this memory
> >> in the driver should it choose to return non-consistent memory."
> >>
> >> $ git grep dma_cache_sync drivers/media
> >> $
> >
> > AFAIK dma_cache_sync() isn't the only way to perform the cache
> > synchronization. The earlier patch series that I reviewed relied on
> > dma_get_sgtable() and then dma_sync_sg_*() (which existed in the
> > vb2-dc since forever [1]). However, it looks like with the final code
> > the sgtable isn't acquired and the synchronization isn't happening, so
> > you have a point.
>
> Using the streaming sync calls on coherent allocations has also always
> been wrong per the API, regardless of the bodies of code that have
> happened to get away with it for so long.
>
> > FWIW, I asked back in time what the plan is for non-coherent
> > allocations and it seemed like DMA_ATTR_NON_CONSISTENT and
> > dma_sync_*() was supposed to be the right thing to go with. [2] The
> > same thread also explains why dma_alloc_pages() isn't suitable for the
> > users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT.
>
> AFAICS even back then Christoph was implying getting rid of
> NON_CONSISTENT and *replacing* it with something streaming-API-based -

That's not how I read his reply from the thread I pointed to, but that
might of course be my misunderstanding.

> i.e. this series - not encouraging mixing the existing APIs. It doesn't
> seem impossible to implement a remapping version of this new
> dma_alloc_pages() for IOMMU-backed ops if it's really warranted
> (although at that point it seems like "non-coherent" vb2-dc starts to
> have significant conceptual overlap with vb2-sg).

No, there is no overlap between vb2-dc and vb2-sg. They differ on
another level - the former is to be used by devices without
scatter-gather or internal mapping capabilities and gives the driver a
single DMA address for the whole buffer, regardless of whether it's
IOVA-contiguous (for devices behind an IOMMU) or physically contiguous
(for the others), while the latter gives the driver an sgtable, which
of course may be DMA-contiguous internally, but doesn't have to and
usually isn't. This model makes it possible to hide the SoC
implementation details from particular drivers, since those are very
often reused on many SoCs which differ in the availability of IOMMU,
DMA addressing restrictions and so on.

Best regards,
Tomasz
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 19/28] dma-mapping: replace DMA_ATTR_NON_CONSISTENT with dma_{alloc, free}_pages
  2020-08-19  6:55   ` [PATCH 19/28] dma-mapping: replace DMA_ATTR_NON_CONSISTENT with dma_{alloc, free}_pages Christoph Hellwig
@ 2020-08-19 15:03     ` Tomasz Figa
  2020-08-20  5:15       ` Christoph Hellwig
  0 siblings, 1 reply; 77+ messages in thread
From: Tomasz Figa @ 2020-08-19 15:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, Linux Kernel Mailing List, James E.J. Bottomley,
	linux-mm, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	Kyungmin Park, Ben Skeggs, Matt Porter, Linux Media Mailing List,
	Tom Lendacky, Pawel Osciak, Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	linux-mips,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,

Hi Christoph,

On Wed, Aug 19, 2020 at 8:57 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Add a new API to allocate and free pages that are guaranteed to be
> addressable by a device, but otherwise behave like pages allocated by
> alloc_pages.  The intended APIs to sync them for use with the device
> and cpu are dma_sync_single_for_{device,cpu} that are also used for
> streaming mappings.
>
> Switch all drivers over to this new API, but keep the usage of the
> crufty dma_cache_sync API for now, which will be cleaned up on a driver
> by driver basis.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  Documentation/core-api/dma-api.rst        | 68 +++++++++++------------
>  Documentation/core-api/dma-attributes.rst |  8 ---
>  arch/alpha/kernel/pci_iommu.c             |  2 +
>  arch/arm/mm/dma-mapping-nommu.c           |  2 +
>  arch/arm/mm/dma-mapping.c                 |  4 ++
>  arch/ia64/hp/common/sba_iommu.c           |  2 +
>  arch/mips/jazz/jazzdma.c                  |  7 +--
>  arch/powerpc/kernel/dma-iommu.c           |  2 +
>  arch/powerpc/platforms/ps3/system-bus.c   |  4 ++
>  arch/powerpc/platforms/pseries/vio.c      |  2 +
>  arch/s390/pci/pci_dma.c                   |  2 +
>  arch/x86/kernel/amd_gart_64.c             |  2 +
>  drivers/iommu/dma-iommu.c                 |  2 +
>  drivers/iommu/intel/iommu.c               |  4 ++
>  drivers/net/ethernet/i825xx/lasi_82596.c  | 13 ++---
>  drivers/net/ethernet/seeq/sgiseeq.c       | 12 ++--
>  drivers/parisc/ccio-dma.c                 |  2 +
>  drivers/parisc/sba_iommu.c                |  2 +
>  drivers/scsi/53c700.c                     |  8 +--
>  drivers/scsi/sgiwd93.c                    | 12 ++--
>  drivers/xen/swiotlb-xen.c                 |  2 +
>  include/linux/dma-direct.h                |  5 ++
>  include/linux/dma-mapping.h               | 29 ++++++++--
>  include/linux/dma-noncoherent.h           |  3 -
>  kernel/dma/direct.c                       | 51 ++++++++++++++++-
>  kernel/dma/mapping.c                      | 43 +++++++++++++-
>  kernel/dma/ops_helpers.c                  | 35 ++++++++++++
>  kernel/dma/virt.c                         |  2 +
>  sound/mips/hal2.c                         | 20 +++----
>  29 files changed, 254 insertions(+), 96 deletions(-)
>

Thanks for the patch. The general design looks quite nice, but please
see my comments inline.


> diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst
> index 90239348b30f6f..047fcfffa0e5cf 100644
> --- a/Documentation/core-api/dma-api.rst
> +++ b/Documentation/core-api/dma-api.rst
> @@ -516,48 +516,53 @@ routines, e.g.:::
>         }
>
>
> -Part II - Advanced dma usage
> -----------------------------
> +Part II - Non-coherent DMA allocations
> +--------------------------------------
>
> -Warning: These pieces of the DMA API should not be used in the
> -majority of cases, since they cater for unlikely corner cases that
> -don't belong in usual drivers.
> +These APIs allow to allocate pages that can be used like normal pages
> +in the kernel direct mapping, but are guaranteed to be DMA addressable.

Could we elaborate a bit more on what "like normal pages in kernel
direct mapping" mean from the driver perspective?

>
>  If you don't understand how cache line coherency works between a
>  processor and an I/O device, you should not be using this part of the
> -API at all.
> +API.
>
>  ::
>
>         void *
> -       dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
> -                       gfp_t flag, unsigned long attrs)
> +       dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle,
> +                       enum dma_data_direction dir, gfp_t gfp)
> +
> +This routine allocates a region of <size> bytes of consistent memory.  It
> +returns a pointer to the allocated region (in the processor's virtual address
> +space) or NULL if the allocation failed. The returned memory is guanteed to
> +behave like memory allocated using alloc_pages.

There is one aspect that the existing dma_alloc_attrs() handles, but
this new function doesn't: IOMMU support. The function will always
allocate a physically-contiguous block memory, which is a costly
operation and not even guaranteed to succeed, even if enough free
memory is available.

Modern SoCs employ IOMMUs to avoid the need to allocate
physically-contiguous memory and those happen to be also the devices
that could benefit from non-coherent allocations a lot. One of the
tasks of the DMA API was making it possible to allocate suitable
memory for a given device, without having the driver know about the
SoC integration details, such as the presence of an IOMMU.

Today, dma_alloc_attrs() uses the .alloc callback of the dma_ops
struct and the IOMMU-aware implementations, like the dma-iommu helpers
[1], would allocate discontiguous pages. Therefore, while I see the
DMA-aware page allocation functionality as a useful functionality on
its own for scatter-gather-capable hardware, I believe it is not a
complete replacement for dma_alloc_attrs() with the
DMA_ATTR_NON_CONSISTENT attribute.

[1] https://elixir.bootlin.com/linux/v5.9-rc1/source/drivers/iommu/dma-iommu.c#L510

Best regards,
Tomasz
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-19 13:57         ` Tomasz Figa
@ 2020-08-20  4:43           ` Christoph Hellwig
  2020-08-20  5:20             ` Christoph Hellwig
  0 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-20  4:43 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, linux-mips, James E.J. Bottomley, linux-mm,
	Christoph Hellwig, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	Kyungmin Park, Ben Skeggs, Matt Porter, Linux Media Mailing List,
	Tom Lendacky, Pawel Osciak, Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	Linux Kernel Mailing List,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,

On Wed, Aug 19, 2020 at 03:57:53PM +0200, Tomasz Figa wrote:
> > > Could you explain what makes you think it's unused? It's a feature of
> > > the UAPI generally supported by the videobuf2 framework and relied on
> > > by Chromium OS to get any kind of reasonable performance when
> > > accessing V4L2 buffers in the userspace.
> >
> > Because it doesn't do anything except on PARISC and non-coherent MIPS,
> > so by definition it isn't used by any of these media drivers.
> 
> It's still an UAPI feature, so we can't simply remove the flag, it
> must stay there as a no-op, until the problem is resolved.

Ok, I'll switch to just ignoring it for the next version.

> Also, it of course might be disputable as an out-of-tree usage, but
> selecting CONFIG_DMA_NONCOHERENT_CACHE_SYNC makes the flag actually do
> something on other platforms, including ARM64.

It isn't just disputable, but by kernel policies simply is not relevant.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-19 14:11             ` Tomasz Figa
@ 2020-08-20  4:45               ` Christoph Hellwig
  2020-08-20 10:09                 ` Tomasz Figa
  0 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-20  4:45 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, linux-mips, James E.J. Bottomley, linux-mm,
	Christoph Hellwig, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	list@263.net:IOMMU DRIVERS, Ben Skeggs, Matt Porter,
	Linux Media Mailing List, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	Linux Kernel Mailing List, Kyungmin Park, Robin Murphy

On Wed, Aug 19, 2020 at 04:11:52PM +0200, Tomasz Figa wrote:
> > > By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any
> > > series related to the subsystem-facing DMA API changes, since
> > > videobuf2 is one of the biggest users of it.
> >
> > The cc list is too long - I cc lists and key maintainers.  As a reviewer
> > should should watch your subsystems lists closely.
> 
> Well, I guess we can disagree on this, because there is no clear
> policy. I'm listed in the MAINTAINERS file for the subsystem and I
> believe the purpose of the file is to list the people to CC on
> relevant patches. We're all overloaded with work and having to look
> through the huge volume of mailing lists like linux-media doesn't help
> and thus I'd still appreciate being added on CC.

I'm happy to Cc and active participant in the discussion.  I'm not
going to add all reviewers because even with the trimmed CC list
I'm already hitting the number of receipients limit on various lists.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-19 14:22             ` Tomasz Figa
@ 2020-08-20  4:52               ` Christoph Hellwig
  0 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-20  4:52 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, linux-mips, James E.J. Bottomley, linux-mm,
	Christoph Hellwig, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	list@263.net:IOMMU DRIVERS, Ben Skeggs, Matt Porter,
	Linux Media Mailing List, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	Linux Kernel Mailing List, Kyungmin Park, Robin Murphy

On Wed, Aug 19, 2020 at 04:22:29PM +0200, Tomasz Figa wrote:
> > > FWIW, I asked back in time what the plan is for non-coherent
> > > allocations and it seemed like DMA_ATTR_NON_CONSISTENT and
> > > dma_sync_*() was supposed to be the right thing to go with. [2] The
> > > same thread also explains why dma_alloc_pages() isn't suitable for the
> > > users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT.
> >
> > AFAICS even back then Christoph was implying getting rid of
> > NON_CONSISTENT and *replacing* it with something streaming-API-based -
> 
> That's not how I read his reply from the thread I pointed to, but that
> might of course be my misunderstanding.

Yes.  Without changes like in this series just calling dma_sync_single_*
will break in various cases, e.g. because dma_alloc_attrs returns
memory remapped in the vmalloc space, and the dma_sync_single_*
implementation implementation can't cope with vmalloc addresses.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-19 14:07           ` Robin Murphy
  2020-08-19 14:22             ` Tomasz Figa
@ 2020-08-20  5:02             ` Christoph Hellwig
  2020-08-20 10:24               ` Tomasz Figa
  1 sibling, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-20  5:02 UTC (permalink / raw)
  To: Robin Murphy
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, linux-mips, James E.J. Bottomley, linux-mm,
	Christoph Hellwig, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	list@263.net:IOMMU DRIVERS, Ben Skeggs, Matt Porter,
	Linux Media Mailing List, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, Thomas Bogendoerfer,
	linux-parisc, netdev, Seung-Woo Kim, Linux Kernel Mailing List,
	Kyungmin Park

On Wed, Aug 19, 2020 at 03:07:04PM +0100, Robin Murphy wrote:
>> FWIW, I asked back in time what the plan is for non-coherent
>> allocations and it seemed like DMA_ATTR_NON_CONSISTENT and
>> dma_sync_*() was supposed to be the right thing to go with. [2] The
>> same thread also explains why dma_alloc_pages() isn't suitable for the
>> users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT.
>
> AFAICS even back then Christoph was implying getting rid of NON_CONSISTENT 
> and *replacing* it with something streaming-API-based - i.e. this series - 
> not encouraging mixing the existing APIs. It doesn't seem impossible to 
> implement a remapping version of this new dma_alloc_pages() for 
> IOMMU-backed ops if it's really warranted (although at that point it seems 
> like "non-coherent" vb2-dc starts to have significant conceptual overlap 
> with vb2-sg).

You can alway vmap the returned pages from dma_alloc_pages, but it will
make cache invalidation hell - you'll need to use
invalidate_kernel_vmap_range and flush_kernel_vmap_range to properly
handle virtually indexed caches.

Or with remapping you mean using the iommu do de-scatter/gather?

You can implement that trivially implement it yourself for the iommu
case:

{
	merge_boundary = dma_get_merge_boundary(dev);
	if (!merge_boundary || merge_boundary > chunk_size - 1) {
		/* can't coalesce */
		return -EINVAL;
	}

	
	nents = DIV_ROUND_UP(total_size, chunk_size);
	sg = sgl_alloc();
	for_each_sgl() {
		sg->page = __alloc_pages(get_order(chunk_size))
		sg->len = chunk_size;
	}
	dma_map_sg(sg, DMA_ATTR_SKIP_CPU_SYNC);
	// you are guaranteed to get a single dma_addr out
}

Of course this still uses the scatterlist structure with its annoying
mix of input and output parametes, so I'd rather not expose it as
an official API at the DMA layer.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 19/28] dma-mapping: replace DMA_ATTR_NON_CONSISTENT with dma_{alloc, free}_pages
  2020-08-19 15:03     ` Tomasz Figa
@ 2020-08-20  5:15       ` Christoph Hellwig
  0 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-20  5:15 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, linux-mips, James E.J. Bottomley, linux-mm,
	Christoph Hellwig, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	Kyungmin Park, Ben Skeggs, Matt Porter, Linux Media Mailing List,
	Tom Lendacky, Pawel Osciak, Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	Linux Kernel Mailing List,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,

On Wed, Aug 19, 2020 at 05:03:52PM +0200, Tomasz Figa wrote:
> >
> > -Warning: These pieces of the DMA API should not be used in the
> > -majority of cases, since they cater for unlikely corner cases that
> > -don't belong in usual drivers.
> > +These APIs allow to allocate pages that can be used like normal pages
> > +in the kernel direct mapping, but are guaranteed to be DMA addressable.
> 
> Could we elaborate a bit more on what "like normal pages in kernel
> direct mapping" mean from the driver perspective?

It mostly means you can call virt_to_page and then do anything you'd
do with a page struct.  Unlike dma_alloc_attrs that just return an
opaque virtual address that the caller is not allowed to poke into.

> There is one aspect that the existing dma_alloc_attrs() handles, but
> this new function doesn't: IOMMU support. The function will always
> allocate a physically-contiguous block memory, which is a costly
> operation and not even guaranteed to succeed, even if enough free
> memory is available.
> 
> Modern SoCs employ IOMMUs to avoid the need to allocate
> physically-contiguous memory and those happen to be also the devices
> that could benefit from non-coherent allocations a lot. One of the
> tasks of the DMA API was making it possible to allocate suitable
> memory for a given device, without having the driver know about the
> SoC integration details, such as the presence of an IOMMU.

This is completely out of scope for this API exactly because it
guarantees a page in the direct mapping.  But see my previous mail
in reply to Robin on how you can implement the funtionality you
want right now without any help from the dma-mapping subsystem.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-20  4:43           ` Christoph Hellwig
@ 2020-08-20  5:20             ` Christoph Hellwig
  2020-08-20 10:05               ` Tomasz Figa
  0 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-20  5:20 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, Linux Kernel Mailing List, James E.J. Bottomley,
	linux-mm, Christoph Hellwig, linux-samsung-soc, Joonyoung Shim,
	linux-scsi,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Ben Skeggs, Matt Porter, Linux Media Mailing List, Tom Lendacky,
	Pawel Osciak, Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	linux-mips, Kyungmin Park

On Thu, Aug 20, 2020 at 06:43:47AM +0200, Christoph Hellwig wrote:
> On Wed, Aug 19, 2020 at 03:57:53PM +0200, Tomasz Figa wrote:
> > > > Could you explain what makes you think it's unused? It's a feature of
> > > > the UAPI generally supported by the videobuf2 framework and relied on
> > > > by Chromium OS to get any kind of reasonable performance when
> > > > accessing V4L2 buffers in the userspace.
> > >
> > > Because it doesn't do anything except on PARISC and non-coherent MIPS,
> > > so by definition it isn't used by any of these media drivers.
> > 
> > It's still an UAPI feature, so we can't simply remove the flag, it
> > must stay there as a no-op, until the problem is resolved.
> 
> Ok, I'll switch to just ignoring it for the next version.

So I took a deeper look.  I don't really think it qualifies as a UAPI
in our traditional sense.  For one it only appeared in 5.9-rc1, so we
can trivially expedite the patch into 5.9-rc and not actually make it
show up in any released kernel version.  And even as of the current
Linus' tree the only user is a test driver.  So I really think the best
way to go ahead is to just revert it ASAP as the design wasn't thought
out at all.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-20  5:20             ` Christoph Hellwig
@ 2020-08-20 10:05               ` Tomasz Figa
  2020-08-20 16:54                 ` Christoph Hellwig
  0 siblings, 1 reply; 77+ messages in thread
From: Tomasz Figa @ 2020-08-20 10:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, Linux Kernel Mailing List, James E.J. Bottomley,
	linux-mm, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Ben Skeggs, Matt Porter, Linux Media Mailing List, Tom Lendacky,
	Pawel Osciak, Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	linux-mips, Kyungmin Park

On Thu, Aug 20, 2020 at 7:20 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, Aug 20, 2020 at 06:43:47AM +0200, Christoph Hellwig wrote:
> > On Wed, Aug 19, 2020 at 03:57:53PM +0200, Tomasz Figa wrote:
> > > > > Could you explain what makes you think it's unused? It's a feature of
> > > > > the UAPI generally supported by the videobuf2 framework and relied on
> > > > > by Chromium OS to get any kind of reasonable performance when
> > > > > accessing V4L2 buffers in the userspace.
> > > >
> > > > Because it doesn't do anything except on PARISC and non-coherent MIPS,
> > > > so by definition it isn't used by any of these media drivers.
> > >
> > > It's still an UAPI feature, so we can't simply remove the flag, it
> > > must stay there as a no-op, until the problem is resolved.
> >
> > Ok, I'll switch to just ignoring it for the next version.
>
> So I took a deeper look.  I don't really think it qualifies as a UAPI
> in our traditional sense.  For one it only appeared in 5.9-rc1, so we
> can trivially expedite the patch into 5.9-rc and not actually make it
> show up in any released kernel version.  And even as of the current
> Linus' tree the only user is a test driver.  So I really think the best
> way to go ahead is to just revert it ASAP as the design wasn't thought
> out at all.

The UAPI and V4L2/videobuf2 changes are in good shape and the only
wrong part is the use of DMA API, which was based on an earlier email
guidance anyway, and a change to the synchronization part . I find
conclusions like the above insulting for people who put many hours
into designing and implementing the related functionality, given the
complexity of the videobuf2 framework and how ill-defined the DMA API
was, and would feel better if such could be avoided in future
communication.

That said, we can revert it on the basis of the implementation issues,
but I feel like we wouldn't get anything by doing so, because as I
said, the design is sane and most of the implementation is fine as
well. Instead. I'd suggest simply removing the use of the attribute
being removed, so that the feature stays no-op until the DMA API
provides a way to implement it or we just migrate videobuf2 to stop
using the DMA API as much as possible, like many drivers in the DRM
subsystem did.

Best regards,
Tomasz
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-20  4:45               ` Christoph Hellwig
@ 2020-08-20 10:09                 ` Tomasz Figa
  2020-08-20 16:51                   ` Christoph Hellwig
  0 siblings, 1 reply; 77+ messages in thread
From: Tomasz Figa @ 2020-08-20 10:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, linux-mips, James E.J. Bottomley, linux-mm,
	linux-samsung-soc, Joonyoung Shim, linux-scsi,
	list@263.net:IOMMU DRIVERS, Ben Skeggs, Matt Porter,
	Linux Media Mailing List, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	Linux Kernel Mailing List, Kyungmin Park, Robin Murphy

On Thu, Aug 20, 2020 at 6:45 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, Aug 19, 2020 at 04:11:52PM +0200, Tomasz Figa wrote:
> > > > By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any
> > > > series related to the subsystem-facing DMA API changes, since
> > > > videobuf2 is one of the biggest users of it.
> > >
> > > The cc list is too long - I cc lists and key maintainers.  As a reviewer
> > > should should watch your subsystems lists closely.
> >
> > Well, I guess we can disagree on this, because there is no clear
> > policy. I'm listed in the MAINTAINERS file for the subsystem and I
> > believe the purpose of the file is to list the people to CC on
> > relevant patches. We're all overloaded with work and having to look
> > through the huge volume of mailing lists like linux-media doesn't help
> > and thus I'd still appreciate being added on CC.
>
> I'm happy to Cc and active participant in the discussion.  I'm not
> going to add all reviewers because even with the trimmed CC list
> I'm already hitting the number of receipients limit on various lists.

Fair enough.

We'll make your job easier and just turn my MAINTAINERS entry into a
maintainer. :)

Best regards,
Tomasz
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-20  5:02             ` Christoph Hellwig
@ 2020-08-20 10:24               ` Tomasz Figa
  2020-08-20 16:52                 ` Christoph Hellwig
  0 siblings, 1 reply; 77+ messages in thread
From: Tomasz Figa @ 2020-08-20 10:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, linux-mips, James E.J. Bottomley, linux-mm,
	linux-samsung-soc, Joonyoung Shim, linux-scsi,
	list@263.net:IOMMU DRIVERS, Ben Skeggs, Matt Porter,
	Linux Media Mailing List, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	Linux Kernel Mailing List, Kyungmin Park, Robin Murphy

On Thu, Aug 20, 2020 at 7:02 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, Aug 19, 2020 at 03:07:04PM +0100, Robin Murphy wrote:
> >> FWIW, I asked back in time what the plan is for non-coherent
> >> allocations and it seemed like DMA_ATTR_NON_CONSISTENT and
> >> dma_sync_*() was supposed to be the right thing to go with. [2] The
> >> same thread also explains why dma_alloc_pages() isn't suitable for the
> >> users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT.
> >
> > AFAICS even back then Christoph was implying getting rid of NON_CONSISTENT
> > and *replacing* it with something streaming-API-based - i.e. this series -
> > not encouraging mixing the existing APIs. It doesn't seem impossible to
> > implement a remapping version of this new dma_alloc_pages() for
> > IOMMU-backed ops if it's really warranted (although at that point it seems
> > like "non-coherent" vb2-dc starts to have significant conceptual overlap
> > with vb2-sg).
>
> You can alway vmap the returned pages from dma_alloc_pages, but it will
> make cache invalidation hell - you'll need to use
> invalidate_kernel_vmap_range and flush_kernel_vmap_range to properly
> handle virtually indexed caches.
>
> Or with remapping you mean using the iommu do de-scatter/gather?

Ideally, both.

For remapping in the CPU sense, there are drivers which rely on a
contiguous kernel mapping of the vb2 buffers, which was provided by
dma_alloc_attrs(). I think they could be reworked to work on single
pages, but that would significantly complicate the code. At the same
time, such drivers would actually benefit from a cached mapping,
because they often have non-bursty, random access patterns.

Then, in the IOMMU sense, the whole idea of videobuf2-dma-contig is to
rely on the DMA API to always provide device-contiguous memory, as
required by the hardware which only has a single pointer and size.

>
> You can implement that trivially implement it yourself for the iommu
> case:
>
> {
>         merge_boundary = dma_get_merge_boundary(dev);
>         if (!merge_boundary || merge_boundary > chunk_size - 1) {
>                 /* can't coalesce */
>                 return -EINVAL;
>         }
>
>
>         nents = DIV_ROUND_UP(total_size, chunk_size);
>         sg = sgl_alloc();
>         for_each_sgl() {
>                 sg->page = __alloc_pages(get_order(chunk_size))
>                 sg->len = chunk_size;
>         }
>         dma_map_sg(sg, DMA_ATTR_SKIP_CPU_SYNC);
>         // you are guaranteed to get a single dma_addr out
> }
>
> Of course this still uses the scatterlist structure with its annoying
> mix of input and output parametes, so I'd rather not expose it as
> an official API at the DMA layer.

The problem with the above open coded approach is that it requires
explicit handling of the non-IOMMU and IOMMU cases and this is exactly
what we don't want to have in vb2 and what was actually the job of the
DMA API to hide. Is the plan to actually move the IOMMU handling out
of the DMA API?

Do you think we could instead turn it into a dma_alloc_noncoherent()
helper, which has similar semantics as dma_alloc_attrs() and handles
the various corner cases (e.g. invalidate_kernel_vmap_range and
flush_kernel_vmap_range) to achieve the desired functionality without
delegating the "hell", as you called it, to the users?

Best regards,
Tomasz
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-20 10:09                 ` Tomasz Figa
@ 2020-08-20 16:51                   ` Christoph Hellwig
  0 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-20 16:51 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, linux-mips, James E.J. Bottomley, linux-mm,
	Christoph Hellwig, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	list@263.net:IOMMU DRIVERS, Ben Skeggs, Matt Porter,
	Linux Media Mailing List, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	Linux Kernel Mailing List, Kyungmin Park, Robin Murphy

On Thu, Aug 20, 2020 at 12:09:34PM +0200, Tomasz Figa wrote:
> > I'm happy to Cc and active participant in the discussion.  I'm not
> > going to add all reviewers because even with the trimmed CC list
> > I'm already hitting the number of receipients limit on various lists.
> 
> Fair enough.
> 
> We'll make your job easier and just turn my MAINTAINERS entry into a
> maintainer. :)

Sounds like a plan.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-20 10:24               ` Tomasz Figa
@ 2020-08-20 16:52                 ` Christoph Hellwig
  2020-08-20 17:41                   ` Tomasz Figa
  0 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-20 16:52 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, linux-mips, James E.J. Bottomley, linux-mm,
	Christoph Hellwig, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	list@263.net:IOMMU DRIVERS, Ben Skeggs, Matt Porter,
	Linux Media Mailing List, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	Linux Kernel Mailing List, Kyungmin Park, Robin Murphy

On Thu, Aug 20, 2020 at 12:24:31PM +0200, Tomasz Figa wrote:
> > Of course this still uses the scatterlist structure with its annoying
> > mix of input and output parametes, so I'd rather not expose it as
> > an official API at the DMA layer.
> 
> The problem with the above open coded approach is that it requires
> explicit handling of the non-IOMMU and IOMMU cases and this is exactly
> what we don't want to have in vb2 and what was actually the job of the
> DMA API to hide. Is the plan to actually move the IOMMU handling out
> of the DMA API?
> 
> Do you think we could instead turn it into a dma_alloc_noncoherent()
> helper, which has similar semantics as dma_alloc_attrs() and handles
> the various corner cases (e.g. invalidate_kernel_vmap_range and
> flush_kernel_vmap_range) to achieve the desired functionality without
> delegating the "hell", as you called it, to the users?

Yes, I guess I could do something in that direction.  At least for
dma-iommu, which thanks to Robin should be all you'll need in the
foreseeable future.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-20 10:05               ` Tomasz Figa
@ 2020-08-20 16:54                 ` Christoph Hellwig
  2020-08-20 17:33                   ` Tomasz Figa
  0 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-20 16:54 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, Linux Kernel Mailing List, James E.J. Bottomley,
	linux-mm, Christoph Hellwig, linux-samsung-soc, Joonyoung Shim,
	linux-scsi,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Ben Skeggs, Matt Porter, Linux Media Mailing List, Tom Lendacky,
	Pawel Osciak, Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	linux-mips, Kyungmin Park

On Thu, Aug 20, 2020 at 12:05:29PM +0200, Tomasz Figa wrote:
> The UAPI and V4L2/videobuf2 changes are in good shape and the only
> wrong part is the use of DMA API, which was based on an earlier email
> guidance anyway, and a change to the synchronization part . I find
> conclusions like the above insulting for people who put many hours
> into designing and implementing the related functionality, given the
> complexity of the videobuf2 framework and how ill-defined the DMA API
> was, and would feel better if such could be avoided in future
> communication.

It wasn't meant to be too insulting, but I found this out when trying
to figure out how to just disable it.  But it also ends up using
the actual dma attr flags for it's own consistency checks, so just
not setting the flag did not turn out to work that easily.

But in general it helps to add a few more people to the Cc list for
such things that do stranger things.  Especially if you think you did
it based on the advice of those people.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-20 16:54                 ` Christoph Hellwig
@ 2020-08-20 17:33                   ` Tomasz Figa
  2020-09-01 11:06                     ` Christoph Hellwig
  0 siblings, 1 reply; 77+ messages in thread
From: Tomasz Figa @ 2020-08-20 17:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, Linux Kernel Mailing List, James E.J. Bottomley,
	linux-mm, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Ben Skeggs, Matt Porter, Linux Media Mailing List, Tom Lendacky,
	Pawel Osciak, Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	linux-mips, Kyungmin Park

On Thu, Aug 20, 2020 at 6:54 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, Aug 20, 2020 at 12:05:29PM +0200, Tomasz Figa wrote:
> > The UAPI and V4L2/videobuf2 changes are in good shape and the only
> > wrong part is the use of DMA API, which was based on an earlier email
> > guidance anyway, and a change to the synchronization part . I find
> > conclusions like the above insulting for people who put many hours
> > into designing and implementing the related functionality, given the
> > complexity of the videobuf2 framework and how ill-defined the DMA API
> > was, and would feel better if such could be avoided in future
> > communication.
>
> It wasn't meant to be too insulting, but I found this out when trying
> to figure out how to just disable it.  But it also ends up using
> the actual dma attr flags for it's own consistency checks, so just
> not setting the flag did not turn out to work that easily.
>

Yes, sadly the videobuf2 ended up becoming quite counterintuitive
after growing for the long years and that is reflected in the design
of this feature as well. I think we need to do something about it.

> But in general it helps to add a few more people to the Cc list for
> such things that do stranger things.  Especially if you think you did
> it based on the advice of those people.

Indeed, we should have CCed you and other DMA folks. Sergey who worked
on this series is quite new to these areas of the kernel (although not
to the kernel itself) and it's my fault for not explicitly letting him
know to do that.

Best regards,
Tomasz
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-20 16:52                 ` Christoph Hellwig
@ 2020-08-20 17:41                   ` Tomasz Figa
  0 siblings, 0 replies; 77+ messages in thread
From: Tomasz Figa @ 2020-08-20 17:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, linux-mips, James E.J. Bottomley, linux-mm,
	linux-samsung-soc, Joonyoung Shim, linux-scsi,
	list@263.net:IOMMU DRIVERS, Ben Skeggs, Matt Porter,
	Linux Media Mailing List, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	Linux Kernel Mailing List, Kyungmin Park, Robin Murphy

On Thu, Aug 20, 2020 at 6:52 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, Aug 20, 2020 at 12:24:31PM +0200, Tomasz Figa wrote:
> > > Of course this still uses the scatterlist structure with its annoying
> > > mix of input and output parametes, so I'd rather not expose it as
> > > an official API at the DMA layer.
> >
> > The problem with the above open coded approach is that it requires
> > explicit handling of the non-IOMMU and IOMMU cases and this is exactly
> > what we don't want to have in vb2 and what was actually the job of the
> > DMA API to hide. Is the plan to actually move the IOMMU handling out
> > of the DMA API?
> >
> > Do you think we could instead turn it into a dma_alloc_noncoherent()
> > helper, which has similar semantics as dma_alloc_attrs() and handles
> > the various corner cases (e.g. invalidate_kernel_vmap_range and
> > flush_kernel_vmap_range) to achieve the desired functionality without
> > delegating the "hell", as you called it, to the users?
>
> Yes, I guess I could do something in that direction.  At least for
> dma-iommu, which thanks to Robin should be all you'll need in the
> foreseeable future.

That would be really great. Let me know if we can help by testing with
V4L2/vb2 or in any other way.

Best regards,
Tomasz
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: a saner API for allocating DMA addressable pages
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (27 preceding siblings ...)
  2020-08-19  6:55   ` [PATCH 28/28] nvme-pci: use dma_alloc_pages backed dmapools Christoph Hellwig
@ 2020-08-25 11:30   ` Marek Szyprowski
  2020-08-25 13:26     ` Christoph Hellwig
  2020-08-29  9:46   ` Helge Deller
  29 siblings, 1 reply; 77+ messages in thread
From: Marek Szyprowski @ 2020-08-25 11:30 UTC (permalink / raw)
  To: Christoph Hellwig, Mauro Carvalho Chehab, Thomas Bogendoerfer,
	James E.J. Bottomley, Joonyoung Shim, Seung-Woo Kim,
	Kyungmin Park, Ben Skeggs, Pawel Osciak, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Hi Christoph,

On 19.08.2020 08:55, Christoph Hellwig wrote:
> this series replaced the DMA_ATTR_NON_CONSISTENT flag to dma_alloc_attrs
> with a separate new dma_alloc_pages API, which is available on all
> platforms.  In addition to cleaning up the convoluted code path, this
> ensures that other drivers that have asked for better support for
> non-coherent DMA to pages with incurring bounce buffering over can finally
> be properly supported.
>
> I'm still a little unsure about the API naming, as alloc_pages sort of
> implies a struct page return value, but we return a kernel virtual
> address.  The other alternative would be to name the API
> dma_alloc_noncoherent, but the whole non-coherent naming seems to put
> people off.  As a follow up I plan to move the implementation of the
> DMA_ATTR_NO_KERNEL_MAPPING flag over to this framework as well, given
> that is also is a fundamentally non coherent allocation.  The replacement
> for that flag would then return a struct page, as it is allowed to
> actually return pages without a kernel mapping as the name suggested
> (although most of the time they will actually have a kernel mapping..)
>
> In addition to the conversions of the existing non-coherent DMA users
> the last three patches also convert the DMA coherent allocations in
> the NVMe driver to use this new framework through a dmapool addition.
> This was both to give me a good testing vehicle, but also because it
> should speed up the NVMe driver on platforms with non-coherent DMA
> nicely, without a downside on platforms with cache coherent DMA.

I really wonder what is the difference between this new API and 
alloc_pages(GFP_DMA, n). Is this API really needed? I thought that this 
is legacy thing to be removed one day...

Maybe it would make more sense to convert the few remaining drivers to 
regular dma_map_page()/dma_sync_*()/dma_unmap_page() or have I missed 
something?

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: a saner API for allocating DMA addressable pages
  2020-08-25 11:30   ` a saner API for allocating DMA addressable pages Marek Szyprowski
@ 2020-08-25 13:26     ` Christoph Hellwig
  0 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-08-25 13:26 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-kernel, James E.J. Bottomley, linux-mm, Christoph Hellwig,
	linux-samsung-soc, Joonyoung Shim, linux-scsi, Kyungmin Park,
	Ben Skeggs, Matt Porter, linux-media, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, Thomas Bogendoerfer,
	linux-parisc, netdev, Seung-Woo Kim, linux-mips, iommu

On Tue, Aug 25, 2020 at 01:30:41PM +0200, Marek Szyprowski wrote:
> I really wonder what is the difference between this new API and 
> alloc_pages(GFP_DMA, n). Is this API really needed? I thought that this 
> is legacy thing to be removed one day...

The difference is that the pages returned are guranteed to be addressable
by the devie.  This is a very important difference that matters for
a lot of use cases.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: a saner API for allocating DMA addressable pages
  2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
                     ` (28 preceding siblings ...)
  2020-08-25 11:30   ` a saner API for allocating DMA addressable pages Marek Szyprowski
@ 2020-08-29  9:46   ` Helge Deller
  29 siblings, 0 replies; 77+ messages in thread
From: Helge Deller @ 2020-08-29  9:46 UTC (permalink / raw)
  To: Christoph Hellwig, Mauro Carvalho Chehab, Thomas Bogendoerfer,
	James E.J. Bottomley, Joonyoung Shim, Seung-Woo Kim,
	Kyungmin Park, Ben Skeggs, Pawel Osciak, Marek Szyprowski,
	Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

Hi Christoph,

On 19.08.20 08:55, Christoph Hellwig wrote:
> this series replaced the DMA_ATTR_NON_CONSISTENT flag to dma_alloc_attrs
> with a separate new dma_alloc_pages API, which is available on all
> platforms.  In addition to cleaning up the convoluted code path, this
> ensures that other drivers that have asked for better support for
> non-coherent DMA to pages with incurring bounce buffering over can finally
> be properly supported.
> ....
> A git tree is available here:
>
>     git://git.infradead.org/users/hch/misc.git dma_alloc_pages

I've tested this tree on my parisc machine which uses the 53c700
and lasi_82596 drivers.
Everything worked as expected, so you may add:

Tested-by: Helge Deller <deller@gmx.de> # parisc

Thanks!
Helge

>
> Gitweb:
>
>     http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma_alloc_pages
>
>
> Diffstat:
>  Documentation/core-api/dma-api.rst                       |   92 ++----
>  Documentation/core-api/dma-attributes.rst                |    8
>  Documentation/userspace-api/media/v4l/buffer.rst         |   17 -
>  Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst |    1
>  arch/alpha/kernel/pci_iommu.c                            |    2
>  arch/arm/include/asm/dma-direct.h                        |    4
>  arch/arm/mm/dma-mapping-nommu.c                          |    2
>  arch/arm/mm/dma-mapping.c                                |    4
>  arch/ia64/Kconfig                                        |    3
>  arch/ia64/hp/common/sba_iommu.c                          |    2
>  arch/ia64/kernel/dma-mapping.c                           |   14
>  arch/ia64/mm/init.c                                      |    3
>  arch/mips/Kconfig                                        |    1
>  arch/mips/bmips/dma.c                                    |    4
>  arch/mips/cavium-octeon/dma-octeon.c                     |    4
>  arch/mips/include/asm/dma-direct.h                       |    4
>  arch/mips/include/asm/jazzdma.h                          |    2
>  arch/mips/jazz/jazzdma.c                                 |  102 +------
>  arch/mips/loongson2ef/fuloong-2e/dma.c                   |    4
>  arch/mips/loongson2ef/lemote-2f/dma.c                    |    4
>  arch/mips/loongson64/dma.c                               |    4
>  arch/mips/mm/dma-noncoherent.c                           |   48 +--
>  arch/mips/pci/pci-ar2315.c                               |    4
>  arch/mips/pci/pci-xtalk-bridge.c                         |    4
>  arch/mips/sgi-ip32/ip32-dma.c                            |    4
>  arch/parisc/Kconfig                                      |    1
>  arch/parisc/kernel/pci-dma.c                             |    6
>  arch/powerpc/include/asm/dma-direct.h                    |    4
>  arch/powerpc/kernel/dma-iommu.c                          |    2
>  arch/powerpc/platforms/ps3/system-bus.c                  |    4
>  arch/powerpc/platforms/pseries/vio.c                     |    2
>  arch/s390/pci/pci_dma.c                                  |    2
>  arch/x86/kernel/amd_gart_64.c                            |    8
>  drivers/gpu/drm/exynos/exynos_drm_gem.c                  |    2
>  drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c      |    3
>  drivers/iommu/dma-iommu.c                                |    2
>  drivers/iommu/intel/iommu.c                              |    6
>  drivers/media/common/videobuf2/videobuf2-core.c          |   36 --
>  drivers/media/common/videobuf2/videobuf2-dma-contig.c    |   19 -
>  drivers/media/common/videobuf2/videobuf2-dma-sg.c        |    3
>  drivers/media/common/videobuf2/videobuf2-v4l2.c          |   12
>  drivers/net/ethernet/amd/au1000_eth.c                    |   15 -
>  drivers/net/ethernet/i825xx/lasi_82596.c                 |   36 +-
>  drivers/net/ethernet/i825xx/lib82596.c                   |  148 +++++-----
>  drivers/net/ethernet/i825xx/sni_82596.c                  |   23 -
>  drivers/net/ethernet/seeq/sgiseeq.c                      |   24 -
>  drivers/nvme/host/pci.c                                  |   79 ++---
>  drivers/parisc/ccio-dma.c                                |    2
>  drivers/parisc/sba_iommu.c                               |    2
>  drivers/scsi/53c700.c                                    |  120 ++++----
>  drivers/scsi/53c700.h                                    |    9
>  drivers/scsi/sgiwd93.c                                   |   14
>  drivers/xen/swiotlb-xen.c                                |    2
>  include/linux/dma-direct.h                               |   55 ++-
>  include/linux/dma-mapping.h                              |   32 +-
>  include/linux/dma-noncoherent.h                          |   21 -
>  include/linux/dmapool.h                                  |   23 +
>  include/linux/gfp.h                                      |    6
>  include/media/videobuf2-core.h                           |    3
>  include/uapi/linux/videodev2.h                           |    2
>  kernel/dma/Kconfig                                       |    9
>  kernel/dma/Makefile                                      |    1
>  kernel/dma/coherent.c                                    |   17 +
>  kernel/dma/direct.c                                      |  112 +++++--
>  kernel/dma/mapping.c                                     |  104 ++-----
>  kernel/dma/ops_helpers.c                                 |   86 ++++++
>  kernel/dma/pool.c                                        |    2
>  kernel/dma/swiotlb.c                                     |    4
>  kernel/dma/virt.c                                        |    2
>  mm/dmapool.c                                             |  211 +++++++++------
>  sound/mips/hal2.c                                        |   58 +---
>  71 files changed, 872 insertions(+), 803 deletions(-)
>

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-08-20 17:33                   ` Tomasz Figa
@ 2020-09-01 11:06                     ` Christoph Hellwig
  2020-09-01 15:02                       ` Tomasz Figa
  0 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-09-01 11:06 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, Linux Kernel Mailing List, James E.J. Bottomley,
	linux-mm, Christoph Hellwig, linux-samsung-soc, Joonyoung Shim,
	linux-scsi,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Ben Skeggs, Matt Porter, Linux Media Mailing List, Tom Lendacky,
	Pawel Osciak, Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	linux-mips, Kyungmin Park

On Thu, Aug 20, 2020 at 07:33:48PM +0200, Tomasz Figa wrote:
> > It wasn't meant to be too insulting, but I found this out when trying
> > to figure out how to just disable it.  But it also ends up using
> > the actual dma attr flags for it's own consistency checks, so just
> > not setting the flag did not turn out to work that easily.
> >
> 
> Yes, sadly the videobuf2 ended up becoming quite counterintuitive
> after growing for the long years and that is reflected in the design
> of this feature as well. I think we need to do something about it.

So I'm about to respin the series and wonder how we should proceed.
I've failed to come up with a clean patch to keep the flag and make
it a no-op.  Can you or your team give it a spin?

Also I wonder if the flag should be renamed from NON_CONSISTENT
to NON_COHERENT - the consistent thing is a weird wart from the times
the old PCI DMA API that is mostly gone now.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 06/28] lib82596: move DMA allocation into the callers of i82596_probe
  2020-08-19  6:55   ` [PATCH 06/28] lib82596: move DMA allocation into the callers of i82596_probe Christoph Hellwig
@ 2020-09-01 13:29     ` Thomas Bogendoerfer
  0 siblings, 0 replies; 77+ messages in thread
From: Thomas Bogendoerfer @ 2020-09-01 13:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-kernel, James E.J. Bottomley, linux-mm, linux-samsung-soc,
	Joonyoung Shim, linux-scsi, Kyungmin Park, Ben Skeggs,
	Matt Porter, linux-media, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, linux-parisc, netdev,
	Seung-Woo Kim, linux-mips, iommu

On Wed, Aug 19, 2020 at 08:55:33AM +0200, Christoph Hellwig wrote:
> This allows us to get rid of the LIB82596_DMA_ATTR defined and prepare
> for untangling the coherent vs non-coherent DMA allocation API.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/net/ethernet/i825xx/lasi_82596.c | 24 ++++++++++------
>  drivers/net/ethernet/i825xx/lib82596.c   | 36 ++++++++----------------
>  drivers/net/ethernet/i825xx/sni_82596.c  | 19 +++++++++----
>  3 files changed, 40 insertions(+), 39 deletions(-)
> 
> [...]
> diff --git a/drivers/net/ethernet/i825xx/sni_82596.c b/drivers/net/ethernet/i825xx/sni_82596.c
> index 22f5887578b2bd..e80e790ffbd4d4 100644
> --- a/drivers/net/ethernet/i825xx/sni_82596.c
> +++ b/drivers/net/ethernet/i825xx/sni_82596.c
> @@ -24,8 +24,6 @@
>  
>  static const char sni_82596_string[] = "snirm_82596";
>  
> -#define LIB82596_DMA_ATTR	0
> -
>  #define DMA_WBACK(priv, addr, len)     do { } while (0)
>  #define DMA_INV(priv, addr, len)       do { } while (0)
>  #define DMA_WBACK_INV(priv, addr, len) do { } while (0)
> @@ -134,10 +132,19 @@ static int sni_82596_probe(struct platform_device *dev)
>  	lp->ca = ca_addr;
>  	lp->mpu_port = mpu_addr;
>  
> +	lp->dma = dma_alloc_coherent(dev->dev.parent, sizeof(struct i596_dma),
> +				     &lp->dma_addr, GFP_KERNEL);

this needs to use &dev->dev as device argument otherwise I get a

WARNING: CPU: 0 PID: 1 at linux/kernel/dma/mapping.c:416 dma_alloc_attrs+0x64/0x98

(coherent_dma_mask is set correctly).

dev->dev.parent was correct when going from netdevice to underlying device,
but now allocation is done via platform_device probe. I wonder why this works
for parisc.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 09/28] MIPS/jazzdma: remove the unused vdma_remap function
  2020-08-19  6:55   ` [PATCH 09/28] MIPS/jazzdma: remove the unused vdma_remap function Christoph Hellwig
@ 2020-09-01 13:49     ` Thomas Bogendoerfer
  0 siblings, 0 replies; 77+ messages in thread
From: Thomas Bogendoerfer @ 2020-09-01 13:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-kernel, James E.J. Bottomley, linux-mm, linux-samsung-soc,
	Joonyoung Shim, linux-scsi, Kyungmin Park, Ben Skeggs,
	Matt Porter, linux-media, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, linux-parisc, netdev,
	Seung-Woo Kim, linux-mips, iommu

On Wed, Aug 19, 2020 at 08:55:36AM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  arch/mips/include/asm/jazzdma.h |  2 -
>  arch/mips/jazz/jazzdma.c        | 70 ---------------------------------
>  2 files changed, 72 deletions(-)

Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 10/28] MIPS/jazzdma: decouple from dma-direct
  2020-08-19  6:55   ` [PATCH 10/28] MIPS/jazzdma: decouple from dma-direct Christoph Hellwig
@ 2020-09-01 13:49     ` Thomas Bogendoerfer
  0 siblings, 0 replies; 77+ messages in thread
From: Thomas Bogendoerfer @ 2020-09-01 13:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-kernel, James E.J. Bottomley, linux-mm, linux-samsung-soc,
	Joonyoung Shim, linux-scsi, Kyungmin Park, Ben Skeggs,
	Matt Porter, linux-media, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, linux-parisc, netdev,
	Seung-Woo Kim, linux-mips, iommu

On Wed, Aug 19, 2020 at 08:55:37AM +0200, Christoph Hellwig wrote:
> The jazzdma ops implement support for a very basic IOMMU.  Thus we really
> should not use the dma-direct code that takes physical address limits
> into account.  This survived through the great MIPS DMA ops cleanup mostly
> because I was lazy, but now it is time to fully split the implementations.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  arch/mips/jazz/jazzdma.c | 32 +++++++++++++++++++++-----------
>  1 file changed, 21 insertions(+), 11 deletions(-)

Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 08/28] MIPS: make dma_sync_*_for_cpu a little less overzealous
  2020-08-19  6:55   ` [PATCH 08/28] MIPS: make dma_sync_*_for_cpu a little less overzealous Christoph Hellwig
@ 2020-09-01 13:53     ` Thomas Bogendoerfer
  0 siblings, 0 replies; 77+ messages in thread
From: Thomas Bogendoerfer @ 2020-09-01 13:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-kernel, James E.J. Bottomley, linux-mm, linux-samsung-soc,
	Joonyoung Shim, linux-scsi, Kyungmin Park, Ben Skeggs,
	Matt Porter, linux-media, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, linux-parisc, netdev,
	Seung-Woo Kim, linux-mips, iommu

On Wed, Aug 19, 2020 at 08:55:35AM +0200, Christoph Hellwig wrote:
> When transferring DMA ownership back to the CPU there should never
> be any writeback from the cache, as the buffer was owned by the
> device until now.  Instead it should just be invalidated for the
> mapping directions where the device could have written data.
> Note that the changes rely on the fact that kmap_atomic is stubbed
> out for the !HIGHMEM case to simplify the code a bit.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  arch/mips/mm/dma-noncoherent.c | 44 +++++++++++++++++++++-------------
>  1 file changed, 28 insertions(+), 16 deletions(-)

Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 07/28] 53c700: improve non-coherent DMA handling
  2020-08-19  6:55   ` [PATCH 07/28] 53c700: improve non-coherent DMA handling Christoph Hellwig
@ 2020-09-01 14:52     ` James Bottomley
  2020-09-01 15:05       ` Matthew Wilcox
  0 siblings, 1 reply; 77+ messages in thread
From: James Bottomley @ 2020-09-01 14:52 UTC (permalink / raw)
  To: Christoph Hellwig, Mauro Carvalho Chehab, Thomas Bogendoerfer,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Ben Skeggs,
	Pawel Osciak, Marek Szyprowski, Matt Porter, iommu
  Cc: Tom Lendacky, alsa-devel, linux-samsung-soc, linux-ia64,
	linux-scsi, linux-parisc, linux-doc, nouveau, linux-kernel,
	linux-nvme, linux-mips, linux-mm, netdev, linux-arm-kernel,
	linux-media

On Wed, 2020-08-19 at 08:55 +0200, Christoph Hellwig wrote:
> Switch the 53c700 driver to only use non-coherent descriptor memory
> if it really has to because dma_alloc_coherent fails.  This doesn't
> matter for any of the platforms it runs on currently, but that will
> change soon.
> 
> To help with this two new helpers to transfer ownership to and from
> the device are added that abstract the syncing of the non-coherent
> memory. The two current bidirectional cases are mapped to transfers
> to the device, as that appears to what they are used for.  Note that
> for parisc, which is the only architecture this driver needs to use
> non-coherent memory on, the direction argument of dma_cache_sync is
> ignored, so this will not change behavior in any way.

I think this looks mostly OK, except for one misnamed parameter below. 
Unfortunately, the last non-coherent parisc was the 700 series and I no
longer own a box, so I can't test that part of it (I can fire up the
C360 to test it on a coherent arch).

[...]
> diff --git a/drivers/scsi/53c700.h b/drivers/scsi/53c700.h
> index 05fe439b66afe5..0f545b05fe611d 100644
> --- a/drivers/scsi/53c700.h
> +++ b/drivers/scsi/53c700.h
> @@ -209,6 +209,7 @@ struct NCR_700_Host_Parameters {
>  #endif
>  	__u32	chip710:1;	/* set if really a 710 not
> 700 */
>  	__u32	burst_length:4;	/* set to 0 to disable
> 710 bursting */
> +	__u32	noncoherent:1;	/* needs to use non-
> coherent DMA */
>  
>  	/* NOTHING BELOW HERE NEEDS ALTERING */
>  	__u32	fast:1;		/* if we can alter the
> SCSI bus clock
> @@ -429,7 +430,7 @@ struct NCR_700_Host_Parameters {
>  	for(i=0; i< (sizeof(A_##symbol##_used) / sizeof(__u32));
> i++) { \
>  		__u32 val =
> bS_to_cpu((script)[A_##symbol##_used[i]]) + da; \
>  		(script)[A_##symbol##_used[i]] = bS_to_host(val); \
> -		dma_cache_sync((dev),
> &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \
> +		dma_sync_to_dev((dev),
> &(script)[A_##symbol##_used[i]], 4); \
>  		DEBUG((" script, patching %s at %d to %pad\n", \
>  		       #symbol, A_##symbol##_used[i], &da)); \
>  	} \
> @@ -441,7 +442,7 @@ struct NCR_700_Host_Parameters {
>  	dma_addr_t da = value; \
>  	for(i=0; i< (sizeof(A_##symbol##_used) / sizeof(__u32));
> i++) { \
>  		(script)[A_##symbol##_used[i]] = bS_to_host(da); \
> -		dma_cache_sync((dev),
> &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \
> +		dma_sync_to_dev((dev),
> &(script)[A_##symbol##_used[i]], 4); \
>  		DEBUG((" script, patching %s at %d to %pad\n", \
>  		       #symbol, A_##symbol##_used[i], &da)); \
>  	} \
> @@ -456,7 +457,7 @@ struct NCR_700_Host_Parameters {
>  		val &= 0xff00ffff; \
>  		val |= ((value) & 0xff) << 16; \
>  		(script)[A_##symbol##_used[i]] = bS_to_host(val); \
> -		dma_cache_sync((dev),
> &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \
> +		dma_sync_to_dev((dev),
> &(script)[A_##symbol##_used[i]], 4); \
>  		DEBUG((" script, patching ID field %s at %d to
> 0x%x\n", \
>  		       #symbol, A_##symbol##_used[i], val)); \
>  	} \
> @@ -470,7 +471,7 @@ struct NCR_700_Host_Parameters {
>  		val &= 0xffff0000; \
>  		val |= ((value) & 0xffff); \
>  		(script)[A_##symbol##_used[i]] = bS_to_host(val); \
> -		dma_cache_sync((dev),
> &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \
> +		dma_sync_to_dev((dev),
> &(script)[A_##symbol##_used[i]], 4); \
>  		DEBUG((" script, patching short field %s at %d to
> 0x%x\n", \
>  		       #symbol, A_##symbol##_used[i], val)); \
>  	} \

These macro arguments need updating.  Since you changed the input from
hostdata->dev to hostdata, leaving the macro argument as dev is simply
misleading.  It needs to become hostdata or h.

James

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
  2020-09-01 11:06                     ` Christoph Hellwig
@ 2020-09-01 15:02                       ` Tomasz Figa
  0 siblings, 0 replies; 77+ messages in thread
From: Tomasz Figa @ 2020-09-01 15:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, Linux Doc Mailing List, nouveau,
	linux-nvme, Linux Kernel Mailing List, James E.J. Bottomley,
	linux-mm, linux-samsung-soc, Joonyoung Shim, linux-scsi,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Ben Skeggs, Matt Porter, Linux Media Mailing List, Tom Lendacky,
	Pawel Osciak, Mauro Carvalho Chehab,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Thomas Bogendoerfer, linux-parisc, netdev, Seung-Woo Kim,
	linux-mips, Kyungmin Park

On Tue, Sep 1, 2020 at 1:06 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, Aug 20, 2020 at 07:33:48PM +0200, Tomasz Figa wrote:
> > > It wasn't meant to be too insulting, but I found this out when trying
> > > to figure out how to just disable it.  But it also ends up using
> > > the actual dma attr flags for it's own consistency checks, so just
> > > not setting the flag did not turn out to work that easily.
> > >
> >
> > Yes, sadly the videobuf2 ended up becoming quite counterintuitive
> > after growing for the long years and that is reflected in the design
> > of this feature as well. I think we need to do something about it.
>
> So I'm about to respin the series and wonder how we should proceed.
> I've failed to come up with a clean patch to keep the flag and make
> it a no-op.  Can you or your team give it a spin?
>

Okay, I'll take a look.

> Also I wonder if the flag should be renamed from NON_CONSISTENT
> to NON_COHERENT - the consistent thing is a weird wart from the times
> the old PCI DMA API that is mostly gone now.

It originated from the DMA_ATTR_NON_CONSISTENT flag, but agreed that
NON_COHERENT would be more consistent (pun not intended) with the rest
of the DMA API given the removal of that flag. Let me see if we can
still change it.

Best regards,
Tomasz
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 07/28] 53c700: improve non-coherent DMA handling
  2020-09-01 14:52     ` James Bottomley
@ 2020-09-01 15:05       ` Matthew Wilcox
  2020-09-01 15:22         ` James Bottomley
  0 siblings, 1 reply; 77+ messages in thread
From: Matthew Wilcox @ 2020-09-01 15:05 UTC (permalink / raw)
  To: James Bottomley
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-kernel, linux-mm, Christoph Hellwig, linux-samsung-soc,
	Joonyoung Shim, linux-scsi, Kyungmin Park, Ben Skeggs,
	Matt Porter, linux-media, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, Thomas Bogendoerfer,
	linux-parisc, netdev, Seung-Woo Kim, linux-mips, iommu

On Tue, Sep 01, 2020 at 07:52:40AM -0700, James Bottomley wrote:
> I think this looks mostly OK, except for one misnamed parameter below. 
> Unfortunately, the last non-coherent parisc was the 700 series and I no
> longer own a box, so I can't test that part of it (I can fire up the
> C360 to test it on a coherent arch).

I have a 715/50 that probably hasn't been powered on in 15 years if you
need something that old to test on (I believe the 725/100 uses the 7100LC
and so is coherent).  I'll need to set up a cross-compiler ...
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 22/28] sgiseeq: convert from dma_cache_sync to dma_sync_single_for_device
  2020-08-19  6:55   ` [PATCH 22/28] sgiseeq: " Christoph Hellwig
@ 2020-09-01 15:22     ` Thomas Bogendoerfer
  2020-09-01 17:12       ` Thomas Bogendoerfer
  0 siblings, 1 reply; 77+ messages in thread
From: Thomas Bogendoerfer @ 2020-09-01 15:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-kernel, James E.J. Bottomley, linux-mm, linux-samsung-soc,
	Joonyoung Shim, linux-scsi, Kyungmin Park, Ben Skeggs,
	Matt Porter, linux-media, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, linux-parisc, netdev,
	Seung-Woo Kim, linux-mips, iommu

On Wed, Aug 19, 2020 at 08:55:49AM +0200, Christoph Hellwig wrote:
> Use the proper modern API to transfer cache ownership for incoherent DMA.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/net/ethernet/seeq/sgiseeq.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/seeq/sgiseeq.c b/drivers/net/ethernet/seeq/sgiseeq.c
> index 39599bbb5d45b6..f91dae16d69a19 100644
> --- a/drivers/net/ethernet/seeq/sgiseeq.c
> +++ b/drivers/net/ethernet/seeq/sgiseeq.c
> @@ -112,14 +112,18 @@ struct sgiseeq_private {
>  
>  static inline void dma_sync_desc_cpu(struct net_device *dev, void *addr)
>  {
> -	dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
> -		       DMA_FROM_DEVICE);
> +	struct sgiseeq_private *sp = netdev_priv(dev);
> +
> +	dma_sync_single_for_cpu(dev->dev.parent, VIRT_TO_DMA(sp, addr),
> +			sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL);
>  }
>  
>  static inline void dma_sync_desc_dev(struct net_device *dev, void *addr)
>  {
> -	dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
> -		       DMA_TO_DEVICE);
> +	struct sgiseeq_private *sp = netdev_priv(dev);
> +
> +	dma_sync_single_for_device(dev->dev.parent, VIRT_TO_DMA(sp, addr),
> +			sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL);
>  }

this breaks ethernet on IP22 completely, but I haven't figured out why, yet.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 07/28] 53c700: improve non-coherent DMA handling
  2020-09-01 15:05       ` Matthew Wilcox
@ 2020-09-01 15:22         ` James Bottomley
  2020-09-01 16:21           ` Helge Deller
  0 siblings, 1 reply; 77+ messages in thread
From: James Bottomley @ 2020-09-01 15:22 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-kernel, linux-mm, Christoph Hellwig, linux-samsung-soc,
	Joonyoung Shim, linux-scsi, Kyungmin Park, Ben Skeggs,
	Matt Porter, linux-media, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, Thomas Bogendoerfer,
	linux-parisc, netdev, Seung-Woo Kim, linux-mips, iommu

On Tue, 2020-09-01 at 16:05 +0100, Matthew Wilcox wrote:
> On Tue, Sep 01, 2020 at 07:52:40AM -0700, James Bottomley wrote:
> > I think this looks mostly OK, except for one misnamed parameter
> > below. Unfortunately, the last non-coherent parisc was the 700
> > series and I no longer own a box, so I can't test that part of it
> > (I can fire up the C360 to test it on a coherent arch).
> 
> I have a 715/50 that probably hasn't been powered on in 15 years if
> you need something that old to test on (I believe the 725/100 uses
> the 7100LC and so is coherent).  I'll need to set up a cross-compiler 
> ...

I'm not going to say no to actual testing, but it's going to be a world
of pain getting something so old going.  I do have a box of older
systems I keep for architectural testing that I need to rummage around
in ... I just have a vague memory that my 715 actually caught fire a
decade ago and had to be disposed of.

James

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 07/28] 53c700: improve non-coherent DMA handling
  2020-09-01 15:22         ` James Bottomley
@ 2020-09-01 16:21           ` Helge Deller
  2020-09-01 16:41             ` Helge Deller
  0 siblings, 1 reply; 77+ messages in thread
From: Helge Deller @ 2020-09-01 16:21 UTC (permalink / raw)
  To: James Bottomley, Matthew Wilcox
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-kernel, linux-mm, Christoph Hellwig, linux-samsung-soc,
	Joonyoung Shim, linux-scsi, Kyungmin Park, Ben Skeggs,
	Matt Porter, linux-media, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, Thomas Bogendoerfer,
	linux-parisc, netdev, Seung-Woo Kim, linux-mips, iommu

On 01.09.20 17:22, James Bottomley wrote:
> On Tue, 2020-09-01 at 16:05 +0100, Matthew Wilcox wrote:
>> On Tue, Sep 01, 2020 at 07:52:40AM -0700, James Bottomley wrote:
>>> I think this looks mostly OK, except for one misnamed parameter
>>> below. Unfortunately, the last non-coherent parisc was the 700
>>> series and I no longer own a box, so I can't test that part of it
>>> (I can fire up the C360 to test it on a coherent arch).
>>
>> I have a 715/50 that probably hasn't been powered on in 15 years if
>> you need something that old to test on (I believe the 725/100 uses
>> the 7100LC and so is coherent).  I'll need to set up a cross-compiler
>> ...
>
> I'm not going to say no to actual testing, but it's going to be a world
> of pain getting something so old going.  I do have a box of older
> systems I keep for architectural testing that I need to rummage around
> in ... I just have a vague memory that my 715 actually caught fire a
> decade ago and had to be disposed of.

I still have a zoo of machines running for such testing, including a
715/64 and two 730.
I'm going to test this git tree on the 715/64:
git://git.infradead.org/users/hch/misc.git dma_alloc_pages

Helge
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 07/28] 53c700: improve non-coherent DMA handling
  2020-09-01 16:21           ` Helge Deller
@ 2020-09-01 16:41             ` Helge Deller
  2020-09-01 16:53               ` Matthew Wilcox
  0 siblings, 1 reply; 77+ messages in thread
From: Helge Deller @ 2020-09-01 16:41 UTC (permalink / raw)
  To: James Bottomley, Matthew Wilcox
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-kernel, linux-mm, Christoph Hellwig, linux-samsung-soc,
	Joonyoung Shim, linux-scsi, Kyungmin Park, Ben Skeggs,
	Matt Porter, linux-media, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, Thomas Bogendoerfer,
	linux-parisc, netdev, Seung-Woo Kim, linux-mips, iommu

On 01.09.20 18:21, Helge Deller wrote:
> On 01.09.20 17:22, James Bottomley wrote:
>> On Tue, 2020-09-01 at 16:05 +0100, Matthew Wilcox wrote:
>>> On Tue, Sep 01, 2020 at 07:52:40AM -0700, James Bottomley wrote:
>>>> I think this looks mostly OK, except for one misnamed parameter
>>>> below. Unfortunately, the last non-coherent parisc was the 700
>>>> series and I no longer own a box, so I can't test that part of it
>>>> (I can fire up the C360 to test it on a coherent arch).
>>>
>>> I have a 715/50 that probably hasn't been powered on in 15 years if
>>> you need something that old to test on (I believe the 725/100 uses
>>> the 7100LC and so is coherent).  I'll need to set up a cross-compiler
>>> ...
>>
>> I'm not going to say no to actual testing, but it's going to be a world
>> of pain getting something so old going.  I do have a box of older
>> systems I keep for architectural testing that I need to rummage around
>> in ... I just have a vague memory that my 715 actually caught fire a
>> decade ago and had to be disposed of.
>
> I still have a zoo of machines running for such testing, including a
> 715/64 and two 730.
> I'm going to test this git tree on the 715/64:
> git://git.infradead.org/users/hch/misc.git dma_alloc_pages

This tree boots nicely (up to a command prompt with i82596 nic working):

53c700: Version 2.8 By James.Bottomley@HansenPartnership.com
scsi0: 53c710 rev 2
scsi host0: LASI SCSI 53c700
scsi 0:0:6:0: Direct-Access     QUANTUM  FIREBALL_TM3200S 300X PQ: 0 ANSI: 2
scsi target0:0:6: Beginning Domain Validation
scsi 0:0:6:0: tag#56 Enabling Tag Command Queuing
scsi target0:0:6: asynchronous
scsi target0:0:6: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 8)
scsi target0:0:6: Domain Validation skipping write tests
scsi target0:0:6: Ending Domain Validation
scsi 0:0:6:1: tag#63 Disabling Tag Command Queuing
st: Version 20160209, fixed bufsize 32768, s/g segs 256
sd 0:0:6:0: Power-on or device reset occurred
sd 0:0:6:0: Attached scsi generic sg0 type 0
LASI 82596 driver - Revision: 1.30
Found i82596 at 0xf0107000, IRQ 17
eth0: 82596 at 0xf0107000, 08:00:09:c2:9e:60 IRQ 17.
sd 0:0:6:0: [sda] 6281856 512-byte logical blocks: (3.22 GB/3.00 GiB)
sd 0:0:6:0: [sda] Write Protect is off

Christoph, you may add a
Tested-by: Helge Deller <deller@gmx.de> # parisc
to the series.

Helge
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 07/28] 53c700: improve non-coherent DMA handling
  2020-09-01 16:41             ` Helge Deller
@ 2020-09-01 16:53               ` Matthew Wilcox
  2020-09-02 15:00                 ` Helge Deller
  0 siblings, 1 reply; 77+ messages in thread
From: Matthew Wilcox @ 2020-09-01 16:53 UTC (permalink / raw)
  To: Helge Deller
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-kernel, James Bottomley, linux-mm, Christoph Hellwig,
	linux-samsung-soc, Joonyoung Shim, linux-scsi, Kyungmin Park,
	Ben Skeggs, Matt Porter, linux-media, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, Thomas Bogendoerfer,
	linux-parisc, netdev, Seung-Woo Kim, linux-mips, iommu

On Tue, Sep 01, 2020 at 06:41:12PM +0200, Helge Deller wrote:
> > I still have a zoo of machines running for such testing, including a
> > 715/64 and two 730.
> > I'm going to test this git tree on the 715/64:

The 715/64 is a 7100LC machine though.  I think you need to boot on
the 730 to test the non-coherent path.

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 22/28] sgiseeq: convert from dma_cache_sync to dma_sync_single_for_device
  2020-09-01 15:22     ` Thomas Bogendoerfer
@ 2020-09-01 17:12       ` Thomas Bogendoerfer
  2020-09-01 17:16         ` Christoph Hellwig
  0 siblings, 1 reply; 77+ messages in thread
From: Thomas Bogendoerfer @ 2020-09-01 17:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-mips, James E.J. Bottomley, linux-mm, linux-samsung-soc,
	Joonyoung Shim, linux-scsi, iommu, Ben Skeggs, Matt Porter,
	linux-media, Tom Lendacky, Pawel Osciak, Mauro Carvalho Chehab,
	linux-arm-kernel, linux-parisc, netdev, Seung-Woo Kim,
	linux-kernel, Kyungmin Park

On Tue, Sep 01, 2020 at 05:22:09PM +0200, Thomas Bogendoerfer wrote:
> On Wed, Aug 19, 2020 at 08:55:49AM +0200, Christoph Hellwig wrote:
> > Use the proper modern API to transfer cache ownership for incoherent DMA.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  drivers/net/ethernet/seeq/sgiseeq.c | 12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/seeq/sgiseeq.c b/drivers/net/ethernet/seeq/sgiseeq.c
> > index 39599bbb5d45b6..f91dae16d69a19 100644
> > --- a/drivers/net/ethernet/seeq/sgiseeq.c
> > +++ b/drivers/net/ethernet/seeq/sgiseeq.c
> > @@ -112,14 +112,18 @@ struct sgiseeq_private {
> >  
> >  static inline void dma_sync_desc_cpu(struct net_device *dev, void *addr)
> >  {
> > -	dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
> > -		       DMA_FROM_DEVICE);
> > +	struct sgiseeq_private *sp = netdev_priv(dev);
> > +
> > +	dma_sync_single_for_cpu(dev->dev.parent, VIRT_TO_DMA(sp, addr),
> > +			sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL);
> >  }
> >  
> >  static inline void dma_sync_desc_dev(struct net_device *dev, void *addr)
> >  {
> > -	dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
> > -		       DMA_TO_DEVICE);
> > +	struct sgiseeq_private *sp = netdev_priv(dev);
> > +
> > +	dma_sync_single_for_device(dev->dev.parent, VIRT_TO_DMA(sp, addr),
> > +			sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL);
> >  }
> 
> this breaks ethernet on IP22 completely, but I haven't figured out why, yet.

the problem is that dma_sync_single_for_cpu() doesn't flush anything
for IP22, because it only flushes for CPUs which do speculation. So
either MIPS arch_sync_dma_for_cpu() should always flush or sgiseeq
needs to use a different sync funktion, when it wants to re-read descriptors
from memory.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 22/28] sgiseeq: convert from dma_cache_sync to dma_sync_single_for_device
  2020-09-01 17:12       ` Thomas Bogendoerfer
@ 2020-09-01 17:16         ` Christoph Hellwig
  2020-09-01 17:38           ` Thomas Bogendoerfer
  0 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-09-01 17:16 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-mips, James E.J. Bottomley, linux-mm, Christoph Hellwig,
	linux-samsung-soc, Joonyoung Shim, linux-scsi, iommu, Ben Skeggs,
	Matt Porter, linux-media, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, linux-parisc, netdev,
	Seung-Woo Kim, linux-kernel, Kyungmin Park

On Tue, Sep 01, 2020 at 07:12:41PM +0200, Thomas Bogendoerfer wrote:
> On Tue, Sep 01, 2020 at 05:22:09PM +0200, Thomas Bogendoerfer wrote:
> > On Wed, Aug 19, 2020 at 08:55:49AM +0200, Christoph Hellwig wrote:
> > > Use the proper modern API to transfer cache ownership for incoherent DMA.
> > > 
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  drivers/net/ethernet/seeq/sgiseeq.c | 12 ++++++++----
> > >  1 file changed, 8 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/net/ethernet/seeq/sgiseeq.c b/drivers/net/ethernet/seeq/sgiseeq.c
> > > index 39599bbb5d45b6..f91dae16d69a19 100644
> > > --- a/drivers/net/ethernet/seeq/sgiseeq.c
> > > +++ b/drivers/net/ethernet/seeq/sgiseeq.c
> > > @@ -112,14 +112,18 @@ struct sgiseeq_private {
> > >  
> > >  static inline void dma_sync_desc_cpu(struct net_device *dev, void *addr)
> > >  {
> > > -	dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
> > > -		       DMA_FROM_DEVICE);
> > > +	struct sgiseeq_private *sp = netdev_priv(dev);
> > > +
> > > +	dma_sync_single_for_cpu(dev->dev.parent, VIRT_TO_DMA(sp, addr),
> > > +			sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL);
> > >  }
> > >  
> > >  static inline void dma_sync_desc_dev(struct net_device *dev, void *addr)
> > >  {
> > > -	dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
> > > -		       DMA_TO_DEVICE);
> > > +	struct sgiseeq_private *sp = netdev_priv(dev);
> > > +
> > > +	dma_sync_single_for_device(dev->dev.parent, VIRT_TO_DMA(sp, addr),
> > > +			sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL);
> > >  }
> > 
> > this breaks ethernet on IP22 completely, but I haven't figured out why, yet.
> 
> the problem is that dma_sync_single_for_cpu() doesn't flush anything
> for IP22, because it only flushes for CPUs which do speculation. So
> either MIPS arch_sync_dma_for_cpu() should always flush or sgiseeq
> needs to use a different sync funktion, when it wants to re-read descriptors
> from memory.

Well, if IP22 doesn't speculate (which I'm pretty sure is the case),
dma_sync_single_for_cpu should indeeed be a no-op.  But then there
also shouldn't be anything in the cache, as the previous
dma_sync_single_for_device should have invalidated it.  So it seems like
we are missing one (or more) ownership transfers to the device.  I'll
try to look at the the ownership management in a little more detail
tomorrow.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 22/28] sgiseeq: convert from dma_cache_sync to dma_sync_single_for_device
  2020-09-01 17:16         ` Christoph Hellwig
@ 2020-09-01 17:38           ` Thomas Bogendoerfer
  2020-09-02 21:38             ` Thomas Bogendoerfer
  2020-09-03  8:43             ` Christoph Hellwig
  0 siblings, 2 replies; 77+ messages in thread
From: Thomas Bogendoerfer @ 2020-09-01 17:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-mips, James E.J. Bottomley, linux-mm, linux-samsung-soc,
	Joonyoung Shim, linux-scsi, iommu, Ben Skeggs, Matt Porter,
	linux-media, Tom Lendacky, Pawel Osciak, Mauro Carvalho Chehab,
	linux-arm-kernel, linux-parisc, netdev, Seung-Woo Kim,
	linux-kernel, Kyungmin Park

On Tue, Sep 01, 2020 at 07:16:27PM +0200, Christoph Hellwig wrote:
> Well, if IP22 doesn't speculate (which I'm pretty sure is the case),
> dma_sync_single_for_cpu should indeeed be a no-op.  But then there
> also shouldn't be anything in the cache, as the previous
> dma_sync_single_for_device should have invalidated it.  So it seems like
> we are missing one (or more) ownership transfers to the device.  I'll
> try to look at the the ownership management in a little more detail
> tomorrow.

this is the problem:

       /* Always check for received packets. */
        sgiseeq_rx(dev, sp, hregs, sregs);

so the driver will look at the rx descriptor on every interrupt, so
we cache the rx descriptor on the first interrupt and if there was
$no rx packet, we will only see it, if cache line gets flushed for
some other reason. kick_tx() does a busy loop checking tx descriptors,
with just sync_desc_cpu...

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 07/28] 53c700: improve non-coherent DMA handling
  2020-09-01 16:53               ` Matthew Wilcox
@ 2020-09-02 15:00                 ` Helge Deller
  0 siblings, 0 replies; 77+ messages in thread
From: Helge Deller @ 2020-09-02 15:00 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-kernel, James Bottomley, linux-mm, Christoph Hellwig,
	linux-samsung-soc, Joonyoung Shim, linux-scsi, Kyungmin Park,
	Ben Skeggs, Matt Porter, linux-media, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, Thomas Bogendoerfer,
	linux-parisc, netdev, Seung-Woo Kim, linux-mips, iommu

Hi Willy,

On 01.09.20 18:53, Matthew Wilcox wrote:
> On Tue, Sep 01, 2020 at 06:41:12PM +0200, Helge Deller wrote:
>>> I still have a zoo of machines running for such testing, including a
>>> 715/64 and two 730.
>>> I'm going to test this git tree on the 715/64:
>
> The 715/64 is a 7100LC machine though.  I think you need to boot on
> the 730 to test the non-coherent path.

Just tested the 730, and it works as well.

Helge
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 22/28] sgiseeq: convert from dma_cache_sync to dma_sync_single_for_device
  2020-09-01 17:38           ` Thomas Bogendoerfer
@ 2020-09-02 21:38             ` Thomas Bogendoerfer
  2020-09-03  8:42               ` Christoph Hellwig
  2020-09-03  8:43             ` Christoph Hellwig
  1 sibling, 1 reply; 77+ messages in thread
From: Thomas Bogendoerfer @ 2020-09-02 21:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-mips, James E.J. Bottomley, linux-mm, linux-samsung-soc,
	Joonyoung Shim, linux-scsi, iommu, Ben Skeggs, Matt Porter,
	linux-media, Tom Lendacky, Pawel Osciak, Mauro Carvalho Chehab,
	linux-arm-kernel, linux-parisc, netdev, Seung-Woo Kim,
	linux-kernel, Kyungmin Park

On Tue, Sep 01, 2020 at 07:38:10PM +0200, Thomas Bogendoerfer wrote:
> On Tue, Sep 01, 2020 at 07:16:27PM +0200, Christoph Hellwig wrote:
> > Well, if IP22 doesn't speculate (which I'm pretty sure is the case),
> > dma_sync_single_for_cpu should indeeed be a no-op.  But then there
> > also shouldn't be anything in the cache, as the previous
> > dma_sync_single_for_device should have invalidated it.  So it seems like
> > we are missing one (or more) ownership transfers to the device.  I'll
> > try to look at the the ownership management in a little more detail
> > tomorrow.
> 
> this is the problem:
> 
>        /* Always check for received packets. */
>         sgiseeq_rx(dev, sp, hregs, sregs);
> 
> so the driver will look at the rx descriptor on every interrupt, so
> we cache the rx descriptor on the first interrupt and if there was
> $no rx packet, we will only see it, if cache line gets flushed for
> some other reason. kick_tx() does a busy loop checking tx descriptors,
> with just sync_desc_cpu...

the patch below fixes the problem.

Thomas.


diff --git a/drivers/net/ethernet/seeq/sgiseeq.c b/drivers/net/ethernet/seeq/sgiseeq.c
index 8507ff242014..876e3700a0e4 100644
--- a/drivers/net/ethernet/seeq/sgiseeq.c
+++ b/drivers/net/ethernet/seeq/sgiseeq.c
@@ -112,14 +112,18 @@ struct sgiseeq_private {
 
 static inline void dma_sync_desc_cpu(struct net_device *dev, void *addr)
 {
-       dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
-                      DMA_FROM_DEVICE);
+       struct sgiseeq_private *sp = netdev_priv(dev);
+
+       dma_sync_single_for_device(dev->dev.parent, VIRT_TO_DMA(sp, addr),
+                       sizeof(struct sgiseeq_rx_desc), DMA_FROM_DEVICE);
 }
 
 static inline void dma_sync_desc_dev(struct net_device *dev, void *addr)
 {
-       dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
-                      DMA_TO_DEVICE);
+       struct sgiseeq_private *sp = netdev_priv(dev);
+
+       dma_sync_single_for_device(dev->dev.parent, VIRT_TO_DMA(sp, addr),
+                       sizeof(struct sgiseeq_rx_desc), DMA_TO_DEVICE);
 }
 
-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 22/28] sgiseeq: convert from dma_cache_sync to dma_sync_single_for_device
  2020-09-02 21:38             ` Thomas Bogendoerfer
@ 2020-09-03  8:42               ` Christoph Hellwig
  0 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-09-03  8:42 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-mips, James E.J. Bottomley, linux-mm, Christoph Hellwig,
	linux-samsung-soc, Joonyoung Shim, linux-scsi, iommu, Ben Skeggs,
	Matt Porter, linux-media, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, linux-parisc, netdev,
	Seung-Woo Kim, linux-kernel, Kyungmin Park

On Wed, Sep 02, 2020 at 11:38:09PM +0200, Thomas Bogendoerfer wrote:
> the patch below fixes the problem.

But is very wrong unfortunately.

>  static inline void dma_sync_desc_cpu(struct net_device *dev, void *addr)
>  {
> -       dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
> -                      DMA_FROM_DEVICE);
> +       struct sgiseeq_private *sp = netdev_priv(dev);
> +
> +       dma_sync_single_for_device(dev->dev.parent, VIRT_TO_DMA(sp, addr),
> +                       sizeof(struct sgiseeq_rx_desc), DMA_FROM_DEVICE);
>  }
>  
>  static inline void dma_sync_desc_dev(struct net_device *dev, void *addr)
>  {
> -       dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
> -                      DMA_TO_DEVICE);
> +       struct sgiseeq_private *sp = netdev_priv(dev);
> +
> +       dma_sync_single_for_device(dev->dev.parent, VIRT_TO_DMA(sp, addr),
> +                       sizeof(struct sgiseeq_rx_desc), DMA_TO_DEVICE);

This is not how the DMA API works.  You can only call
dma_sync_single_for_{device,cpu} with the direction that the memory
was mapped.  It then transfer ownership to the device or the cpu,
and the ownership of the memory is a fundamental concept that allows
for reasoning about the caching interaction.

>  }
>  
> -- 
> Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
> good idea.                                                [ RFC1925, 2.3 ]
---end quoted text---
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 22/28] sgiseeq: convert from dma_cache_sync to dma_sync_single_for_device
  2020-09-01 17:38           ` Thomas Bogendoerfer
  2020-09-02 21:38             ` Thomas Bogendoerfer
@ 2020-09-03  8:43             ` Christoph Hellwig
  2020-09-03  8:46               ` Christoph Hellwig
  1 sibling, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2020-09-03  8:43 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-mips, James E.J. Bottomley, linux-mm, Christoph Hellwig,
	linux-samsung-soc, Joonyoung Shim, linux-scsi, iommu, Ben Skeggs,
	Matt Porter, linux-media, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, linux-parisc, netdev,
	Seung-Woo Kim, linux-kernel, Kyungmin Park

On Tue, Sep 01, 2020 at 07:38:10PM +0200, Thomas Bogendoerfer wrote:
> this is the problem:
> 
>        /* Always check for received packets. */
>         sgiseeq_rx(dev, sp, hregs, sregs);
> 
> so the driver will look at the rx descriptor on every interrupt, so
> we cache the rx descriptor on the first interrupt and if there was
> $no rx packet, we will only see it, if cache line gets flushed for
> some other reason.

That means a transfer back to device ownership is missing after a
(negative) check.

> kick_tx() does a busy loop checking tx descriptors,
> with just sync_desc_cpu...
> 
> Thomas.
> 
> -- 
> Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
> good idea.                                                [ RFC1925, 2.3 ]
---end quoted text---
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 22/28] sgiseeq: convert from dma_cache_sync to dma_sync_single_for_device
  2020-09-03  8:43             ` Christoph Hellwig
@ 2020-09-03  8:46               ` Christoph Hellwig
  0 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2020-09-03  8:46 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: alsa-devel, linux-ia64, linux-doc, nouveau, linux-nvme,
	linux-mips, James E.J. Bottomley, linux-mm, Christoph Hellwig,
	linux-samsung-soc, Joonyoung Shim, linux-scsi, iommu, Ben Skeggs,
	Matt Porter, linux-media, Tom Lendacky, Pawel Osciak,
	Mauro Carvalho Chehab, linux-arm-kernel, linux-parisc, netdev,
	Seung-Woo Kim, linux-kernel, Kyungmin Park

On Thu, Sep 03, 2020 at 10:43:02AM +0200, Christoph Hellwig wrote:
> On Tue, Sep 01, 2020 at 07:38:10PM +0200, Thomas Bogendoerfer wrote:
> > this is the problem:
> > 
> >        /* Always check for received packets. */
> >         sgiseeq_rx(dev, sp, hregs, sregs);
> > 
> > so the driver will look at the rx descriptor on every interrupt, so
> > we cache the rx descriptor on the first interrupt and if there was
> > $no rx packet, we will only see it, if cache line gets flushed for
> > some other reason.
> 
> That means a transfer back to device ownership is missing after a
> (negative) check.

E.g. something like this for the particular problem, although there
might be other hiding elsewhere:

diff --git a/drivers/net/ethernet/seeq/sgiseeq.c b/drivers/net/ethernet/seeq/sgiseeq.c
index 8507ff2420143a..a1c7be8a0d1e5d 100644
--- a/drivers/net/ethernet/seeq/sgiseeq.c
+++ b/drivers/net/ethernet/seeq/sgiseeq.c
@@ -403,6 +403,8 @@ static inline void sgiseeq_rx(struct net_device *dev, struct sgiseeq_private *sp
 		rd = &sp->rx_desc[sp->rx_new];
 		dma_sync_desc_cpu(dev, rd);
 	}
+	dma_sync_desc_dev(dev, rd);
+
 	dma_sync_desc_cpu(dev, &sp->rx_desc[orig_end]);
 	sp->rx_desc[orig_end].rdma.cntinfo &= ~(HPCDMA_EOR);
 	dma_sync_desc_dev(dev, &sp->rx_desc[orig_end]);
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2020-09-03  8:46 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20200819065610eucas1p2fde88e81917071b1888e7cc01ba0f298@eucas1p2.samsung.com>
2020-08-19  6:55 ` a saner API for allocating DMA addressable pages Christoph Hellwig
2020-08-19  6:55   ` [PATCH 01/28] mm: turn alloc_pages into an inline function Christoph Hellwig
2020-08-19  6:55   ` [PATCH 02/28] drm/exynos: stop setting DMA_ATTR_NON_CONSISTENT Christoph Hellwig
2020-08-19  6:55   ` [PATCH 03/28] drm/nouveau/gk20a: " Christoph Hellwig
2020-08-19  6:55   ` [PATCH 04/28] net/au1000-eth: stop using DMA_ATTR_NON_CONSISTENT Christoph Hellwig
2020-08-19  6:55   ` [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT Christoph Hellwig
2020-08-19 11:16     ` Tomasz Figa
2020-08-19 11:51       ` Robin Murphy
2020-08-19 12:49         ` Tomasz Figa
2020-08-19 13:57           ` Christoph Hellwig
2020-08-19 14:11             ` Tomasz Figa
2020-08-20  4:45               ` Christoph Hellwig
2020-08-20 10:09                 ` Tomasz Figa
2020-08-20 16:51                   ` Christoph Hellwig
2020-08-19 14:07           ` Robin Murphy
2020-08-19 14:22             ` Tomasz Figa
2020-08-20  4:52               ` Christoph Hellwig
2020-08-20  5:02             ` Christoph Hellwig
2020-08-20 10:24               ` Tomasz Figa
2020-08-20 16:52                 ` Christoph Hellwig
2020-08-20 17:41                   ` Tomasz Figa
2020-08-19 13:54       ` Christoph Hellwig
2020-08-19 13:57         ` Tomasz Figa
2020-08-20  4:43           ` Christoph Hellwig
2020-08-20  5:20             ` Christoph Hellwig
2020-08-20 10:05               ` Tomasz Figa
2020-08-20 16:54                 ` Christoph Hellwig
2020-08-20 17:33                   ` Tomasz Figa
2020-09-01 11:06                     ` Christoph Hellwig
2020-09-01 15:02                       ` Tomasz Figa
2020-08-19  6:55   ` [PATCH 06/28] lib82596: move DMA allocation into the callers of i82596_probe Christoph Hellwig
2020-09-01 13:29     ` Thomas Bogendoerfer
2020-08-19  6:55   ` [PATCH 07/28] 53c700: improve non-coherent DMA handling Christoph Hellwig
2020-09-01 14:52     ` James Bottomley
2020-09-01 15:05       ` Matthew Wilcox
2020-09-01 15:22         ` James Bottomley
2020-09-01 16:21           ` Helge Deller
2020-09-01 16:41             ` Helge Deller
2020-09-01 16:53               ` Matthew Wilcox
2020-09-02 15:00                 ` Helge Deller
2020-08-19  6:55   ` [PATCH 08/28] MIPS: make dma_sync_*_for_cpu a little less overzealous Christoph Hellwig
2020-09-01 13:53     ` Thomas Bogendoerfer
2020-08-19  6:55   ` [PATCH 09/28] MIPS/jazzdma: remove the unused vdma_remap function Christoph Hellwig
2020-09-01 13:49     ` Thomas Bogendoerfer
2020-08-19  6:55   ` [PATCH 10/28] MIPS/jazzdma: decouple from dma-direct Christoph Hellwig
2020-09-01 13:49     ` Thomas Bogendoerfer
2020-08-19  6:55   ` [PATCH 11/28] dma-mapping: add (back) arch_dma_mark_clean for ia64 Christoph Hellwig
2020-08-19  6:55   ` [PATCH 12/28] dma-direct: remove dma_direct_{alloc,free}_pages Christoph Hellwig
2020-08-19  6:55   ` [PATCH 13/28] dma-direct: lift gfp_t manipulation out of__dma_direct_alloc_pages Christoph Hellwig
2020-08-19  6:55   ` [PATCH 14/28] dma-direct: use phys_to_dma_direct in dma_direct_alloc Christoph Hellwig
2020-08-19  6:55   ` [PATCH 15/28] dma-direct: remove __dma_to_phys Christoph Hellwig
2020-08-19  6:55   ` [PATCH 16/28] dma-direct: rename and cleanup __phys_to_dma Christoph Hellwig
2020-08-19  6:55   ` [PATCH 17/28] dma-mapping: move dma_common_{mmap, get_sgtable} out of mapping.c Christoph Hellwig
2020-08-19  6:55   ` [PATCH 18/28] dma-mapping: move the dma_declare_coherent_memory documentation Christoph Hellwig
2020-08-19  6:55   ` [PATCH 19/28] dma-mapping: replace DMA_ATTR_NON_CONSISTENT with dma_{alloc, free}_pages Christoph Hellwig
2020-08-19 15:03     ` Tomasz Figa
2020-08-20  5:15       ` Christoph Hellwig
2020-08-19  6:55   ` [PATCH 20/28] sgiwd93: convert from dma_cache_sync to dma_sync_single_for_device Christoph Hellwig
2020-08-19  6:55   ` [PATCH 21/28] hal2: " Christoph Hellwig
2020-08-19  6:55   ` [PATCH 22/28] sgiseeq: " Christoph Hellwig
2020-09-01 15:22     ` Thomas Bogendoerfer
2020-09-01 17:12       ` Thomas Bogendoerfer
2020-09-01 17:16         ` Christoph Hellwig
2020-09-01 17:38           ` Thomas Bogendoerfer
2020-09-02 21:38             ` Thomas Bogendoerfer
2020-09-03  8:42               ` Christoph Hellwig
2020-09-03  8:43             ` Christoph Hellwig
2020-09-03  8:46               ` Christoph Hellwig
2020-08-19  6:55   ` [PATCH 23/28] lib82596: " Christoph Hellwig
2020-08-19  6:55   ` [PATCH 24/28] 53c700: " Christoph Hellwig
2020-08-19  6:55   ` [PATCH 25/28] dma-mapping: remove dma_cache_sync Christoph Hellwig
2020-08-19  6:55   ` [PATCH 26/28] dmapool: add dma_alloc_pages support Christoph Hellwig
2020-08-19  6:55   ` [PATCH 27/28] nvme-pci: fix PRP pool size Christoph Hellwig
2020-08-19  6:55   ` [PATCH 28/28] nvme-pci: use dma_alloc_pages backed dmapools Christoph Hellwig
2020-08-25 11:30   ` a saner API for allocating DMA addressable pages Marek Szyprowski
2020-08-25 13:26     ` Christoph Hellwig
2020-08-29  9:46   ` Helge Deller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).