linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* remove dma_virt_ops v2
@ 2020-11-06 18:19 Christoph Hellwig
  2020-11-06 18:19 ` [PATCH 01/10] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs Christoph Hellwig
                   ` (12 more replies)
  0 siblings, 13 replies; 22+ messages in thread
From: Christoph Hellwig @ 2020-11-06 18:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, Santosh Shilimkar,
	linux-rdma, rds-devel, linux-pci, iommu

Hi Jason,

this series switches the RDMA core to opencode the special case of
devices bypassing the DMA mapping in the RDMA ULPs.  The virt ops
have caused a bit of trouble due to the P2P code node working with
them due to the fact that we'd do two dma mapping iterations for a
single I/O, but also are a bit of layering violation and lead to
more code than necessary.

Tested with nvme-rdma over rxe.

Note that the rds changes are untested, as I could not find any
simple rds test setup.

Changes since v2:
 - simplify the INFINIBAND_VIRT_DMA dependencies
 - add a ib_uses_virt_dma helper
 - use ib_uses_virt_dma in nvmet-rdma to disable p2p for virt_dma devices
 - use ib_dma_max_seg_size in umem
 - stop using dmapool in rds

Changes since v1:
 - disable software RDMA drivers for highmem configs
 - update the PCI commit logs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 01/10] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs
  2020-11-06 18:19 remove dma_virt_ops v2 Christoph Hellwig
@ 2020-11-06 18:19 ` Christoph Hellwig
  2020-11-06 18:19 ` [PATCH 02/10] RDMA/umem: use ib_dma_max_seg_size instead of dma_get_max_seg_size Christoph Hellwig
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2020-11-06 18:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, Santosh Shilimkar,
	linux-rdma, rds-devel, linux-pci, iommu

dma_virt_ops requires that all pages have a kernel virtual address.
Introduce a INFINIBAND_VIRT_DMA Kconfig symbol that depends on !HIGHMEM
and make all three driver depend on the new symbol.

Also remove the ARCH_DMA_ADDR_T_64BIT dependency, which has been
obsolete since commit 4965a68780c5 ("arch: define the
ARCH_DMA_ADDR_T_64BIT config symbol in lib/Kconfig")

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/Kconfig           | 3 +++
 drivers/infiniband/sw/rdmavt/Kconfig | 3 ++-
 drivers/infiniband/sw/rxe/Kconfig    | 2 +-
 drivers/infiniband/sw/siw/Kconfig    | 1 +
 4 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 32a51432ec4f73..9325e189a21536 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -73,6 +73,9 @@ config INFINIBAND_ADDR_TRANS_CONFIGFS
 	  This allows the user to config the default GID type that the CM
 	  uses for each device, when initiaing new connections.
 
+config INFINIBAND_VIRT_DMA
+	def_bool !HIGHMEM
+
 if INFINIBAND_USER_ACCESS || !INFINIBAND_USER_ACCESS
 source "drivers/infiniband/hw/mthca/Kconfig"
 source "drivers/infiniband/hw/qib/Kconfig"
diff --git a/drivers/infiniband/sw/rdmavt/Kconfig b/drivers/infiniband/sw/rdmavt/Kconfig
index 9ef5f5ce1ff6b0..c8e268082952b0 100644
--- a/drivers/infiniband/sw/rdmavt/Kconfig
+++ b/drivers/infiniband/sw/rdmavt/Kconfig
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0-only
 config INFINIBAND_RDMAVT
 	tristate "RDMA verbs transport library"
-	depends on X86_64 && ARCH_DMA_ADDR_T_64BIT
+	depends on INFINIBAND_VIRT_DMA
+	depends on X86_64
 	depends on PCI
 	select DMA_VIRT_OPS
 	help
diff --git a/drivers/infiniband/sw/rxe/Kconfig b/drivers/infiniband/sw/rxe/Kconfig
index a0c6c7dfc1814f..8810bfa680495a 100644
--- a/drivers/infiniband/sw/rxe/Kconfig
+++ b/drivers/infiniband/sw/rxe/Kconfig
@@ -2,7 +2,7 @@
 config RDMA_RXE
 	tristate "Software RDMA over Ethernet (RoCE) driver"
 	depends on INET && PCI && INFINIBAND
-	depends on !64BIT || ARCH_DMA_ADDR_T_64BIT
+	depends on INFINIBAND_VIRT_DMA
 	select NET_UDP_TUNNEL
 	select CRYPTO_CRC32
 	select DMA_VIRT_OPS
diff --git a/drivers/infiniband/sw/siw/Kconfig b/drivers/infiniband/sw/siw/Kconfig
index b622fc62f2cd6d..3450ba5081df51 100644
--- a/drivers/infiniband/sw/siw/Kconfig
+++ b/drivers/infiniband/sw/siw/Kconfig
@@ -1,6 +1,7 @@
 config RDMA_SIW
 	tristate "Software RDMA over TCP/IP (iWARP) driver"
 	depends on INET && INFINIBAND && LIBCRC32C
+	depends on INFINIBAND_VIRT_DMA
 	select DMA_VIRT_OPS
 	help
 	This driver implements the iWARP RDMA transport over
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 02/10] RDMA/umem: use ib_dma_max_seg_size instead of dma_get_max_seg_size
  2020-11-06 18:19 remove dma_virt_ops v2 Christoph Hellwig
  2020-11-06 18:19 ` [PATCH 01/10] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs Christoph Hellwig
@ 2020-11-06 18:19 ` Christoph Hellwig
  2020-11-06 18:19 ` [PATCH 03/10] RDMA: lift ibdev_to_node from rds to common code Christoph Hellwig
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2020-11-06 18:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, Santosh Shilimkar,
	linux-rdma, rds-devel, linux-pci, iommu, Jason Gunthorpe

RDMA ULPs must not call DMA mapping APIs directly but instead use the
ib_dma_* wrappers.

Fixes: 0c16d9635e3a ("RDMA/umem: Move to allocate SG table from pages")
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/core/umem.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index e9fecbdf391bcc..0d4da44f30cd68 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -220,10 +220,10 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
 
 		cur_base += ret * PAGE_SIZE;
 		npages -= ret;
-		sg = __sg_alloc_table_from_pages(
-			&umem->sg_head, page_list, ret, 0, ret << PAGE_SHIFT,
-			dma_get_max_seg_size(device->dma_device), sg, npages,
-			GFP_KERNEL);
+		sg = __sg_alloc_table_from_pages(&umem->sg_head, page_list, ret,
+				0, ret << PAGE_SHIFT,
+				ib_dma_max_seg_size(device), sg, npages,
+				GFP_KERNEL);
 		umem->sg_nents = umem->sg_head.nents;
 		if (IS_ERR(sg)) {
 			unpin_user_pages_dirty_lock(page_list, ret, 0);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 03/10] RDMA: lift ibdev_to_node from rds to common code
  2020-11-06 18:19 remove dma_virt_ops v2 Christoph Hellwig
  2020-11-06 18:19 ` [PATCH 01/10] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs Christoph Hellwig
  2020-11-06 18:19 ` [PATCH 02/10] RDMA/umem: use ib_dma_max_seg_size instead of dma_get_max_seg_size Christoph Hellwig
@ 2020-11-06 18:19 ` Christoph Hellwig
  2020-11-06 18:19 ` [PATCH 04/10] nvme-rdma: use ibdev_to_node instead of dereferencing ->dma_device Christoph Hellwig
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2020-11-06 18:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, Santosh Shilimkar,
	linux-rdma, rds-devel, linux-pci, iommu

Lift the ibdev_to_node from rds to common code and document it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/rdma/ib_verbs.h | 13 +++++++++++++
 net/rds/ib.h            |  7 -------
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9bf6c319a670e2..3257cc046e460f 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4615,6 +4615,19 @@ static inline struct ib_device *rdma_device_to_ibdev(struct device *device)
 	return coredev->owner;
 }
 
+/**
+ * ibdev_to_node - return the NUMA node for a given ib_device
+ * @dev:	device to get the NUMA node for.
+ */
+static inline int ibdev_to_node(struct ib_device *ibdev)
+{
+	struct device *parent = ibdev->dev.parent;
+
+	if (!parent)
+		return NUMA_NO_NODE;
+	return dev_to_node(parent);
+}
+
 /**
  * rdma_device_to_drv_device - Helper macro to reach back to driver's
  *			       ib_device holder structure from device pointer.
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 8dfff43cf07f46..c23a11d9ad3628 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -264,13 +264,6 @@ struct rds_ib_device {
 	int			*vector_load;
 };
 
-static inline int ibdev_to_node(struct ib_device *ibdev)
-{
-	struct device *parent;
-
-	parent = ibdev->dev.parent;
-	return parent ? dev_to_node(parent) : NUMA_NO_NODE;
-}
 #define rdsibdev_to_node(rdsibdev) ibdev_to_node(rdsibdev->dev)
 
 /* bits for i_ack_flags */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 04/10] nvme-rdma: use ibdev_to_node instead of dereferencing ->dma_device
  2020-11-06 18:19 remove dma_virt_ops v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2020-11-06 18:19 ` [PATCH 03/10] RDMA: lift ibdev_to_node from rds to common code Christoph Hellwig
@ 2020-11-06 18:19 ` Christoph Hellwig
  2020-11-06 18:19 ` [PATCH 05/10] rds: stop using dmapool Christoph Hellwig
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2020-11-06 18:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, Santosh Shilimkar,
	linux-rdma, rds-devel, linux-pci, iommu

->dma_device is a private implementation detail of the RDMA core.  Use
the ibdev_to_node helper to get the NUMA node for a ib_device instead
of poking into ->dma_device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 541b0cba6d8019..c08625e2f21a56 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -854,7 +854,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 		return error;
 
 	ctrl->device = ctrl->queues[0].device;
-	ctrl->ctrl.numa_node = dev_to_node(ctrl->device->dev->dma_device);
+	ctrl->ctrl.numa_node = ibdev_to_node(ctrl->device->dev);
 
 	/* T10-PI support */
 	if (ctrl->device->dev->attrs.device_cap_flags &
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 05/10] rds: stop using dmapool
  2020-11-06 18:19 remove dma_virt_ops v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2020-11-06 18:19 ` [PATCH 04/10] nvme-rdma: use ibdev_to_node instead of dereferencing ->dma_device Christoph Hellwig
@ 2020-11-06 18:19 ` Christoph Hellwig
  2020-11-06 18:19 ` [PATCH 06/10] RDMA/core: remove ib_dma_{alloc,free}_coherent Christoph Hellwig
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2020-11-06 18:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, Santosh Shilimkar,
	linux-rdma, rds-devel, linux-pci, iommu

RDMA ULPs should only perform DMA through the ib_dma_* API instead of
using the hidden dma_device directly.  In addition using the dma coherent
API family that dmapool is a part of can be very ineffcient on plaforms
that are not DMA coherent.  Switch to use slab allocations and the
ib_dma_* APIs instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/rds/ib.c      |  10 ----
 net/rds/ib.h      |   6 ---
 net/rds/ib_cm.c   | 128 ++++++++++++++++++++++++++++------------------
 net/rds/ib_recv.c |  18 +++++--
 net/rds/ib_send.c |   8 +++
 5 files changed, 101 insertions(+), 69 deletions(-)

diff --git a/net/rds/ib.c b/net/rds/ib.c
index deecbdcdae84ef..24c9a9005a6fba 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -30,7 +30,6 @@
  * SOFTWARE.
  *
  */
-#include <linux/dmapool.h>
 #include <linux/kernel.h>
 #include <linux/in.h>
 #include <linux/if.h>
@@ -108,7 +107,6 @@ static void rds_ib_dev_free(struct work_struct *work)
 		rds_ib_destroy_mr_pool(rds_ibdev->mr_1m_pool);
 	if (rds_ibdev->pd)
 		ib_dealloc_pd(rds_ibdev->pd);
-	dma_pool_destroy(rds_ibdev->rid_hdrs_pool);
 
 	list_for_each_entry_safe(i_ipaddr, i_next, &rds_ibdev->ipaddr_list, list) {
 		list_del(&i_ipaddr->list);
@@ -191,14 +189,6 @@ static int rds_ib_add_one(struct ib_device *device)
 		rds_ibdev->pd = NULL;
 		goto put_dev;
 	}
-	rds_ibdev->rid_hdrs_pool = dma_pool_create(device->name,
-						   device->dma_device,
-						   sizeof(struct rds_header),
-						   L1_CACHE_BYTES, 0);
-	if (!rds_ibdev->rid_hdrs_pool) {
-		ret = -ENOMEM;
-		goto put_dev;
-	}
 
 	rds_ibdev->mr_1m_pool =
 		rds_ib_create_mr_pool(rds_ibdev, RDS_IB_MR_1M_POOL);
diff --git a/net/rds/ib.h b/net/rds/ib.h
index c23a11d9ad3628..2ba71102b1f1f2 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -246,7 +246,6 @@ struct rds_ib_device {
 	struct list_head	conn_list;
 	struct ib_device	*dev;
 	struct ib_pd		*pd;
-	struct dma_pool		*rid_hdrs_pool; /* RDS headers DMA pool */
 	u8			odp_capable:1;
 
 	unsigned int		max_mrs;
@@ -380,11 +379,6 @@ int rds_ib_cm_handle_connect(struct rdma_cm_id *cm_id,
 int rds_ib_cm_initiate_connect(struct rdma_cm_id *cm_id, bool isv6);
 void rds_ib_cm_connect_complete(struct rds_connection *conn,
 				struct rdma_cm_event *event);
-struct rds_header **rds_dma_hdrs_alloc(struct ib_device *ibdev,
-				       struct dma_pool *pool,
-				       dma_addr_t **dma_addrs, u32 num_hdrs);
-void rds_dma_hdrs_free(struct dma_pool *pool, struct rds_header **hdrs,
-		       dma_addr_t *dma_addrs, u32 num_hdrs);
 
 #define rds_ib_conn_error(conn, fmt...) \
 	__rds_ib_conn_error(conn, KERN_WARNING "RDS/IB: " fmt)
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index b36b60668b1da9..f5cbe963cd8f78 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -30,7 +30,6 @@
  * SOFTWARE.
  *
  */
-#include <linux/dmapool.h>
 #include <linux/kernel.h>
 #include <linux/in.h>
 #include <linux/slab.h>
@@ -441,42 +440,87 @@ static inline void ibdev_put_vector(struct rds_ib_device *rds_ibdev, int index)
 	rds_ibdev->vector_load[index]--;
 }
 
+static void rds_dma_hdr_free(struct ib_device *dev, struct rds_header *hdr,
+		dma_addr_t dma_addr, enum dma_data_direction dir)
+{
+	ib_dma_unmap_single(dev, dma_addr, sizeof(*hdr), dir);
+	kfree(hdr);
+}
+
+static struct rds_header *rds_dma_hdr_alloc(struct ib_device *dev,
+		dma_addr_t *dma_addr, enum dma_data_direction dir)
+{
+	struct rds_header *hdr;
+
+	hdr = kzalloc_node(sizeof(*hdr), GFP_KERNEL, ibdev_to_node(dev));
+	if (!hdr)
+		return NULL;
+
+	*dma_addr = ib_dma_map_single(dev, hdr, sizeof(*hdr),
+				      DMA_BIDIRECTIONAL);
+	if (ib_dma_mapping_error(dev, *dma_addr)) {
+		kfree(hdr);
+		return NULL;
+	}
+
+	return hdr;
+}
+
+/* Free the DMA memory used to store struct rds_header.
+ *
+ * @dev: the RDS IB device
+ * @hdrs: pointer to the array storing DMA memory pointers
+ * @dma_addrs: pointer to the array storing DMA addresses
+ * @num_hdars: number of headers to free.
+ */
+static void rds_dma_hdrs_free(struct rds_ib_device *dev,
+		struct rds_header **hdrs, dma_addr_t *dma_addrs, u32 num_hdrs,
+		enum dma_data_direction dir)
+{
+	u32 i;
+
+	for (i = 0; i < num_hdrs; i++)
+		rds_dma_hdr_free(dev->dev, hdrs[i], dma_addrs[i], dir);
+	kvfree(hdrs);
+	kvfree(dma_addrs);
+}
+
+
 /* Allocate DMA coherent memory to be used to store struct rds_header for
  * sending/receiving packets.  The pointers to the DMA memory and the
  * associated DMA addresses are stored in two arrays.
  *
- * @ibdev: the IB device
- * @pool: the DMA memory pool
+ * @dev: the RDS IB device
  * @dma_addrs: pointer to the array for storing DMA addresses
  * @num_hdrs: number of headers to allocate
  *
  * It returns the pointer to the array storing the DMA memory pointers.  On
  * error, NULL pointer is returned.
  */
-struct rds_header **rds_dma_hdrs_alloc(struct ib_device *ibdev,
-				       struct dma_pool *pool,
-				       dma_addr_t **dma_addrs, u32 num_hdrs)
+static struct rds_header **rds_dma_hdrs_alloc(struct rds_ib_device *dev,
+		dma_addr_t **dma_addrs, u32 num_hdrs,
+		enum dma_data_direction dir)
 {
 	struct rds_header **hdrs;
 	dma_addr_t *hdr_daddrs;
 	u32 i;
 
 	hdrs = kvmalloc_node(sizeof(*hdrs) * num_hdrs, GFP_KERNEL,
-			     ibdev_to_node(ibdev));
+			     ibdev_to_node(dev->dev));
 	if (!hdrs)
 		return NULL;
 
 	hdr_daddrs = kvmalloc_node(sizeof(*hdr_daddrs) * num_hdrs, GFP_KERNEL,
-				   ibdev_to_node(ibdev));
+				   ibdev_to_node(dev->dev));
 	if (!hdr_daddrs) {
 		kvfree(hdrs);
 		return NULL;
 	}
 
 	for (i = 0; i < num_hdrs; i++) {
-		hdrs[i] = dma_pool_zalloc(pool, GFP_KERNEL, &hdr_daddrs[i]);
+		hdrs[i] = rds_dma_hdr_alloc(dev->dev, &hdr_daddrs[i], dir);
 		if (!hdrs[i]) {
-			rds_dma_hdrs_free(pool, hdrs, hdr_daddrs, i);
+			rds_dma_hdrs_free(dev, hdrs, hdr_daddrs, i, dir);
 			return NULL;
 		}
 	}
@@ -485,24 +529,6 @@ struct rds_header **rds_dma_hdrs_alloc(struct ib_device *ibdev,
 	return hdrs;
 }
 
-/* Free the DMA memory used to store struct rds_header.
- *
- * @pool: the DMA memory pool
- * @hdrs: pointer to the array storing DMA memory pointers
- * @dma_addrs: pointer to the array storing DMA addresses
- * @num_hdars: number of headers to free.
- */
-void rds_dma_hdrs_free(struct dma_pool *pool, struct rds_header **hdrs,
-		       dma_addr_t *dma_addrs, u32 num_hdrs)
-{
-	u32 i;
-
-	for (i = 0; i < num_hdrs; i++)
-		dma_pool_free(pool, hdrs[i], dma_addrs[i]);
-	kvfree(hdrs);
-	kvfree(dma_addrs);
-}
-
 /*
  * This needs to be very careful to not leave IS_ERR pointers around for
  * cleanup to trip over.
@@ -516,7 +542,6 @@ static int rds_ib_setup_qp(struct rds_connection *conn)
 	struct rds_ib_device *rds_ibdev;
 	unsigned long max_wrs;
 	int ret, fr_queue_space;
-	struct dma_pool *pool;
 
 	/*
 	 * It's normal to see a null device if an incoming connection races
@@ -612,25 +637,26 @@ static int rds_ib_setup_qp(struct rds_connection *conn)
 		goto recv_cq_out;
 	}
 
-	pool = rds_ibdev->rid_hdrs_pool;
-	ic->i_send_hdrs = rds_dma_hdrs_alloc(dev, pool, &ic->i_send_hdrs_dma,
-					     ic->i_send_ring.w_nr);
+	ic->i_send_hdrs = rds_dma_hdrs_alloc(rds_ibdev, &ic->i_send_hdrs_dma,
+					     ic->i_send_ring.w_nr,
+					     DMA_TO_DEVICE);
 	if (!ic->i_send_hdrs) {
 		ret = -ENOMEM;
 		rdsdebug("DMA send hdrs alloc failed\n");
 		goto qp_out;
 	}
 
-	ic->i_recv_hdrs = rds_dma_hdrs_alloc(dev, pool, &ic->i_recv_hdrs_dma,
-					     ic->i_recv_ring.w_nr);
+	ic->i_recv_hdrs = rds_dma_hdrs_alloc(rds_ibdev, &ic->i_recv_hdrs_dma,
+					     ic->i_recv_ring.w_nr,
+					     DMA_FROM_DEVICE);
 	if (!ic->i_recv_hdrs) {
 		ret = -ENOMEM;
 		rdsdebug("DMA recv hdrs alloc failed\n");
 		goto send_hdrs_dma_out;
 	}
 
-	ic->i_ack = dma_pool_zalloc(pool, GFP_KERNEL,
-				    &ic->i_ack_dma);
+	ic->i_ack = rds_dma_hdr_alloc(rds_ibdev->dev, &ic->i_ack_dma,
+				      DMA_TO_DEVICE);
 	if (!ic->i_ack) {
 		ret = -ENOMEM;
 		rdsdebug("DMA ack header alloc failed\n");
@@ -666,18 +692,19 @@ static int rds_ib_setup_qp(struct rds_connection *conn)
 	vfree(ic->i_sends);
 
 ack_dma_out:
-	dma_pool_free(pool, ic->i_ack, ic->i_ack_dma);
+	rds_dma_hdr_free(rds_ibdev->dev, ic->i_ack, ic->i_ack_dma,
+			 DMA_TO_DEVICE);
 	ic->i_ack = NULL;
 
 recv_hdrs_dma_out:
-	rds_dma_hdrs_free(pool, ic->i_recv_hdrs, ic->i_recv_hdrs_dma,
-			  ic->i_recv_ring.w_nr);
+	rds_dma_hdrs_free(rds_ibdev, ic->i_recv_hdrs, ic->i_recv_hdrs_dma,
+			  ic->i_recv_ring.w_nr, DMA_FROM_DEVICE);
 	ic->i_recv_hdrs = NULL;
 	ic->i_recv_hdrs_dma = NULL;
 
 send_hdrs_dma_out:
-	rds_dma_hdrs_free(pool, ic->i_send_hdrs, ic->i_send_hdrs_dma,
-			  ic->i_send_ring.w_nr);
+	rds_dma_hdrs_free(rds_ibdev, ic->i_send_hdrs, ic->i_send_hdrs_dma,
+			  ic->i_send_ring.w_nr, DMA_TO_DEVICE);
 	ic->i_send_hdrs = NULL;
 	ic->i_send_hdrs_dma = NULL;
 
@@ -1110,29 +1137,30 @@ void rds_ib_conn_path_shutdown(struct rds_conn_path *cp)
 		}
 
 		if (ic->rds_ibdev) {
-			struct dma_pool *pool;
-
-			pool = ic->rds_ibdev->rid_hdrs_pool;
-
 			/* then free the resources that ib callbacks use */
 			if (ic->i_send_hdrs) {
-				rds_dma_hdrs_free(pool, ic->i_send_hdrs,
+				rds_dma_hdrs_free(ic->rds_ibdev,
+						  ic->i_send_hdrs,
 						  ic->i_send_hdrs_dma,
-						  ic->i_send_ring.w_nr);
+						  ic->i_send_ring.w_nr,
+						  DMA_TO_DEVICE);
 				ic->i_send_hdrs = NULL;
 				ic->i_send_hdrs_dma = NULL;
 			}
 
 			if (ic->i_recv_hdrs) {
-				rds_dma_hdrs_free(pool, ic->i_recv_hdrs,
+				rds_dma_hdrs_free(ic->rds_ibdev,
+						  ic->i_recv_hdrs,
 						  ic->i_recv_hdrs_dma,
-						  ic->i_recv_ring.w_nr);
+						  ic->i_recv_ring.w_nr,
+						  DMA_FROM_DEVICE);
 				ic->i_recv_hdrs = NULL;
 				ic->i_recv_hdrs_dma = NULL;
 			}
 
 			if (ic->i_ack) {
-				dma_pool_free(pool, ic->i_ack, ic->i_ack_dma);
+				rds_dma_hdr_free(ic->rds_ibdev->dev, ic->i_ack,
+						 ic->i_ack_dma, DMA_TO_DEVICE);
 				ic->i_ack = NULL;
 			}
 		} else {
diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index 3cffcec5fb371b..6fdedd9dbbc28f 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -662,10 +662,16 @@ static void rds_ib_send_ack(struct rds_ib_connection *ic, unsigned int adv_credi
 	seq = rds_ib_get_ack(ic);
 
 	rdsdebug("send_ack: ic %p ack %llu\n", ic, (unsigned long long) seq);
+
+	ib_dma_sync_single_for_cpu(ic->rds_ibdev->dev, ic->i_ack_dma,
+				   sizeof(*hdr), DMA_TO_DEVICE);
 	rds_message_populate_header(hdr, 0, 0, 0);
 	hdr->h_ack = cpu_to_be64(seq);
 	hdr->h_credit = adv_credits;
 	rds_message_make_checksum(hdr);
+	ib_dma_sync_single_for_device(ic->rds_ibdev->dev, ic->i_ack_dma,
+				      sizeof(*hdr), DMA_TO_DEVICE);
+
 	ic->i_ack_queued = jiffies;
 
 	ret = ib_post_send(ic->i_cm_id->qp, &ic->i_ack_wr, NULL);
@@ -845,6 +851,7 @@ static void rds_ib_process_recv(struct rds_connection *conn,
 	struct rds_ib_connection *ic = conn->c_transport_data;
 	struct rds_ib_incoming *ibinc = ic->i_ibinc;
 	struct rds_header *ihdr, *hdr;
+	dma_addr_t dma_addr = ic->i_recv_hdrs_dma[recv - ic->i_recvs];
 
 	/* XXX shut down the connection if port 0,0 are seen? */
 
@@ -863,6 +870,8 @@ static void rds_ib_process_recv(struct rds_connection *conn,
 
 	ihdr = ic->i_recv_hdrs[recv - ic->i_recvs];
 
+	ib_dma_sync_single_for_cpu(ic->rds_ibdev->dev, dma_addr,
+				   sizeof(*ihdr), DMA_FROM_DEVICE);
 	/* Validate the checksum. */
 	if (!rds_message_verify_checksum(ihdr)) {
 		rds_ib_conn_error(conn, "incoming message "
@@ -870,7 +879,7 @@ static void rds_ib_process_recv(struct rds_connection *conn,
 		       "forcing a reconnect\n",
 		       &conn->c_faddr);
 		rds_stats_inc(s_recv_drop_bad_checksum);
-		return;
+		goto done;
 	}
 
 	/* Process the ACK sequence which comes with every packet */
@@ -899,7 +908,7 @@ static void rds_ib_process_recv(struct rds_connection *conn,
 		 */
 		rds_ib_frag_free(ic, recv->r_frag);
 		recv->r_frag = NULL;
-		return;
+		goto done;
 	}
 
 	/*
@@ -933,7 +942,7 @@ static void rds_ib_process_recv(struct rds_connection *conn,
 		    hdr->h_dport != ihdr->h_dport) {
 			rds_ib_conn_error(conn,
 				"fragment header mismatch; forcing reconnect\n");
-			return;
+			goto done;
 		}
 	}
 
@@ -965,6 +974,9 @@ static void rds_ib_process_recv(struct rds_connection *conn,
 
 		rds_inc_put(&ibinc->ii_inc);
 	}
+done:
+	ib_dma_sync_single_for_device(ic->rds_ibdev->dev, dma_addr,
+				      sizeof(*ihdr), DMA_FROM_DEVICE);
 }
 
 void rds_ib_recv_cqe_handler(struct rds_ib_connection *ic,
diff --git a/net/rds/ib_send.c b/net/rds/ib_send.c
index dfe778220657af..92b4a8689aae7a 100644
--- a/net/rds/ib_send.c
+++ b/net/rds/ib_send.c
@@ -638,6 +638,10 @@ int rds_ib_xmit(struct rds_connection *conn, struct rds_message *rm,
 		send->s_sge[0].length = sizeof(struct rds_header);
 		send->s_sge[0].lkey = ic->i_pd->local_dma_lkey;
 
+		ib_dma_sync_single_for_cpu(ic->rds_ibdev->dev,
+					   ic->i_send_hdrs_dma[pos],
+					   sizeof(struct rds_header),
+					   DMA_TO_DEVICE);
 		memcpy(ic->i_send_hdrs[pos], &rm->m_inc.i_hdr,
 		       sizeof(struct rds_header));
 
@@ -688,6 +692,10 @@ int rds_ib_xmit(struct rds_connection *conn, struct rds_message *rm,
 			adv_credits = 0;
 			rds_ib_stats_inc(s_ib_tx_credit_updates);
 		}
+		ib_dma_sync_single_for_device(ic->rds_ibdev->dev,
+					      ic->i_send_hdrs_dma[pos],
+					      sizeof(struct rds_header),
+					      DMA_TO_DEVICE);
 
 		if (prev)
 			prev->s_wr.next = &send->s_wr;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 06/10] RDMA/core: remove ib_dma_{alloc,free}_coherent
  2020-11-06 18:19 remove dma_virt_ops v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2020-11-06 18:19 ` [PATCH 05/10] rds: stop using dmapool Christoph Hellwig
@ 2020-11-06 18:19 ` Christoph Hellwig
  2020-11-06 18:19 ` [PATCH 07/10] RDMA/core: remove use of dma_virt_ops Christoph Hellwig
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2020-11-06 18:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, Santosh Shilimkar,
	linux-rdma, rds-devel, linux-pci, iommu

These two functions are entirely unused.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/rdma/ib_verbs.h | 29 -----------------------------
 1 file changed, 29 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 3257cc046e460f..453793d1d2225f 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4098,35 +4098,6 @@ static inline void ib_dma_sync_single_for_device(struct ib_device *dev,
 	dma_sync_single_for_device(dev->dma_device, addr, size, dir);
 }
 
-/**
- * ib_dma_alloc_coherent - Allocate memory and map it for DMA
- * @dev: The device for which the DMA address is requested
- * @size: The size of the region to allocate in bytes
- * @dma_handle: A pointer for returning the DMA address of the region
- * @flag: memory allocator flags
- */
-static inline void *ib_dma_alloc_coherent(struct ib_device *dev,
-					   size_t size,
-					   dma_addr_t *dma_handle,
-					   gfp_t flag)
-{
-	return dma_alloc_coherent(dev->dma_device, size, dma_handle, flag);
-}
-
-/**
- * ib_dma_free_coherent - Free memory allocated by ib_dma_alloc_coherent()
- * @dev: The device for which the DMA addresses were allocated
- * @size: The size of the region
- * @cpu_addr: the address returned by ib_dma_alloc_coherent()
- * @dma_handle: the DMA address returned by ib_dma_alloc_coherent()
- */
-static inline void ib_dma_free_coherent(struct ib_device *dev,
-					size_t size, void *cpu_addr,
-					dma_addr_t dma_handle)
-{
-	dma_free_coherent(dev->dma_device, size, cpu_addr, dma_handle);
-}
-
 /* ib_reg_user_mr - register a memory region for virtual addresses from kernel
  * space. This function should be called when 'current' is the owning MM.
  */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 07/10] RDMA/core: remove use of dma_virt_ops
  2020-11-06 18:19 remove dma_virt_ops v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2020-11-06 18:19 ` [PATCH 06/10] RDMA/core: remove ib_dma_{alloc,free}_coherent Christoph Hellwig
@ 2020-11-06 18:19 ` Christoph Hellwig
  2020-11-06 18:19 ` [PATCH 08/10] PCI/P2PDMA: Remove the DMA_VIRT_OPS hacks Christoph Hellwig
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2020-11-06 18:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, Santosh Shilimkar,
	linux-rdma, rds-devel, linux-pci, iommu

Use the ib_dma_* helpers to skip the DMA translation instead.  This
removes the last user if dma_virt_ops and keeps the weird layering
violation inside the RDMA core instead of burderning the DMA mapping
subsystems with it.  This also means the software RDMA drivers now
don't have to mess with DMA parameters that are not relevant to them
at all, and that in the future we can use PCI P2P transfers even for
software RDMA, as there is no first fake layer of DMA mapping that
the P2P DMA support.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/core/device.c      | 43 ++++++++--------
 drivers/infiniband/core/rw.c          |  5 +-
 drivers/infiniband/sw/rdmavt/Kconfig  |  1 -
 drivers/infiniband/sw/rdmavt/mr.c     |  6 +--
 drivers/infiniband/sw/rdmavt/vt.c     |  8 ---
 drivers/infiniband/sw/rxe/Kconfig     |  1 -
 drivers/infiniband/sw/rxe/rxe_verbs.c |  7 ---
 drivers/infiniband/sw/rxe/rxe_verbs.h |  1 -
 drivers/infiniband/sw/siw/Kconfig     |  1 -
 drivers/infiniband/sw/siw/siw.h       |  1 -
 drivers/infiniband/sw/siw/siw_main.c  |  7 ---
 drivers/nvme/target/rdma.c            |  3 +-
 include/rdma/ib_verbs.h               | 73 ++++++++++++++++++---------
 13 files changed, 81 insertions(+), 76 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index a3b1fc84cdcab9..562095a896bbc0 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1177,25 +1177,6 @@ static int assign_name(struct ib_device *device, const char *name)
 	return ret;
 }
 
-static void setup_dma_device(struct ib_device *device,
-			     struct device *dma_device)
-{
-	/*
-	 * If the caller does not provide a DMA capable device then the IB
-	 * device will be used. In this case the caller should fully setup the
-	 * ibdev for DMA. This usually means using dma_virt_ops.
-	 */
-#ifdef CONFIG_DMA_VIRT_OPS
-	if (!dma_device) {
-		device->dev.dma_ops = &dma_virt_ops;
-		dma_device = &device->dev;
-	}
-#endif
-	WARN_ON(!dma_device);
-	device->dma_device = dma_device;
-	WARN_ON(!device->dma_device->dma_parms);
-}
-
 /*
  * setup_device() allocates memory and sets up data that requires calling the
  * device ops, this is the only reason these actions are not done during
@@ -1341,7 +1322,14 @@ int ib_register_device(struct ib_device *device, const char *name,
 	if (ret)
 		return ret;
 
-	setup_dma_device(device, dma_device);
+	/*
+	 * If the caller does not provide a DMA capable device then the IB core
+	 * will set up ib_sge and scatterlist structures that stash the kernel
+	 * virtual address into the address field.
+	 */
+	WARN_ON(dma_device && !dma_device->dma_parms);
+	device->dma_device = dma_device;
+
 	ret = setup_device(device);
 	if (ret)
 		return ret;
@@ -2675,6 +2663,21 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 }
 EXPORT_SYMBOL(ib_set_device_ops);
 
+#ifdef CONFIG_INFINIBAND_VIRT_DMA
+int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents)
+{
+	struct scatterlist *s;
+	int i;
+
+	for_each_sg(sg, s, nents, i) {
+		sg_dma_address(s) = (uintptr_t)sg_virt(s);
+		sg_dma_len(s) = s->length;
+	}
+	return nents;
+}
+EXPORT_SYMBOL(ib_dma_virt_map_sg);
+#endif /* CONFIG_INFINIBAND_VIRT_DMA */
+
 static const struct rdma_nl_cbs ibnl_ls_cb_table[RDMA_NL_LS_NUM_OPS] = {
 	[RDMA_NL_LS_OP_RESOLVE] = {
 		.doit = ib_nl_handle_resolve_resp,
diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 13f43ab7220b05..a96030b784eb21 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -285,8 +285,11 @@ static void rdma_rw_unmap_sg(struct ib_device *dev, struct scatterlist *sg,
 static int rdma_rw_map_sg(struct ib_device *dev, struct scatterlist *sg,
 			  u32 sg_cnt, enum dma_data_direction dir)
 {
-	if (is_pci_p2pdma_page(sg_page(sg)))
+	if (is_pci_p2pdma_page(sg_page(sg))) {
+		if (WARN_ON_ONCE(ib_uses_virt_dma(dev)))
+			return 0;
 		return pci_p2pdma_map_sg(dev->dma_device, sg, sg_cnt, dir);
+	}
 	return ib_dma_map_sg(dev, sg, sg_cnt, dir);
 }
 
diff --git a/drivers/infiniband/sw/rdmavt/Kconfig b/drivers/infiniband/sw/rdmavt/Kconfig
index c8e268082952b0..0df48b3a6b56c5 100644
--- a/drivers/infiniband/sw/rdmavt/Kconfig
+++ b/drivers/infiniband/sw/rdmavt/Kconfig
@@ -4,6 +4,5 @@ config INFINIBAND_RDMAVT
 	depends on INFINIBAND_VIRT_DMA
 	depends on X86_64
 	depends on PCI
-	select DMA_VIRT_OPS
 	help
 	This is a common software verbs provider for RDMA networks.
diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c
index 8490fdb9c91e50..90fc234f489acd 100644
--- a/drivers/infiniband/sw/rdmavt/mr.c
+++ b/drivers/infiniband/sw/rdmavt/mr.c
@@ -324,8 +324,6 @@ static void __rvt_free_mr(struct rvt_mr *mr)
  * @acc: access flags
  *
  * Return: the memory region on success, otherwise returns an errno.
- * Note that all DMA addresses should be created via the functions in
- * struct dma_virt_ops.
  */
 struct ib_mr *rvt_get_dma_mr(struct ib_pd *pd, int acc)
 {
@@ -766,7 +764,7 @@ int rvt_lkey_ok(struct rvt_lkey_table *rkt, struct rvt_pd *pd,
 
 	/*
 	 * We use LKEY == zero for kernel virtual addresses
-	 * (see rvt_get_dma_mr() and dma_virt_ops).
+	 * (see rvt_get_dma_mr()).
 	 */
 	if (sge->lkey == 0) {
 		struct rvt_dev_info *dev = ib_to_rvt(pd->ibpd.device);
@@ -877,7 +875,7 @@ int rvt_rkey_ok(struct rvt_qp *qp, struct rvt_sge *sge,
 
 	/*
 	 * We use RKEY == zero for kernel virtual addresses
-	 * (see rvt_get_dma_mr() and dma_virt_ops).
+	 * (see rvt_get_dma_mr()).
 	 */
 	rcu_read_lock();
 	if (rkey == 0) {
diff --git a/drivers/infiniband/sw/rdmavt/vt.c b/drivers/infiniband/sw/rdmavt/vt.c
index 670a9623b46e11..d1bbe66610cfe4 100644
--- a/drivers/infiniband/sw/rdmavt/vt.c
+++ b/drivers/infiniband/sw/rdmavt/vt.c
@@ -524,7 +524,6 @@ static noinline int check_support(struct rvt_dev_info *rdi, int verb)
 int rvt_register_device(struct rvt_dev_info *rdi)
 {
 	int ret = 0, i;
-	u64 dma_mask;
 
 	if (!rdi)
 		return -EINVAL;
@@ -579,13 +578,6 @@ int rvt_register_device(struct rvt_dev_info *rdi)
 	/* Completion queues */
 	spin_lock_init(&rdi->n_cqs_lock);
 
-	/* DMA Operations */
-	rdi->ibdev.dev.dma_parms = rdi->ibdev.dev.parent->dma_parms;
-	dma_mask = IS_ENABLED(CONFIG_64BIT) ? DMA_BIT_MASK(64) : DMA_BIT_MASK(32);
-	ret = dma_coerce_mask_and_coherent(&rdi->ibdev.dev, dma_mask);
-	if (ret)
-		goto bail_wss;
-
 	/* Protection Domain */
 	spin_lock_init(&rdi->n_pds_lock);
 	rdi->n_pds_allocated = 0;
diff --git a/drivers/infiniband/sw/rxe/Kconfig b/drivers/infiniband/sw/rxe/Kconfig
index 8810bfa680495a..4521490667925f 100644
--- a/drivers/infiniband/sw/rxe/Kconfig
+++ b/drivers/infiniband/sw/rxe/Kconfig
@@ -5,7 +5,6 @@ config RDMA_RXE
 	depends on INFINIBAND_VIRT_DMA
 	select NET_UDP_TUNNEL
 	select CRYPTO_CRC32
-	select DMA_VIRT_OPS
 	help
 	This driver implements the InfiniBand RDMA transport over
 	the Linux network stack. It enables a system with a
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index f9c832e82552f9..9c66f76545b3c2 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1118,7 +1118,6 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
 	int err;
 	struct ib_device *dev = &rxe->ib_dev;
 	struct crypto_shash *tfm;
-	u64 dma_mask;
 
 	strlcpy(dev->node_desc, "rxe", sizeof(dev->node_desc));
 
@@ -1129,12 +1128,6 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
 	dev->local_dma_lkey = 0;
 	addrconf_addr_eui48((unsigned char *)&dev->node_guid,
 			    rxe->ndev->dev_addr);
-	dev->dev.dma_parms = &rxe->dma_parms;
-	dma_set_max_seg_size(&dev->dev, UINT_MAX);
-	dma_mask = IS_ENABLED(CONFIG_64BIT) ? DMA_BIT_MASK(64) : DMA_BIT_MASK(32);
-	err = dma_coerce_mask_and_coherent(&dev->dev, dma_mask);
-	if (err)
-		return err;
 
 	dev->uverbs_cmd_mask = BIT_ULL(IB_USER_VERBS_CMD_GET_CONTEXT)
 	    | BIT_ULL(IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL)
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 3414b341b7091f..4bf5d85a1ab3ce 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -352,7 +352,6 @@ struct rxe_port {
 struct rxe_dev {
 	struct ib_device	ib_dev;
 	struct ib_device_attr	attr;
-	struct device_dma_parameters dma_parms;
 	int			max_ucontext;
 	int			max_inline_data;
 	struct mutex	usdev_lock;
diff --git a/drivers/infiniband/sw/siw/Kconfig b/drivers/infiniband/sw/siw/Kconfig
index 3450ba5081df51..1b5105cbabaeed 100644
--- a/drivers/infiniband/sw/siw/Kconfig
+++ b/drivers/infiniband/sw/siw/Kconfig
@@ -2,7 +2,6 @@ config RDMA_SIW
 	tristate "Software RDMA over TCP/IP (iWARP) driver"
 	depends on INET && INFINIBAND && LIBCRC32C
 	depends on INFINIBAND_VIRT_DMA
-	select DMA_VIRT_OPS
 	help
 	This driver implements the iWARP RDMA transport over
 	the Linux TCP/IP network stack. It enables a system with a
diff --git a/drivers/infiniband/sw/siw/siw.h b/drivers/infiniband/sw/siw/siw.h
index e9753831ac3f33..adda7899621962 100644
--- a/drivers/infiniband/sw/siw/siw.h
+++ b/drivers/infiniband/sw/siw/siw.h
@@ -69,7 +69,6 @@ struct siw_pd {
 
 struct siw_device {
 	struct ib_device base_dev;
-	struct device_dma_parameters dma_parms;
 	struct net_device *netdev;
 	struct siw_dev_cap attrs;
 
diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
index 181e06c1c43d7e..c62a7a0d423c0e 100644
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -306,7 +306,6 @@ static struct siw_device *siw_device_create(struct net_device *netdev)
 	struct siw_device *sdev = NULL;
 	struct ib_device *base_dev;
 	struct device *parent = netdev->dev.parent;
-	u64 dma_mask;
 	int rv;
 
 	if (!parent) {
@@ -383,12 +382,6 @@ static struct siw_device *siw_device_create(struct net_device *netdev)
 	 */
 	base_dev->phys_port_cnt = 1;
 	base_dev->dev.parent = parent;
-	base_dev->dev.dma_parms = &sdev->dma_parms;
-	dma_set_max_seg_size(&base_dev->dev, UINT_MAX);
-	dma_mask = IS_ENABLED(CONFIG_64BIT) ? DMA_BIT_MASK(64) : DMA_BIT_MASK(32);
-	if (dma_coerce_mask_and_coherent(&base_dev->dev, dma_mask))
-		goto error;
-
 	base_dev->num_comp_vectors = num_possible_cpus();
 
 	xa_init_flags(&sdev->qp_xa, XA_FLAGS_ALLOC1);
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index ae6620489457d6..5c1e7cb7fe0dee 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -414,7 +414,8 @@ static int nvmet_rdma_alloc_rsp(struct nvmet_rdma_device *ndev,
 	if (ib_dma_mapping_error(ndev->device, r->send_sge.addr))
 		goto out_free_rsp;
 
-	r->req.p2p_client = &ndev->device->dev;
+	if (!ib_uses_virt_dma(ndev->device))
+		r->req.p2p_client = &ndev->device->dev;
 	r->send_sge.length = sizeof(*r->req.cqe);
 	r->send_sge.lkey = ndev->pd->local_dma_lkey;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 453793d1d2225f..18c67ba5c3b3e6 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3943,6 +3943,16 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt)
 		-ENOSYS;
 }
 
+/*
+ * Drivers that don't need a DMA mapping at the RDMA layer, set dma_device to
+ * NULL. This causes the ib_dma* helpers to just stash the kernel virtual
+ * address into the dma address.
+ */
+static inline bool ib_uses_virt_dma(struct ib_device *dev)
+{
+	return IS_ENABLED(CONFIG_INFINIBAND_VIRT_DMA) && !dev->dma_device;
+}
+
 /**
  * ib_dma_mapping_error - check a DMA addr for error
  * @dev: The device for which the dma_addr was created
@@ -3950,6 +3960,8 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt)
  */
 static inline int ib_dma_mapping_error(struct ib_device *dev, u64 dma_addr)
 {
+	if (ib_uses_virt_dma(dev))
+		return 0;
 	return dma_mapping_error(dev->dma_device, dma_addr);
 }
 
@@ -3964,6 +3976,8 @@ static inline u64 ib_dma_map_single(struct ib_device *dev,
 				    void *cpu_addr, size_t size,
 				    enum dma_data_direction direction)
 {
+	if (ib_uses_virt_dma(dev))
+		return (uintptr_t)cpu_addr;
 	return dma_map_single(dev->dma_device, cpu_addr, size, direction);
 }
 
@@ -3978,7 +3992,8 @@ static inline void ib_dma_unmap_single(struct ib_device *dev,
 				       u64 addr, size_t size,
 				       enum dma_data_direction direction)
 {
-	dma_unmap_single(dev->dma_device, addr, size, direction);
+	if (!ib_uses_virt_dma(dev))
+		dma_unmap_single(dev->dma_device, addr, size, direction);
 }
 
 /**
@@ -3995,6 +4010,8 @@ static inline u64 ib_dma_map_page(struct ib_device *dev,
 				  size_t size,
 					 enum dma_data_direction direction)
 {
+	if (ib_uses_virt_dma(dev))
+		return (uintptr_t)(page_address(page) + offset);
 	return dma_map_page(dev->dma_device, page, offset, size, direction);
 }
 
@@ -4009,7 +4026,30 @@ static inline void ib_dma_unmap_page(struct ib_device *dev,
 				     u64 addr, size_t size,
 				     enum dma_data_direction direction)
 {
-	dma_unmap_page(dev->dma_device, addr, size, direction);
+	if (!ib_uses_virt_dma(dev))
+		dma_unmap_page(dev->dma_device, addr, size, direction);
+}
+
+int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents);
+static inline int ib_dma_map_sg_attrs(struct ib_device *dev,
+				      struct scatterlist *sg, int nents,
+				      enum dma_data_direction direction,
+				      unsigned long dma_attrs)
+{
+	if (ib_uses_virt_dma(dev))
+		return ib_dma_virt_map_sg(dev, sg, nents);
+	return dma_map_sg_attrs(dev->dma_device, sg, nents, direction,
+				dma_attrs);
+}
+
+static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev,
+					 struct scatterlist *sg, int nents,
+					 enum dma_data_direction direction,
+					 unsigned long dma_attrs)
+{
+	if (!ib_uses_virt_dma(dev))
+		dma_unmap_sg_attrs(dev->dma_device, sg, nents, direction,
+				   dma_attrs);
 }
 
 /**
@@ -4023,7 +4063,7 @@ static inline int ib_dma_map_sg(struct ib_device *dev,
 				struct scatterlist *sg, int nents,
 				enum dma_data_direction direction)
 {
-	return dma_map_sg(dev->dma_device, sg, nents, direction);
+	return ib_dma_map_sg_attrs(dev, sg, nents, direction, 0);
 }
 
 /**
@@ -4037,24 +4077,7 @@ static inline void ib_dma_unmap_sg(struct ib_device *dev,
 				   struct scatterlist *sg, int nents,
 				   enum dma_data_direction direction)
 {
-	dma_unmap_sg(dev->dma_device, sg, nents, direction);
-}
-
-static inline int ib_dma_map_sg_attrs(struct ib_device *dev,
-				      struct scatterlist *sg, int nents,
-				      enum dma_data_direction direction,
-				      unsigned long dma_attrs)
-{
-	return dma_map_sg_attrs(dev->dma_device, sg, nents, direction,
-				dma_attrs);
-}
-
-static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev,
-					 struct scatterlist *sg, int nents,
-					 enum dma_data_direction direction,
-					 unsigned long dma_attrs)
-{
-	dma_unmap_sg_attrs(dev->dma_device, sg, nents, direction, dma_attrs);
+	ib_dma_unmap_sg_attrs(dev, sg, nents, direction, 0);
 }
 
 /**
@@ -4065,6 +4088,8 @@ static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev,
  */
 static inline unsigned int ib_dma_max_seg_size(struct ib_device *dev)
 {
+	if (ib_uses_virt_dma(dev))
+		return UINT_MAX;
 	return dma_get_max_seg_size(dev->dma_device);
 }
 
@@ -4080,7 +4105,8 @@ static inline void ib_dma_sync_single_for_cpu(struct ib_device *dev,
 					      size_t size,
 					      enum dma_data_direction dir)
 {
-	dma_sync_single_for_cpu(dev->dma_device, addr, size, dir);
+	if (!ib_uses_virt_dma(dev))
+		dma_sync_single_for_cpu(dev->dma_device, addr, size, dir);
 }
 
 /**
@@ -4095,7 +4121,8 @@ static inline void ib_dma_sync_single_for_device(struct ib_device *dev,
 						 size_t size,
 						 enum dma_data_direction dir)
 {
-	dma_sync_single_for_device(dev->dma_device, addr, size, dir);
+	if (!ib_uses_virt_dma(dev))
+		dma_sync_single_for_device(dev->dma_device, addr, size, dir);
 }
 
 /* ib_reg_user_mr - register a memory region for virtual addresses from kernel
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 08/10] PCI/P2PDMA: Remove the DMA_VIRT_OPS hacks
  2020-11-06 18:19 remove dma_virt_ops v2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2020-11-06 18:19 ` [PATCH 07/10] RDMA/core: remove use of dma_virt_ops Christoph Hellwig
@ 2020-11-06 18:19 ` Christoph Hellwig
  2020-11-06 18:19 ` [PATCH 09/10] PCI/P2PDMA: Cleanup __pci_p2pdma_map_sg a bit Christoph Hellwig
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2020-11-06 18:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, Santosh Shilimkar,
	linux-rdma, rds-devel, linux-pci, iommu

Now that all users of dma_virt_ops are gone we can remove the workaround
for it in the PCI peer to peer code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/p2pdma.c | 20 --------------------
 1 file changed, 20 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index de1c331dbed43f..b07018af53876c 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -556,15 +556,6 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
 		return -1;
 
 	for (i = 0; i < num_clients; i++) {
-#ifdef CONFIG_DMA_VIRT_OPS
-		if (clients[i]->dma_ops == &dma_virt_ops) {
-			if (verbose)
-				dev_warn(clients[i],
-					 "cannot be used for peer-to-peer DMA because the driver makes use of dma_virt_ops\n");
-			return -1;
-		}
-#endif
-
 		pci_client = find_parent_pci_dev(clients[i]);
 		if (!pci_client) {
 			if (verbose)
@@ -837,17 +828,6 @@ static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
 	phys_addr_t paddr;
 	int i;
 
-	/*
-	 * p2pdma mappings are not compatible with devices that use
-	 * dma_virt_ops. If the upper layers do the right thing
-	 * this should never happen because it will be prevented
-	 * by the check in pci_p2pdma_distance_many()
-	 */
-#ifdef CONFIG_DMA_VIRT_OPS
-	if (WARN_ON_ONCE(dev->dma_ops == &dma_virt_ops))
-		return 0;
-#endif
-
 	for_each_sg(sg, s, nents, i) {
 		paddr = sg_phys(s);
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 09/10] PCI/P2PDMA: Cleanup __pci_p2pdma_map_sg a bit
  2020-11-06 18:19 remove dma_virt_ops v2 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2020-11-06 18:19 ` [PATCH 08/10] PCI/P2PDMA: Remove the DMA_VIRT_OPS hacks Christoph Hellwig
@ 2020-11-06 18:19 ` Christoph Hellwig
  2020-11-06 18:19 ` [PATCH 10/10] dma-mapping: remove dma_virt_ops Christoph Hellwig
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2020-11-06 18:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, Santosh Shilimkar,
	linux-rdma, rds-devel, linux-pci, iommu

Remove the pointless paddr variable that was only used once.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/p2pdma.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index b07018af53876c..afd792cc272832 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -825,13 +825,10 @@ static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
 		struct device *dev, struct scatterlist *sg, int nents)
 {
 	struct scatterlist *s;
-	phys_addr_t paddr;
 	int i;
 
 	for_each_sg(sg, s, nents, i) {
-		paddr = sg_phys(s);
-
-		s->dma_address = paddr - p2p_pgmap->bus_offset;
+		s->dma_address = sg_phys(s) - p2p_pgmap->bus_offset;
 		sg_dma_len(s) = s->length;
 	}
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 10/10] dma-mapping: remove dma_virt_ops
  2020-11-06 18:19 remove dma_virt_ops v2 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2020-11-06 18:19 ` [PATCH 09/10] PCI/P2PDMA: Cleanup __pci_p2pdma_map_sg a bit Christoph Hellwig
@ 2020-11-06 18:19 ` Christoph Hellwig
  2020-11-12  9:40 ` remove dma_virt_ops v2 Christoph Hellwig
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2020-11-06 18:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, Santosh Shilimkar,
	linux-rdma, rds-devel, linux-pci, iommu

Now that the RDMA core deals with devices that only do DMA mapping in
lower layers properly, there is no user for dma_virt_ops and it can
be removed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/dma-mapping.h |  2 --
 kernel/dma/Kconfig          |  5 ---
 kernel/dma/Makefile         |  1 -
 kernel/dma/virt.c           | 61 -------------------------------------
 4 files changed, 69 deletions(-)
 delete mode 100644 kernel/dma/virt.c

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 956151052d4542..2aaed35b556df4 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -565,6 +565,4 @@ static inline int dma_mmap_wc(struct device *dev,
 int dma_direct_set_offset(struct device *dev, phys_addr_t cpu_start,
 		dma_addr_t dma_start, u64 size);
 
-extern const struct dma_map_ops dma_virt_ops;
-
 #endif /* _LINUX_DMA_MAPPING_H */
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index c99de4a2145889..fd2db2665fc691 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -75,11 +75,6 @@ config ARCH_HAS_DMA_PREP_COHERENT
 config ARCH_HAS_FORCE_DMA_UNENCRYPTED
 	bool
 
-config DMA_VIRT_OPS
-	bool
-	depends on HAS_DMA
-	select DMA_OPS
-
 config SWIOTLB
 	bool
 	select NEED_DMA_MAP_STATE
diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile
index dc755ab68aabf9..cd1d86358a7a62 100644
--- a/kernel/dma/Makefile
+++ b/kernel/dma/Makefile
@@ -5,7 +5,6 @@ obj-$(CONFIG_DMA_OPS)			+= ops_helpers.o
 obj-$(CONFIG_DMA_OPS)			+= dummy.o
 obj-$(CONFIG_DMA_CMA)			+= contiguous.o
 obj-$(CONFIG_DMA_DECLARE_COHERENT)	+= coherent.o
-obj-$(CONFIG_DMA_VIRT_OPS)		+= virt.o
 obj-$(CONFIG_DMA_API_DEBUG)		+= debug.o
 obj-$(CONFIG_SWIOTLB)			+= swiotlb.o
 obj-$(CONFIG_DMA_COHERENT_POOL)		+= pool.o
diff --git a/kernel/dma/virt.c b/kernel/dma/virt.c
deleted file mode 100644
index 59d32317dd574a..00000000000000
--- a/kernel/dma/virt.c
+++ /dev/null
@@ -1,61 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * DMA operations that map to virtual addresses without flushing memory.
- */
-#include <linux/export.h>
-#include <linux/mm.h>
-#include <linux/dma-map-ops.h>
-#include <linux/scatterlist.h>
-
-static void *dma_virt_alloc(struct device *dev, size_t size,
-			    dma_addr_t *dma_handle, gfp_t gfp,
-			    unsigned long attrs)
-{
-	void *ret;
-
-	ret = (void *)__get_free_pages(gfp | __GFP_ZERO, get_order(size));
-	if (ret)
-		*dma_handle = (uintptr_t)ret;
-	return ret;
-}
-
-static void dma_virt_free(struct device *dev, size_t size,
-			  void *cpu_addr, dma_addr_t dma_addr,
-			  unsigned long attrs)
-{
-	free_pages((unsigned long)cpu_addr, get_order(size));
-}
-
-static dma_addr_t dma_virt_map_page(struct device *dev, struct page *page,
-				    unsigned long offset, size_t size,
-				    enum dma_data_direction dir,
-				    unsigned long attrs)
-{
-	return (uintptr_t)(page_address(page) + offset);
-}
-
-static int dma_virt_map_sg(struct device *dev, struct scatterlist *sgl,
-			   int nents, enum dma_data_direction dir,
-			   unsigned long attrs)
-{
-	int i;
-	struct scatterlist *sg;
-
-	for_each_sg(sgl, sg, nents, i) {
-		BUG_ON(!sg_page(sg));
-		sg_dma_address(sg) = (uintptr_t)sg_virt(sg);
-		sg_dma_len(sg) = sg->length;
-	}
-
-	return nents;
-}
-
-const struct dma_map_ops dma_virt_ops = {
-	.alloc			= dma_virt_alloc,
-	.free			= dma_virt_free,
-	.map_page		= dma_virt_map_page,
-	.map_sg			= dma_virt_map_sg,
-	.alloc_pages		= dma_common_alloc_pages,
-	.free_pages		= dma_common_free_pages,
-};
-EXPORT_SYMBOL(dma_virt_ops);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: remove dma_virt_ops v2
  2020-11-06 18:19 remove dma_virt_ops v2 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2020-11-06 18:19 ` [PATCH 10/10] dma-mapping: remove dma_virt_ops Christoph Hellwig
@ 2020-11-12  9:40 ` Christoph Hellwig
  2020-11-12 13:23   ` Jason Gunthorpe
  2020-11-12 16:59 ` Jason Gunthorpe
  2020-11-17 19:41 ` Jason Gunthorpe
  12 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2020-11-12  9:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, Santosh Shilimkar,
	linux-rdma, rds-devel, linux-pci, iommu

ping?

On Fri, Nov 06, 2020 at 07:19:31PM +0100, Christoph Hellwig wrote:
> Hi Jason,
> 
> this series switches the RDMA core to opencode the special case of
> devices bypassing the DMA mapping in the RDMA ULPs.  The virt ops
> have caused a bit of trouble due to the P2P code node working with
> them due to the fact that we'd do two dma mapping iterations for a
> single I/O, but also are a bit of layering violation and lead to
> more code than necessary.
> 
> Tested with nvme-rdma over rxe.
> 
> Note that the rds changes are untested, as I could not find any
> simple rds test setup.
> 
> Changes since v2:
>  - simplify the INFINIBAND_VIRT_DMA dependencies
>  - add a ib_uses_virt_dma helper
>  - use ib_uses_virt_dma in nvmet-rdma to disable p2p for virt_dma devices
>  - use ib_dma_max_seg_size in umem
>  - stop using dmapool in rds
> 
> Changes since v1:
>  - disable software RDMA drivers for highmem configs
>  - update the PCI commit logs
---end quoted text---

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: remove dma_virt_ops v2
  2020-11-12  9:40 ` remove dma_virt_ops v2 Christoph Hellwig
@ 2020-11-12 13:23   ` Jason Gunthorpe
  2020-11-12 17:36     ` santosh.shilimkar
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2020-11-12 13:23 UTC (permalink / raw)
  To: Christoph Hellwig, Santosh Shilimkar
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, linux-rdma, rds-devel,
	linux-pci, iommu

On Thu, Nov 12, 2020 at 10:40:30AM +0100, Christoph Hellwig wrote:
> ping?
> 
> On Fri, Nov 06, 2020 at 07:19:31PM +0100, Christoph Hellwig wrote:
> > Hi Jason,
> > 
> > this series switches the RDMA core to opencode the special case of
> > devices bypassing the DMA mapping in the RDMA ULPs.  The virt ops
> > have caused a bit of trouble due to the P2P code node working with
> > them due to the fact that we'd do two dma mapping iterations for a
> > single I/O, but also are a bit of layering violation and lead to
> > more code than necessary.
> > 
> > Tested with nvme-rdma over rxe.
> > 
> > Note that the rds changes are untested, as I could not find any
> > simple rds test setup.
> > 
> > Changes since v2:
> >  - simplify the INFINIBAND_VIRT_DMA dependencies
> >  - add a ib_uses_virt_dma helper
> >  - use ib_uses_virt_dma in nvmet-rdma to disable p2p for virt_dma devices
> >  - use ib_dma_max_seg_size in umem
> >  - stop using dmapool in rds
> > 
> > Changes since v1:
> >  - disable software RDMA drivers for highmem configs
> >  - update the PCI commit logs

Santosh can you please check the RDA parts??

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: remove dma_virt_ops v2
  2020-11-06 18:19 remove dma_virt_ops v2 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2020-11-12  9:40 ` remove dma_virt_ops v2 Christoph Hellwig
@ 2020-11-12 16:59 ` Jason Gunthorpe
  2020-11-12 17:09   ` Christoph Hellwig
  2020-11-17 19:41 ` Jason Gunthorpe
  12 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2020-11-12 16:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, Santosh Shilimkar,
	linux-rdma, rds-devel, linux-pci, iommu

On Fri, Nov 06, 2020 at 07:19:31PM +0100, Christoph Hellwig wrote:
> Hi Jason,
> 
> this series switches the RDMA core to opencode the special case of
> devices bypassing the DMA mapping in the RDMA ULPs.  The virt ops
> have caused a bit of trouble due to the P2P code node working with
> them due to the fact that we'd do two dma mapping iterations for a
> single I/O, but also are a bit of layering violation and lead to
> more code than necessary.
> 
> Tested with nvme-rdma over rxe.
> 
> Note that the rds changes are untested, as I could not find any
> simple rds test setup.
> 
> Changes since v2:
>  - simplify the INFINIBAND_VIRT_DMA dependencies
>  - add a ib_uses_virt_dma helper
>  - use ib_uses_virt_dma in nvmet-rdma to disable p2p for virt_dma devices
>  - use ib_dma_max_seg_size in umem
>  - stop using dmapool in rds
> 
> Changes since v1:
>  - disable software RDMA drivers for highmem configs
>  - update the PCI commit logs

Lets give Santosh a little longer for RDS, I've grabbed the precursor
parts to for-next for now:

 nvme-rdma: Use ibdev_to_node instead of dereferencing ->dma_device
 RDMA: Lift ibdev_to_node from rds to common code
 RDMA/core: Remove ib_dma_{alloc,free}_coherent
 RDMA/umem: Use ib_dma_max_seg_size instead of dma_get_max_seg_size
 RMDA/sw: Don't allow drivers using dma_virt_ops on highmem configs

Will get the rest next week regardless.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: remove dma_virt_ops v2
  2020-11-12 16:59 ` Jason Gunthorpe
@ 2020-11-12 17:09   ` Christoph Hellwig
  2020-11-12 17:39     ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2020-11-12 17:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Bjorn Helgaas, Bernard Metzler, Zhu Yanjun,
	Logan Gunthorpe, Dennis Dalessandro, Mike Marciniszyn,
	Santosh Shilimkar, linux-rdma, rds-devel, linux-pci, iommu

On Thu, Nov 12, 2020 at 12:59:35PM -0400, Jason Gunthorpe wrote:
>  RMDA/sw: Don't allow drivers using dma_virt_ops on highmem configs

I think this one actually is something needed in 5.10 and -stable.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: remove dma_virt_ops v2
  2020-11-12 13:23   ` Jason Gunthorpe
@ 2020-11-12 17:36     ` santosh.shilimkar
  2020-11-17 10:50       ` Ka-Cheong Poon
  0 siblings, 1 reply; 22+ messages in thread
From: santosh.shilimkar @ 2020-11-12 17:36 UTC (permalink / raw)
  To: Jason Gunthorpe, Ka-Cheong Poon
  Cc: Christoph Hellwig, Bjorn Helgaas, Bernard Metzler, Zhu Yanjun,
	Logan Gunthorpe, Dennis Dalessandro, Mike Marciniszyn,
	linux-rdma, rds-devel, linux-pci, iommu

+ Ka-Cheong

On 11/12/20 5:23 AM, Jason Gunthorpe wrote:
> On Thu, Nov 12, 2020 at 10:40:30AM +0100, Christoph Hellwig wrote:
>> ping?
>>
>> On Fri, Nov 06, 2020 at 07:19:31PM +0100, Christoph Hellwig wrote:
>>> Hi Jason,
>>>
>>> this series switches the RDMA core to opencode the special case of
>>> devices bypassing the DMA mapping in the RDMA ULPs.  The virt ops
>>> have caused a bit of trouble due to the P2P code node working with
>>> them due to the fact that we'd do two dma mapping iterations for a
>>> single I/O, but also are a bit of layering violation and lead to
>>> more code than necessary.
>>>
>>> Tested with nvme-rdma over rxe.
>>>
>>> Note that the rds changes are untested, as I could not find any
>>> simple rds test setup.
>>>
>>> Changes since v2:
>>>   - simplify the INFINIBAND_VIRT_DMA dependencies
>>>   - add a ib_uses_virt_dma helper
>>>   - use ib_uses_virt_dma in nvmet-rdma to disable p2p for virt_dma devices
>>>   - use ib_dma_max_seg_size in umem
>>>   - stop using dmapool in rds
>>>
>>> Changes since v1:
>>>   - disable software RDMA drivers for highmem configs
>>>   - update the PCI commit logs
> 
> Santosh can you please check the RDA parts??
> 

Hi Ka-Cheong,

Can you please check Christoph change [1] which clean-up
dma-pool API to use ib_dma_* and slab allocator ? This was added
as part of your "net/rds: Use DMA memory pool allocation for rds_header"
commit.


Regards,
Santosh

[1] https://www.spinics.net/lists/linux-pci/msg101547.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: remove dma_virt_ops v2
  2020-11-12 17:09   ` Christoph Hellwig
@ 2020-11-12 17:39     ` Jason Gunthorpe
  2020-11-13  8:50       ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2020-11-12 17:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, Santosh Shilimkar,
	linux-rdma, rds-devel, linux-pci, iommu

On Thu, Nov 12, 2020 at 06:09:56PM +0100, Christoph Hellwig wrote:
> On Thu, Nov 12, 2020 at 12:59:35PM -0400, Jason Gunthorpe wrote:
> >  RMDA/sw: Don't allow drivers using dma_virt_ops on highmem configs
> 
> I think this one actually is something needed in 5.10 and -stable.

Done, I added a

Fixes: 551199aca1c3 ("lib/dma-virt: Add dma_virt_ops")

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: remove dma_virt_ops v2
  2020-11-12 17:39     ` Jason Gunthorpe
@ 2020-11-13  8:50       ` Christoph Hellwig
  2020-11-17 14:01         ` Mike Marciniszyn
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2020-11-13  8:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Bjorn Helgaas, Bernard Metzler, Zhu Yanjun,
	Logan Gunthorpe, Dennis Dalessandro, Mike Marciniszyn,
	Santosh Shilimkar, linux-rdma, rds-devel, linux-pci, iommu

On Thu, Nov 12, 2020 at 01:39:06PM -0400, Jason Gunthorpe wrote:
> On Thu, Nov 12, 2020 at 06:09:56PM +0100, Christoph Hellwig wrote:
> > On Thu, Nov 12, 2020 at 12:59:35PM -0400, Jason Gunthorpe wrote:
> > >  RMDA/sw: Don't allow drivers using dma_virt_ops on highmem configs
> > 
> > I think this one actually is something needed in 5.10 and -stable.
> 
> Done, I added a
> 
> Fixes: 551199aca1c3 ("lib/dma-virt: Add dma_virt_ops")

Note that the drivers had open coded versions of this earlier.  I think
this goes back to the addition of the qib driver which is now gone
or the addition of the hfi1 or rxe drivers for something that still
matters.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: remove dma_virt_ops v2
  2020-11-12 17:36     ` santosh.shilimkar
@ 2020-11-17 10:50       ` Ka-Cheong Poon
  2020-11-17 19:10         ` santosh.shilimkar
  0 siblings, 1 reply; 22+ messages in thread
From: Ka-Cheong Poon @ 2020-11-17 10:50 UTC (permalink / raw)
  To: santosh.shilimkar, Jason Gunthorpe
  Cc: Christoph Hellwig, Bjorn Helgaas, Bernard Metzler, Zhu Yanjun,
	Logan Gunthorpe, Dennis Dalessandro, Mike Marciniszyn,
	linux-rdma, rds-devel, linux-pci, iommu

On 11/13/20 1:36 AM, santosh.shilimkar@oracle.com wrote:
> + Ka-Cheong
> 
> On 11/12/20 5:23 AM, Jason Gunthorpe wrote:
>> On Thu, Nov 12, 2020 at 10:40:30AM +0100, Christoph Hellwig wrote:
>>> ping?
>>>
>>> On Fri, Nov 06, 2020 at 07:19:31PM +0100, Christoph Hellwig wrote:
>>>> Hi Jason,
>>>>
>>>> this series switches the RDMA core to opencode the special case of
>>>> devices bypassing the DMA mapping in the RDMA ULPs.  The virt ops
>>>> have caused a bit of trouble due to the P2P code node working with
>>>> them due to the fact that we'd do two dma mapping iterations for a
>>>> single I/O, but also are a bit of layering violation and lead to
>>>> more code than necessary.
>>>>
>>>> Tested with nvme-rdma over rxe.
>>>>
>>>> Note that the rds changes are untested, as I could not find any
>>>> simple rds test setup.
>>>>
>>>> Changes since v2:
>>>>   - simplify the INFINIBAND_VIRT_DMA dependencies
>>>>   - add a ib_uses_virt_dma helper
>>>>   - use ib_uses_virt_dma in nvmet-rdma to disable p2p for virt_dma devices
>>>>   - use ib_dma_max_seg_size in umem
>>>>   - stop using dmapool in rds
>>>>
>>>> Changes since v1:
>>>>   - disable software RDMA drivers for highmem configs
>>>>   - update the PCI commit logs
>>
>> Santosh can you please check the RDA parts??
>>
> 
> Hi Ka-Cheong,
> 
> Can you please check Christoph change [1] which clean-up
> dma-pool API to use ib_dma_* and slab allocator ? This was added
> as part of your "net/rds: Use DMA memory pool allocation for rds_header"
> commit.


I applied the patch and ran some basic testing.  And it seems to
work fine.

Thanks.


-- 
K. Poon
ka-cheong.poon@oracle.com



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: remove dma_virt_ops v2
  2020-11-13  8:50       ` Christoph Hellwig
@ 2020-11-17 14:01         ` Mike Marciniszyn
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Marciniszyn @ 2020-11-17 14:01 UTC (permalink / raw)
  To: Christoph Hellwig, Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Santosh Shilimkar, linux-rdma, rds-devel,
	linux-pci, iommu

>> Fixes: 551199aca1c3 ("lib/dma-virt: Add dma_virt_ops")
> 
> Note that the drivers had open coded versions of this earlier.  I think
> this goes back to the addition of the qib driver which is now gone
> or the addition of the hfi1 or rxe drivers for something that still
> matters

Christoph,Jason

I built a branch using the following recipe:
https://patchwork.kernel.org/project/linux-rdma/patch/:
20201106181941.1878556-11-hch@lst.de/ dma-mapping: remove dma_virt_ops 
20201106181941.1878556-10-hch@lst.de/ PCI/P2PDMA: Cleanup 
__pci_p2pdma_map_sg a bit
20201106181941.1878556-9-hch@lst.de/  PCI/P2PDMA: Remove the 
DMA_VIRT_OPS hacks
20201106181941.1878556-8-hch@lst.de/  RDMA/core: remove use of dma_virt_ops
20201106181941.1878556-7-hch@lst.de/  RDMA/core: remove 
ib_dma_{alloc,free}_coherent
20201106181941.1878556-6-hch@lst.de/  rds: stop using dmapool
20201106181941.1878556-5-hch@lst.de/  nvme-rdma: use ibdev_to_node 
instead of dereferencing ->dma_device
20201106181941.1878556-4-hch@lst.de/  RDMA: lift ibdev_to_node from rds 
to common code
20201106181941.1878556-3-hch@lst.de/  RDMA/umem: use ib_dma_max_seg_size 
instead of dma_get_max_seg_size
rdma/for-rc dabbd6abcdbe which has RMDA/sw: don't allow drivers using 
dma_virt_ops on highmem configs

All of our rdmavt/hfi1 tests passed.

So I can at least vouch for "RDMA/core: remove use of dma_virt_ops"

Mike
Tested-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: remove dma_virt_ops v2
  2020-11-17 10:50       ` Ka-Cheong Poon
@ 2020-11-17 19:10         ` santosh.shilimkar
  0 siblings, 0 replies; 22+ messages in thread
From: santosh.shilimkar @ 2020-11-17 19:10 UTC (permalink / raw)
  To: Ka-Cheong Poon, Jason Gunthorpe
  Cc: Christoph Hellwig, Bjorn Helgaas, Bernard Metzler, Zhu Yanjun,
	Logan Gunthorpe, Dennis Dalessandro, Mike Marciniszyn,
	linux-rdma, rds-devel, linux-pci, iommu

On 11/17/20 2:50 AM, Ka-Cheong Poon wrote:
> On 11/13/20 1:36 AM, santosh.shilimkar@oracle.com wrote:
>> + Ka-Cheong
>>
>> On 11/12/20 5:23 AM, Jason Gunthorpe wrote:
>>> On Thu, Nov 12, 2020 at 10:40:30AM +0100, Christoph Hellwig wrote:
>>>> ping?
>>>>
>>>> On Fri, Nov 06, 2020 at 07:19:31PM +0100, Christoph Hellwig wrote:
>>>>> Hi Jason,
>>>>>
>>>>> this series switches the RDMA core to opencode the special case of
>>>>> devices bypassing the DMA mapping in the RDMA ULPs.  The virt ops
>>>>> have caused a bit of trouble due to the P2P code node working with
>>>>> them due to the fact that we'd do two dma mapping iterations for a
>>>>> single I/O, but also are a bit of layering violation and lead to
>>>>> more code than necessary.
>>>>>
>>>>> Tested with nvme-rdma over rxe.
>>>>>
>>>>> Note that the rds changes are untested, as I could not find any
>>>>> simple rds test setup.
>>>>>
>>>>> Changes since v2:
>>>>>   - simplify the INFINIBAND_VIRT_DMA dependencies
>>>>>   - add a ib_uses_virt_dma helper
>>>>>   - use ib_uses_virt_dma in nvmet-rdma to disable p2p for virt_dma 
>>>>> devices
>>>>>   - use ib_dma_max_seg_size in umem
>>>>>   - stop using dmapool in rds
>>>>>
>>>>> Changes since v1:
>>>>>   - disable software RDMA drivers for highmem configs
>>>>>   - update the PCI commit logs
>>>
>>> Santosh can you please check the RDA parts??
>>>
>>
>> Hi Ka-Cheong,
>>
>> Can you please check Christoph change [1] which clean-up
>> dma-pool API to use ib_dma_* and slab allocator ? This was added
>> as part of your "net/rds: Use DMA memory pool allocation for rds_header"
>> commit.
> 
> 
> I applied the patch and ran some basic testing.  And it seems to
> work fine.
> 
Thanks Ka-Cheong.

Jason, Feel free to add ack for the RDS part.

Regards,
Santosh



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: remove dma_virt_ops v2
  2020-11-06 18:19 remove dma_virt_ops v2 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2020-11-12 16:59 ` Jason Gunthorpe
@ 2020-11-17 19:41 ` Jason Gunthorpe
  12 siblings, 0 replies; 22+ messages in thread
From: Jason Gunthorpe @ 2020-11-17 19:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, Santosh Shilimkar,
	linux-rdma, rds-devel, linux-pci, iommu

On Fri, Nov 06, 2020 at 07:19:31PM +0100, Christoph Hellwig wrote:
> Hi Jason,
> 
> this series switches the RDMA core to opencode the special case of
> devices bypassing the DMA mapping in the RDMA ULPs.  The virt ops
> have caused a bit of trouble due to the P2P code node working with
> them due to the fact that we'd do two dma mapping iterations for a
> single I/O, but also are a bit of layering violation and lead to
> more code than necessary.
> 
> Tested with nvme-rdma over rxe.
> 
> Note that the rds changes are untested, as I could not find any
> simple rds test setup.
> 
> Changes since v2:
>  - simplify the INFINIBAND_VIRT_DMA dependencies
>  - add a ib_uses_virt_dma helper
>  - use ib_uses_virt_dma in nvmet-rdma to disable p2p for virt_dma devices
>  - use ib_dma_max_seg_size in umem
>  - stop using dmapool in rds
> 
> Changes since v1:
>  - disable software RDMA drivers for highmem configs
>  - update the PCI commit logs

All applied to for-next, thanks everyone

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-11-17 19:42 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-06 18:19 remove dma_virt_ops v2 Christoph Hellwig
2020-11-06 18:19 ` [PATCH 01/10] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs Christoph Hellwig
2020-11-06 18:19 ` [PATCH 02/10] RDMA/umem: use ib_dma_max_seg_size instead of dma_get_max_seg_size Christoph Hellwig
2020-11-06 18:19 ` [PATCH 03/10] RDMA: lift ibdev_to_node from rds to common code Christoph Hellwig
2020-11-06 18:19 ` [PATCH 04/10] nvme-rdma: use ibdev_to_node instead of dereferencing ->dma_device Christoph Hellwig
2020-11-06 18:19 ` [PATCH 05/10] rds: stop using dmapool Christoph Hellwig
2020-11-06 18:19 ` [PATCH 06/10] RDMA/core: remove ib_dma_{alloc,free}_coherent Christoph Hellwig
2020-11-06 18:19 ` [PATCH 07/10] RDMA/core: remove use of dma_virt_ops Christoph Hellwig
2020-11-06 18:19 ` [PATCH 08/10] PCI/P2PDMA: Remove the DMA_VIRT_OPS hacks Christoph Hellwig
2020-11-06 18:19 ` [PATCH 09/10] PCI/P2PDMA: Cleanup __pci_p2pdma_map_sg a bit Christoph Hellwig
2020-11-06 18:19 ` [PATCH 10/10] dma-mapping: remove dma_virt_ops Christoph Hellwig
2020-11-12  9:40 ` remove dma_virt_ops v2 Christoph Hellwig
2020-11-12 13:23   ` Jason Gunthorpe
2020-11-12 17:36     ` santosh.shilimkar
2020-11-17 10:50       ` Ka-Cheong Poon
2020-11-17 19:10         ` santosh.shilimkar
2020-11-12 16:59 ` Jason Gunthorpe
2020-11-12 17:09   ` Christoph Hellwig
2020-11-12 17:39     ` Jason Gunthorpe
2020-11-13  8:50       ` Christoph Hellwig
2020-11-17 14:01         ` Mike Marciniszyn
2020-11-17 19:41 ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).