linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* remove dma_virt_ops v2
@ 2020-11-05  7:41 Christoph Hellwig
  2020-11-05  7:42 ` [PATCH 1/6] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs Christoph Hellwig
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-11-05  7:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, linux-rdma, linux-pci,
	iommu

Hi Jason,

this series switches the RDMA core to opencode the special case of
devices bypassing the DMA mapping in the RDMA ULPs.  The virt ops
have caused a bit of trouble due to the P2P code node working with
them due to the fact that we'd do two dma mapping iterations for a
single I/O, but also are a bit of layering violation and lead to
more code than necessary.

Tested with nvme-rdma over rxe.

Changes since v1:
 - disable software RDMA drivers for highmem configs
 - update the PCI commit logs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/6] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs
  2020-11-05  7:41 remove dma_virt_ops v2 Christoph Hellwig
@ 2020-11-05  7:42 ` Christoph Hellwig
  2020-11-05 12:15   ` Robin Murphy
  2020-11-05 14:41   ` Jason Gunthorpe
  2020-11-05  7:42 ` [PATCH 2/6] RDMA/core: remove ib_dma_{alloc,free}_coherent Christoph Hellwig
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-11-05  7:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, linux-rdma, linux-pci,
	iommu

dma_virt_ops requires that all pages have a kernel virtual address.
Introduce a INFINIBAND_VIRT_DMA Kconfig symbol that depends on !HIGHMEM
and a large enough dma_addr_t, and make all three driver depend on the
new symbol.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/Kconfig           | 6 ++++++
 drivers/infiniband/sw/rdmavt/Kconfig | 3 ++-
 drivers/infiniband/sw/rxe/Kconfig    | 2 +-
 drivers/infiniband/sw/siw/Kconfig    | 1 +
 4 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 32a51432ec4f73..81acaf5fb5be67 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -73,6 +73,12 @@ config INFINIBAND_ADDR_TRANS_CONFIGFS
 	  This allows the user to config the default GID type that the CM
 	  uses for each device, when initiaing new connections.
 
+config INFINIBAND_VIRT_DMA
+	bool
+	default y
+	depends on !HIGHMEM
+	depends on !64BIT || ARCH_DMA_ADDR_T_64BIT
+
 if INFINIBAND_USER_ACCESS || !INFINIBAND_USER_ACCESS
 source "drivers/infiniband/hw/mthca/Kconfig"
 source "drivers/infiniband/hw/qib/Kconfig"
diff --git a/drivers/infiniband/sw/rdmavt/Kconfig b/drivers/infiniband/sw/rdmavt/Kconfig
index 9ef5f5ce1ff6b0..c8e268082952b0 100644
--- a/drivers/infiniband/sw/rdmavt/Kconfig
+++ b/drivers/infiniband/sw/rdmavt/Kconfig
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0-only
 config INFINIBAND_RDMAVT
 	tristate "RDMA verbs transport library"
-	depends on X86_64 && ARCH_DMA_ADDR_T_64BIT
+	depends on INFINIBAND_VIRT_DMA
+	depends on X86_64
 	depends on PCI
 	select DMA_VIRT_OPS
 	help
diff --git a/drivers/infiniband/sw/rxe/Kconfig b/drivers/infiniband/sw/rxe/Kconfig
index a0c6c7dfc1814f..8810bfa680495a 100644
--- a/drivers/infiniband/sw/rxe/Kconfig
+++ b/drivers/infiniband/sw/rxe/Kconfig
@@ -2,7 +2,7 @@
 config RDMA_RXE
 	tristate "Software RDMA over Ethernet (RoCE) driver"
 	depends on INET && PCI && INFINIBAND
-	depends on !64BIT || ARCH_DMA_ADDR_T_64BIT
+	depends on INFINIBAND_VIRT_DMA
 	select NET_UDP_TUNNEL
 	select CRYPTO_CRC32
 	select DMA_VIRT_OPS
diff --git a/drivers/infiniband/sw/siw/Kconfig b/drivers/infiniband/sw/siw/Kconfig
index b622fc62f2cd6d..3450ba5081df51 100644
--- a/drivers/infiniband/sw/siw/Kconfig
+++ b/drivers/infiniband/sw/siw/Kconfig
@@ -1,6 +1,7 @@
 config RDMA_SIW
 	tristate "Software RDMA over TCP/IP (iWARP) driver"
 	depends on INET && INFINIBAND && LIBCRC32C
+	depends on INFINIBAND_VIRT_DMA
 	select DMA_VIRT_OPS
 	help
 	This driver implements the iWARP RDMA transport over
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/6] RDMA/core: remove ib_dma_{alloc,free}_coherent
  2020-11-05  7:41 remove dma_virt_ops v2 Christoph Hellwig
  2020-11-05  7:42 ` [PATCH 1/6] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs Christoph Hellwig
@ 2020-11-05  7:42 ` Christoph Hellwig
  2020-11-05  7:42 ` [PATCH 3/6] RDMA/core: remove use of dma_virt_ops Christoph Hellwig
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-11-05  7:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, linux-rdma, linux-pci,
	iommu

These two functions are entirely unused.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/rdma/ib_verbs.h | 29 -----------------------------
 1 file changed, 29 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9bf6c319a670e2..5f8fd7976034e0 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4098,35 +4098,6 @@ static inline void ib_dma_sync_single_for_device(struct ib_device *dev,
 	dma_sync_single_for_device(dev->dma_device, addr, size, dir);
 }
 
-/**
- * ib_dma_alloc_coherent - Allocate memory and map it for DMA
- * @dev: The device for which the DMA address is requested
- * @size: The size of the region to allocate in bytes
- * @dma_handle: A pointer for returning the DMA address of the region
- * @flag: memory allocator flags
- */
-static inline void *ib_dma_alloc_coherent(struct ib_device *dev,
-					   size_t size,
-					   dma_addr_t *dma_handle,
-					   gfp_t flag)
-{
-	return dma_alloc_coherent(dev->dma_device, size, dma_handle, flag);
-}
-
-/**
- * ib_dma_free_coherent - Free memory allocated by ib_dma_alloc_coherent()
- * @dev: The device for which the DMA addresses were allocated
- * @size: The size of the region
- * @cpu_addr: the address returned by ib_dma_alloc_coherent()
- * @dma_handle: the DMA address returned by ib_dma_alloc_coherent()
- */
-static inline void ib_dma_free_coherent(struct ib_device *dev,
-					size_t size, void *cpu_addr,
-					dma_addr_t dma_handle)
-{
-	dma_free_coherent(dev->dma_device, size, cpu_addr, dma_handle);
-}
-
 /* ib_reg_user_mr - register a memory region for virtual addresses from kernel
  * space. This function should be called when 'current' is the owning MM.
  */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/6] RDMA/core: remove use of dma_virt_ops
  2020-11-05  7:41 remove dma_virt_ops v2 Christoph Hellwig
  2020-11-05  7:42 ` [PATCH 1/6] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs Christoph Hellwig
  2020-11-05  7:42 ` [PATCH 2/6] RDMA/core: remove ib_dma_{alloc,free}_coherent Christoph Hellwig
@ 2020-11-05  7:42 ` Christoph Hellwig
  2020-11-05 14:34   ` Jason Gunthorpe
  2020-11-05 17:52   ` Jason Gunthorpe
  2020-11-05  7:42 ` [PATCH 4/6] PCI/P2PDMA: Remove the DMA_VIRT_OPS hacks Christoph Hellwig
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-11-05  7:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, linux-rdma, linux-pci,
	iommu

Use the ib_dma_* helpers to skip the DMA translation instead.  This
removes the last user if dma_virt_ops and keeps the weird layering
violation inside the RDMA core instead of burderning the DMA mapping
subsystems with it.  This also means the software RDMA drivers now
don't have to mess with DMA parameters that are not relevant to them
at all, and that in the future we can use PCI P2P transfers even for
software RDMA, as there is no first fake layer of DMA mapping that
the P2P DMA support.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/core/device.c      | 41 ++++++++++---------
 drivers/infiniband/core/rw.c          |  2 +-
 drivers/infiniband/sw/rdmavt/Kconfig  |  1 -
 drivers/infiniband/sw/rdmavt/mr.c     |  6 +--
 drivers/infiniband/sw/rdmavt/vt.c     |  8 ----
 drivers/infiniband/sw/rxe/Kconfig     |  1 -
 drivers/infiniband/sw/rxe/rxe_verbs.c |  7 ----
 drivers/infiniband/sw/rxe/rxe_verbs.h |  1 -
 drivers/infiniband/sw/siw/Kconfig     |  1 -
 drivers/infiniband/sw/siw/siw.h       |  1 -
 drivers/infiniband/sw/siw/siw_main.c  |  7 ----
 include/rdma/ib_verbs.h               | 59 ++++++++++++++++++---------
 12 files changed, 64 insertions(+), 71 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index a3b1fc84cdcab9..528de41487521b 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1177,25 +1177,6 @@ static int assign_name(struct ib_device *device, const char *name)
 	return ret;
 }
 
-static void setup_dma_device(struct ib_device *device,
-			     struct device *dma_device)
-{
-	/*
-	 * If the caller does not provide a DMA capable device then the IB
-	 * device will be used. In this case the caller should fully setup the
-	 * ibdev for DMA. This usually means using dma_virt_ops.
-	 */
-#ifdef CONFIG_DMA_VIRT_OPS
-	if (!dma_device) {
-		device->dev.dma_ops = &dma_virt_ops;
-		dma_device = &device->dev;
-	}
-#endif
-	WARN_ON(!dma_device);
-	device->dma_device = dma_device;
-	WARN_ON(!device->dma_device->dma_parms);
-}
-
 /*
  * setup_device() allocates memory and sets up data that requires calling the
  * device ops, this is the only reason these actions are not done during
@@ -1341,7 +1322,14 @@ int ib_register_device(struct ib_device *device, const char *name,
 	if (ret)
 		return ret;
 
-	setup_dma_device(device, dma_device);
+	/*
+	 * If the caller does not provide a DMA capable device then the IB core
+	 * will set up ib_sge and scatterlist structures that stash the kernel
+	 * virtual address into the address field.
+	 */
+	device->dma_device = dma_device;
+	WARN_ON(dma_device && !dma_device->dma_parms);
+
 	ret = setup_device(device);
 	if (ret)
 		return ret;
@@ -2675,6 +2663,19 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 }
 EXPORT_SYMBOL(ib_set_device_ops);
 
+int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents)
+{
+	struct scatterlist *s;
+	int i;
+
+	for_each_sg(sg, s, nents, i) {
+		sg_dma_address(s) = (uintptr_t)sg_virt(s);
+		sg_dma_len(s) = s->length;
+	}
+	return nents;
+}
+EXPORT_SYMBOL(ib_dma_virt_map_sg);
+
 static const struct rdma_nl_cbs ibnl_ls_cb_table[RDMA_NL_LS_NUM_OPS] = {
 	[RDMA_NL_LS_OP_RESOLVE] = {
 		.doit = ib_nl_handle_resolve_resp,
diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 13f43ab7220b05..ae5a629ecc6772 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -276,7 +276,7 @@ static int rdma_rw_init_single_wr(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
 static void rdma_rw_unmap_sg(struct ib_device *dev, struct scatterlist *sg,
 			     u32 sg_cnt, enum dma_data_direction dir)
 {
-	if (is_pci_p2pdma_page(sg_page(sg)))
+	if (dev->dma_device && is_pci_p2pdma_page(sg_page(sg)))
 		pci_p2pdma_unmap_sg(dev->dma_device, sg, sg_cnt, dir);
 	else
 		ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
diff --git a/drivers/infiniband/sw/rdmavt/Kconfig b/drivers/infiniband/sw/rdmavt/Kconfig
index c8e268082952b0..0df48b3a6b56c5 100644
--- a/drivers/infiniband/sw/rdmavt/Kconfig
+++ b/drivers/infiniband/sw/rdmavt/Kconfig
@@ -4,6 +4,5 @@ config INFINIBAND_RDMAVT
 	depends on INFINIBAND_VIRT_DMA
 	depends on X86_64
 	depends on PCI
-	select DMA_VIRT_OPS
 	help
 	This is a common software verbs provider for RDMA networks.
diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c
index 8490fdb9c91e50..90fc234f489acd 100644
--- a/drivers/infiniband/sw/rdmavt/mr.c
+++ b/drivers/infiniband/sw/rdmavt/mr.c
@@ -324,8 +324,6 @@ static void __rvt_free_mr(struct rvt_mr *mr)
  * @acc: access flags
  *
  * Return: the memory region on success, otherwise returns an errno.
- * Note that all DMA addresses should be created via the functions in
- * struct dma_virt_ops.
  */
 struct ib_mr *rvt_get_dma_mr(struct ib_pd *pd, int acc)
 {
@@ -766,7 +764,7 @@ int rvt_lkey_ok(struct rvt_lkey_table *rkt, struct rvt_pd *pd,
 
 	/*
 	 * We use LKEY == zero for kernel virtual addresses
-	 * (see rvt_get_dma_mr() and dma_virt_ops).
+	 * (see rvt_get_dma_mr()).
 	 */
 	if (sge->lkey == 0) {
 		struct rvt_dev_info *dev = ib_to_rvt(pd->ibpd.device);
@@ -877,7 +875,7 @@ int rvt_rkey_ok(struct rvt_qp *qp, struct rvt_sge *sge,
 
 	/*
 	 * We use RKEY == zero for kernel virtual addresses
-	 * (see rvt_get_dma_mr() and dma_virt_ops).
+	 * (see rvt_get_dma_mr()).
 	 */
 	rcu_read_lock();
 	if (rkey == 0) {
diff --git a/drivers/infiniband/sw/rdmavt/vt.c b/drivers/infiniband/sw/rdmavt/vt.c
index 670a9623b46e11..d1bbe66610cfe4 100644
--- a/drivers/infiniband/sw/rdmavt/vt.c
+++ b/drivers/infiniband/sw/rdmavt/vt.c
@@ -524,7 +524,6 @@ static noinline int check_support(struct rvt_dev_info *rdi, int verb)
 int rvt_register_device(struct rvt_dev_info *rdi)
 {
 	int ret = 0, i;
-	u64 dma_mask;
 
 	if (!rdi)
 		return -EINVAL;
@@ -579,13 +578,6 @@ int rvt_register_device(struct rvt_dev_info *rdi)
 	/* Completion queues */
 	spin_lock_init(&rdi->n_cqs_lock);
 
-	/* DMA Operations */
-	rdi->ibdev.dev.dma_parms = rdi->ibdev.dev.parent->dma_parms;
-	dma_mask = IS_ENABLED(CONFIG_64BIT) ? DMA_BIT_MASK(64) : DMA_BIT_MASK(32);
-	ret = dma_coerce_mask_and_coherent(&rdi->ibdev.dev, dma_mask);
-	if (ret)
-		goto bail_wss;
-
 	/* Protection Domain */
 	spin_lock_init(&rdi->n_pds_lock);
 	rdi->n_pds_allocated = 0;
diff --git a/drivers/infiniband/sw/rxe/Kconfig b/drivers/infiniband/sw/rxe/Kconfig
index 8810bfa680495a..4521490667925f 100644
--- a/drivers/infiniband/sw/rxe/Kconfig
+++ b/drivers/infiniband/sw/rxe/Kconfig
@@ -5,7 +5,6 @@ config RDMA_RXE
 	depends on INFINIBAND_VIRT_DMA
 	select NET_UDP_TUNNEL
 	select CRYPTO_CRC32
-	select DMA_VIRT_OPS
 	help
 	This driver implements the InfiniBand RDMA transport over
 	the Linux network stack. It enables a system with a
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index f9c832e82552f9..9c66f76545b3c2 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1118,7 +1118,6 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
 	int err;
 	struct ib_device *dev = &rxe->ib_dev;
 	struct crypto_shash *tfm;
-	u64 dma_mask;
 
 	strlcpy(dev->node_desc, "rxe", sizeof(dev->node_desc));
 
@@ -1129,12 +1128,6 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
 	dev->local_dma_lkey = 0;
 	addrconf_addr_eui48((unsigned char *)&dev->node_guid,
 			    rxe->ndev->dev_addr);
-	dev->dev.dma_parms = &rxe->dma_parms;
-	dma_set_max_seg_size(&dev->dev, UINT_MAX);
-	dma_mask = IS_ENABLED(CONFIG_64BIT) ? DMA_BIT_MASK(64) : DMA_BIT_MASK(32);
-	err = dma_coerce_mask_and_coherent(&dev->dev, dma_mask);
-	if (err)
-		return err;
 
 	dev->uverbs_cmd_mask = BIT_ULL(IB_USER_VERBS_CMD_GET_CONTEXT)
 	    | BIT_ULL(IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL)
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 3414b341b7091f..4bf5d85a1ab3ce 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -352,7 +352,6 @@ struct rxe_port {
 struct rxe_dev {
 	struct ib_device	ib_dev;
 	struct ib_device_attr	attr;
-	struct device_dma_parameters dma_parms;
 	int			max_ucontext;
 	int			max_inline_data;
 	struct mutex	usdev_lock;
diff --git a/drivers/infiniband/sw/siw/Kconfig b/drivers/infiniband/sw/siw/Kconfig
index 3450ba5081df51..1b5105cbabaeed 100644
--- a/drivers/infiniband/sw/siw/Kconfig
+++ b/drivers/infiniband/sw/siw/Kconfig
@@ -2,7 +2,6 @@ config RDMA_SIW
 	tristate "Software RDMA over TCP/IP (iWARP) driver"
 	depends on INET && INFINIBAND && LIBCRC32C
 	depends on INFINIBAND_VIRT_DMA
-	select DMA_VIRT_OPS
 	help
 	This driver implements the iWARP RDMA transport over
 	the Linux TCP/IP network stack. It enables a system with a
diff --git a/drivers/infiniband/sw/siw/siw.h b/drivers/infiniband/sw/siw/siw.h
index e9753831ac3f33..adda7899621962 100644
--- a/drivers/infiniband/sw/siw/siw.h
+++ b/drivers/infiniband/sw/siw/siw.h
@@ -69,7 +69,6 @@ struct siw_pd {
 
 struct siw_device {
 	struct ib_device base_dev;
-	struct device_dma_parameters dma_parms;
 	struct net_device *netdev;
 	struct siw_dev_cap attrs;
 
diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
index 181e06c1c43d7e..c62a7a0d423c0e 100644
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -306,7 +306,6 @@ static struct siw_device *siw_device_create(struct net_device *netdev)
 	struct siw_device *sdev = NULL;
 	struct ib_device *base_dev;
 	struct device *parent = netdev->dev.parent;
-	u64 dma_mask;
 	int rv;
 
 	if (!parent) {
@@ -383,12 +382,6 @@ static struct siw_device *siw_device_create(struct net_device *netdev)
 	 */
 	base_dev->phys_port_cnt = 1;
 	base_dev->dev.parent = parent;
-	base_dev->dev.dma_parms = &sdev->dma_parms;
-	dma_set_max_seg_size(&base_dev->dev, UINT_MAX);
-	dma_mask = IS_ENABLED(CONFIG_64BIT) ? DMA_BIT_MASK(64) : DMA_BIT_MASK(32);
-	if (dma_coerce_mask_and_coherent(&base_dev->dev, dma_mask))
-		goto error;
-
 	base_dev->num_comp_vectors = num_possible_cpus();
 
 	xa_init_flags(&sdev->qp_xa, XA_FLAGS_ALLOC1);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5f8fd7976034e0..661e4a22b3cb81 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3950,6 +3950,8 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt)
  */
 static inline int ib_dma_mapping_error(struct ib_device *dev, u64 dma_addr)
 {
+	if (!dev->dma_device)
+		return 0;
 	return dma_mapping_error(dev->dma_device, dma_addr);
 }
 
@@ -3964,6 +3966,8 @@ static inline u64 ib_dma_map_single(struct ib_device *dev,
 				    void *cpu_addr, size_t size,
 				    enum dma_data_direction direction)
 {
+	if (!dev->dma_device)
+		return (uintptr_t)cpu_addr;
 	return dma_map_single(dev->dma_device, cpu_addr, size, direction);
 }
 
@@ -3978,6 +3982,8 @@ static inline void ib_dma_unmap_single(struct ib_device *dev,
 				       u64 addr, size_t size,
 				       enum dma_data_direction direction)
 {
+	if (!dev->dma_device)
+		return;
 	dma_unmap_single(dev->dma_device, addr, size, direction);
 }
 
@@ -3995,6 +4001,8 @@ static inline u64 ib_dma_map_page(struct ib_device *dev,
 				  size_t size,
 					 enum dma_data_direction direction)
 {
+	if (!dev->dma_device)
+		return (uintptr_t)(page_address(page) + offset);
 	return dma_map_page(dev->dma_device, page, offset, size, direction);
 }
 
@@ -4009,9 +4017,33 @@ static inline void ib_dma_unmap_page(struct ib_device *dev,
 				     u64 addr, size_t size,
 				     enum dma_data_direction direction)
 {
+	if (!dev->dma_device)
+		return;
 	dma_unmap_page(dev->dma_device, addr, size, direction);
 }
 
+int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents);
+static inline int ib_dma_map_sg_attrs(struct ib_device *dev,
+				      struct scatterlist *sg, int nents,
+				      enum dma_data_direction direction,
+				      unsigned long dma_attrs)
+{
+	if (!dev->dma_device)
+		return ib_dma_virt_map_sg(dev, sg, nents);
+	return dma_map_sg_attrs(dev->dma_device, sg, nents, direction,
+				dma_attrs);
+}
+
+static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev,
+					 struct scatterlist *sg, int nents,
+					 enum dma_data_direction direction,
+					 unsigned long dma_attrs)
+{
+	if (!dev->dma_device)
+		return;
+	dma_unmap_sg_attrs(dev->dma_device, sg, nents, direction, dma_attrs);
+}
+
 /**
  * ib_dma_map_sg - Map a scatter/gather list to DMA addresses
  * @dev: The device for which the DMA addresses are to be created
@@ -4023,7 +4055,7 @@ static inline int ib_dma_map_sg(struct ib_device *dev,
 				struct scatterlist *sg, int nents,
 				enum dma_data_direction direction)
 {
-	return dma_map_sg(dev->dma_device, sg, nents, direction);
+	return ib_dma_map_sg_attrs(dev, sg, nents, direction, 0);
 }
 
 /**
@@ -4037,24 +4069,7 @@ static inline void ib_dma_unmap_sg(struct ib_device *dev,
 				   struct scatterlist *sg, int nents,
 				   enum dma_data_direction direction)
 {
-	dma_unmap_sg(dev->dma_device, sg, nents, direction);
-}
-
-static inline int ib_dma_map_sg_attrs(struct ib_device *dev,
-				      struct scatterlist *sg, int nents,
-				      enum dma_data_direction direction,
-				      unsigned long dma_attrs)
-{
-	return dma_map_sg_attrs(dev->dma_device, sg, nents, direction,
-				dma_attrs);
-}
-
-static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev,
-					 struct scatterlist *sg, int nents,
-					 enum dma_data_direction direction,
-					 unsigned long dma_attrs)
-{
-	dma_unmap_sg_attrs(dev->dma_device, sg, nents, direction, dma_attrs);
+	ib_dma_unmap_sg_attrs(dev, sg, nents, direction, 0);
 }
 
 /**
@@ -4065,6 +4080,8 @@ static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev,
  */
 static inline unsigned int ib_dma_max_seg_size(struct ib_device *dev)
 {
+	if (!dev->dma_device)
+		return UINT_MAX;
 	return dma_get_max_seg_size(dev->dma_device);
 }
 
@@ -4080,6 +4097,8 @@ static inline void ib_dma_sync_single_for_cpu(struct ib_device *dev,
 					      size_t size,
 					      enum dma_data_direction dir)
 {
+	if (!dev->dma_device)
+		return;
 	dma_sync_single_for_cpu(dev->dma_device, addr, size, dir);
 }
 
@@ -4095,6 +4114,8 @@ static inline void ib_dma_sync_single_for_device(struct ib_device *dev,
 						 size_t size,
 						 enum dma_data_direction dir)
 {
+	if (!dev->dma_device)
+		return;
 	dma_sync_single_for_device(dev->dma_device, addr, size, dir);
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/6] PCI/P2PDMA: Remove the DMA_VIRT_OPS hacks
  2020-11-05  7:41 remove dma_virt_ops v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2020-11-05  7:42 ` [PATCH 3/6] RDMA/core: remove use of dma_virt_ops Christoph Hellwig
@ 2020-11-05  7:42 ` Christoph Hellwig
  2020-11-05 14:34   ` Jason Gunthorpe
  2020-11-05  7:42 ` [PATCH 5/6] PCI/P2PDMA: Cleanup __pci_p2pdma_map_sg a bit Christoph Hellwig
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2020-11-05  7:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, linux-rdma, linux-pci,
	iommu

Now that all users of dma_virt_ops are gone we can remove the workaround
for it in the PCI peer to peer code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/p2pdma.c | 20 --------------------
 1 file changed, 20 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index de1c331dbed43f..b07018af53876c 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -556,15 +556,6 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
 		return -1;
 
 	for (i = 0; i < num_clients; i++) {
-#ifdef CONFIG_DMA_VIRT_OPS
-		if (clients[i]->dma_ops == &dma_virt_ops) {
-			if (verbose)
-				dev_warn(clients[i],
-					 "cannot be used for peer-to-peer DMA because the driver makes use of dma_virt_ops\n");
-			return -1;
-		}
-#endif
-
 		pci_client = find_parent_pci_dev(clients[i]);
 		if (!pci_client) {
 			if (verbose)
@@ -837,17 +828,6 @@ static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
 	phys_addr_t paddr;
 	int i;
 
-	/*
-	 * p2pdma mappings are not compatible with devices that use
-	 * dma_virt_ops. If the upper layers do the right thing
-	 * this should never happen because it will be prevented
-	 * by the check in pci_p2pdma_distance_many()
-	 */
-#ifdef CONFIG_DMA_VIRT_OPS
-	if (WARN_ON_ONCE(dev->dma_ops == &dma_virt_ops))
-		return 0;
-#endif
-
 	for_each_sg(sg, s, nents, i) {
 		paddr = sg_phys(s);
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/6] PCI/P2PDMA: Cleanup __pci_p2pdma_map_sg a bit
  2020-11-05  7:41 remove dma_virt_ops v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2020-11-05  7:42 ` [PATCH 4/6] PCI/P2PDMA: Remove the DMA_VIRT_OPS hacks Christoph Hellwig
@ 2020-11-05  7:42 ` Christoph Hellwig
  2020-11-05  7:42 ` [PATCH 6/6] dma-mapping: remove dma_virt_ops Christoph Hellwig
  2020-11-05 20:32 ` [PATCH 1/6] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs Bernard Metzler
  6 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-11-05  7:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, linux-rdma, linux-pci,
	iommu

Remove the pointless paddr variable that was only used once.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/p2pdma.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index b07018af53876c..afd792cc272832 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -825,13 +825,10 @@ static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
 		struct device *dev, struct scatterlist *sg, int nents)
 {
 	struct scatterlist *s;
-	phys_addr_t paddr;
 	int i;
 
 	for_each_sg(sg, s, nents, i) {
-		paddr = sg_phys(s);
-
-		s->dma_address = paddr - p2p_pgmap->bus_offset;
+		s->dma_address = sg_phys(s) - p2p_pgmap->bus_offset;
 		sg_dma_len(s) = s->length;
 	}
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/6] dma-mapping: remove dma_virt_ops
  2020-11-05  7:41 remove dma_virt_ops v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2020-11-05  7:42 ` [PATCH 5/6] PCI/P2PDMA: Cleanup __pci_p2pdma_map_sg a bit Christoph Hellwig
@ 2020-11-05  7:42 ` Christoph Hellwig
  2020-11-05 20:32 ` [PATCH 1/6] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs Bernard Metzler
  6 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-11-05  7:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, linux-rdma, linux-pci,
	iommu

Now that the RDMA core deals with devices that only do DMA mapping in
lower layers properly, there is no user for dma_virt_ops and it can
be removed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/dma-mapping.h |  2 --
 kernel/dma/Kconfig          |  5 ---
 kernel/dma/Makefile         |  1 -
 kernel/dma/virt.c           | 61 -------------------------------------
 4 files changed, 69 deletions(-)
 delete mode 100644 kernel/dma/virt.c

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 956151052d4542..2aaed35b556df4 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -565,6 +565,4 @@ static inline int dma_mmap_wc(struct device *dev,
 int dma_direct_set_offset(struct device *dev, phys_addr_t cpu_start,
 		dma_addr_t dma_start, u64 size);
 
-extern const struct dma_map_ops dma_virt_ops;
-
 #endif /* _LINUX_DMA_MAPPING_H */
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index c99de4a2145889..fd2db2665fc691 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -75,11 +75,6 @@ config ARCH_HAS_DMA_PREP_COHERENT
 config ARCH_HAS_FORCE_DMA_UNENCRYPTED
 	bool
 
-config DMA_VIRT_OPS
-	bool
-	depends on HAS_DMA
-	select DMA_OPS
-
 config SWIOTLB
 	bool
 	select NEED_DMA_MAP_STATE
diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile
index dc755ab68aabf9..cd1d86358a7a62 100644
--- a/kernel/dma/Makefile
+++ b/kernel/dma/Makefile
@@ -5,7 +5,6 @@ obj-$(CONFIG_DMA_OPS)			+= ops_helpers.o
 obj-$(CONFIG_DMA_OPS)			+= dummy.o
 obj-$(CONFIG_DMA_CMA)			+= contiguous.o
 obj-$(CONFIG_DMA_DECLARE_COHERENT)	+= coherent.o
-obj-$(CONFIG_DMA_VIRT_OPS)		+= virt.o
 obj-$(CONFIG_DMA_API_DEBUG)		+= debug.o
 obj-$(CONFIG_SWIOTLB)			+= swiotlb.o
 obj-$(CONFIG_DMA_COHERENT_POOL)		+= pool.o
diff --git a/kernel/dma/virt.c b/kernel/dma/virt.c
deleted file mode 100644
index 59d32317dd574a..00000000000000
--- a/kernel/dma/virt.c
+++ /dev/null
@@ -1,61 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * DMA operations that map to virtual addresses without flushing memory.
- */
-#include <linux/export.h>
-#include <linux/mm.h>
-#include <linux/dma-map-ops.h>
-#include <linux/scatterlist.h>
-
-static void *dma_virt_alloc(struct device *dev, size_t size,
-			    dma_addr_t *dma_handle, gfp_t gfp,
-			    unsigned long attrs)
-{
-	void *ret;
-
-	ret = (void *)__get_free_pages(gfp | __GFP_ZERO, get_order(size));
-	if (ret)
-		*dma_handle = (uintptr_t)ret;
-	return ret;
-}
-
-static void dma_virt_free(struct device *dev, size_t size,
-			  void *cpu_addr, dma_addr_t dma_addr,
-			  unsigned long attrs)
-{
-	free_pages((unsigned long)cpu_addr, get_order(size));
-}
-
-static dma_addr_t dma_virt_map_page(struct device *dev, struct page *page,
-				    unsigned long offset, size_t size,
-				    enum dma_data_direction dir,
-				    unsigned long attrs)
-{
-	return (uintptr_t)(page_address(page) + offset);
-}
-
-static int dma_virt_map_sg(struct device *dev, struct scatterlist *sgl,
-			   int nents, enum dma_data_direction dir,
-			   unsigned long attrs)
-{
-	int i;
-	struct scatterlist *sg;
-
-	for_each_sg(sgl, sg, nents, i) {
-		BUG_ON(!sg_page(sg));
-		sg_dma_address(sg) = (uintptr_t)sg_virt(sg);
-		sg_dma_len(sg) = sg->length;
-	}
-
-	return nents;
-}
-
-const struct dma_map_ops dma_virt_ops = {
-	.alloc			= dma_virt_alloc,
-	.free			= dma_virt_free,
-	.map_page		= dma_virt_map_page,
-	.map_sg			= dma_virt_map_sg,
-	.alloc_pages		= dma_common_alloc_pages,
-	.free_pages		= dma_common_free_pages,
-};
-EXPORT_SYMBOL(dma_virt_ops);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs
  2020-11-05  7:42 ` [PATCH 1/6] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs Christoph Hellwig
@ 2020-11-05 12:15   ` Robin Murphy
  2020-11-05 17:00     ` Christoph Hellwig
  2020-11-05 14:41   ` Jason Gunthorpe
  1 sibling, 1 reply; 26+ messages in thread
From: Robin Murphy @ 2020-11-05 12:15 UTC (permalink / raw)
  To: Christoph Hellwig, Jason Gunthorpe
  Cc: Zhu Yanjun, Dennis Dalessandro, linux-rdma, linux-pci,
	Mike Marciniszyn, iommu, Bjorn Helgaas, Bernard Metzler,
	Logan Gunthorpe

On 2020-11-05 07:42, Christoph Hellwig wrote:
> dma_virt_ops requires that all pages have a kernel virtual address.
> Introduce a INFINIBAND_VIRT_DMA Kconfig symbol that depends on !HIGHMEM
> and a large enough dma_addr_t, and make all three driver depend on the
> new symbol.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/infiniband/Kconfig           | 6 ++++++
>   drivers/infiniband/sw/rdmavt/Kconfig | 3 ++-
>   drivers/infiniband/sw/rxe/Kconfig    | 2 +-
>   drivers/infiniband/sw/siw/Kconfig    | 1 +
>   4 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
> index 32a51432ec4f73..81acaf5fb5be67 100644
> --- a/drivers/infiniband/Kconfig
> +++ b/drivers/infiniband/Kconfig
> @@ -73,6 +73,12 @@ config INFINIBAND_ADDR_TRANS_CONFIGFS
>   	  This allows the user to config the default GID type that the CM
>   	  uses for each device, when initiaing new connections.
>   
> +config INFINIBAND_VIRT_DMA
> +	bool
> +	default y
> +	depends on !HIGHMEM
> +	depends on !64BIT || ARCH_DMA_ADDR_T_64BIT

Isn't that effectively always true now since 4965a68780c5? I had a quick 
try of manually overriding CONFIG_ARCH_DMA_ADDR_T_64BIT in my .config, 
and the build just forces it back to "=y".

Robin.

> +
>   if INFINIBAND_USER_ACCESS || !INFINIBAND_USER_ACCESS
>   source "drivers/infiniband/hw/mthca/Kconfig"
>   source "drivers/infiniband/hw/qib/Kconfig"
> diff --git a/drivers/infiniband/sw/rdmavt/Kconfig b/drivers/infiniband/sw/rdmavt/Kconfig
> index 9ef5f5ce1ff6b0..c8e268082952b0 100644
> --- a/drivers/infiniband/sw/rdmavt/Kconfig
> +++ b/drivers/infiniband/sw/rdmavt/Kconfig
> @@ -1,7 +1,8 @@
>   # SPDX-License-Identifier: GPL-2.0-only
>   config INFINIBAND_RDMAVT
>   	tristate "RDMA verbs transport library"
> -	depends on X86_64 && ARCH_DMA_ADDR_T_64BIT
> +	depends on INFINIBAND_VIRT_DMA
> +	depends on X86_64
>   	depends on PCI
>   	select DMA_VIRT_OPS
>   	help
> diff --git a/drivers/infiniband/sw/rxe/Kconfig b/drivers/infiniband/sw/rxe/Kconfig
> index a0c6c7dfc1814f..8810bfa680495a 100644
> --- a/drivers/infiniband/sw/rxe/Kconfig
> +++ b/drivers/infiniband/sw/rxe/Kconfig
> @@ -2,7 +2,7 @@
>   config RDMA_RXE
>   	tristate "Software RDMA over Ethernet (RoCE) driver"
>   	depends on INET && PCI && INFINIBAND
> -	depends on !64BIT || ARCH_DMA_ADDR_T_64BIT
> +	depends on INFINIBAND_VIRT_DMA
>   	select NET_UDP_TUNNEL
>   	select CRYPTO_CRC32
>   	select DMA_VIRT_OPS
> diff --git a/drivers/infiniband/sw/siw/Kconfig b/drivers/infiniband/sw/siw/Kconfig
> index b622fc62f2cd6d..3450ba5081df51 100644
> --- a/drivers/infiniband/sw/siw/Kconfig
> +++ b/drivers/infiniband/sw/siw/Kconfig
> @@ -1,6 +1,7 @@
>   config RDMA_SIW
>   	tristate "Software RDMA over TCP/IP (iWARP) driver"
>   	depends on INET && INFINIBAND && LIBCRC32C
> +	depends on INFINIBAND_VIRT_DMA
>   	select DMA_VIRT_OPS
>   	help
>   	This driver implements the iWARP RDMA transport over
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/6] RDMA/core: remove use of dma_virt_ops
  2020-11-05  7:42 ` [PATCH 3/6] RDMA/core: remove use of dma_virt_ops Christoph Hellwig
@ 2020-11-05 14:34   ` Jason Gunthorpe
  2020-11-05 17:09     ` Christoph Hellwig
  2020-11-05 17:52   ` Jason Gunthorpe
  1 sibling, 1 reply; 26+ messages in thread
From: Jason Gunthorpe @ 2020-11-05 14:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, linux-rdma, linux-pci,
	iommu

On Thu, Nov 05, 2020 at 08:42:02AM +0100, Christoph Hellwig wrote:
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 5f8fd7976034e0..661e4a22b3cb81 100644
> +++ b/include/rdma/ib_verbs.h
> @@ -3950,6 +3950,8 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt)
>   */
>  static inline int ib_dma_mapping_error(struct ib_device *dev, u64 dma_addr)
>  {
> +	if (!dev->dma_device)
> +		return 0;

How about:

static inline bool ib_uses_virt_dma(struct ib_device *dev)
{
	return IS_ENABLED(CONFIG_INFINIBAND_VIRT_DMA) && !dev->dma_device;
}

Which is a a little more guidance that driver authors need to set this
config symbol.

Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] PCI/P2PDMA: Remove the DMA_VIRT_OPS hacks
  2020-11-05  7:42 ` [PATCH 4/6] PCI/P2PDMA: Remove the DMA_VIRT_OPS hacks Christoph Hellwig
@ 2020-11-05 14:34   ` Jason Gunthorpe
  2020-11-05 17:08     ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Gunthorpe @ 2020-11-05 14:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, linux-rdma, linux-pci,
	iommu

On Thu, Nov 05, 2020 at 08:42:03AM +0100, Christoph Hellwig wrote:
> Now that all users of dma_virt_ops are gone we can remove the workaround
> for it in the PCI peer to peer code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>  drivers/pci/p2pdma.c | 20 --------------------
>  1 file changed, 20 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index de1c331dbed43f..b07018af53876c 100644
> +++ b/drivers/pci/p2pdma.c
> @@ -556,15 +556,6 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
>  		return -1;
>  
>  	for (i = 0; i < num_clients; i++) {
> -#ifdef CONFIG_DMA_VIRT_OPS
> -		if (clients[i]->dma_ops == &dma_virt_ops) {
> -			if (verbose)
> -				dev_warn(clients[i],
> -					 "cannot be used for peer-to-peer DMA because the driver makes use of dma_virt_ops\n");
> -			return -1;
> -		}
> -#endif
> -
>  		pci_client = find_parent_pci_dev(clients[i]);
>  		if (!pci_client) {
>  			if (verbose)
> @@ -837,17 +828,6 @@ static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
>  	phys_addr_t paddr;
>  	int i;
>  
> -	/*
> -	 * p2pdma mappings are not compatible with devices that use
> -	 * dma_virt_ops. If the upper layers do the right thing
> -	 * this should never happen because it will be prevented
> -	 * by the check in pci_p2pdma_distance_many()
> -	 */
> -#ifdef CONFIG_DMA_VIRT_OPS
> -	if (WARN_ON_ONCE(dev->dma_ops == &dma_virt_ops))
> -		return 0;
> -#endif

The check is removed here, but I didn't see a matching check added to
the IB side? Something like:

static int rdma_rw_map_sg(struct ib_device *dev, struct scatterlist *sg,
			  u32 sg_cnt, enum dma_data_direction dir)
{
	if (is_pci_p2pdma_page(sg_page(sg))) {
		if (ib_uses_virt_dma(dev))
			return 0;
		return pci_p2pdma_map_sg(dev->dma_device, sg, sg_cnt, dir);
	}
	return ib_dma_map_sg(dev, sg, sg_cnt, dir);
}

I think the change to rdma_rw_unmap_sg() should probably be dropped in
favour of the above?

Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs
  2020-11-05  7:42 ` [PATCH 1/6] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs Christoph Hellwig
  2020-11-05 12:15   ` Robin Murphy
@ 2020-11-05 14:41   ` Jason Gunthorpe
  2020-11-05 15:29     ` Robin Murphy
  1 sibling, 1 reply; 26+ messages in thread
From: Jason Gunthorpe @ 2020-11-05 14:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, linux-rdma, linux-pci,
	iommu

On Thu, Nov 05, 2020 at 08:42:00AM +0100, Christoph Hellwig wrote:
> dma_virt_ops requires that all pages have a kernel virtual address.
> Introduce a INFINIBAND_VIRT_DMA Kconfig symbol that depends on !HIGHMEM
> and a large enough dma_addr_t, and make all three driver depend on the
> new symbol.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>  drivers/infiniband/Kconfig           | 6 ++++++
>  drivers/infiniband/sw/rdmavt/Kconfig | 3 ++-
>  drivers/infiniband/sw/rxe/Kconfig    | 2 +-
>  drivers/infiniband/sw/siw/Kconfig    | 1 +
>  4 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
> index 32a51432ec4f73..81acaf5fb5be67 100644
> +++ b/drivers/infiniband/Kconfig
> @@ -73,6 +73,12 @@ config INFINIBAND_ADDR_TRANS_CONFIGFS
>  	  This allows the user to config the default GID type that the CM
>  	  uses for each device, when initiaing new connections.
>  
> +config INFINIBAND_VIRT_DMA
> +	bool
> +	default y

Oh, I haven't seen this kconfig trick with default before..

> +	depends on !HIGHMEM
> +	depends on !64BIT || ARCH_DMA_ADDR_T_64BIT
> +
>  if INFINIBAND_USER_ACCESS || !INFINIBAND_USER_ACCESS
>  source "drivers/infiniband/hw/mthca/Kconfig"
>  source "drivers/infiniband/hw/qib/Kconfig"
> diff --git a/drivers/infiniband/sw/rdmavt/Kconfig b/drivers/infiniband/sw/rdmavt/Kconfig
> index 9ef5f5ce1ff6b0..c8e268082952b0 100644
> +++ b/drivers/infiniband/sw/rdmavt/Kconfig
> @@ -1,7 +1,8 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  config INFINIBAND_RDMAVT
>  	tristate "RDMA verbs transport library"
> -	depends on X86_64 && ARCH_DMA_ADDR_T_64BIT
> +	depends on INFINIBAND_VIRT_DMA

Usually I would expect a non-menu item to be used with select not
'depends on' - is the use of default avoiding that?

This looks nice

Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs
  2020-11-05 14:41   ` Jason Gunthorpe
@ 2020-11-05 15:29     ` Robin Murphy
  2020-11-05 17:03       ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Robin Murphy @ 2020-11-05 15:29 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Hellwig
  Cc: Zhu Yanjun, Dennis Dalessandro, linux-rdma, linux-pci,
	Mike Marciniszyn, iommu, Bjorn Helgaas, Bernard Metzler,
	Logan Gunthorpe

On 2020-11-05 14:41, Jason Gunthorpe wrote:
> On Thu, Nov 05, 2020 at 08:42:00AM +0100, Christoph Hellwig wrote:
>> dma_virt_ops requires that all pages have a kernel virtual address.
>> Introduce a INFINIBAND_VIRT_DMA Kconfig symbol that depends on !HIGHMEM
>> and a large enough dma_addr_t, and make all three driver depend on the
>> new symbol.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>   drivers/infiniband/Kconfig           | 6 ++++++
>>   drivers/infiniband/sw/rdmavt/Kconfig | 3 ++-
>>   drivers/infiniband/sw/rxe/Kconfig    | 2 +-
>>   drivers/infiniband/sw/siw/Kconfig    | 1 +
>>   4 files changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
>> index 32a51432ec4f73..81acaf5fb5be67 100644
>> +++ b/drivers/infiniband/Kconfig
>> @@ -73,6 +73,12 @@ config INFINIBAND_ADDR_TRANS_CONFIGFS
>>   	  This allows the user to config the default GID type that the CM
>>   	  uses for each device, when initiaing new connections.
>>   
>> +config INFINIBAND_VIRT_DMA
>> +	bool
>> +	default y
> 
> Oh, I haven't seen this kconfig trick with default before..

It's commonly done using the "def_bool" shorthand. I fact, I think 
simply "def_bool !HIGHMEM" would suffice for the fundamental definition 
here.

>> +	depends on !HIGHMEM
>> +	depends on !64BIT || ARCH_DMA_ADDR_T_64BIT
>> +
>>   if INFINIBAND_USER_ACCESS || !INFINIBAND_USER_ACCESS
>>   source "drivers/infiniband/hw/mthca/Kconfig"
>>   source "drivers/infiniband/hw/qib/Kconfig"
>> diff --git a/drivers/infiniband/sw/rdmavt/Kconfig b/drivers/infiniband/sw/rdmavt/Kconfig
>> index 9ef5f5ce1ff6b0..c8e268082952b0 100644
>> +++ b/drivers/infiniband/sw/rdmavt/Kconfig
>> @@ -1,7 +1,8 @@
>>   # SPDX-License-Identifier: GPL-2.0-only
>>   config INFINIBAND_RDMAVT
>>   	tristate "RDMA verbs transport library"
>> -	depends on X86_64 && ARCH_DMA_ADDR_T_64BIT
>> +	depends on INFINIBAND_VIRT_DMA
> 
> Usually I would expect a non-menu item to be used with select not
> 'depends on' - is the use of default avoiding that?

A select wouldn't make any sense here - if the user chooses to enable 
the subsystem it can't automatically pull in "the absence of highmem" 
from the arch code; there's still a literal dependency on certain 
conditions being met for the option to be available. The intermediate 
config symbol just abstracts that set of conditions.

Robin.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs
  2020-11-05 12:15   ` Robin Murphy
@ 2020-11-05 17:00     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-11-05 17:00 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Christoph Hellwig, Jason Gunthorpe, Zhu Yanjun,
	Dennis Dalessandro, linux-rdma, linux-pci, Mike Marciniszyn,
	iommu, Bjorn Helgaas, Bernard Metzler, Logan Gunthorpe

On Thu, Nov 05, 2020 at 12:15:46PM +0000, Robin Murphy wrote:
> On 2020-11-05 07:42, Christoph Hellwig wrote:
>> dma_virt_ops requires that all pages have a kernel virtual address.
>> Introduce a INFINIBAND_VIRT_DMA Kconfig symbol that depends on !HIGHMEM
>> and a large enough dma_addr_t, and make all three driver depend on the
>> new symbol.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>>   drivers/infiniband/Kconfig           | 6 ++++++
>>   drivers/infiniband/sw/rdmavt/Kconfig | 3 ++-
>>   drivers/infiniband/sw/rxe/Kconfig    | 2 +-
>>   drivers/infiniband/sw/siw/Kconfig    | 1 +
>>   4 files changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
>> index 32a51432ec4f73..81acaf5fb5be67 100644
>> --- a/drivers/infiniband/Kconfig
>> +++ b/drivers/infiniband/Kconfig
>> @@ -73,6 +73,12 @@ config INFINIBAND_ADDR_TRANS_CONFIGFS
>>   	  This allows the user to config the default GID type that the CM
>>   	  uses for each device, when initiaing new connections.
>>   +config INFINIBAND_VIRT_DMA
>> +	bool
>> +	default y
>> +	depends on !HIGHMEM
>> +	depends on !64BIT || ARCH_DMA_ADDR_T_64BIT
>
> Isn't that effectively always true now since 4965a68780c5? I had a quick 
> try of manually overriding CONFIG_ARCH_DMA_ADDR_T_64BIT in my .config, and 
> the build just forces it back to "=y".

True.  The guy who did the commit should have really told me about it :)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs
  2020-11-05 15:29     ` Robin Murphy
@ 2020-11-05 17:03       ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-11-05 17:03 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jason Gunthorpe, Christoph Hellwig, Zhu Yanjun,
	Dennis Dalessandro, linux-rdma, linux-pci, Mike Marciniszyn,
	iommu, Bjorn Helgaas, Bernard Metzler, Logan Gunthorpe

On Thu, Nov 05, 2020 at 03:29:58PM +0000, Robin Murphy wrote:
> It's commonly done using the "def_bool" shorthand. I fact, I think simply 
> "def_bool !HIGHMEM" would suffice for the fundamental definition here.

Indeed, I'll switch it over.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] PCI/P2PDMA: Remove the DMA_VIRT_OPS hacks
  2020-11-05 14:34   ` Jason Gunthorpe
@ 2020-11-05 17:08     ` Christoph Hellwig
  2020-11-05 17:23       ` Jason Gunthorpe
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2020-11-05 17:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Bjorn Helgaas, Bernard Metzler, Zhu Yanjun,
	Logan Gunthorpe, Dennis Dalessandro, Mike Marciniszyn,
	linux-rdma, linux-pci, iommu

On Thu, Nov 05, 2020 at 10:34:18AM -0400, Jason Gunthorpe wrote:
> The check is removed here, but I didn't see a matching check added to
> the IB side? Something like:
> 
> static int rdma_rw_map_sg(struct ib_device *dev, struct scatterlist *sg,
> 			  u32 sg_cnt, enum dma_data_direction dir)
> {
> 	if (is_pci_p2pdma_page(sg_page(sg))) {
> 		if (ib_uses_virt_dma(dev))
> 			return 0;
> 		return pci_p2pdma_map_sg(dev->dma_device, sg, sg_cnt, dir);
> 	}
> 	return ib_dma_map_sg(dev, sg, sg_cnt, dir);
> }

We should never get a P2P page into the rdma_rw_map_sg or other ib_dma*
routines for the software drivers, as their struct devices don't connect
to a PCІ device underneath, and thus no valid P2P distance can be
retourned.  That being said IFF we want to implement P2P for those
we'd need somethign like the above check, except that we still need
to cal ib_dma_map_sg, i.e.:

static int rdma_rw_map_sg(struct ib_device *dev, struct scatterlist *sg,
 			  u32 sg_cnt, enum dma_data_direction dir)
{
	if (is_pci_p2pdma_page(sg_page(sg)) && !ib_uses_virt_dma(dev))
		return pci_p2pdma_map_sg(dev->dma_device, sg, sg_cnt, dir);
	return ib_dma_map_sg(dev, sg, sg_cnt, dir);
}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/6] RDMA/core: remove use of dma_virt_ops
  2020-11-05 14:34   ` Jason Gunthorpe
@ 2020-11-05 17:09     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-11-05 17:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Bjorn Helgaas, Bernard Metzler, Zhu Yanjun,
	Logan Gunthorpe, Dennis Dalessandro, Mike Marciniszyn,
	linux-rdma, linux-pci, iommu

On Thu, Nov 05, 2020 at 10:34:15AM -0400, Jason Gunthorpe wrote:
> On Thu, Nov 05, 2020 at 08:42:02AM +0100, Christoph Hellwig wrote:
> > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> > index 5f8fd7976034e0..661e4a22b3cb81 100644
> > +++ b/include/rdma/ib_verbs.h
> > @@ -3950,6 +3950,8 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt)
> >   */
> >  static inline int ib_dma_mapping_error(struct ib_device *dev, u64 dma_addr)
> >  {
> > +	if (!dev->dma_device)
> > +		return 0;
> 
> How about:
> 
> static inline bool ib_uses_virt_dma(struct ib_device *dev)
> {
> 	return IS_ENABLED(CONFIG_INFINIBAND_VIRT_DMA) && !dev->dma_device;
> }
> 
> Which is a a little more guidance that driver authors need to set this
> config symbol.

Sure, I'll do that.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] PCI/P2PDMA: Remove the DMA_VIRT_OPS hacks
  2020-11-05 17:08     ` Christoph Hellwig
@ 2020-11-05 17:23       ` Jason Gunthorpe
  2020-11-05 17:29         ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Gunthorpe @ 2020-11-05 17:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, linux-rdma, linux-pci,
	iommu

On Thu, Nov 05, 2020 at 06:08:16PM +0100, Christoph Hellwig wrote:
> On Thu, Nov 05, 2020 at 10:34:18AM -0400, Jason Gunthorpe wrote:
> > The check is removed here, but I didn't see a matching check added to
> > the IB side? Something like:
> > 
> > static int rdma_rw_map_sg(struct ib_device *dev, struct scatterlist *sg,
> > 			  u32 sg_cnt, enum dma_data_direction dir)
> > {
> > 	if (is_pci_p2pdma_page(sg_page(sg))) {
> > 		if (ib_uses_virt_dma(dev))
> > 			return 0;
> > 		return pci_p2pdma_map_sg(dev->dma_device, sg, sg_cnt, dir);
> > 	}
> > 	return ib_dma_map_sg(dev, sg, sg_cnt, dir);
> > }
> 
> We should never get a P2P page into the rdma_rw_map_sg or other ib_dma*
> routines for the software drivers, as their struct devices don't connect
> to a PCІ device underneath, and thus no valid P2P distance can be
> retourned.  

But that depends on the calling driver doing this properly, and we
don't expose an API to get the PCI device of the struct ib_device
.. how does nvme even work here?

If we can't get here then why did you add the check to the unmap side?

Why did this code in p2pdma exist at all?

> That being said IFF we want to implement P2P for those we'd need
> somethign like the above check, except that we still need to cal
> ib_dma_map_sg, i.e.:
> 
> static int rdma_rw_map_sg(struct ib_device *dev, struct scatterlist *sg,
>  			  u32 sg_cnt, enum dma_data_direction dir)
> {
> 	if (is_pci_p2pdma_page(sg_page(sg)) && !ib_uses_virt_dma(dev))
> 		return pci_p2pdma_map_sg(dev->dma_device, sg, sg_cnt, dir);
> 	return ib_dma_map_sg(dev, sg, sg_cnt, dir);
> }

The SW drivers can't handle PCI pages at all, they are going to try to
memcpy them or something else not __iomem, so we really do need to
prevent P2P pages going into them.

Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] PCI/P2PDMA: Remove the DMA_VIRT_OPS hacks
  2020-11-05 17:23       ` Jason Gunthorpe
@ 2020-11-05 17:29         ` Christoph Hellwig
  2020-11-05 17:39           ` Jason Gunthorpe
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2020-11-05 17:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Bjorn Helgaas, Bernard Metzler, Zhu Yanjun,
	Logan Gunthorpe, Dennis Dalessandro, Mike Marciniszyn,
	linux-rdma, linux-pci, iommu

On Thu, Nov 05, 2020 at 01:23:57PM -0400, Jason Gunthorpe wrote:
> But that depends on the calling driver doing this properly, and we
> don't expose an API to get the PCI device of the struct ib_device
> .. how does nvme even work here?

The PCI p2pdma APIs walk the parent chains of a struct device until
they find a PCI device.  And the ib_device eventually ends up there.

> 
> If we can't get here then why did you add the check to the unmap side?

Because I added them to the map and unmap side, but forgot to commit
the map side.  Mostly to be prepared for the case where we could
end up there.  And thinking out loud I actually need to double check
rdmavt if that is true there as well.  It certainly is for rxe and
siw as I checked it on a live system.

> The SW drivers can't handle PCI pages at all, they are going to try to
> memcpy them or something else not __iomem, so we really do need to
> prevent P2P pages going into them.

Ok, let's prevent it for now.  And if someone wants to do it there
they have to do all the work.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] PCI/P2PDMA: Remove the DMA_VIRT_OPS hacks
  2020-11-05 17:29         ` Christoph Hellwig
@ 2020-11-05 17:39           ` Jason Gunthorpe
  2020-11-05 17:43             ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Gunthorpe @ 2020-11-05 17:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, linux-rdma, linux-pci,
	iommu

On Thu, Nov 05, 2020 at 06:29:21PM +0100, Christoph Hellwig wrote:
> On Thu, Nov 05, 2020 at 01:23:57PM -0400, Jason Gunthorpe wrote:
> > But that depends on the calling driver doing this properly, and we
> > don't expose an API to get the PCI device of the struct ib_device
> > .. how does nvme even work here?
> 
> The PCI p2pdma APIs walk the parent chains of a struct device until
> they find a PCI device.  And the ib_device eventually ends up there.

Hmm. This works for real devices like mlx5, but it means the three SW
devices will also resolve to a real PCI device that is not the DMA
device.

If nvme wants to do something like this it should walk from the
ibdev->dma_device, after these patches to make dma_device NULL.

eg rxe is like:

$ sudo rdma link add rxe0 type rxe netdev eth1

lrwxrwxrwx 1 root root 0 Nov  5 17:34 /sys/class/infiniband/rxe0/device -> ../../../0000:00:09.0/

I think this is a bug, these virtual devices should have NULL
parents...

> > If we can't get here then why did you add the check to the unmap side?
> 
> Because I added them to the map and unmap side, but forgot to commit
> the map side.  Mostly to be prepared for the case where we could
> end up there.  And thinking out loud I actually need to double check
> rdmavt if that is true there as well.  It certainly is for rxe and
> siw as I checked it on a live system.

rdmavt parents itself to the HFI/QIB PCI device, so the walk above
should also find a real PCI device

> > The SW drivers can't handle PCI pages at all, they are going to try to
> > memcpy them or something else not __iomem, so we really do need to
> > prevent P2P pages going into them.
> 
> Ok, let's prevent it for now.  And if someone wants to do it there
> they have to do all the work.

Yes, that is the safest - just block the SW devices from ever touch
P2P pages.

Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] PCI/P2PDMA: Remove the DMA_VIRT_OPS hacks
  2020-11-05 17:39           ` Jason Gunthorpe
@ 2020-11-05 17:43             ` Christoph Hellwig
  2020-11-05 17:56               ` Jason Gunthorpe
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2020-11-05 17:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Bjorn Helgaas, Bernard Metzler, Zhu Yanjun,
	Logan Gunthorpe, Dennis Dalessandro, Mike Marciniszyn,
	linux-rdma, linux-pci, iommu

On Thu, Nov 05, 2020 at 01:39:30PM -0400, Jason Gunthorpe wrote:
> Hmm. This works for real devices like mlx5, but it means the three SW
> devices will also resolve to a real PCI device that is not the DMA
> device.

Does it?  When I followed the links on my system rxe was a virtual
device without a physical parent.  Weird.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/6] RDMA/core: remove use of dma_virt_ops
  2020-11-05  7:42 ` [PATCH 3/6] RDMA/core: remove use of dma_virt_ops Christoph Hellwig
  2020-11-05 14:34   ` Jason Gunthorpe
@ 2020-11-05 17:52   ` Jason Gunthorpe
  2020-11-05 17:58     ` Jason Gunthorpe
  2020-11-06 10:01     ` Christoph Hellwig
  1 sibling, 2 replies; 26+ messages in thread
From: Jason Gunthorpe @ 2020-11-05 17:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, linux-rdma, linux-pci,
	iommu

On Thu, Nov 05, 2020 at 08:42:02AM +0100, Christoph Hellwig wrote:
> @@ -1341,7 +1322,14 @@ int ib_register_device(struct ib_device *device, const char *name,
>  	if (ret)
>  		return ret;
>  
> -	setup_dma_device(device, dma_device);
> +	/*
> +	 * If the caller does not provide a DMA capable device then the IB core
> +	 * will set up ib_sge and scatterlist structures that stash the kernel
> +	 * virtual address into the address field.
> +	 */
> +	device->dma_device = dma_device;
> +	WARN_ON(dma_device && !dma_device->dma_parms);

I noticed there were a couple of places expecting dma_device to be set
to !NULL:

drivers/infiniband/core/umem.c:                 dma_get_max_seg_size(device->dma_device), sg, npages,
drivers/nvme/host/rdma.c:       ctrl->ctrl.numa_node = dev_to_node(ctrl->device->dev->dma_device);
net/rds/ib.c:                                              device->dma_device,

No sure what to do about RDS..

Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] PCI/P2PDMA: Remove the DMA_VIRT_OPS hacks
  2020-11-05 17:43             ` Christoph Hellwig
@ 2020-11-05 17:56               ` Jason Gunthorpe
  0 siblings, 0 replies; 26+ messages in thread
From: Jason Gunthorpe @ 2020-11-05 17:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, linux-rdma, linux-pci,
	iommu

On Thu, Nov 05, 2020 at 06:43:06PM +0100, Christoph Hellwig wrote:
> On Thu, Nov 05, 2020 at 01:39:30PM -0400, Jason Gunthorpe wrote:
> > Hmm. This works for real devices like mlx5, but it means the three SW
> > devices will also resolve to a real PCI device that is not the DMA
> > device.
> 
> Does it?  When I followed the links on my system rxe was a virtual
> device without a physical parent.  Weird.

Yes, Bob also might have seen it oddly be virtual too.. No idea why.

This seems like the right thing to do, looks like virtual is only
possible if the netdev is virtual too..

diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 575e1a4ec82121..2b4238cdeab953 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -20,18 +20,6 @@
 
 static struct rxe_recv_sockets recv_sockets;
 
-struct device *rxe_dma_device(struct rxe_dev *rxe)
-{
-	struct net_device *ndev;
-
-	ndev = rxe->ndev;
-
-	if (is_vlan_dev(ndev))
-		ndev = vlan_dev_real_dev(ndev);
-
-	return ndev->dev.parent;
-}
-
 int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
 {
 	int err;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 209c7b3fab97a2..0cc4116d9a1fa6 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1134,7 +1134,6 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
 	dev->node_type = RDMA_NODE_IB_CA;
 	dev->phys_port_cnt = 1;
 	dev->num_comp_vectors = num_possible_cpus();
-	dev->dev.parent = rxe_dma_device(rxe);
 	dev->local_dma_lkey = 0;
 	addrconf_addr_eui48((unsigned char *)&dev->node_guid,
 			    rxe->ndev->dev_addr);

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/6] RDMA/core: remove use of dma_virt_ops
  2020-11-05 17:52   ` Jason Gunthorpe
@ 2020-11-05 17:58     ` Jason Gunthorpe
  2020-11-06 14:18       ` Christoph Hellwig
  2020-11-06 10:01     ` Christoph Hellwig
  1 sibling, 1 reply; 26+ messages in thread
From: Jason Gunthorpe @ 2020-11-05 17:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bjorn Helgaas, Bernard Metzler, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, linux-rdma, linux-pci,
	iommu

On Thu, Nov 05, 2020 at 01:52:53PM -0400, Jason Gunthorpe wrote:
> On Thu, Nov 05, 2020 at 08:42:02AM +0100, Christoph Hellwig wrote:
> > @@ -1341,7 +1322,14 @@ int ib_register_device(struct ib_device *device, const char *name,
> >  	if (ret)
> >  		return ret;
> >  
> > -	setup_dma_device(device, dma_device);
> > +	/*
> > +	 * If the caller does not provide a DMA capable device then the IB core
> > +	 * will set up ib_sge and scatterlist structures that stash the kernel
> > +	 * virtual address into the address field.
> > +	 */
> > +	device->dma_device = dma_device;
> > +	WARN_ON(dma_device && !dma_device->dma_parms);
> 
> I noticed there were a couple of places expecting dma_device to be set
> to !NULL:
> 
> drivers/infiniband/core/umem.c:                 dma_get_max_seg_size(device->dma_device), sg, npages,
> drivers/nvme/host/rdma.c:       ctrl->ctrl.numa_node = dev_to_node(ctrl->device->dev->dma_device);

Don't know much about NUMA, but do you think the ib device setup
should autocopy the numa node from the dma_device to the ib_device and
this usage should just refer to the ib_device?

Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re:  [PATCH 1/6] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs
  2020-11-05  7:41 remove dma_virt_ops v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2020-11-05  7:42 ` [PATCH 6/6] dma-mapping: remove dma_virt_ops Christoph Hellwig
@ 2020-11-05 20:32 ` Bernard Metzler
  6 siblings, 0 replies; 26+ messages in thread
From: Bernard Metzler @ 2020-11-05 20:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jason Gunthorpe, Bjorn Helgaas, Zhu Yanjun, Logan Gunthorpe,
	Dennis Dalessandro, Mike Marciniszyn, linux-rdma, linux-pci,
	iommu

-----"Christoph Hellwig" <hch@lst.de> wrote: -----

>To: "Jason Gunthorpe" <jgg@ziepe.ca>
>From: "Christoph Hellwig" <hch@lst.de>
>Date: 11/05/2020 08:46AM
>Cc: "Bjorn Helgaas" <bhelgaas@google.com>, "Bernard Metzler"
><bmt@zurich.ibm.com>, "Zhu Yanjun" <yanjunz@nvidia.com>, "Logan
>Gunthorpe" <logang@deltatee.com>, "Dennis Dalessandro"
><dennis.dalessandro@cornelisnetworks.com>, "Mike Marciniszyn"
><mike.marciniszyn@cornelisnetworks.com>, linux-rdma@vger.kernel.org,
>linux-pci@vger.kernel.org, iommu@lists.linux-foundation.org
>Subject: [EXTERNAL] [PATCH 1/6] RMDA/sw: don't allow drivers using
>dma_virt_ops on highmem configs
>
>dma_virt_ops requires that all pages have a kernel virtual address.
>Introduce a INFINIBAND_VIRT_DMA Kconfig symbol that depends on
>!HIGHMEM
>and a large enough dma_addr_t, and make all three driver depend on
>the
>new symbol.
>
>Signed-off-by: Christoph Hellwig <hch@lst.de>
>---
> drivers/infiniband/Kconfig           | 6 ++++++
> drivers/infiniband/sw/rdmavt/Kconfig | 3 ++-
> drivers/infiniband/sw/rxe/Kconfig    | 2 +-
> drivers/infiniband/sw/siw/Kconfig    | 1 +
> 4 files changed, 10 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
>index 32a51432ec4f73..81acaf5fb5be67 100644
>--- a/drivers/infiniband/Kconfig
>+++ b/drivers/infiniband/Kconfig
>@@ -73,6 +73,12 @@ config INFINIBAND_ADDR_TRANS_CONFIGFS
> 	  This allows the user to config the default GID type that the CM
> 	  uses for each device, when initiaing new connections.
> 
>+config INFINIBAND_VIRT_DMA
>+	bool
>+	default y
>+	depends on !HIGHMEM
>+	depends on !64BIT || ARCH_DMA_ADDR_T_64BIT
>+
> if INFINIBAND_USER_ACCESS || !INFINIBAND_USER_ACCESS
> source "drivers/infiniband/hw/mthca/Kconfig"
> source "drivers/infiniband/hw/qib/Kconfig"
>diff --git a/drivers/infiniband/sw/rdmavt/Kconfig
>b/drivers/infiniband/sw/rdmavt/Kconfig
>index 9ef5f5ce1ff6b0..c8e268082952b0 100644
>--- a/drivers/infiniband/sw/rdmavt/Kconfig
>+++ b/drivers/infiniband/sw/rdmavt/Kconfig
>@@ -1,7 +1,8 @@
> # SPDX-License-Identifier: GPL-2.0-only
> config INFINIBAND_RDMAVT
> 	tristate "RDMA verbs transport library"
>-	depends on X86_64 && ARCH_DMA_ADDR_T_64BIT
>+	depends on INFINIBAND_VIRT_DMA
>+	depends on X86_64
> 	depends on PCI
> 	select DMA_VIRT_OPS
> 	help
>diff --git a/drivers/infiniband/sw/rxe/Kconfig
>b/drivers/infiniband/sw/rxe/Kconfig
>index a0c6c7dfc1814f..8810bfa680495a 100644
>--- a/drivers/infiniband/sw/rxe/Kconfig
>+++ b/drivers/infiniband/sw/rxe/Kconfig
>@@ -2,7 +2,7 @@
> config RDMA_RXE
> 	tristate "Software RDMA over Ethernet (RoCE) driver"
> 	depends on INET && PCI && INFINIBAND
>-	depends on !64BIT || ARCH_DMA_ADDR_T_64BIT
>+	depends on INFINIBAND_VIRT_DMA
> 	select NET_UDP_TUNNEL
> 	select CRYPTO_CRC32
> 	select DMA_VIRT_OPS
>diff --git a/drivers/infiniband/sw/siw/Kconfig
>b/drivers/infiniband/sw/siw/Kconfig
>index b622fc62f2cd6d..3450ba5081df51 100644
>--- a/drivers/infiniband/sw/siw/Kconfig
>+++ b/drivers/infiniband/sw/siw/Kconfig
>@@ -1,6 +1,7 @@
> config RDMA_SIW
> 	tristate "Software RDMA over TCP/IP (iWARP) driver"
> 	depends on INET && INFINIBAND && LIBCRC32C
>+	depends on INFINIBAND_VIRT_DMA
> 	select DMA_VIRT_OPS
> 	help
> 	This driver implements the iWARP RDMA transport over
>-- 
>2.28.0
>
>


The complete patch set works for siw. Tested with siw as
nvmef target.

Tested-by: Bernard Metzler <bmt@zurich.ibm.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/6] RDMA/core: remove use of dma_virt_ops
  2020-11-05 17:52   ` Jason Gunthorpe
  2020-11-05 17:58     ` Jason Gunthorpe
@ 2020-11-06 10:01     ` Christoph Hellwig
  1 sibling, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-11-06 10:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Bjorn Helgaas, Bernard Metzler, Zhu Yanjun,
	Logan Gunthorpe, Dennis Dalessandro, Mike Marciniszyn,
	linux-rdma, linux-pci, iommu

On Thu, Nov 05, 2020 at 01:52:53PM -0400, Jason Gunthorpe wrote:
> On Thu, Nov 05, 2020 at 08:42:02AM +0100, Christoph Hellwig wrote:
> > @@ -1341,7 +1322,14 @@ int ib_register_device(struct ib_device *device, const char *name,
> >  	if (ret)
> >  		return ret;
> >  
> > -	setup_dma_device(device, dma_device);
> > +	/*
> > +	 * If the caller does not provide a DMA capable device then the IB core
> > +	 * will set up ib_sge and scatterlist structures that stash the kernel
> > +	 * virtual address into the address field.
> > +	 */
> > +	device->dma_device = dma_device;
> > +	WARN_ON(dma_device && !dma_device->dma_parms);
> 
> I noticed there were a couple of places expecting dma_device to be set
> to !NULL:
> 
> drivers/infiniband/core/umem.c:                 dma_get_max_seg_size(device->dma_device), sg, npages,

This needs to use ib_dma_max_seg_size.

> drivers/nvme/host/rdma.c:       ctrl->ctrl.numa_node = dev_to_node(ctrl->device->dev->dma_device);

> Don't know much about NUMA, but do you think the ib device setup
> should autocopy the numa node from the dma_device to the ib_device and
> this usage should just refer to the ib_device?

IMHO we could add a ib_device_get_numa_node API or something like that,
which uses dev_to_node on the DMA device is present and otherwise returns
-1.  If needed we can refine that later.

> net/rds/ib.c:                                              device->dma_device,
> 
> No sure what to do about RDS..

Yikes, this is completely broken.  We either need a wrapper for the
dma_pool API, or get rid of it.  Let me dig into that.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/6] RDMA/core: remove use of dma_virt_ops
  2020-11-05 17:58     ` Jason Gunthorpe
@ 2020-11-06 14:18       ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-11-06 14:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Bjorn Helgaas, Bernard Metzler, Zhu Yanjun,
	Logan Gunthorpe, Dennis Dalessandro, Mike Marciniszyn,
	linux-rdma, linux-pci, iommu

On Thu, Nov 05, 2020 at 01:58:16PM -0400, Jason Gunthorpe wrote:
> > I noticed there were a couple of places expecting dma_device to be set
> > to !NULL:
> > 
> > drivers/infiniband/core/umem.c:                 dma_get_max_seg_size(device->dma_device), sg, npages,
> > drivers/nvme/host/rdma.c:       ctrl->ctrl.numa_node = dev_to_node(ctrl->device->dev->dma_device);
> 
> Don't know much about NUMA, but do you think the ib device setup
> should autocopy the numa node from the dma_device to the ib_device and
> this usage should just refer to the ib_device?

FYI, I ended up just lifting the ibdev_to_node from rds to ib_verbs.h.  That uses
the parent pointer in the ib_device and should generally work ok.  If not we can
improve іt as we now have a proper abstraction.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-11-06 14:18 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-05  7:41 remove dma_virt_ops v2 Christoph Hellwig
2020-11-05  7:42 ` [PATCH 1/6] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs Christoph Hellwig
2020-11-05 12:15   ` Robin Murphy
2020-11-05 17:00     ` Christoph Hellwig
2020-11-05 14:41   ` Jason Gunthorpe
2020-11-05 15:29     ` Robin Murphy
2020-11-05 17:03       ` Christoph Hellwig
2020-11-05  7:42 ` [PATCH 2/6] RDMA/core: remove ib_dma_{alloc,free}_coherent Christoph Hellwig
2020-11-05  7:42 ` [PATCH 3/6] RDMA/core: remove use of dma_virt_ops Christoph Hellwig
2020-11-05 14:34   ` Jason Gunthorpe
2020-11-05 17:09     ` Christoph Hellwig
2020-11-05 17:52   ` Jason Gunthorpe
2020-11-05 17:58     ` Jason Gunthorpe
2020-11-06 14:18       ` Christoph Hellwig
2020-11-06 10:01     ` Christoph Hellwig
2020-11-05  7:42 ` [PATCH 4/6] PCI/P2PDMA: Remove the DMA_VIRT_OPS hacks Christoph Hellwig
2020-11-05 14:34   ` Jason Gunthorpe
2020-11-05 17:08     ` Christoph Hellwig
2020-11-05 17:23       ` Jason Gunthorpe
2020-11-05 17:29         ` Christoph Hellwig
2020-11-05 17:39           ` Jason Gunthorpe
2020-11-05 17:43             ` Christoph Hellwig
2020-11-05 17:56               ` Jason Gunthorpe
2020-11-05  7:42 ` [PATCH 5/6] PCI/P2PDMA: Cleanup __pci_p2pdma_map_sg a bit Christoph Hellwig
2020-11-05  7:42 ` [PATCH 6/6] dma-mapping: remove dma_virt_ops Christoph Hellwig
2020-11-05 20:32 ` [PATCH 1/6] RMDA/sw: don't allow drivers using dma_virt_ops on highmem configs Bernard Metzler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).