linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
@ 2020-09-22  8:27 Leon Romanovsky
  2020-09-22  8:58 ` Bernard Metzler
  2020-09-23  5:38 ` Christoph Hellwig
  0 siblings, 2 replies; 23+ messages in thread
From: Leon Romanovsky @ 2020-09-22  8:27 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Adit Ranadive, Ariel Elior, Bernard Metzler, Christian Benvenuti,
	Dennis Dalessandro, Devesh Sharma, Faisal Latif, Gal Pressman,
	Lijun Ou, linux-rdma, Michal Kalderon, Mike Marciniszyn,
	Naresh Kumar PBS, Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

From: Jason Gunthorpe <jgg@nvidia.com>

The current design is convoulted where the IB core makes assumptions
that a real DMA device will usually come from parent unless it looks
like the ib_device is partially setup for DMA, in which case the
ib_device itself is used for DMA, but somethings might still come
from the parent.

Make this clearer by having the caller explicitly specify what the
DMA device should be. The caller is always responsible to fully
setup the DMA device it specifies. If NULL is used then the
ib_device will be used as the DMA device, but the caller must
still set it up completely.

rvt is the only driver that did not fully setup the DMA device
before registering. Move the rvt specific code out of
setup_dma_device() into rvt and set the dma_mask's directly.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/device.c              | 73 +++++++------------
 drivers/infiniband/hw/bnxt_re/main.c          |  2 +-
 drivers/infiniband/hw/cxgb4/provider.c        |  3 +-
 drivers/infiniband/hw/efa/efa_main.c          |  2 +-
 drivers/infiniband/hw/hns/hns_roce_main.c     |  2 +-
 drivers/infiniband/hw/i40iw/i40iw_verbs.c     |  2 +-
 drivers/infiniband/hw/mlx4/main.c             |  3 +-
 drivers/infiniband/hw/mlx5/main.c             |  2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c  |  2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_main.c    |  3 +-
 drivers/infiniband/hw/qedr/main.c             |  2 +-
 drivers/infiniband/hw/usnic/usnic_ib_main.c   |  2 +-
 .../infiniband/hw/vmw_pvrdma/pvrdma_main.c    |  2 +-
 drivers/infiniband/sw/rdmavt/vt.c             |  8 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c         |  2 +-
 drivers/infiniband/sw/siw/siw_main.c          |  4 +-
 include/rdma/ib_verbs.h                       |  3 +-
 17 files changed, 52 insertions(+), 65 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index ec3becf85cac..417d93bbdaca 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1177,58 +1177,34 @@ static int assign_name(struct ib_device *device, const char *name)
 	return ret;
 }
 
-static void setup_dma_device(struct ib_device *device)
+static void setup_dma_device(struct ib_device *device,
+			     struct device *dma_device)
 {
-	struct device *parent = device->dev.parent;
-
-	WARN_ON_ONCE(device->dma_device);
-
-#ifdef CONFIG_DMA_OPS
-	if (device->dev.dma_ops) {
+	if (!dma_device) {
 		/*
-		 * The caller provided custom DMA operations. Copy the
-		 * DMA-related fields that are used by e.g. dma_alloc_coherent()
-		 * into device->dev.
+		 * If the caller does not provide a DMA capable device then the
+		 * IB device will be used. In this case the caller should fully
+		 * setup the ibdev for DMA. This usually means using
+		 * dma_virt_ops.
 		 */
-		device->dma_device = &device->dev;
-		if (!device->dev.dma_mask) {
-			if (parent)
-				device->dev.dma_mask = parent->dma_mask;
-			else
-				WARN_ON_ONCE(true);
-		}
-		if (!device->dev.coherent_dma_mask) {
-			if (parent)
-				device->dev.coherent_dma_mask =
-					parent->coherent_dma_mask;
-			else
-				WARN_ON_ONCE(true);
-		}
-	} else
-#endif /* CONFIG_DMA_OPS */
-	{
+#ifdef CONFIG_DMA_OPS
+		if (WARN_ON(!device->dev.dma_ops))
+			return;
+#endif
+		if (WARN_ON(!device->dev.dma_parms))
+			return;
+
+		dma_device = &device->dev;
+	} else {
+		device->dev.dma_parms = dma_device->dma_parms;
 		/*
-		 * The caller did not provide custom DMA operations. Use the
-		 * DMA mapping operations of the parent device.
+		 * Auto setup the segment size if a DMA device was passed in.
+		 * The PCI core sets the maximum segment size to 64 KB. Increase
+		 * this parameter to 2 GB.
 		 */
-		WARN_ON_ONCE(!parent);
-		device->dma_device = parent;
-	}
-
-	if (!device->dev.dma_parms) {
-		if (parent) {
-			/*
-			 * The caller did not provide DMA parameters, so
-			 * 'parent' probably represents a PCI device. The PCI
-			 * core sets the maximum segment size to 64
-			 * KB. Increase this parameter to 2 GB.
-			 */
-			device->dev.dma_parms = parent->dma_parms;
-			dma_set_max_seg_size(device->dma_device, SZ_2G);
-		} else {
-			WARN_ON_ONCE(true);
-		}
+		dma_set_max_seg_size(dma_device, SZ_2G);
 	}
+	device->dma_device = dma_device;
 }
 
 /*
@@ -1241,7 +1217,6 @@ static int setup_device(struct ib_device *device)
 	struct ib_udata uhw = {.outlen = 0, .inlen = 0};
 	int ret;
 
-	setup_dma_device(device);
 	ib_device_check_mandatory(device);
 
 	ret = setup_port_data(device);
@@ -1361,7 +1336,8 @@ static void prevent_dealloc_device(struct ib_device *ib_dev)
  * asynchronously then the device pointer may become freed as soon as this
  * function returns.
  */
-int ib_register_device(struct ib_device *device, const char *name)
+int ib_register_device(struct ib_device *device, const char *name,
+		       struct device *dma_device)
 {
 	int ret;
 
@@ -1369,6 +1345,7 @@ int ib_register_device(struct ib_device *device, const char *name)
 	if (ret)
 		return ret;
 
+	setup_dma_device(device, dma_device);
 	ret = setup_device(device);
 	if (ret)
 		return ret;
diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index 53aee5a42ab8..b3bc62021039 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -736,7 +736,7 @@ static int bnxt_re_register_ib(struct bnxt_re_dev *rdev)
 	if (ret)
 		return ret;
 
-	return ib_register_device(ibdev, "bnxt_re%d");
+	return ib_register_device(ibdev, "bnxt_re%d", &rdev->en_dev->pdev->dev);
 }
 
 static void bnxt_re_dev_remove(struct bnxt_re_dev *rdev)
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 4b76f2f3f4e4..5f4f3abf41e4 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -570,7 +570,8 @@ void c4iw_register_device(struct work_struct *work)
 	ret = set_netdevs(&dev->ibdev, &dev->rdev);
 	if (ret)
 		goto err_dealloc_ctx;
-	ret = ib_register_device(&dev->ibdev, "cxgb4_%d");
+	ret = ib_register_device(&dev->ibdev, "cxgb4_%d",
+				 &dev->rdev.lldi.pdev->dev);
 	if (ret)
 		goto err_dealloc_ctx;
 	return;
diff --git a/drivers/infiniband/hw/efa/efa_main.c b/drivers/infiniband/hw/efa/efa_main.c
index 92d701146320..4de5be3e1dfe 100644
--- a/drivers/infiniband/hw/efa/efa_main.c
+++ b/drivers/infiniband/hw/efa/efa_main.c
@@ -331,7 +331,7 @@ static int efa_ib_device_add(struct efa_dev *dev)
 
 	ib_set_device_ops(&dev->ibdev, &efa_dev_ops);
 
-	err = ib_register_device(&dev->ibdev, "efa_%d");
+	err = ib_register_device(&dev->ibdev, "efa_%d", &pdev->dev);
 	if (err)
 		goto err_release_doorbell_bar;
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
index 2b4d75733e72..1b5f895d7daf 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -547,7 +547,7 @@ static int hns_roce_register_device(struct hns_roce_dev *hr_dev)
 		if (ret)
 			return ret;
 	}
-	ret = ib_register_device(ib_dev, "hns_%d");
+	ret = ib_register_device(ib_dev, "hns_%d", dev);
 	if (ret) {
 		dev_err(dev, "ib_register_device failed!\n");
 		return ret;
diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
index e53f6c0dc12e..945d30a86bbc 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
@@ -2748,7 +2748,7 @@ int i40iw_register_rdma_device(struct i40iw_device *iwdev)
 	if (ret)
 		goto error;
 
-	ret = ib_register_device(&iwibdev->ibdev, "i40iw%d");
+	ret = ib_register_device(&iwibdev->ibdev, "i40iw%d", &iwdev->hw.pcidev->dev);
 	if (ret)
 		goto error;
 
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 753c70402498..cd0fba6b0964 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2841,7 +2841,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 		goto err_steer_free_bitmap;
 
 	rdma_set_device_sysfs_group(&ibdev->ib_dev, &mlx4_attr_group);
-	if (ib_register_device(&ibdev->ib_dev, "mlx4_%d"))
+	if (ib_register_device(&ibdev->ib_dev, "mlx4_%d",
+			       &dev->persist->pdev->dev))
 		goto err_diag_counters;
 
 	if (mlx4_ib_mad_init(ibdev))
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 3ae681a6ae3b..bca57c7661eb 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4404,7 +4404,7 @@ static int mlx5_ib_stage_ib_reg_init(struct mlx5_ib_dev *dev)
 		name = "mlx5_%d";
 	else
 		name = "mlx5_bond_%d";
-	return ib_register_device(&dev->ib_dev, name);
+	return ib_register_device(&dev->ib_dev, name, &dev->mdev->pdev->dev);
 }
 
 static void mlx5_ib_stage_pre_ib_reg_umr_cleanup(struct mlx5_ib_dev *dev)
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 31b558ff8218..c4d9cdc4ee97 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -1206,7 +1206,7 @@ int mthca_register_device(struct mthca_dev *dev)
 	mutex_init(&dev->cap_mask_mutex);
 
 	rdma_set_device_sysfs_group(&dev->ib_dev, &mthca_attr_group);
-	ret = ib_register_device(&dev->ib_dev, "mthca%d");
+	ret = ib_register_device(&dev->ib_dev, "mthca%d", &dev->pdev->dev);
 	if (ret)
 		return ret;
 
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index d8c47d24d6d6..60416186f1d0 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -255,7 +255,8 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)
 	if (ret)
 		return ret;
 
-	return ib_register_device(&dev->ibdev, "ocrdma%d");
+	return ib_register_device(&dev->ibdev, "ocrdma%d",
+				  &dev->nic_info.pdev->dev);
 }
 
 static int ocrdma_alloc_resources(struct ocrdma_dev *dev)
diff --git a/drivers/infiniband/hw/qedr/main.c b/drivers/infiniband/hw/qedr/main.c
index 7c0aac3e635b..464becdd41f7 100644
--- a/drivers/infiniband/hw/qedr/main.c
+++ b/drivers/infiniband/hw/qedr/main.c
@@ -293,7 +293,7 @@ static int qedr_register_device(struct qedr_dev *dev)
 	if (rc)
 		return rc;
 
-	return ib_register_device(&dev->ibdev, "qedr%d");
+	return ib_register_device(&dev->ibdev, "qedr%d", &dev->pdev->dev);
 }
 
 /* This function allocates fast-path status block memory */
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_main.c b/drivers/infiniband/hw/usnic/usnic_ib_main.c
index 462ed71abf53..6c23a5472168 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_main.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_main.c
@@ -425,7 +425,7 @@ static void *usnic_ib_device_add(struct pci_dev *dev)
 	if (ret)
 		goto err_fwd_dealloc;
 
-	if (ib_register_device(&us_ibdev->ib_dev, "usnic_%d"))
+	if (ib_register_device(&us_ibdev->ib_dev, "usnic_%d", &dev->dev))
 		goto err_fwd_dealloc;
 
 	usnic_fwd_set_mtu(us_ibdev->ufdev, us_ibdev->netdev->mtu);
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
index 780fd2dfc07e..5b2c94441125 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
@@ -270,7 +270,7 @@ static int pvrdma_register_device(struct pvrdma_dev *dev)
 	spin_lock_init(&dev->srq_tbl_lock);
 	rdma_set_device_sysfs_group(&dev->ib_dev, &pvrdma_attr_group);
 
-	ret = ib_register_device(&dev->ib_dev, "vmw_pvrdma%d");
+	ret = ib_register_device(&dev->ib_dev, "vmw_pvrdma%d", &dev->pdev->dev);
 	if (ret)
 		goto err_srq_free;
 
diff --git a/drivers/infiniband/sw/rdmavt/vt.c b/drivers/infiniband/sw/rdmavt/vt.c
index f904bb34477a..2f117ac11c8b 100644
--- a/drivers/infiniband/sw/rdmavt/vt.c
+++ b/drivers/infiniband/sw/rdmavt/vt.c
@@ -581,7 +581,11 @@ int rvt_register_device(struct rvt_dev_info *rdi)
 	spin_lock_init(&rdi->n_cqs_lock);
 
 	/* DMA Operations */
-	rdi->ibdev.dev.dma_ops = rdi->ibdev.dev.dma_ops ? : &dma_virt_ops;
+	rdi->ibdev.dev.dma_ops = &dma_virt_ops;
+	rdi->ibdev.dev.dma_parms = rdi->ibdev.dev.parent->dma_parms;
+	rdi->ibdev.dev.dma_mask = rdi->ibdev.dev.parent->dma_mask;
+	rdi->ibdev.dev.coherent_dma_mask =
+		rdi->ibdev.dev.parent->coherent_dma_mask;
 
 	/* Protection Domain */
 	spin_lock_init(&rdi->n_pds_lock);
@@ -629,7 +633,7 @@ int rvt_register_device(struct rvt_dev_info *rdi)
 		rdi->ibdev.num_comp_vectors = 1;
 
 	/* We are now good to announce we exist */
-	ret = ib_register_device(&rdi->ibdev, dev_name(&rdi->ibdev.dev));
+	ret = ib_register_device(&rdi->ibdev, dev_name(&rdi->ibdev.dev), NULL);
 	if (ret) {
 		rvt_pr_err(rdi, "Failed to register driver with ib core.\n");
 		goto bail_wss;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index f368dc16281a..37fee72755be 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1182,7 +1182,7 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
 	rxe->tfm = tfm;
 
 	rdma_set_device_sysfs_group(dev, &rxe_attr_group);
-	err = ib_register_device(dev, ibdev_name);
+	err = ib_register_device(dev, ibdev_name, NULL);
 	if (err)
 		pr_warn("%s failed with error %d\n", __func__, err);
 
diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
index d862bec84376..0362d57b4db8 100644
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -69,7 +69,7 @@ static int siw_device_register(struct siw_device *sdev, const char *name)
 
 	sdev->vendor_part_id = dev_id++;
 
-	rv = ib_register_device(base_dev, name);
+	rv = ib_register_device(base_dev, name, NULL);
 	if (rv) {
 		pr_warn("siw: device registration error %d\n", rv);
 		return rv;
@@ -386,6 +386,8 @@ static struct siw_device *siw_device_create(struct net_device *netdev)
 	base_dev->dev.dma_parms = &sdev->dma_parms;
 	sdev->dma_parms = (struct device_dma_parameters)
 		{ .max_segment_size = SZ_2G };
+	dma_coerce_mask_and_coherent(&base_dev->dev,
+				     dma_get_required_mask(&base_dev->dev));
 	base_dev->num_comp_vectors = num_possible_cpus();
 
 	xa_init_flags(&sdev->qp_xa, XA_FLAGS_ALLOC1);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index b585db4ef9b4..7fb09a36b654 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2788,7 +2788,8 @@ void ib_dealloc_device(struct ib_device *device);
 
 void ib_get_device_fw_str(struct ib_device *device, char *str);
 
-int ib_register_device(struct ib_device *device, const char *name);
+int ib_register_device(struct ib_device *device, const char *name,
+		       struct device *dma_device);
 void ib_unregister_device(struct ib_device *device);
 void ib_unregister_driver(enum rdma_driver_id driver_id);
 void ib_unregister_device_and_put(struct ib_device *device);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re:  [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-22  8:27 [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device Leon Romanovsky
@ 2020-09-22  8:58 ` Bernard Metzler
  2020-09-22 10:14   ` Leon Romanovsky
  2020-09-22 14:22   ` Bernard Metzler
  2020-09-23  5:38 ` Christoph Hellwig
  1 sibling, 2 replies; 23+ messages in thread
From: Bernard Metzler @ 2020-09-22  8:58 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Jason Gunthorpe, Adit Ranadive, Ariel Elior,
	Christian Benvenuti, Dennis Dalessandro, Devesh Sharma,
	Faisal Latif, Gal Pressman, Lijun Ou, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Naresh Kumar PBS,
	Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

-----"Leon Romanovsky" <leon@kernel.org> wrote: -----

>To: "Doug Ledford" <dledford@redhat.com>, "Jason Gunthorpe"
><jgg@nvidia.com>
>From: "Leon Romanovsky" <leon@kernel.org>
>Date: 09/22/2020 10:28AM
>Cc: "Adit Ranadive" <aditr@vmware.com>, "Ariel Elior"
><aelior@marvell.com>, "Bernard Metzler" <bmt@zurich.ibm.com>,
>"Christian Benvenuti" <benve@cisco.com>, "Dennis Dalessandro"
><dennis.dalessandro@intel.com>, "Devesh Sharma"
><devesh.sharma@broadcom.com>, "Faisal Latif"
><faisal.latif@intel.com>, "Gal Pressman" <galpress@amazon.com>,
>"Lijun Ou" <oulijun@huawei.com>, linux-rdma@vger.kernel.org, "Michal
>Kalderon" <mkalderon@marvell.com>, "Mike Marciniszyn"
><mike.marciniszyn@intel.com>, "Naresh Kumar PBS"
><nareshkumar.pbs@broadcom.com>, "Nelson Escobar"
><neescoba@cisco.com>, "Parav Pandit" <parav@nvidia.com>, "Parvi
>Kaustubhi" <pkaustub@cisco.com>, "Potnuri Bharat Teja"
><bharat@chelsio.com>, "Selvin Xavier" <selvin.xavier@broadcom.com>,
>"Shiraz Saleem" <shiraz.saleem@intel.com>, "Somnath Kotur"
><somnath.kotur@broadcom.com>, "Sriharsha Basavapatna"
><sriharsha.basavapatna@broadcom.com>, "VMware PV-Drivers"
><pv-drivers@vmware.com>, "Weihang Li" <liweihang@huawei.com>, "Wei
>Hu(Xavier)" <huwei87@hisilicon.com>, "Yishai Hadas"
><yishaih@nvidia.com>, "Zhu Yanjun" <yanjunz@nvidia.com>
>Subject: [EXTERNAL] [PATCH rdma-next] RDMA: Explicitly pass in the
>dma_device to ib_register_device
>
>From: Jason Gunthorpe <jgg@nvidia.com>
>
>The current design is convoulted where the IB core makes assumptions
>that a real DMA device will usually come from parent unless it looks
>like the ib_device is partially setup for DMA, in which case the
>ib_device itself is used for DMA, but somethings might still come
>from the parent.
>
>Make this clearer by having the caller explicitly specify what the
>DMA device should be. The caller is always responsible to fully
>setup the DMA device it specifies. If NULL is used then the
>ib_device will be used as the DMA device, but the caller must
>still set it up completely.
>
>rvt is the only driver that did not fully setup the DMA device
>before registering. Move the rvt specific code out of
>setup_dma_device() into rvt and set the dma_mask's directly.
>
>Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>Reviewed-by: Parav Pandit <parav@nvidia.com>
>Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>---
> drivers/infiniband/core/device.c              | 73
>+++++++------------
> drivers/infiniband/hw/bnxt_re/main.c          |  2 +-
> drivers/infiniband/hw/cxgb4/provider.c        |  3 +-
> drivers/infiniband/hw/efa/efa_main.c          |  2 +-
> drivers/infiniband/hw/hns/hns_roce_main.c     |  2 +-
> drivers/infiniband/hw/i40iw/i40iw_verbs.c     |  2 +-
> drivers/infiniband/hw/mlx4/main.c             |  3 +-
> drivers/infiniband/hw/mlx5/main.c             |  2 +-
> drivers/infiniband/hw/mthca/mthca_provider.c  |  2 +-
> drivers/infiniband/hw/ocrdma/ocrdma_main.c    |  3 +-
> drivers/infiniband/hw/qedr/main.c             |  2 +-
> drivers/infiniband/hw/usnic/usnic_ib_main.c   |  2 +-
> .../infiniband/hw/vmw_pvrdma/pvrdma_main.c    |  2 +-
> drivers/infiniband/sw/rdmavt/vt.c             |  8 +-
> drivers/infiniband/sw/rxe/rxe_verbs.c         |  2 +-
> drivers/infiniband/sw/siw/siw_main.c          |  4 +-
> include/rdma/ib_verbs.h                       |  3 +-
> 17 files changed, 52 insertions(+), 65 deletions(-)
>
>diff --git a/drivers/infiniband/core/device.c
>b/drivers/infiniband/core/device.c
>index ec3becf85cac..417d93bbdaca 100644
>--- a/drivers/infiniband/core/device.c
>+++ b/drivers/infiniband/core/device.c
>@@ -1177,58 +1177,34 @@ static int assign_name(struct ib_device
>*device, const char *name)
> 	return ret;
> }
> 
>-static void setup_dma_device(struct ib_device *device)
>+static void setup_dma_device(struct ib_device *device,
>+			     struct device *dma_device)
> {
>-	struct device *parent = device->dev.parent;
>-
>-	WARN_ON_ONCE(device->dma_device);
>-
>-#ifdef CONFIG_DMA_OPS
>-	if (device->dev.dma_ops) {
>+	if (!dma_device) {
> 		/*
>-		 * The caller provided custom DMA operations. Copy the
>-		 * DMA-related fields that are used by e.g. dma_alloc_coherent()
>-		 * into device->dev.
>+		 * If the caller does not provide a DMA capable device then the
>+		 * IB device will be used. In this case the caller should fully
>+		 * setup the ibdev for DMA. This usually means using
>+		 * dma_virt_ops.
> 		 */
>-		device->dma_device = &device->dev;
>-		if (!device->dev.dma_mask) {
>-			if (parent)
>-				device->dev.dma_mask = parent->dma_mask;
>-			else
>-				WARN_ON_ONCE(true);
>-		}
>-		if (!device->dev.coherent_dma_mask) {
>-			if (parent)
>-				device->dev.coherent_dma_mask =
>-					parent->coherent_dma_mask;
>-			else
>-				WARN_ON_ONCE(true);
>-		}
>-	} else
>-#endif /* CONFIG_DMA_OPS */
>-	{
>+#ifdef CONFIG_DMA_OPS
>+		if (WARN_ON(!device->dev.dma_ops))
>+			return;
>+#endif
>+		if (WARN_ON(!device->dev.dma_parms))
>+			return;
>+
>+		dma_device = &device->dev;
>+	} else {
>+		device->dev.dma_parms = dma_device->dma_parms;
> 		/*
>-		 * The caller did not provide custom DMA operations. Use the
>-		 * DMA mapping operations of the parent device.
>+		 * Auto setup the segment size if a DMA device was passed in.
>+		 * The PCI core sets the maximum segment size to 64 KB. Increase
>+		 * this parameter to 2 GB.
> 		 */
>-		WARN_ON_ONCE(!parent);
>-		device->dma_device = parent;
>-	}
>-
>-	if (!device->dev.dma_parms) {
>-		if (parent) {
>-			/*
>-			 * The caller did not provide DMA parameters, so
>-			 * 'parent' probably represents a PCI device. The PCI
>-			 * core sets the maximum segment size to 64
>-			 * KB. Increase this parameter to 2 GB.
>-			 */
>-			device->dev.dma_parms = parent->dma_parms;
>-			dma_set_max_seg_size(device->dma_device, SZ_2G);
>-		} else {
>-			WARN_ON_ONCE(true);
>-		}
>+		dma_set_max_seg_size(dma_device, SZ_2G);
> 	}
>+	device->dma_device = dma_device;
> }
> 
> /*
>@@ -1241,7 +1217,6 @@ static int setup_device(struct ib_device
>*device)
> 	struct ib_udata uhw = {.outlen = 0, .inlen = 0};
> 	int ret;
> 
>-	setup_dma_device(device);
> 	ib_device_check_mandatory(device);
> 
> 	ret = setup_port_data(device);
>@@ -1361,7 +1336,8 @@ static void prevent_dealloc_device(struct
>ib_device *ib_dev)
>  * asynchronously then the device pointer may become freed as soon
>as this
>  * function returns.
>  */
>-int ib_register_device(struct ib_device *device, const char *name)
>+int ib_register_device(struct ib_device *device, const char *name,
>+		       struct device *dma_device)
> {
> 	int ret;
> 
>@@ -1369,6 +1345,7 @@ int ib_register_device(struct ib_device
>*device, const char *name)
> 	if (ret)
> 		return ret;
> 
>+	setup_dma_device(device, dma_device);
> 	ret = setup_device(device);
> 	if (ret)
> 		return ret;
>diff --git a/drivers/infiniband/hw/bnxt_re/main.c
>b/drivers/infiniband/hw/bnxt_re/main.c
>index 53aee5a42ab8..b3bc62021039 100644
>--- a/drivers/infiniband/hw/bnxt_re/main.c
>+++ b/drivers/infiniband/hw/bnxt_re/main.c
>@@ -736,7 +736,7 @@ static int bnxt_re_register_ib(struct bnxt_re_dev
>*rdev)
> 	if (ret)
> 		return ret;
> 
>-	return ib_register_device(ibdev, "bnxt_re%d");
>+	return ib_register_device(ibdev, "bnxt_re%d",
>&rdev->en_dev->pdev->dev);
> }
> 
> static void bnxt_re_dev_remove(struct bnxt_re_dev *rdev)
>diff --git a/drivers/infiniband/hw/cxgb4/provider.c
>b/drivers/infiniband/hw/cxgb4/provider.c
>index 4b76f2f3f4e4..5f4f3abf41e4 100644
>--- a/drivers/infiniband/hw/cxgb4/provider.c
>+++ b/drivers/infiniband/hw/cxgb4/provider.c
>@@ -570,7 +570,8 @@ void c4iw_register_device(struct work_struct
>*work)
> 	ret = set_netdevs(&dev->ibdev, &dev->rdev);
> 	if (ret)
> 		goto err_dealloc_ctx;
>-	ret = ib_register_device(&dev->ibdev, "cxgb4_%d");
>+	ret = ib_register_device(&dev->ibdev, "cxgb4_%d",
>+				 &dev->rdev.lldi.pdev->dev);
> 	if (ret)
> 		goto err_dealloc_ctx;
> 	return;
>diff --git a/drivers/infiniband/hw/efa/efa_main.c
>b/drivers/infiniband/hw/efa/efa_main.c
>index 92d701146320..4de5be3e1dfe 100644
>--- a/drivers/infiniband/hw/efa/efa_main.c
>+++ b/drivers/infiniband/hw/efa/efa_main.c
>@@ -331,7 +331,7 @@ static int efa_ib_device_add(struct efa_dev *dev)
> 
> 	ib_set_device_ops(&dev->ibdev, &efa_dev_ops);
> 
>-	err = ib_register_device(&dev->ibdev, "efa_%d");
>+	err = ib_register_device(&dev->ibdev, "efa_%d", &pdev->dev);
> 	if (err)
> 		goto err_release_doorbell_bar;
> 
>diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c
>b/drivers/infiniband/hw/hns/hns_roce_main.c
>index 2b4d75733e72..1b5f895d7daf 100644
>--- a/drivers/infiniband/hw/hns/hns_roce_main.c
>+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
>@@ -547,7 +547,7 @@ static int hns_roce_register_device(struct
>hns_roce_dev *hr_dev)
> 		if (ret)
> 			return ret;
> 	}
>-	ret = ib_register_device(ib_dev, "hns_%d");
>+	ret = ib_register_device(ib_dev, "hns_%d", dev);
> 	if (ret) {
> 		dev_err(dev, "ib_register_device failed!\n");
> 		return ret;
>diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
>b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
>index e53f6c0dc12e..945d30a86bbc 100644
>--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
>+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
>@@ -2748,7 +2748,7 @@ int i40iw_register_rdma_device(struct
>i40iw_device *iwdev)
> 	if (ret)
> 		goto error;
> 
>-	ret = ib_register_device(&iwibdev->ibdev, "i40iw%d");
>+	ret = ib_register_device(&iwibdev->ibdev, "i40iw%d",
>&iwdev->hw.pcidev->dev);
> 	if (ret)
> 		goto error;
> 
>diff --git a/drivers/infiniband/hw/mlx4/main.c
>b/drivers/infiniband/hw/mlx4/main.c
>index 753c70402498..cd0fba6b0964 100644
>--- a/drivers/infiniband/hw/mlx4/main.c
>+++ b/drivers/infiniband/hw/mlx4/main.c
>@@ -2841,7 +2841,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
> 		goto err_steer_free_bitmap;
> 
> 	rdma_set_device_sysfs_group(&ibdev->ib_dev, &mlx4_attr_group);
>-	if (ib_register_device(&ibdev->ib_dev, "mlx4_%d"))
>+	if (ib_register_device(&ibdev->ib_dev, "mlx4_%d",
>+			       &dev->persist->pdev->dev))
> 		goto err_diag_counters;
> 
> 	if (mlx4_ib_mad_init(ibdev))
>diff --git a/drivers/infiniband/hw/mlx5/main.c
>b/drivers/infiniband/hw/mlx5/main.c
>index 3ae681a6ae3b..bca57c7661eb 100644
>--- a/drivers/infiniband/hw/mlx5/main.c
>+++ b/drivers/infiniband/hw/mlx5/main.c
>@@ -4404,7 +4404,7 @@ static int mlx5_ib_stage_ib_reg_init(struct
>mlx5_ib_dev *dev)
> 		name = "mlx5_%d";
> 	else
> 		name = "mlx5_bond_%d";
>-	return ib_register_device(&dev->ib_dev, name);
>+	return ib_register_device(&dev->ib_dev, name,
>&dev->mdev->pdev->dev);
> }
> 
> static void mlx5_ib_stage_pre_ib_reg_umr_cleanup(struct mlx5_ib_dev
>*dev)
>diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c
>b/drivers/infiniband/hw/mthca/mthca_provider.c
>index 31b558ff8218..c4d9cdc4ee97 100644
>--- a/drivers/infiniband/hw/mthca/mthca_provider.c
>+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
>@@ -1206,7 +1206,7 @@ int mthca_register_device(struct mthca_dev
>*dev)
> 	mutex_init(&dev->cap_mask_mutex);
> 
> 	rdma_set_device_sysfs_group(&dev->ib_dev, &mthca_attr_group);
>-	ret = ib_register_device(&dev->ib_dev, "mthca%d");
>+	ret = ib_register_device(&dev->ib_dev, "mthca%d", &dev->pdev->dev);
> 	if (ret)
> 		return ret;
> 
>diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
>b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
>index d8c47d24d6d6..60416186f1d0 100644
>--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
>+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
>@@ -255,7 +255,8 @@ static int ocrdma_register_device(struct
>ocrdma_dev *dev)
> 	if (ret)
> 		return ret;
> 
>-	return ib_register_device(&dev->ibdev, "ocrdma%d");
>+	return ib_register_device(&dev->ibdev, "ocrdma%d",
>+				  &dev->nic_info.pdev->dev);
> }
> 
> static int ocrdma_alloc_resources(struct ocrdma_dev *dev)
>diff --git a/drivers/infiniband/hw/qedr/main.c
>b/drivers/infiniband/hw/qedr/main.c
>index 7c0aac3e635b..464becdd41f7 100644
>--- a/drivers/infiniband/hw/qedr/main.c
>+++ b/drivers/infiniband/hw/qedr/main.c
>@@ -293,7 +293,7 @@ static int qedr_register_device(struct qedr_dev
>*dev)
> 	if (rc)
> 		return rc;
> 
>-	return ib_register_device(&dev->ibdev, "qedr%d");
>+	return ib_register_device(&dev->ibdev, "qedr%d", &dev->pdev->dev);
> }
> 
> /* This function allocates fast-path status block memory */
>diff --git a/drivers/infiniband/hw/usnic/usnic_ib_main.c
>b/drivers/infiniband/hw/usnic/usnic_ib_main.c
>index 462ed71abf53..6c23a5472168 100644
>--- a/drivers/infiniband/hw/usnic/usnic_ib_main.c
>+++ b/drivers/infiniband/hw/usnic/usnic_ib_main.c
>@@ -425,7 +425,7 @@ static void *usnic_ib_device_add(struct pci_dev
>*dev)
> 	if (ret)
> 		goto err_fwd_dealloc;
> 
>-	if (ib_register_device(&us_ibdev->ib_dev, "usnic_%d"))
>+	if (ib_register_device(&us_ibdev->ib_dev, "usnic_%d", &dev->dev))
> 		goto err_fwd_dealloc;
> 
> 	usnic_fwd_set_mtu(us_ibdev->ufdev, us_ibdev->netdev->mtu);
>diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
>b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
>index 780fd2dfc07e..5b2c94441125 100644
>--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
>+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
>@@ -270,7 +270,7 @@ static int pvrdma_register_device(struct
>pvrdma_dev *dev)
> 	spin_lock_init(&dev->srq_tbl_lock);
> 	rdma_set_device_sysfs_group(&dev->ib_dev, &pvrdma_attr_group);
> 
>-	ret = ib_register_device(&dev->ib_dev, "vmw_pvrdma%d");
>+	ret = ib_register_device(&dev->ib_dev, "vmw_pvrdma%d",
>&dev->pdev->dev);
> 	if (ret)
> 		goto err_srq_free;
> 
>diff --git a/drivers/infiniband/sw/rdmavt/vt.c
>b/drivers/infiniband/sw/rdmavt/vt.c
>index f904bb34477a..2f117ac11c8b 100644
>--- a/drivers/infiniband/sw/rdmavt/vt.c
>+++ b/drivers/infiniband/sw/rdmavt/vt.c
>@@ -581,7 +581,11 @@ int rvt_register_device(struct rvt_dev_info
>*rdi)
> 	spin_lock_init(&rdi->n_cqs_lock);
> 
> 	/* DMA Operations */
>-	rdi->ibdev.dev.dma_ops = rdi->ibdev.dev.dma_ops ? : &dma_virt_ops;
>+	rdi->ibdev.dev.dma_ops = &dma_virt_ops;
>+	rdi->ibdev.dev.dma_parms = rdi->ibdev.dev.parent->dma_parms;
>+	rdi->ibdev.dev.dma_mask = rdi->ibdev.dev.parent->dma_mask;
>+	rdi->ibdev.dev.coherent_dma_mask =
>+		rdi->ibdev.dev.parent->coherent_dma_mask;
> 
> 	/* Protection Domain */
> 	spin_lock_init(&rdi->n_pds_lock);
>@@ -629,7 +633,7 @@ int rvt_register_device(struct rvt_dev_info *rdi)
> 		rdi->ibdev.num_comp_vectors = 1;
> 
> 	/* We are now good to announce we exist */
>-	ret = ib_register_device(&rdi->ibdev, dev_name(&rdi->ibdev.dev));
>+	ret = ib_register_device(&rdi->ibdev, dev_name(&rdi->ibdev.dev),
>NULL);
> 	if (ret) {
> 		rvt_pr_err(rdi, "Failed to register driver with ib core.\n");
> 		goto bail_wss;
>diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c
>b/drivers/infiniband/sw/rxe/rxe_verbs.c
>index f368dc16281a..37fee72755be 100644
>--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
>+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
>@@ -1182,7 +1182,7 @@ int rxe_register_device(struct rxe_dev *rxe,
>const char *ibdev_name)
> 	rxe->tfm = tfm;
> 
> 	rdma_set_device_sysfs_group(dev, &rxe_attr_group);
>-	err = ib_register_device(dev, ibdev_name);
>+	err = ib_register_device(dev, ibdev_name, NULL);
> 	if (err)
> 		pr_warn("%s failed with error %d\n", __func__, err);
> 
>diff --git a/drivers/infiniband/sw/siw/siw_main.c
>b/drivers/infiniband/sw/siw/siw_main.c
>index d862bec84376..0362d57b4db8 100644
>--- a/drivers/infiniband/sw/siw/siw_main.c
>+++ b/drivers/infiniband/sw/siw/siw_main.c
>@@ -69,7 +69,7 @@ static int siw_device_register(struct siw_device
>*sdev, const char *name)
> 
> 	sdev->vendor_part_id = dev_id++;
> 
>-	rv = ib_register_device(base_dev, name);
>+	rv = ib_register_device(base_dev, name, NULL);
> 	if (rv) {
> 		pr_warn("siw: device registration error %d\n", rv);
> 		return rv;
>@@ -386,6 +386,8 @@ static struct siw_device
>*siw_device_create(struct net_device *netdev)
> 	base_dev->dev.dma_parms = &sdev->dma_parms;
> 	sdev->dma_parms = (struct device_dma_parameters)
> 		{ .max_segment_size = SZ_2G };
>+	dma_coerce_mask_and_coherent(&base_dev->dev,
>+				     dma_get_required_mask(&base_dev->dev));

Leon, can you please help me to understand this
additional logic? Do we need to setup the DMA device
for (software) RDMA devices which rely on dma_virt_ops
in the end, or better leave it untouched?

Thanks very much!
Bernard.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-22  8:58 ` Bernard Metzler
@ 2020-09-22 10:14   ` Leon Romanovsky
  2020-09-22 14:22   ` Bernard Metzler
  1 sibling, 0 replies; 23+ messages in thread
From: Leon Romanovsky @ 2020-09-22 10:14 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: Doug Ledford, Jason Gunthorpe, Adit Ranadive, Ariel Elior,
	Christian Benvenuti, Dennis Dalessandro, Devesh Sharma,
	Faisal Latif, Gal Pressman, Lijun Ou, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Naresh Kumar PBS,
	Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On Tue, Sep 22, 2020 at 08:58:06AM +0000, Bernard Metzler wrote:
> -----"Leon Romanovsky" <leon@kernel.org> wrote: -----
>
> >To: "Doug Ledford" <dledford@redhat.com>, "Jason Gunthorpe"
> ><jgg@nvidia.com>
> >From: "Leon Romanovsky" <leon@kernel.org>
> >Date: 09/22/2020 10:28AM
> >Cc: "Adit Ranadive" <aditr@vmware.com>, "Ariel Elior"
> ><aelior@marvell.com>, "Bernard Metzler" <bmt@zurich.ibm.com>,
> >"Christian Benvenuti" <benve@cisco.com>, "Dennis Dalessandro"
> ><dennis.dalessandro@intel.com>, "Devesh Sharma"
> ><devesh.sharma@broadcom.com>, "Faisal Latif"
> ><faisal.latif@intel.com>, "Gal Pressman" <galpress@amazon.com>,
> >"Lijun Ou" <oulijun@huawei.com>, linux-rdma@vger.kernel.org, "Michal
> >Kalderon" <mkalderon@marvell.com>, "Mike Marciniszyn"
> ><mike.marciniszyn@intel.com>, "Naresh Kumar PBS"
> ><nareshkumar.pbs@broadcom.com>, "Nelson Escobar"
> ><neescoba@cisco.com>, "Parav Pandit" <parav@nvidia.com>, "Parvi
> >Kaustubhi" <pkaustub@cisco.com>, "Potnuri Bharat Teja"
> ><bharat@chelsio.com>, "Selvin Xavier" <selvin.xavier@broadcom.com>,
> >"Shiraz Saleem" <shiraz.saleem@intel.com>, "Somnath Kotur"
> ><somnath.kotur@broadcom.com>, "Sriharsha Basavapatna"
> ><sriharsha.basavapatna@broadcom.com>, "VMware PV-Drivers"
> ><pv-drivers@vmware.com>, "Weihang Li" <liweihang@huawei.com>, "Wei
> >Hu(Xavier)" <huwei87@hisilicon.com>, "Yishai Hadas"
> ><yishaih@nvidia.com>, "Zhu Yanjun" <yanjunz@nvidia.com>
> >Subject: [EXTERNAL] [PATCH rdma-next] RDMA: Explicitly pass in the
> >dma_device to ib_register_device
> >
> >From: Jason Gunthorpe <jgg@nvidia.com>
> >
> >The current design is convoulted where the IB core makes assumptions
> >that a real DMA device will usually come from parent unless it looks
> >like the ib_device is partially setup for DMA, in which case the
> >ib_device itself is used for DMA, but somethings might still come
> >from the parent.
> >
> >Make this clearer by having the caller explicitly specify what the
> >DMA device should be. The caller is always responsible to fully
> >setup the DMA device it specifies. If NULL is used then the
> >ib_device will be used as the DMA device, but the caller must
> >still set it up completely.
> >
> >rvt is the only driver that did not fully setup the DMA device
> >before registering. Move the rvt specific code out of
> >setup_dma_device() into rvt and set the dma_mask's directly.
> >
> >Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> >Reviewed-by: Parav Pandit <parav@nvidia.com>
> >Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> >---
> > drivers/infiniband/core/device.c              | 73
> >+++++++------------
> > drivers/infiniband/hw/bnxt_re/main.c          |  2 +-
> > drivers/infiniband/hw/cxgb4/provider.c        |  3 +-
> > drivers/infiniband/hw/efa/efa_main.c          |  2 +-
> > drivers/infiniband/hw/hns/hns_roce_main.c     |  2 +-
> > drivers/infiniband/hw/i40iw/i40iw_verbs.c     |  2 +-
> > drivers/infiniband/hw/mlx4/main.c             |  3 +-
> > drivers/infiniband/hw/mlx5/main.c             |  2 +-
> > drivers/infiniband/hw/mthca/mthca_provider.c  |  2 +-
> > drivers/infiniband/hw/ocrdma/ocrdma_main.c    |  3 +-
> > drivers/infiniband/hw/qedr/main.c             |  2 +-
> > drivers/infiniband/hw/usnic/usnic_ib_main.c   |  2 +-
> > .../infiniband/hw/vmw_pvrdma/pvrdma_main.c    |  2 +-
> > drivers/infiniband/sw/rdmavt/vt.c             |  8 +-
> > drivers/infiniband/sw/rxe/rxe_verbs.c         |  2 +-
> > drivers/infiniband/sw/siw/siw_main.c          |  4 +-
> > include/rdma/ib_verbs.h                       |  3 +-
> > 17 files changed, 52 insertions(+), 65 deletions(-)
> >
> >diff --git a/drivers/infiniband/core/device.c
> >b/drivers/infiniband/core/device.c
> >index ec3becf85cac..417d93bbdaca 100644
> >--- a/drivers/infiniband/core/device.c
> >+++ b/drivers/infiniband/core/device.c
> >@@ -1177,58 +1177,34 @@ static int assign_name(struct ib_device
> >*device, const char *name)
> > 	return ret;
> > }
> >
> >-static void setup_dma_device(struct ib_device *device)
> >+static void setup_dma_device(struct ib_device *device,
> >+			     struct device *dma_device)
> > {
> >-	struct device *parent = device->dev.parent;
> >-
> >-	WARN_ON_ONCE(device->dma_device);
> >-
> >-#ifdef CONFIG_DMA_OPS
> >-	if (device->dev.dma_ops) {
> >+	if (!dma_device) {
> > 		/*
> >-		 * The caller provided custom DMA operations. Copy the
> >-		 * DMA-related fields that are used by e.g. dma_alloc_coherent()
> >-		 * into device->dev.
> >+		 * If the caller does not provide a DMA capable device then the
> >+		 * IB device will be used. In this case the caller should fully
> >+		 * setup the ibdev for DMA. This usually means using
> >+		 * dma_virt_ops.
> > 		 */
> >-		device->dma_device = &device->dev;
> >-		if (!device->dev.dma_mask) {
> >-			if (parent)
> >-				device->dev.dma_mask = parent->dma_mask;
> >-			else
> >-				WARN_ON_ONCE(true);
> >-		}
> >-		if (!device->dev.coherent_dma_mask) {
> >-			if (parent)
> >-				device->dev.coherent_dma_mask =
> >-					parent->coherent_dma_mask;
> >-			else
> >-				WARN_ON_ONCE(true);
> >-		}
> >-	} else
> >-#endif /* CONFIG_DMA_OPS */
> >-	{
> >+#ifdef CONFIG_DMA_OPS
> >+		if (WARN_ON(!device->dev.dma_ops))
> >+			return;
> >+#endif
> >+		if (WARN_ON(!device->dev.dma_parms))
> >+			return;
> >+
> >+		dma_device = &device->dev;
> >+	} else {
> >+		device->dev.dma_parms = dma_device->dma_parms;
> > 		/*
> >-		 * The caller did not provide custom DMA operations. Use the
> >-		 * DMA mapping operations of the parent device.
> >+		 * Auto setup the segment size if a DMA device was passed in.
> >+		 * The PCI core sets the maximum segment size to 64 KB. Increase
> >+		 * this parameter to 2 GB.
> > 		 */
> >-		WARN_ON_ONCE(!parent);
> >-		device->dma_device = parent;
> >-	}
> >-
> >-	if (!device->dev.dma_parms) {
> >-		if (parent) {
> >-			/*
> >-			 * The caller did not provide DMA parameters, so
> >-			 * 'parent' probably represents a PCI device. The PCI
> >-			 * core sets the maximum segment size to 64
> >-			 * KB. Increase this parameter to 2 GB.
> >-			 */
> >-			device->dev.dma_parms = parent->dma_parms;
> >-			dma_set_max_seg_size(device->dma_device, SZ_2G);
> >-		} else {
> >-			WARN_ON_ONCE(true);
> >-		}
> >+		dma_set_max_seg_size(dma_device, SZ_2G);
> > 	}
> >+	device->dma_device = dma_device;
> > }
> >
> > /*
> >@@ -1241,7 +1217,6 @@ static int setup_device(struct ib_device
> >*device)
> > 	struct ib_udata uhw = {.outlen = 0, .inlen = 0};
> > 	int ret;
> >
> >-	setup_dma_device(device);
> > 	ib_device_check_mandatory(device);
> >
> > 	ret = setup_port_data(device);
> >@@ -1361,7 +1336,8 @@ static void prevent_dealloc_device(struct
> >ib_device *ib_dev)
> >  * asynchronously then the device pointer may become freed as soon
> >as this
> >  * function returns.
> >  */
> >-int ib_register_device(struct ib_device *device, const char *name)
> >+int ib_register_device(struct ib_device *device, const char *name,
> >+		       struct device *dma_device)
> > {
> > 	int ret;
> >
> >@@ -1369,6 +1345,7 @@ int ib_register_device(struct ib_device
> >*device, const char *name)
> > 	if (ret)
> > 		return ret;
> >
> >+	setup_dma_device(device, dma_device);
> > 	ret = setup_device(device);
> > 	if (ret)
> > 		return ret;
> >diff --git a/drivers/infiniband/hw/bnxt_re/main.c
> >b/drivers/infiniband/hw/bnxt_re/main.c
> >index 53aee5a42ab8..b3bc62021039 100644
> >--- a/drivers/infiniband/hw/bnxt_re/main.c
> >+++ b/drivers/infiniband/hw/bnxt_re/main.c
> >@@ -736,7 +736,7 @@ static int bnxt_re_register_ib(struct bnxt_re_dev
> >*rdev)
> > 	if (ret)
> > 		return ret;
> >
> >-	return ib_register_device(ibdev, "bnxt_re%d");
> >+	return ib_register_device(ibdev, "bnxt_re%d",
> >&rdev->en_dev->pdev->dev);
> > }
> >
> > static void bnxt_re_dev_remove(struct bnxt_re_dev *rdev)
> >diff --git a/drivers/infiniband/hw/cxgb4/provider.c
> >b/drivers/infiniband/hw/cxgb4/provider.c
> >index 4b76f2f3f4e4..5f4f3abf41e4 100644
> >--- a/drivers/infiniband/hw/cxgb4/provider.c
> >+++ b/drivers/infiniband/hw/cxgb4/provider.c
> >@@ -570,7 +570,8 @@ void c4iw_register_device(struct work_struct
> >*work)
> > 	ret = set_netdevs(&dev->ibdev, &dev->rdev);
> > 	if (ret)
> > 		goto err_dealloc_ctx;
> >-	ret = ib_register_device(&dev->ibdev, "cxgb4_%d");
> >+	ret = ib_register_device(&dev->ibdev, "cxgb4_%d",
> >+				 &dev->rdev.lldi.pdev->dev);
> > 	if (ret)
> > 		goto err_dealloc_ctx;
> > 	return;
> >diff --git a/drivers/infiniband/hw/efa/efa_main.c
> >b/drivers/infiniband/hw/efa/efa_main.c
> >index 92d701146320..4de5be3e1dfe 100644
> >--- a/drivers/infiniband/hw/efa/efa_main.c
> >+++ b/drivers/infiniband/hw/efa/efa_main.c
> >@@ -331,7 +331,7 @@ static int efa_ib_device_add(struct efa_dev *dev)
> >
> > 	ib_set_device_ops(&dev->ibdev, &efa_dev_ops);
> >
> >-	err = ib_register_device(&dev->ibdev, "efa_%d");
> >+	err = ib_register_device(&dev->ibdev, "efa_%d", &pdev->dev);
> > 	if (err)
> > 		goto err_release_doorbell_bar;
> >
> >diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c
> >b/drivers/infiniband/hw/hns/hns_roce_main.c
> >index 2b4d75733e72..1b5f895d7daf 100644
> >--- a/drivers/infiniband/hw/hns/hns_roce_main.c
> >+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
> >@@ -547,7 +547,7 @@ static int hns_roce_register_device(struct
> >hns_roce_dev *hr_dev)
> > 		if (ret)
> > 			return ret;
> > 	}
> >-	ret = ib_register_device(ib_dev, "hns_%d");
> >+	ret = ib_register_device(ib_dev, "hns_%d", dev);
> > 	if (ret) {
> > 		dev_err(dev, "ib_register_device failed!\n");
> > 		return ret;
> >diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
> >b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
> >index e53f6c0dc12e..945d30a86bbc 100644
> >--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
> >+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
> >@@ -2748,7 +2748,7 @@ int i40iw_register_rdma_device(struct
> >i40iw_device *iwdev)
> > 	if (ret)
> > 		goto error;
> >
> >-	ret = ib_register_device(&iwibdev->ibdev, "i40iw%d");
> >+	ret = ib_register_device(&iwibdev->ibdev, "i40iw%d",
> >&iwdev->hw.pcidev->dev);
> > 	if (ret)
> > 		goto error;
> >
> >diff --git a/drivers/infiniband/hw/mlx4/main.c
> >b/drivers/infiniband/hw/mlx4/main.c
> >index 753c70402498..cd0fba6b0964 100644
> >--- a/drivers/infiniband/hw/mlx4/main.c
> >+++ b/drivers/infiniband/hw/mlx4/main.c
> >@@ -2841,7 +2841,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
> > 		goto err_steer_free_bitmap;
> >
> > 	rdma_set_device_sysfs_group(&ibdev->ib_dev, &mlx4_attr_group);
> >-	if (ib_register_device(&ibdev->ib_dev, "mlx4_%d"))
> >+	if (ib_register_device(&ibdev->ib_dev, "mlx4_%d",
> >+			       &dev->persist->pdev->dev))
> > 		goto err_diag_counters;
> >
> > 	if (mlx4_ib_mad_init(ibdev))
> >diff --git a/drivers/infiniband/hw/mlx5/main.c
> >b/drivers/infiniband/hw/mlx5/main.c
> >index 3ae681a6ae3b..bca57c7661eb 100644
> >--- a/drivers/infiniband/hw/mlx5/main.c
> >+++ b/drivers/infiniband/hw/mlx5/main.c
> >@@ -4404,7 +4404,7 @@ static int mlx5_ib_stage_ib_reg_init(struct
> >mlx5_ib_dev *dev)
> > 		name = "mlx5_%d";
> > 	else
> > 		name = "mlx5_bond_%d";
> >-	return ib_register_device(&dev->ib_dev, name);
> >+	return ib_register_device(&dev->ib_dev, name,
> >&dev->mdev->pdev->dev);
> > }
> >
> > static void mlx5_ib_stage_pre_ib_reg_umr_cleanup(struct mlx5_ib_dev
> >*dev)
> >diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c
> >b/drivers/infiniband/hw/mthca/mthca_provider.c
> >index 31b558ff8218..c4d9cdc4ee97 100644
> >--- a/drivers/infiniband/hw/mthca/mthca_provider.c
> >+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
> >@@ -1206,7 +1206,7 @@ int mthca_register_device(struct mthca_dev
> >*dev)
> > 	mutex_init(&dev->cap_mask_mutex);
> >
> > 	rdma_set_device_sysfs_group(&dev->ib_dev, &mthca_attr_group);
> >-	ret = ib_register_device(&dev->ib_dev, "mthca%d");
> >+	ret = ib_register_device(&dev->ib_dev, "mthca%d", &dev->pdev->dev);
> > 	if (ret)
> > 		return ret;
> >
> >diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
> >b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
> >index d8c47d24d6d6..60416186f1d0 100644
> >--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
> >+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
> >@@ -255,7 +255,8 @@ static int ocrdma_register_device(struct
> >ocrdma_dev *dev)
> > 	if (ret)
> > 		return ret;
> >
> >-	return ib_register_device(&dev->ibdev, "ocrdma%d");
> >+	return ib_register_device(&dev->ibdev, "ocrdma%d",
> >+				  &dev->nic_info.pdev->dev);
> > }
> >
> > static int ocrdma_alloc_resources(struct ocrdma_dev *dev)
> >diff --git a/drivers/infiniband/hw/qedr/main.c
> >b/drivers/infiniband/hw/qedr/main.c
> >index 7c0aac3e635b..464becdd41f7 100644
> >--- a/drivers/infiniband/hw/qedr/main.c
> >+++ b/drivers/infiniband/hw/qedr/main.c
> >@@ -293,7 +293,7 @@ static int qedr_register_device(struct qedr_dev
> >*dev)
> > 	if (rc)
> > 		return rc;
> >
> >-	return ib_register_device(&dev->ibdev, "qedr%d");
> >+	return ib_register_device(&dev->ibdev, "qedr%d", &dev->pdev->dev);
> > }
> >
> > /* This function allocates fast-path status block memory */
> >diff --git a/drivers/infiniband/hw/usnic/usnic_ib_main.c
> >b/drivers/infiniband/hw/usnic/usnic_ib_main.c
> >index 462ed71abf53..6c23a5472168 100644
> >--- a/drivers/infiniband/hw/usnic/usnic_ib_main.c
> >+++ b/drivers/infiniband/hw/usnic/usnic_ib_main.c
> >@@ -425,7 +425,7 @@ static void *usnic_ib_device_add(struct pci_dev
> >*dev)
> > 	if (ret)
> > 		goto err_fwd_dealloc;
> >
> >-	if (ib_register_device(&us_ibdev->ib_dev, "usnic_%d"))
> >+	if (ib_register_device(&us_ibdev->ib_dev, "usnic_%d", &dev->dev))
> > 		goto err_fwd_dealloc;
> >
> > 	usnic_fwd_set_mtu(us_ibdev->ufdev, us_ibdev->netdev->mtu);
> >diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
> >b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
> >index 780fd2dfc07e..5b2c94441125 100644
> >--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
> >+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
> >@@ -270,7 +270,7 @@ static int pvrdma_register_device(struct
> >pvrdma_dev *dev)
> > 	spin_lock_init(&dev->srq_tbl_lock);
> > 	rdma_set_device_sysfs_group(&dev->ib_dev, &pvrdma_attr_group);
> >
> >-	ret = ib_register_device(&dev->ib_dev, "vmw_pvrdma%d");
> >+	ret = ib_register_device(&dev->ib_dev, "vmw_pvrdma%d",
> >&dev->pdev->dev);
> > 	if (ret)
> > 		goto err_srq_free;
> >
> >diff --git a/drivers/infiniband/sw/rdmavt/vt.c
> >b/drivers/infiniband/sw/rdmavt/vt.c
> >index f904bb34477a..2f117ac11c8b 100644
> >--- a/drivers/infiniband/sw/rdmavt/vt.c
> >+++ b/drivers/infiniband/sw/rdmavt/vt.c
> >@@ -581,7 +581,11 @@ int rvt_register_device(struct rvt_dev_info
> >*rdi)
> > 	spin_lock_init(&rdi->n_cqs_lock);
> >
> > 	/* DMA Operations */
> >-	rdi->ibdev.dev.dma_ops = rdi->ibdev.dev.dma_ops ? : &dma_virt_ops;
> >+	rdi->ibdev.dev.dma_ops = &dma_virt_ops;
> >+	rdi->ibdev.dev.dma_parms = rdi->ibdev.dev.parent->dma_parms;
> >+	rdi->ibdev.dev.dma_mask = rdi->ibdev.dev.parent->dma_mask;
> >+	rdi->ibdev.dev.coherent_dma_mask =
> >+		rdi->ibdev.dev.parent->coherent_dma_mask;
> >
> > 	/* Protection Domain */
> > 	spin_lock_init(&rdi->n_pds_lock);
> >@@ -629,7 +633,7 @@ int rvt_register_device(struct rvt_dev_info *rdi)
> > 		rdi->ibdev.num_comp_vectors = 1;
> >
> > 	/* We are now good to announce we exist */
> >-	ret = ib_register_device(&rdi->ibdev, dev_name(&rdi->ibdev.dev));
> >+	ret = ib_register_device(&rdi->ibdev, dev_name(&rdi->ibdev.dev),
> >NULL);
> > 	if (ret) {
> > 		rvt_pr_err(rdi, "Failed to register driver with ib core.\n");
> > 		goto bail_wss;
> >diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c
> >b/drivers/infiniband/sw/rxe/rxe_verbs.c
> >index f368dc16281a..37fee72755be 100644
> >--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
> >+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> >@@ -1182,7 +1182,7 @@ int rxe_register_device(struct rxe_dev *rxe,
> >const char *ibdev_name)
> > 	rxe->tfm = tfm;
> >
> > 	rdma_set_device_sysfs_group(dev, &rxe_attr_group);
> >-	err = ib_register_device(dev, ibdev_name);
> >+	err = ib_register_device(dev, ibdev_name, NULL);
> > 	if (err)
> > 		pr_warn("%s failed with error %d\n", __func__, err);
> >
> >diff --git a/drivers/infiniband/sw/siw/siw_main.c
> >b/drivers/infiniband/sw/siw/siw_main.c
> >index d862bec84376..0362d57b4db8 100644
> >--- a/drivers/infiniband/sw/siw/siw_main.c
> >+++ b/drivers/infiniband/sw/siw/siw_main.c
> >@@ -69,7 +69,7 @@ static int siw_device_register(struct siw_device
> >*sdev, const char *name)
> >
> > 	sdev->vendor_part_id = dev_id++;
> >
> >-	rv = ib_register_device(base_dev, name);
> >+	rv = ib_register_device(base_dev, name, NULL);
> > 	if (rv) {
> > 		pr_warn("siw: device registration error %d\n", rv);
> > 		return rv;
> >@@ -386,6 +386,8 @@ static struct siw_device
> >*siw_device_create(struct net_device *netdev)
> > 	base_dev->dev.dma_parms = &sdev->dma_parms;
> > 	sdev->dma_parms = (struct device_dma_parameters)
> > 		{ .max_segment_size = SZ_2G };
> >+	dma_coerce_mask_and_coherent(&base_dev->dev,
> >+				     dma_get_required_mask(&base_dev->dev));
>
> Leon, can you please help me to understand this
> additional logic? Do we need to setup the DMA device
> for (software) RDMA devices which rely on dma_virt_ops
> in the end, or better leave it untouched?

The logic that driver is responsible to give right DMA device,
so yes, you are setting here mask from dma_virt_ops, as RXE did.

Thanks

>
> Thanks very much!
> Bernard.
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-22  8:58 ` Bernard Metzler
  2020-09-22 10:14   ` Leon Romanovsky
@ 2020-09-22 14:22   ` Bernard Metzler
  2020-09-22 16:22     ` Jason Gunthorpe
  1 sibling, 1 reply; 23+ messages in thread
From: Bernard Metzler @ 2020-09-22 14:22 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Jason Gunthorpe, Adit Ranadive, Ariel Elior,
	Christian Benvenuti, Dennis Dalessandro, Devesh Sharma,
	Faisal Latif, Gal Pressman, Lijun Ou, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Naresh Kumar PBS,
	Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

...

>> >diff --git a/drivers/infiniband/sw/siw/siw_main.c
>> >b/drivers/infiniband/sw/siw/siw_main.c
>> >index d862bec84376..0362d57b4db8 100644
>> >--- a/drivers/infiniband/sw/siw/siw_main.c
>> >+++ b/drivers/infiniband/sw/siw/siw_main.c
>> >@@ -69,7 +69,7 @@ static int siw_device_register(struct siw_device
>> >*sdev, const char *name)
>> >
>> > 	sdev->vendor_part_id = dev_id++;
>> >
>> >-	rv = ib_register_device(base_dev, name);
>> >+	rv = ib_register_device(base_dev, name, NULL);
>> > 	if (rv) {
>> > 		pr_warn("siw: device registration error %d\n", rv);
>> > 		return rv;
>> >@@ -386,6 +386,8 @@ static struct siw_device
>> >*siw_device_create(struct net_device *netdev)
>> > 	base_dev->dev.dma_parms = &sdev->dma_parms;
>> > 	sdev->dma_parms = (struct device_dma_parameters)
>> > 		{ .max_segment_size = SZ_2G };
>> >+	dma_coerce_mask_and_coherent(&base_dev->dev,
>> >+				     dma_get_required_mask(&base_dev->dev));
>>
>> Leon, can you please help me to understand this
>> additional logic? Do we need to setup the DMA device
>> for (software) RDMA devices which rely on dma_virt_ops
>> in the end, or better leave it untouched?
>
>The logic that driver is responsible to give right DMA device,
>so yes, you are setting here mask from dma_virt_ops, as RXE did.
>
Thanks Leon!

I wonder how this was working w/o that before!

Many thanks,
Bernard.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-22 14:22   ` Bernard Metzler
@ 2020-09-22 16:22     ` Jason Gunthorpe
  2020-09-23  5:39       ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2020-09-22 16:22 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: Leon Romanovsky, Doug Ledford, Adit Ranadive, Ariel Elior,
	Christian Benvenuti, Dennis Dalessandro, Devesh Sharma,
	Faisal Latif, Gal Pressman, Lijun Ou, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Naresh Kumar PBS,
	Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On Tue, Sep 22, 2020 at 02:22:01PM +0000, Bernard Metzler wrote:
> ...
> 
> >> >diff --git a/drivers/infiniband/sw/siw/siw_main.c
> >> >b/drivers/infiniband/sw/siw/siw_main.c
> >> >index d862bec84376..0362d57b4db8 100644
> >> >+++ b/drivers/infiniband/sw/siw/siw_main.c
> >> >@@ -69,7 +69,7 @@ static int siw_device_register(struct siw_device
> >> >*sdev, const char *name)
> >> >
> >> > 	sdev->vendor_part_id = dev_id++;
> >> >
> >> >-	rv = ib_register_device(base_dev, name);
> >> >+	rv = ib_register_device(base_dev, name, NULL);
> >> > 	if (rv) {
> >> > 		pr_warn("siw: device registration error %d\n", rv);
> >> > 		return rv;
> >> >@@ -386,6 +386,8 @@ static struct siw_device
> >> >*siw_device_create(struct net_device *netdev)
> >> > 	base_dev->dev.dma_parms = &sdev->dma_parms;
> >> > 	sdev->dma_parms = (struct device_dma_parameters)
> >> > 		{ .max_segment_size = SZ_2G };
> >> >+	dma_coerce_mask_and_coherent(&base_dev->dev,
> >> >+				     dma_get_required_mask(&base_dev->dev));
> >>
> >> Leon, can you please help me to understand this
> >> additional logic? Do we need to setup the DMA device
> >> for (software) RDMA devices which rely on dma_virt_ops
> >> in the end, or better leave it untouched?
> >
> >The logic that driver is responsible to give right DMA device,
> >so yes, you are setting here mask from dma_virt_ops, as RXE did.
> >
> Thanks Leon!
> 
> I wonder how this was working w/o that before!

I wonder if dma_virt_ops ignores the masking.. Still seems best to set
it consistently when using dma_virt_ops.

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-22  8:27 [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device Leon Romanovsky
  2020-09-22  8:58 ` Bernard Metzler
@ 2020-09-23  5:38 ` Christoph Hellwig
  2020-09-23  6:45   ` Leon Romanovsky
  2020-09-23 18:34   ` Jason Gunthorpe
  1 sibling, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2020-09-23  5:38 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Jason Gunthorpe, Adit Ranadive, Ariel Elior,
	Bernard Metzler, Christian Benvenuti, Dennis Dalessandro,
	Devesh Sharma, Faisal Latif, Gal Pressman, Lijun Ou, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Naresh Kumar PBS,
	Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

> +static void setup_dma_device(struct ib_device *device,
> +			     struct device *dma_device)
>  {
> +	if (!dma_device) {
>  		/*
> +		 * If the caller does not provide a DMA capable device then the
> +		 * IB device will be used. In this case the caller should fully
> +		 * setup the ibdev for DMA. This usually means using
> +		 * dma_virt_ops.
>  		 */
> +#ifdef CONFIG_DMA_OPS
> +		if (WARN_ON(!device->dev.dma_ops))
> +			return;
> +#endif

dma ops are entirely optiona and NULL for the most common case
(direct mapping without an IOMMU).

> +		if (WARN_ON(!device->dev.dma_parms))
> +			return;
> +
> +		dma_device = &device->dev;
> +	} else {
> +		device->dev.dma_parms = dma_device->dma_parms;
>  		/*
> +		 * Auto setup the segment size if a DMA device was passed in.
> +		 * The PCI core sets the maximum segment size to 64 KB. Increase
> +		 * this parameter to 2 GB.
>  		 */
> +		dma_set_max_seg_size(dma_device, SZ_2G);

You can't just inherity DMA properties like this this.  Please
fix all code that looks at the seg size to look at the DMA device.

Btw, where does the magic 2G come from?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-22 16:22     ` Jason Gunthorpe
@ 2020-09-23  5:39       ` Christoph Hellwig
  2020-09-23 18:35         ` Jason Gunthorpe
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2020-09-23  5:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bernard Metzler, Leon Romanovsky, Doug Ledford, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, Dennis Dalessandro,
	Devesh Sharma, Faisal Latif, Gal Pressman, Lijun Ou, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Naresh Kumar PBS,
	Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On Tue, Sep 22, 2020 at 01:22:06PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 22, 2020 at 02:22:01PM +0000, Bernard Metzler wrote:
> > ...
> > 
> > >> >diff --git a/drivers/infiniband/sw/siw/siw_main.c
> > >> >b/drivers/infiniband/sw/siw/siw_main.c
> > >> >index d862bec84376..0362d57b4db8 100644
> > >> >+++ b/drivers/infiniband/sw/siw/siw_main.c
> > >> >@@ -69,7 +69,7 @@ static int siw_device_register(struct siw_device
> > >> >*sdev, const char *name)
> > >> >
> > >> > 	sdev->vendor_part_id = dev_id++;
> > >> >
> > >> >-	rv = ib_register_device(base_dev, name);
> > >> >+	rv = ib_register_device(base_dev, name, NULL);
> > >> > 	if (rv) {
> > >> > 		pr_warn("siw: device registration error %d\n", rv);
> > >> > 		return rv;
> > >> >@@ -386,6 +386,8 @@ static struct siw_device
> > >> >*siw_device_create(struct net_device *netdev)
> > >> > 	base_dev->dev.dma_parms = &sdev->dma_parms;
> > >> > 	sdev->dma_parms = (struct device_dma_parameters)
> > >> > 		{ .max_segment_size = SZ_2G };
> > >> >+	dma_coerce_mask_and_coherent(&base_dev->dev,
> > >> >+				     dma_get_required_mask(&base_dev->dev));
> > >>
> > >> Leon, can you please help me to understand this
> > >> additional logic? Do we need to setup the DMA device
> > >> for (software) RDMA devices which rely on dma_virt_ops
> > >> in the end, or better leave it untouched?
> > >
> > >The logic that driver is responsible to give right DMA device,
> > >so yes, you are setting here mask from dma_virt_ops, as RXE did.
> > >
> > Thanks Leon!
> > 
> > I wonder how this was working w/o that before!
> 
> I wonder if dma_virt_ops ignores the masking.. Still seems best to set
> it consistently when using dma_virt_ops.

dma_virt_ops doesn't look at the mask.  But in retrospective
dma_virt_ops has been a really bad idea.  I'd much rather move the
handling of non-physical devices back into the RDMA core in the long
run.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-23  5:38 ` Christoph Hellwig
@ 2020-09-23  6:45   ` Leon Romanovsky
  2020-09-23  6:54     ` Christoph Hellwig
  2020-09-23 18:34   ` Jason Gunthorpe
  1 sibling, 1 reply; 23+ messages in thread
From: Leon Romanovsky @ 2020-09-23  6:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Doug Ledford, Jason Gunthorpe, Adit Ranadive, Ariel Elior,
	Bernard Metzler, Christian Benvenuti, Dennis Dalessandro,
	Devesh Sharma, Faisal Latif, Gal Pressman, Lijun Ou, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Naresh Kumar PBS,
	Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On Wed, Sep 23, 2020 at 06:38:40AM +0100, Christoph Hellwig wrote:
> > +static void setup_dma_device(struct ib_device *device,
> > +			     struct device *dma_device)
> >  {
> > +	if (!dma_device) {
> >  		/*
> > +		 * If the caller does not provide a DMA capable device then the
> > +		 * IB device will be used. In this case the caller should fully
> > +		 * setup the ibdev for DMA. This usually means using
> > +		 * dma_virt_ops.
> >  		 */
> > +#ifdef CONFIG_DMA_OPS
> > +		if (WARN_ON(!device->dev.dma_ops))
> > +			return;
> > +#endif
>
> dma ops are entirely optiona and NULL for the most common case
> (direct mapping without an IOMMU).

IMHO we don't support such mode (without IOMMU).

>
> > +		if (WARN_ON(!device->dev.dma_parms))
> > +			return;
> > +
> > +		dma_device = &device->dev;
> > +	} else {
> > +		device->dev.dma_parms = dma_device->dma_parms;
> >  		/*
> > +		 * Auto setup the segment size if a DMA device was passed in.
> > +		 * The PCI core sets the maximum segment size to 64 KB. Increase
> > +		 * this parameter to 2 GB.
> >  		 */
> > +		dma_set_max_seg_size(dma_device, SZ_2G);
>
> You can't just inherity DMA properties like this this.  Please
> fix all code that looks at the seg size to look at the DMA device.
>
> Btw, where does the magic 2G come from?

It comes from patch d10bcf947a3e ("RDMA/umem: Combine contiguous
PAGE_SIZE regions in SGEs"), I can't say about all devices, but this is
the limit for mlx5, rxe and SIW devices.

Thanks

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-23  6:45   ` Leon Romanovsky
@ 2020-09-23  6:54     ` Christoph Hellwig
  2020-09-23  7:10       ` Leon Romanovsky
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2020-09-23  6:54 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Christoph Hellwig, Doug Ledford, Jason Gunthorpe, Adit Ranadive,
	Ariel Elior, Bernard Metzler, Christian Benvenuti,
	Dennis Dalessandro, Devesh Sharma, Faisal Latif, Gal Pressman,
	Lijun Ou, linux-rdma, Michal Kalderon, Mike Marciniszyn,
	Naresh Kumar PBS, Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On Wed, Sep 23, 2020 at 09:45:52AM +0300, Leon Romanovsky wrote:
> On Wed, Sep 23, 2020 at 06:38:40AM +0100, Christoph Hellwig wrote:
> > > +static void setup_dma_device(struct ib_device *device,
> > > +			     struct device *dma_device)
> > >  {
> > > +	if (!dma_device) {
> > >  		/*
> > > +		 * If the caller does not provide a DMA capable device then the
> > > +		 * IB device will be used. In this case the caller should fully
> > > +		 * setup the ibdev for DMA. This usually means using
> > > +		 * dma_virt_ops.
> > >  		 */
> > > +#ifdef CONFIG_DMA_OPS
> > > +		if (WARN_ON(!device->dev.dma_ops))
> > > +			return;
> > > +#endif
> >
> > dma ops are entirely optiona and NULL for the most common case
> > (direct mapping without an IOMMU).
> 
> IMHO we don't support such mode (without IOMMU).

This seems weird.  Of course I can pass in the PCI dev here at least
in theory.  Maybe it doesn't actually happen in practice, but the check
seems totally bogus.

> > > +	} else {
> > > +		device->dev.dma_parms = dma_device->dma_parms;
> > >  		/*
> > > +		 * Auto setup the segment size if a DMA device was passed in.
> > > +		 * The PCI core sets the maximum segment size to 64 KB. Increase
> > > +		 * this parameter to 2 GB.
> > >  		 */
> > > +		dma_set_max_seg_size(dma_device, SZ_2G);
> >
> > You can't just inherity DMA properties like this this.  Please
> > fix all code that looks at the seg size to look at the DMA device.
> >
> > Btw, where does the magic 2G come from?
> 
> It comes from patch d10bcf947a3e ("RDMA/umem: Combine contiguous
> PAGE_SIZE regions in SGEs"), I can't say about all devices, but this is
> the limit for mlx5, rxe and SIW devices.

If you touch this anyway I think you absolutely should move this setting
into the drivers, and not apply a random one in the core code.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-23  6:54     ` Christoph Hellwig
@ 2020-09-23  7:10       ` Leon Romanovsky
  2020-09-23  7:21         ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Leon Romanovsky @ 2020-09-23  7:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Doug Ledford, Jason Gunthorpe, Adit Ranadive, Ariel Elior,
	Bernard Metzler, Christian Benvenuti, Dennis Dalessandro,
	Devesh Sharma, Faisal Latif, Gal Pressman, Lijun Ou, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Naresh Kumar PBS,
	Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On Wed, Sep 23, 2020 at 07:54:16AM +0100, Christoph Hellwig wrote:
> On Wed, Sep 23, 2020 at 09:45:52AM +0300, Leon Romanovsky wrote:
> > On Wed, Sep 23, 2020 at 06:38:40AM +0100, Christoph Hellwig wrote:
> > > > +static void setup_dma_device(struct ib_device *device,
> > > > +			     struct device *dma_device)
> > > >  {
> > > > +	if (!dma_device) {
> > > >  		/*
> > > > +		 * If the caller does not provide a DMA capable device then the
> > > > +		 * IB device will be used. In this case the caller should fully
> > > > +		 * setup the ibdev for DMA. This usually means using
> > > > +		 * dma_virt_ops.
> > > >  		 */
> > > > +#ifdef CONFIG_DMA_OPS
> > > > +		if (WARN_ON(!device->dev.dma_ops))
> > > > +			return;
> > > > +#endif
> > >
> > > dma ops are entirely optiona and NULL for the most common case
> > > (direct mapping without an IOMMU).
> >
> > IMHO we don't support such mode (without IOMMU).
>
> This seems weird.  Of course I can pass in the PCI dev here at least
> in theory.  Maybe it doesn't actually happen in practice, but the check
> seems totally bogus.

No problem, I will check what can be done.

>
> > > > +	} else {
> > > > +		device->dev.dma_parms = dma_device->dma_parms;
> > > >  		/*
> > > > +		 * Auto setup the segment size if a DMA device was passed in.
> > > > +		 * The PCI core sets the maximum segment size to 64 KB. Increase
> > > > +		 * this parameter to 2 GB.
> > > >  		 */
> > > > +		dma_set_max_seg_size(dma_device, SZ_2G);
> > >
> > > You can't just inherity DMA properties like this this.  Please
> > > fix all code that looks at the seg size to look at the DMA device.
> > >
> > > Btw, where does the magic 2G come from?
> >
> > It comes from patch d10bcf947a3e ("RDMA/umem: Combine contiguous
> > PAGE_SIZE regions in SGEs"), I can't say about all devices, but this is
> > the limit for mlx5, rxe and SIW devices.
>
> If you touch this anyway I think you absolutely should move this setting
> into the drivers, and not apply a random one in the core code.

I have no idea what will be the value for other drivers, because it was
"global 2G size" before and no one complained.

Thanks

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-23  7:10       ` Leon Romanovsky
@ 2020-09-23  7:21         ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2020-09-23  7:21 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Christoph Hellwig, Doug Ledford, Jason Gunthorpe, Adit Ranadive,
	Ariel Elior, Bernard Metzler, Christian Benvenuti,
	Dennis Dalessandro, Devesh Sharma, Faisal Latif, Gal Pressman,
	Lijun Ou, linux-rdma, Michal Kalderon, Mike Marciniszyn,
	Naresh Kumar PBS, Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On Wed, Sep 23, 2020 at 10:10:35AM +0300, Leon Romanovsky wrote:
> > If you touch this anyway I think you absolutely should move this setting
> > into the drivers, and not apply a random one in the core code.
> 
> I have no idea what will be the value for other drivers, because it was
> "global 2G size" before and no one complained.

Just lift it into all drivers into this series.  That increases the
chance that the driver maintainers will eventually notice it :)

> 
> Thanks
---end quoted text---

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-23  5:38 ` Christoph Hellwig
  2020-09-23  6:45   ` Leon Romanovsky
@ 2020-09-23 18:34   ` Jason Gunthorpe
  2020-09-24  5:49     ` Christoph Hellwig
  1 sibling, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2020-09-23 18:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Leon Romanovsky, Doug Ledford, Adit Ranadive, Ariel Elior,
	Bernard Metzler, Christian Benvenuti, Dennis Dalessandro,
	Devesh Sharma, Faisal Latif, Gal Pressman, Lijun Ou, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Naresh Kumar PBS,
	Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On Wed, Sep 23, 2020 at 06:38:40AM +0100, Christoph Hellwig wrote:
> > +static void setup_dma_device(struct ib_device *device,
> > +			     struct device *dma_device)
> >  {
> > +	if (!dma_device) {
> >  		/*
> > +		 * If the caller does not provide a DMA capable device then the
> > +		 * IB device will be used. In this case the caller should fully
> > +		 * setup the ibdev for DMA. This usually means using
> > +		 * dma_virt_ops.
> >  		 */
> > +#ifdef CONFIG_DMA_OPS
> > +		if (WARN_ON(!device->dev.dma_ops))
> > +			return;
> > +#endif
> 
> dma ops are entirely optiona and NULL for the most common case
> (direct mapping without an IOMMU).

This is the case where:

 +		dma_device = &device->dev;

device == struct ib_device we just allocated

The only use of this configuration is to override the dma ops with
dma_virt_ops, so drivers that don't do that are buggy. A ib_device
itself cannot do DMA otherwise. This should probably be clarified to
just fail if !CONIFG_DMA_OPS

All other cases should point dma_device at some kind of DMA capable
struct device like a pci_device, which can have a NULL ops.

> > +	} else {
> > +		device->dev.dma_parms = dma_device->dma_parms;
> >  		/*
> > +		 * Auto setup the segment size if a DMA device was passed in.
> > +		 * The PCI core sets the maximum segment size to 64 KB. Increase
> > +		 * this parameter to 2 GB.
> >  		 */
> > +		dma_set_max_seg_size(dma_device, SZ_2G);
> 
> You can't just inherity DMA properties like this this.  Please
> fix all code that looks at the seg size to look at the DMA device.

Inherit? This is overriding the PCI default of 64K to be 2G for RDMA
devices.

The closest thing RDMA has to segment size is the length of a IB
scatter/gather WR element in verbs. This is 32 bits by spec.

Even if a SGL > 32 bits was required the ULP should switch to use RDMA
MRs instead of inline IB SG.

So really there is no segment size limitation and the intention here
is to just disable segment size at IOMMU layer.

Since this is universal, by spec, not HW specific, it doesn't make
much sense to put in the drivers.

> Btw, where does the magic 2G come from?

2G is the largest power of two that will fit in a struct
scatterlist->length or the ib_sge->length.

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-23  5:39       ` Christoph Hellwig
@ 2020-09-23 18:35         ` Jason Gunthorpe
  2020-09-24  5:53           ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2020-09-23 18:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bernard Metzler, Leon Romanovsky, Doug Ledford, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, Dennis Dalessandro,
	Devesh Sharma, Faisal Latif, Gal Pressman, Lijun Ou, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Naresh Kumar PBS,
	Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On Wed, Sep 23, 2020 at 06:39:55AM +0100, Christoph Hellwig wrote:
> > I wonder if dma_virt_ops ignores the masking.. Still seems best to set
> > it consistently when using dma_virt_ops.
> 
> dma_virt_ops doesn't look at the mask.  But in retrospective
> dma_virt_ops has been a really bad idea.  I'd much rather move the
> handling of non-physical devices back into the RDMA core in the long
> run.

Doesn't that mean we need to put all the ugly IB DMA ops wrappers back
though?

Why was dma_virt_ops such a bad idea?

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-23 18:34   ` Jason Gunthorpe
@ 2020-09-24  5:49     ` Christoph Hellwig
  2020-09-24 11:49       ` Jason Gunthorpe
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2020-09-24  5:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Leon Romanovsky, Doug Ledford, Adit Ranadive,
	Ariel Elior, Bernard Metzler, Christian Benvenuti,
	Dennis Dalessandro, Devesh Sharma, Faisal Latif, Gal Pressman,
	Lijun Ou, linux-rdma, Michal Kalderon, Mike Marciniszyn,
	Naresh Kumar PBS, Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On Wed, Sep 23, 2020 at 03:34:09PM -0300, Jason Gunthorpe wrote:
> device == struct ib_device we just allocated
> 
> The only use of this configuration is to override the dma ops with
> dma_virt_ops, so drivers that don't do that are buggy. A ib_device
> itself cannot do DMA otherwise. This should probably be clarified to
> just fail if !CONIFG_DMA_OPS
> 
> All other cases should point dma_device at some kind of DMA capable
> struct device like a pci_device, which can have a NULL ops.

This makes some sense.  Of course this is really weird by keeping
the virt_ops assignment in the driver while this logic is here, creating
a bit of a mess.

> > > +	} else {
> > > +		device->dev.dma_parms = dma_device->dma_parms;
> > >  		/*
> > > +		 * Auto setup the segment size if a DMA device was passed in.
> > > +		 * The PCI core sets the maximum segment size to 64 KB. Increase
> > > +		 * this parameter to 2 GB.
> > >  		 */
> > > +		dma_set_max_seg_size(dma_device, SZ_2G);
> > 
> > You can't just inherity DMA properties like this this.  Please
> > fix all code that looks at the seg size to look at the DMA device.
> 
> Inherit? This is overriding the PCI default of 64K to be 2G for RDMA
> devices.

With inherit I mean the

		device->dev.dma_parms = dma_device->dma_parms;

line, which is completely bogus.  All DMA mapping is done on the
dma_device in the RDMA core and ULPs, so it also can't have an effect.

> The closest thing RDMA has to segment size is the length of a IB
> scatter/gather WR element in verbs. This is 32 bits by spec.
> 
> Even if a SGL > 32 bits was required the ULP should switch to use RDMA
> MRs instead of inline IB SG.
> 
> So really there is no segment size limitation and the intention here
> is to just disable segment size at IOMMU layer.
> 
> Since this is universal, by spec, not HW specific, it doesn't make
> much sense to put in the drivers.

What if your DMA device is shared by non-RDMA functionality such
as a network or storage device which would like an even larger limit?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-23 18:35         ` Jason Gunthorpe
@ 2020-09-24  5:53           ` Christoph Hellwig
  2020-09-24  6:13             ` Christoph Hellwig
  2020-09-24  7:31             ` Parav Pandit
  0 siblings, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2020-09-24  5:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Bernard Metzler, Leon Romanovsky,
	Doug Ledford, Adit Ranadive, Ariel Elior, Christian Benvenuti,
	Dennis Dalessandro, Devesh Sharma, Faisal Latif, Gal Pressman,
	Lijun Ou, linux-rdma, Michal Kalderon, Mike Marciniszyn,
	Naresh Kumar PBS, Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On Wed, Sep 23, 2020 at 03:35:56PM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 23, 2020 at 06:39:55AM +0100, Christoph Hellwig wrote:
> > > I wonder if dma_virt_ops ignores the masking.. Still seems best to set
> > > it consistently when using dma_virt_ops.
> > 
> > dma_virt_ops doesn't look at the mask.  But in retrospective
> > dma_virt_ops has been a really bad idea.  I'd much rather move the
> > handling of non-physical devices back into the RDMA core in the long
> > run.
> 
> Doesn't that mean we need to put all the ugly IB DMA ops wrappers back
> though?

All the wrappers still exists, and as far a I can tell are used by the
core and ULPs properly.

> Why was dma_virt_ops such a bad idea?

Because it isn't DMA, and not we need crappy workarounds like the one
in the PCIe P2P code.  It also enforces the same size for dma_addr_t
and a pointer, which is not true in various cases.  More importantly
it forces a very strange design in the IB APIs (actually it seems the
other way around - the weird IB Verbs APIs forced this decision, but
it cements it in).  For example most modern Mellanox cards can include
immediate data in the command capsule (sorry NVMe terms, I'm pretty
sure you guys use something else) that is just copied into the BAR
using MMIO.  But the IB API design still forces the ULP to dma map
it, which is idiotic.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-24  5:53           ` Christoph Hellwig
@ 2020-09-24  6:13             ` Christoph Hellwig
  2020-09-24 11:55               ` Jason Gunthorpe
  2020-09-24  7:31             ` Parav Pandit
  1 sibling, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2020-09-24  6:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Bernard Metzler, Leon Romanovsky,
	Doug Ledford, Adit Ranadive, Ariel Elior, Christian Benvenuti,
	Dennis Dalessandro, Devesh Sharma, Faisal Latif, Gal Pressman,
	Lijun Ou, linux-rdma, Michal Kalderon, Mike Marciniszyn,
	Naresh Kumar PBS, Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On Thu, Sep 24, 2020 at 06:53:45AM +0100, Christoph Hellwig wrote:
> Because it isn't DMA, and not we need crappy workarounds like the one
> in the PCIe P2P code.  It also enforces the same size for dma_addr_t
> and a pointer, which is not true in various cases.  More importantly
> it forces a very strange design in the IB APIs (actually it seems the
> other way around - the weird IB Verbs APIs forced this decision, but
> it cements it in).  For example most modern Mellanox cards can include
> immediate data in the command capsule (sorry NVMe terms, I'm pretty
> sure you guys use something else) that is just copied into the BAR
> using MMIO.  But the IB API design still forces the ULP to dma map
> it, which is idiotic.

FYI, here is the quick patch to kill of dma_virt_ops on top of the
patch under discussion here.  I think the diffstat alone speaks for
itself, never mind the fact that we avoid indirect calls for the
drivers that don't want the DMA mapping.


diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index f8fc41ef921868..036b01bf047231 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1173,36 +1173,6 @@ static int assign_name(struct ib_device *device, const char *name)
 	return ret;
 }
 
-static void setup_dma_device(struct ib_device *device,
-			     struct device *dma_device)
-{
-	if (!dma_device) {
-		/*
-		 * If the caller does not provide a DMA capable device then the
-		 * IB device will be used. In this case the caller should fully
-		 * setup the ibdev for DMA. This usually means using
-		 * dma_virt_ops.
-		 */
-#ifdef CONFIG_DMA_OPS
-		if (WARN_ON(!device->dev.dma_ops))
-			return;
-#endif
-		if (WARN_ON(!device->dev.dma_parms))
-			return;
-
-		dma_device = &device->dev;
-	} else {
-		device->dev.dma_parms = dma_device->dma_parms;
-		/*
-		 * Auto setup the segment size if a DMA device was passed in.
-		 * The PCI core sets the maximum segment size to 64 KB. Increase
-		 * this parameter to 2 GB.
-		 */
-		dma_set_max_seg_size(dma_device, SZ_2G);
-	}
-	device->dma_device = dma_device;
-}
-
 /*
  * setup_device() allocates memory and sets up data that requires calling the
  * device ops, this is the only reason these actions are not done during
@@ -1345,7 +1315,18 @@ int ib_register_device(struct ib_device *device, const char *name,
 	if (ret)
 		return ret;
 
-	setup_dma_device(device, dma_device);
+	/*
+	 * Auto setup the segment size if a DMA device was passed in.  The PCI
+	 * core sets the maximum segment size to 64 KB.  Increase this parameter
+	 * to 2 GB.
+	 *
+	 * XXX: this really should move into the drivers, as some might support
+	 * a larger segment size for non-RDMA purposes.
+	 */
+	device->dma_device = dma_device;
+	if (dma_device)
+		dma_set_max_seg_size(dma_device, SZ_2G);
+
 	ret = setup_device(device);
 	if (ret)
 		return ret;
diff --git a/drivers/infiniband/sw/rdmavt/Kconfig b/drivers/infiniband/sw/rdmavt/Kconfig
index 9ef5f5ce1ff6b0..2de31692885cf0 100644
--- a/drivers/infiniband/sw/rdmavt/Kconfig
+++ b/drivers/infiniband/sw/rdmavt/Kconfig
@@ -1,8 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 config INFINIBAND_RDMAVT
 	tristate "RDMA verbs transport library"
-	depends on X86_64 && ARCH_DMA_ADDR_T_64BIT
+	depends on X86_64
 	depends on PCI
-	select DMA_VIRT_OPS
 	help
 	This is a common software verbs provider for RDMA networks.
diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c
index 2f7c25fea44a9d..e60a11b2f78b70 100644
--- a/drivers/infiniband/sw/rdmavt/mr.c
+++ b/drivers/infiniband/sw/rdmavt/mr.c
@@ -324,8 +324,6 @@ static void __rvt_free_mr(struct rvt_mr *mr)
  * @acc: access flags
  *
  * Return: the memory region on success, otherwise returns an errno.
- * Note that all DMA addresses should be created via the functions in
- * struct dma_virt_ops.
  */
 struct ib_mr *rvt_get_dma_mr(struct ib_pd *pd, int acc)
 {
@@ -766,7 +764,7 @@ int rvt_lkey_ok(struct rvt_lkey_table *rkt, struct rvt_pd *pd,
 
 	/*
 	 * We use LKEY == zero for kernel virtual addresses
-	 * (see rvt_get_dma_mr() and dma_virt_ops).
+	 * (see rvt_get_dma_mr()).
 	 */
 	if (sge->lkey == 0) {
 		struct rvt_dev_info *dev = ib_to_rvt(pd->ibpd.device);
@@ -877,7 +875,7 @@ int rvt_rkey_ok(struct rvt_qp *qp, struct rvt_sge *sge,
 
 	/*
 	 * We use RKEY == zero for kernel virtual addresses
-	 * (see rvt_get_dma_mr() and dma_virt_ops).
+	 * (see rvt_get_dma_mr()).
 	 */
 	rcu_read_lock();
 	if (rkey == 0) {
diff --git a/drivers/infiniband/sw/rdmavt/vt.c b/drivers/infiniband/sw/rdmavt/vt.c
index 2f117ac11c8b86..8dc9d980b172d8 100644
--- a/drivers/infiniband/sw/rdmavt/vt.c
+++ b/drivers/infiniband/sw/rdmavt/vt.c
@@ -580,13 +580,6 @@ int rvt_register_device(struct rvt_dev_info *rdi)
 	/* Completion queues */
 	spin_lock_init(&rdi->n_cqs_lock);
 
-	/* DMA Operations */
-	rdi->ibdev.dev.dma_ops = &dma_virt_ops;
-	rdi->ibdev.dev.dma_parms = rdi->ibdev.dev.parent->dma_parms;
-	rdi->ibdev.dev.dma_mask = rdi->ibdev.dev.parent->dma_mask;
-	rdi->ibdev.dev.coherent_dma_mask =
-		rdi->ibdev.dev.parent->coherent_dma_mask;
-
 	/* Protection Domain */
 	spin_lock_init(&rdi->n_pds_lock);
 	rdi->n_pds_allocated = 0;
diff --git a/drivers/infiniband/sw/rxe/Kconfig b/drivers/infiniband/sw/rxe/Kconfig
index a0c6c7dfc1814f..f12bb088a44453 100644
--- a/drivers/infiniband/sw/rxe/Kconfig
+++ b/drivers/infiniband/sw/rxe/Kconfig
@@ -2,10 +2,8 @@
 config RDMA_RXE
 	tristate "Software RDMA over Ethernet (RoCE) driver"
 	depends on INET && PCI && INFINIBAND
-	depends on !64BIT || ARCH_DMA_ADDR_T_64BIT
 	select NET_UDP_TUNNEL
 	select CRYPTO_CRC32
-	select DMA_VIRT_OPS
 	help
 	This driver implements the InfiniBand RDMA transport over
 	the Linux network stack. It enables a system with a
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 37fee72755be76..df4a5de6456906 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1128,12 +1128,6 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
 	dev->local_dma_lkey = 0;
 	addrconf_addr_eui48((unsigned char *)&dev->node_guid,
 			    rxe->ndev->dev_addr);
-	dev->dev.dma_ops = &dma_virt_ops;
-	dev->dev.dma_parms = &rxe->dma_parms;
-	rxe->dma_parms = (struct device_dma_parameters)
-		{ .max_segment_size = SZ_2G };
-	dma_coerce_mask_and_coherent(&dev->dev,
-				     dma_get_required_mask(&dev->dev));
 
 	dev->uverbs_cmd_mask = BIT_ULL(IB_USER_VERBS_CMD_GET_CONTEXT)
 	    | BIT_ULL(IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL)
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 560a610bb0aa83..3cd58195191ddb 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -356,7 +356,6 @@ struct rxe_port {
 struct rxe_dev {
 	struct ib_device	ib_dev;
 	struct ib_device_attr	attr;
-	struct device_dma_parameters dma_parms;
 	int			max_ucontext;
 	int			max_inline_data;
 	struct mutex	usdev_lock;
diff --git a/drivers/infiniband/sw/siw/Kconfig b/drivers/infiniband/sw/siw/Kconfig
index b622fc62f2cd6d..8582ac6050c743 100644
--- a/drivers/infiniband/sw/siw/Kconfig
+++ b/drivers/infiniband/sw/siw/Kconfig
@@ -1,7 +1,6 @@
 config RDMA_SIW
 	tristate "Software RDMA over TCP/IP (iWARP) driver"
 	depends on INET && INFINIBAND && LIBCRC32C
-	select DMA_VIRT_OPS
 	help
 	This driver implements the iWARP RDMA transport over
 	the Linux TCP/IP network stack. It enables a system with a
diff --git a/drivers/infiniband/sw/siw/siw.h b/drivers/infiniband/sw/siw/siw.h
index e9753831ac3f33..adda7899621962 100644
--- a/drivers/infiniband/sw/siw/siw.h
+++ b/drivers/infiniband/sw/siw/siw.h
@@ -69,7 +69,6 @@ struct siw_pd {
 
 struct siw_device {
 	struct ib_device base_dev;
-	struct device_dma_parameters dma_parms;
 	struct net_device *netdev;
 	struct siw_dev_cap attrs;
 
diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
index 0362d57b4db898..c62a7a0d423c0e 100644
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -382,12 +382,6 @@ static struct siw_device *siw_device_create(struct net_device *netdev)
 	 */
 	base_dev->phys_port_cnt = 1;
 	base_dev->dev.parent = parent;
-	base_dev->dev.dma_ops = &dma_virt_ops;
-	base_dev->dev.dma_parms = &sdev->dma_parms;
-	sdev->dma_parms = (struct device_dma_parameters)
-		{ .max_segment_size = SZ_2G };
-	dma_coerce_mask_and_coherent(&base_dev->dev,
-				     dma_get_required_mask(&base_dev->dev));
 	base_dev->num_comp_vectors = num_possible_cpus();
 
 	xa_init_flags(&sdev->qp_xa, XA_FLAGS_ALLOC1);
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index bf66b0622b49c8..f2621eaecb7211 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -556,15 +556,6 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
 		return -1;
 
 	for (i = 0; i < num_clients; i++) {
-#ifdef CONFIG_DMA_VIRT_OPS
-		if (clients[i]->dma_ops == &dma_virt_ops) {
-			if (verbose)
-				dev_warn(clients[i],
-					 "cannot be used for peer-to-peer DMA because the driver makes use of dma_virt_ops\n");
-			return -1;
-		}
-#endif
-
 		pci_client = find_parent_pci_dev(clients[i]);
 		if (!pci_client) {
 			if (verbose)
@@ -837,17 +828,6 @@ static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
 	phys_addr_t paddr;
 	int i;
 
-	/*
-	 * p2pdma mappings are not compatible with devices that use
-	 * dma_virt_ops. If the upper layers do the right thing
-	 * this should never happen because it will be prevented
-	 * by the check in pci_p2pdma_distance_many()
-	 */
-#ifdef CONFIG_DMA_VIRT_OPS
-	if (WARN_ON_ONCE(dev->dma_ops == &dma_virt_ops))
-		return 0;
-#endif
-
 	for_each_sg(sg, s, nents, i) {
 		paddr = sg_phys(s);
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index bb138ac6f5e63e..6d39b34b75b115 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -133,7 +133,6 @@ struct dma_map_ops {
 
 #define DMA_MAPPING_ERROR		(~(dma_addr_t)0)
 
-extern const struct dma_map_ops dma_virt_ops;
 extern const struct dma_map_ops dma_dummy_ops;
 
 #define DMA_BIT_MASK(n)	(((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index fe32ec9f4754ea..f8a4f88aae298f 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3967,6 +3967,8 @@ static inline u64 ib_dma_map_single(struct ib_device *dev,
 				    void *cpu_addr, size_t size,
 				    enum dma_data_direction direction)
 {
+	if (!dev->dma_device)
+		return (uintptr_t)cpu_addr;
 	return dma_map_single(dev->dma_device, cpu_addr, size, direction);
 }
 
@@ -3981,6 +3983,8 @@ static inline void ib_dma_unmap_single(struct ib_device *dev,
 				       u64 addr, size_t size,
 				       enum dma_data_direction direction)
 {
+	if (!dev->dma_device)
+		return;
 	dma_unmap_single(dev->dma_device, addr, size, direction);
 }
 
@@ -3998,6 +4002,8 @@ static inline u64 ib_dma_map_page(struct ib_device *dev,
 				  size_t size,
 					 enum dma_data_direction direction)
 {
+	if (!dev->dma_device)
+		return (uintptr_t)(page_address(page) + offset);
 	return dma_map_page(dev->dma_device, page, offset, size, direction);
 }
 
@@ -4012,9 +4018,41 @@ static inline void ib_dma_unmap_page(struct ib_device *dev,
 				     u64 addr, size_t size,
 				     enum dma_data_direction direction)
 {
+	if (!dev->dma_device)
+		return;
 	dma_unmap_page(dev->dma_device, addr, size, direction);
 }
 
+static inline int ib_dma_map_sg_attrs(struct ib_device *dev,
+				      struct scatterlist *sg, int nents,
+				      enum dma_data_direction direction,
+				      unsigned long dma_attrs)
+{
+	if (!dev->dma_device) {
+		struct scatterlist *s;
+		int i;
+
+		for_each_sg(sg, s, nents, i) {
+			sg_dma_address(s) = (uintptr_t)sg_virt(s);
+			sg_dma_len(s) = s->length;
+		}
+
+		return nents;
+	}
+	return dma_map_sg_attrs(dev->dma_device, sg, nents, direction,
+				dma_attrs);
+}
+
+static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev,
+					 struct scatterlist *sg, int nents,
+					 enum dma_data_direction direction,
+					 unsigned long dma_attrs)
+{
+	if (!dev->dma_device)
+		return;
+	dma_unmap_sg_attrs(dev->dma_device, sg, nents, direction, dma_attrs);
+}
+
 /**
  * ib_dma_map_sg - Map a scatter/gather list to DMA addresses
  * @dev: The device for which the DMA addresses are to be created
@@ -4026,7 +4064,7 @@ static inline int ib_dma_map_sg(struct ib_device *dev,
 				struct scatterlist *sg, int nents,
 				enum dma_data_direction direction)
 {
-	return dma_map_sg(dev->dma_device, sg, nents, direction);
+	return ib_dma_map_sg_attrs(dev, sg, nents, direction, 0);
 }
 
 /**
@@ -4040,24 +4078,7 @@ static inline void ib_dma_unmap_sg(struct ib_device *dev,
 				   struct scatterlist *sg, int nents,
 				   enum dma_data_direction direction)
 {
-	dma_unmap_sg(dev->dma_device, sg, nents, direction);
-}
-
-static inline int ib_dma_map_sg_attrs(struct ib_device *dev,
-				      struct scatterlist *sg, int nents,
-				      enum dma_data_direction direction,
-				      unsigned long dma_attrs)
-{
-	return dma_map_sg_attrs(dev->dma_device, sg, nents, direction,
-				dma_attrs);
-}
-
-static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev,
-					 struct scatterlist *sg, int nents,
-					 enum dma_data_direction direction,
-					 unsigned long dma_attrs)
-{
-	dma_unmap_sg_attrs(dev->dma_device, sg, nents, direction, dma_attrs);
+	ib_dma_unmap_sg_attrs(dev, sg, nents, direction, 0);
 }
 
 /**
@@ -4068,6 +4089,8 @@ static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev,
  */
 static inline unsigned int ib_dma_max_seg_size(struct ib_device *dev)
 {
+	if (!dev->dma_device)
+		return SZ_2G;
 	return dma_get_max_seg_size(dev->dma_device);
 }
 
@@ -4083,6 +4106,8 @@ static inline void ib_dma_sync_single_for_cpu(struct ib_device *dev,
 					      size_t size,
 					      enum dma_data_direction dir)
 {
+	if (!dev->dma_device)
+		return;
 	dma_sync_single_for_cpu(dev->dma_device, addr, size, dir);
 }
 
@@ -4098,6 +4123,8 @@ static inline void ib_dma_sync_single_for_device(struct ib_device *dev,
 						 size_t size,
 						 enum dma_data_direction dir)
 {
+	if (!dev->dma_device)
+		return;
 	dma_sync_single_for_device(dev->dma_device, addr, size, dir);
 }
 
@@ -4113,6 +4140,14 @@ static inline void *ib_dma_alloc_coherent(struct ib_device *dev,
 					   dma_addr_t *dma_handle,
 					   gfp_t flag)
 {
+	if (!dev->dma_device) {
+		void *ret = (void *)__get_free_pages(flag | __GFP_ZERO,
+				get_order(size));
+		if (ret)
+			*dma_handle = (uintptr_t)ret;
+		return ret;
+	}
+
 	return dma_alloc_coherent(dev->dma_device, size, dma_handle, flag);
 }
 
@@ -4127,7 +4162,10 @@ static inline void ib_dma_free_coherent(struct ib_device *dev,
 					size_t size, void *cpu_addr,
 					dma_addr_t dma_handle)
 {
-	dma_free_coherent(dev->dma_device, size, cpu_addr, dma_handle);
+	if (!dev->dma_device)
+		free_pages((unsigned long)cpu_addr, get_order(size));
+	else
+		dma_free_coherent(dev->dma_device, size, cpu_addr, dma_handle);
 }
 
 /* ib_reg_user_mr - register a memory region for virtual addresses from kernel
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 281785feb874db..0edf7845b4effa 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -78,11 +78,6 @@ config ARCH_HAS_FORCE_DMA_UNENCRYPTED
 config DMA_NONCOHERENT_CACHE_SYNC
 	bool
 
-config DMA_VIRT_OPS
-	bool
-	depends on HAS_DMA
-	select DMA_OPS
-
 config SWIOTLB
 	bool
 	select NEED_DMA_MAP_STATE
diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile
index dc755ab68aabf9..cd1d86358a7a62 100644
--- a/kernel/dma/Makefile
+++ b/kernel/dma/Makefile
@@ -5,7 +5,6 @@ obj-$(CONFIG_DMA_OPS)			+= ops_helpers.o
 obj-$(CONFIG_DMA_OPS)			+= dummy.o
 obj-$(CONFIG_DMA_CMA)			+= contiguous.o
 obj-$(CONFIG_DMA_DECLARE_COHERENT)	+= coherent.o
-obj-$(CONFIG_DMA_VIRT_OPS)		+= virt.o
 obj-$(CONFIG_DMA_API_DEBUG)		+= debug.o
 obj-$(CONFIG_SWIOTLB)			+= swiotlb.o
 obj-$(CONFIG_DMA_COHERENT_POOL)		+= pool.o
diff --git a/kernel/dma/virt.c b/kernel/dma/virt.c
deleted file mode 100644
index ebe128833af7b5..00000000000000
--- a/kernel/dma/virt.c
+++ /dev/null
@@ -1,59 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * DMA operations that map to virtual addresses without flushing memory.
- */
-#include <linux/export.h>
-#include <linux/mm.h>
-#include <linux/dma-mapping.h>
-#include <linux/scatterlist.h>
-
-static void *dma_virt_alloc(struct device *dev, size_t size,
-			    dma_addr_t *dma_handle, gfp_t gfp,
-			    unsigned long attrs)
-{
-	void *ret;
-
-	ret = (void *)__get_free_pages(gfp | __GFP_ZERO, get_order(size));
-	if (ret)
-		*dma_handle = (uintptr_t)ret;
-	return ret;
-}
-
-static void dma_virt_free(struct device *dev, size_t size,
-			  void *cpu_addr, dma_addr_t dma_addr,
-			  unsigned long attrs)
-{
-	free_pages((unsigned long)cpu_addr, get_order(size));
-}
-
-static dma_addr_t dma_virt_map_page(struct device *dev, struct page *page,
-				    unsigned long offset, size_t size,
-				    enum dma_data_direction dir,
-				    unsigned long attrs)
-{
-	return (uintptr_t)(page_address(page) + offset);
-}
-
-static int dma_virt_map_sg(struct device *dev, struct scatterlist *sgl,
-			   int nents, enum dma_data_direction dir,
-			   unsigned long attrs)
-{
-	int i;
-	struct scatterlist *sg;
-
-	for_each_sg(sgl, sg, nents, i) {
-		BUG_ON(!sg_page(sg));
-		sg_dma_address(sg) = (uintptr_t)sg_virt(sg);
-		sg_dma_len(sg) = sg->length;
-	}
-
-	return nents;
-}
-
-const struct dma_map_ops dma_virt_ops = {
-	.alloc			= dma_virt_alloc,
-	.free			= dma_virt_free,
-	.map_page		= dma_virt_map_page,
-	.map_sg			= dma_virt_map_sg,
-};
-EXPORT_SYMBOL(dma_virt_ops);

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* RE: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-24  5:53           ` Christoph Hellwig
  2020-09-24  6:13             ` Christoph Hellwig
@ 2020-09-24  7:31             ` Parav Pandit
  1 sibling, 0 replies; 23+ messages in thread
From: Parav Pandit @ 2020-09-24  7:31 UTC (permalink / raw)
  To: Christoph Hellwig, Jason Gunthorpe
  Cc: Bernard Metzler, Leon Romanovsky, Doug Ledford, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, Dennis Dalessandro,
	Devesh Sharma, Faisal Latif, Gal Pressman, Lijun Ou, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Naresh Kumar PBS,
	Nelson Escobar, Parvi Kaustubhi, Potnuri Bharat Teja,
	Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Yanjun Zhu

Hi Christoph,

> From: Christoph Hellwig <hch@infradead.org>
> Sent: Thursday, September 24, 2020 11:24 AM
> 
> On Wed, Sep 23, 2020 at 03:35:56PM -0300, Jason Gunthorpe wrote:
> > On Wed, Sep 23, 2020 at 06:39:55AM +0100, Christoph Hellwig wrote:
> > > > I wonder if dma_virt_ops ignores the masking.. Still seems best to
> > > > set it consistently when using dma_virt_ops.
> > >
> > > dma_virt_ops doesn't look at the mask.  But in retrospective
> > > dma_virt_ops has been a really bad idea.  I'd much rather move the
> > > handling of non-physical devices back into the RDMA core in the long
> > > run.
> >
> > Doesn't that mean we need to put all the ugly IB DMA ops wrappers back
> > though?
> 
> All the wrappers still exists, and as far a I can tell are used by the core and
> ULPs properly.
> 
> > Why was dma_virt_ops such a bad idea?
> 
> Because it isn't DMA, and not we need crappy workarounds like the one in
> the PCIe P2P code.  It also enforces the same size for dma_addr_t and a
> pointer, which is not true in various cases.  More importantly it forces a very
> strange design in the IB APIs (actually it seems the other way around - the
> weird IB Verbs APIs forced this decision, but it cements it in).  For example
> most modern Mellanox cards can include immediate data in the command
> capsule (sorry NVMe terms, I'm pretty sure you guys use something else)
> that is just copied into the BAR using MMIO.  But the IB API design still forces
> the ULP to dma map it, which is idiotic.

For 64 B nvme command & for 16 nvme cqe, IB_SEND_INLINE flag can be used while posting send WR.
Here the data is inline in the WQE at ib_sge as VA.

This inline data doesn't require dma mapping.
It is memcpyied to MMIO area and in the SQ ring (for differed access).

Code is located at drivers/infiniband/hw/mlx5/wr.c -> set_data_inl_seg()

To make use of it, ib_create_qp(qp_init_attr->cap.max_inline_data = 64) should be set.

However little more plumbing is needed across vendors for nvme ULP to consume in proper way.
For example,
drivers/infiniband/hw/i40iw/i40iw_verbs.c trims the inline data to its internal limit of 48, while ULP might have asked for 64 during QP creation time.
Post_send() will fail later when command is posted, which is obviously not expected.

In another example ConnectX3 mlx4 driver doesn't support it.
A while back I posted the patch [1] for ConnectX3 to avoid kernel crash and to support IB_SEND_INLINE.

[1] https://patchwork.kernel.org/patch/9619537/#20240807

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-24  5:49     ` Christoph Hellwig
@ 2020-09-24 11:49       ` Jason Gunthorpe
  2020-10-06 14:29         ` Bart Van Assche
  0 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2020-09-24 11:49 UTC (permalink / raw)
  To: Christoph Hellwig, Bart Van Assche
  Cc: Leon Romanovsky, Doug Ledford, Adit Ranadive, Ariel Elior,
	Bernard Metzler, Christian Benvenuti, Dennis Dalessandro,
	Devesh Sharma, Faisal Latif, Gal Pressman, Lijun Ou, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Naresh Kumar PBS,
	Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On Thu, Sep 24, 2020 at 06:49:07AM +0100, Christoph Hellwig wrote:
> > > > +	} else {
> > > > +		device->dev.dma_parms = dma_device->dma_parms;
> > > >  		/*
> > > > +		 * Auto setup the segment size if a DMA device was passed in.
> > > > +		 * The PCI core sets the maximum segment size to 64 KB. Increase
> > > > +		 * this parameter to 2 GB.
> > > >  		 */
> > > > +		dma_set_max_seg_size(dma_device, SZ_2G);
> > > 
> > > You can't just inherity DMA properties like this this.  Please
> > > fix all code that looks at the seg size to look at the DMA device.
> > 
> > Inherit? This is overriding the PCI default of 64K to be 2G for RDMA
> > devices.
> 
> With inherit I mean the
> 
> 		device->dev.dma_parms = dma_device->dma_parms;
> 
> line, which is completely bogus.  All DMA mapping is done on the
> dma_device in the RDMA core and ULPs, so it also can't have an effect.

Oh. Yes, no idea why that is there..

commit c9121262d57b8a3be4f08073546436ba0128ca6a
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Fri Oct 25 15:58:30 2019 -0700

    RDMA/core: Set DMA parameters correctly
    
    The dma_set_max_seg_size() call in setup_dma_device() does not have any
    effect since device->dev.dma_parms is NULL. Fix this by initializing
    device->dev.dma_parms first.

Bart?

> > The closest thing RDMA has to segment size is the length of a IB
> > scatter/gather WR element in verbs. This is 32 bits by spec.
> > 
> > Even if a SGL > 32 bits was required the ULP should switch to use RDMA
> > MRs instead of inline IB SG.
> > 
> > So really there is no segment size limitation and the intention here
> > is to just disable segment size at IOMMU layer.
> > 
> > Since this is universal, by spec, not HW specific, it doesn't make
> > much sense to put in the drivers.
> 
> What if your DMA device is shared by non-RDMA functionality such
> as a network or storage device which would like an even larger limit?

This limit should be the largest possible, if we can go higher here,
then lets go higher. UINT_MAX?

Hopefully nobody needs lower in the multi-function case

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-24  6:13             ` Christoph Hellwig
@ 2020-09-24 11:55               ` Jason Gunthorpe
  0 siblings, 0 replies; 23+ messages in thread
From: Jason Gunthorpe @ 2020-09-24 11:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bernard Metzler, Leon Romanovsky, Doug Ledford, Adit Ranadive,
	Ariel Elior, Christian Benvenuti, Dennis Dalessandro,
	Devesh Sharma, Faisal Latif, Gal Pressman, Lijun Ou, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Naresh Kumar PBS,
	Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On Thu, Sep 24, 2020 at 07:13:25AM +0100, Christoph Hellwig wrote:
> On Thu, Sep 24, 2020 at 06:53:45AM +0100, Christoph Hellwig wrote:
> > Because it isn't DMA, and not we need crappy workarounds like the one
> > in the PCIe P2P code.  It also enforces the same size for dma_addr_t
> > and a pointer, which is not true in various cases.  More importantly
> > it forces a very strange design in the IB APIs (actually it seems the
> > other way around - the weird IB Verbs APIs forced this decision, but
> > it cements it in).  For example most modern Mellanox cards can include
> > immediate data in the command capsule (sorry NVMe terms, I'm pretty
> > sure you guys use something else) that is just copied into the BAR
> > using MMIO.  But the IB API design still forces the ULP to dma map
> > it, which is idiotic.
> 
> FYI, here is the quick patch to kill of dma_virt_ops on top of the
> patch under discussion here.  I think the diffstat alone speaks for
> itself, never mind the fact that we avoid indirect calls for the
> drivers that don't want the DMA mapping.

Yes looks reasonable, I can't recall why things went in the other
direction.

> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index bf66b0622b49c8..f2621eaecb7211 100644
> +++ b/drivers/pci/p2pdma.c
> @@ -556,15 +556,6 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
>  		return -1;
>  
>  	for (i = 0; i < num_clients; i++) {
> -#ifdef CONFIG_DMA_VIRT_OPS
> -		if (clients[i]->dma_ops == &dma_virt_ops) {
> -			if (verbose)
> -				dev_warn(clients[i],
> -					 "cannot be used for peer-to-peer DMA because the driver makes use of dma_virt_ops\n");
> -			return -1;
> -		}
> -#endif

We still need to block p2p for any of the non DMA devices. We can't
pass a p2p sgl into SW devices, they will try to memcpy from the MMIO
__iommem which is not allowed.

Now the check changes to !ib_device->dma_device in the callers

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-09-24 11:49       ` Jason Gunthorpe
@ 2020-10-06 14:29         ` Bart Van Assche
  2020-10-06 16:53           ` Jason Gunthorpe
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2020-10-06 14:29 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Hellwig
  Cc: Leon Romanovsky, Doug Ledford, Adit Ranadive, Ariel Elior,
	Bernard Metzler, Christian Benvenuti, Dennis Dalessandro,
	Devesh Sharma, Faisal Latif, Gal Pressman, Lijun Ou, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Naresh Kumar PBS,
	Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On 9/24/20 4:49 AM, Jason Gunthorpe wrote:
> On Thu, Sep 24, 2020 at 06:49:07AM +0100, Christoph Hellwig wrote:
>>>>> +	} else {
>>>>> +		device->dev.dma_parms = dma_device->dma_parms;
>>>>>   		/*
>>>>> +		 * Auto setup the segment size if a DMA device was passed in.
>>>>> +		 * The PCI core sets the maximum segment size to 64 KB. Increase
>>>>> +		 * this parameter to 2 GB.
>>>>>   		 */
>>>>> +		dma_set_max_seg_size(dma_device, SZ_2G);
>>>>
>>>> You can't just inherity DMA properties like this this.  Please
>>>> fix all code that looks at the seg size to look at the DMA device.
>>>
>>> Inherit? This is overriding the PCI default of 64K to be 2G for RDMA
>>> devices.
>>
>> With inherit I mean the
>>
>> 		device->dev.dma_parms = dma_device->dma_parms;
>>
>> line, which is completely bogus.  All DMA mapping is done on the
>> dma_device in the RDMA core and ULPs, so it also can't have an effect.
> 
> Oh. Yes, no idea why that is there..
> 
> commit c9121262d57b8a3be4f08073546436ba0128ca6a
> Author: Bart Van Assche <bvanassche@acm.org>
> Date:   Fri Oct 25 15:58:30 2019 -0700
> 
>      RDMA/core: Set DMA parameters correctly
>      
>      The dma_set_max_seg_size() call in setup_dma_device() does not have any
>      effect since device->dev.dma_parms is NULL. Fix this by initializing
>      device->dev.dma_parms first.
> 
> Bart?

(just noticed this email)

Hi Jason,

That code may be a leftover from when the ib_dma_*() functions used &dev->dev as
their first argument instead of dev->dma_device. See also commit 0957c29f78af
("IB/core: Restore I/O MMU, s390 and powerpc support").

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-10-06 14:29         ` Bart Van Assche
@ 2020-10-06 16:53           ` Jason Gunthorpe
  2020-10-06 18:22             ` Bart Van Assche
  0 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2020-10-06 16:53 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Leon Romanovsky, Doug Ledford, Adit Ranadive,
	Ariel Elior, Bernard Metzler, Christian Benvenuti,
	Dennis Dalessandro, Devesh Sharma, Faisal Latif, Gal Pressman,
	Lijun Ou, linux-rdma, Michal Kalderon, Mike Marciniszyn,
	Naresh Kumar PBS, Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On Tue, Oct 06, 2020 at 07:29:16AM -0700, Bart Van Assche wrote:
> On 9/24/20 4:49 AM, Jason Gunthorpe wrote:
> > On Thu, Sep 24, 2020 at 06:49:07AM +0100, Christoph Hellwig wrote:
> > > > > > +	} else {
> > > > > > +		device->dev.dma_parms = dma_device->dma_parms;
> > > > > >   		/*
> > > > > > +		 * Auto setup the segment size if a DMA device was passed in.
> > > > > > +		 * The PCI core sets the maximum segment size to 64 KB. Increase
> > > > > > +		 * this parameter to 2 GB.
> > > > > >   		 */
> > > > > > +		dma_set_max_seg_size(dma_device, SZ_2G);
> > > > > 
> > > > > You can't just inherity DMA properties like this this.  Please
> > > > > fix all code that looks at the seg size to look at the DMA device.
> > > > 
> > > > Inherit? This is overriding the PCI default of 64K to be 2G for RDMA
> > > > devices.
> > > 
> > > With inherit I mean the
> > > 
> > > 		device->dev.dma_parms = dma_device->dma_parms;
> > > 
> > > line, which is completely bogus.  All DMA mapping is done on the
> > > dma_device in the RDMA core and ULPs, so it also can't have an effect.
> > 
> > Oh. Yes, no idea why that is there..
> > 
> > commit c9121262d57b8a3be4f08073546436ba0128ca6a
> > Author: Bart Van Assche <bvanassche@acm.org>
> > Date:   Fri Oct 25 15:58:30 2019 -0700
> > 
> >      RDMA/core: Set DMA parameters correctly
> >      The dma_set_max_seg_size() call in setup_dma_device() does not have any
> >      effect since device->dev.dma_parms is NULL. Fix this by initializing
> >      device->dev.dma_parms first.
> > 
> > Bart?
> 
> (just noticed this email)
> 
> Hi Jason,
>
> That code may be a leftover from when the ib_dma_*() functions used &dev->dev as
> their first argument instead of dev->dma_device. 

Hmm the above was two years after the commit that added dma_device? I
assumed you added this because you were doing testing with rxe?

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-10-06 16:53           ` Jason Gunthorpe
@ 2020-10-06 18:22             ` Bart Van Assche
  2020-10-06 18:39               ` Jason Gunthorpe
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2020-10-06 18:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Leon Romanovsky, Doug Ledford, Adit Ranadive,
	Ariel Elior, Bernard Metzler, Christian Benvenuti,
	Dennis Dalessandro, Devesh Sharma, Faisal Latif, Gal Pressman,
	Lijun Ou, linux-rdma, Michal Kalderon, Mike Marciniszyn,
	Naresh Kumar PBS, Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On 10/6/20 9:53 AM, Jason Gunthorpe wrote:
> On Tue, Oct 06, 2020 at 07:29:16AM -0700, Bart Van Assche wrote:
>> On 9/24/20 4:49 AM, Jason Gunthorpe wrote:
>>> On Thu, Sep 24, 2020 at 06:49:07AM +0100, Christoph Hellwig wrote:
>>>>>>> +	} else {
>>>>>>> +		device->dev.dma_parms = dma_device->dma_parms;
>>>>>>>    		/*
>>>>>>> +		 * Auto setup the segment size if a DMA device was passed in.
>>>>>>> +		 * The PCI core sets the maximum segment size to 64 KB. Increase
>>>>>>> +		 * this parameter to 2 GB.
>>>>>>>    		 */
>>>>>>> +		dma_set_max_seg_size(dma_device, SZ_2G);
>>>>>>
>>>>>> You can't just inherity DMA properties like this this.  Please
>>>>>> fix all code that looks at the seg size to look at the DMA device.
>>>>>
>>>>> Inherit? This is overriding the PCI default of 64K to be 2G for RDMA
>>>>> devices.
>>>>
>>>> With inherit I mean the
>>>>
>>>> 		device->dev.dma_parms = dma_device->dma_parms;
>>>>
>>>> line, which is completely bogus.  All DMA mapping is done on the
>>>> dma_device in the RDMA core and ULPs, so it also can't have an effect.
>>>
>>> Oh. Yes, no idea why that is there..
>>>
>>> commit c9121262d57b8a3be4f08073546436ba0128ca6a
>>> Author: Bart Van Assche <bvanassche@acm.org>
>>> Date:   Fri Oct 25 15:58:30 2019 -0700
>>>
>>>       RDMA/core: Set DMA parameters correctly
>>>       The dma_set_max_seg_size() call in setup_dma_device() does not have any
>>>       effect since device->dev.dma_parms is NULL. Fix this by initializing
>>>       device->dev.dma_parms first.
>>>
>>> Bart?
>>
>> (just noticed this email)
>>
>> Hi Jason,
>>
>> That code may be a leftover from when the ib_dma_*() functions used &dev->dev as
>> their first argument instead of dev->dma_device.
> 
> Hmm the above was two years after the commit that added dma_device? I
> assumed you added this because you were doing testing with rxe?

Hi Jason,

In my previous email I was referring to the code that sets DMA parameters in 'device'.

The patch "RDMA/core: Set DMA parameters correctly" was the result of source reading
while I was chasing an unrelated rdma_rxe bug.

Bart.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device
  2020-10-06 18:22             ` Bart Van Assche
@ 2020-10-06 18:39               ` Jason Gunthorpe
  0 siblings, 0 replies; 23+ messages in thread
From: Jason Gunthorpe @ 2020-10-06 18:39 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Leon Romanovsky, Doug Ledford, Adit Ranadive,
	Ariel Elior, Bernard Metzler, Christian Benvenuti,
	Dennis Dalessandro, Devesh Sharma, Faisal Latif, Gal Pressman,
	Lijun Ou, linux-rdma, Michal Kalderon, Mike Marciniszyn,
	Naresh Kumar PBS, Nelson Escobar, Parav Pandit, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier),
	Yishai Hadas, Zhu Yanjun

On Tue, Oct 06, 2020 at 11:22:26AM -0700, Bart Van Assche wrote:

> The patch "RDMA/core: Set DMA parameters correctly" was the result of source reading
> while I was chasing an unrelated rdma_rxe bug.

Okay, then we can drop it off safely thanks

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-10-06 18:39 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-22  8:27 [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device Leon Romanovsky
2020-09-22  8:58 ` Bernard Metzler
2020-09-22 10:14   ` Leon Romanovsky
2020-09-22 14:22   ` Bernard Metzler
2020-09-22 16:22     ` Jason Gunthorpe
2020-09-23  5:39       ` Christoph Hellwig
2020-09-23 18:35         ` Jason Gunthorpe
2020-09-24  5:53           ` Christoph Hellwig
2020-09-24  6:13             ` Christoph Hellwig
2020-09-24 11:55               ` Jason Gunthorpe
2020-09-24  7:31             ` Parav Pandit
2020-09-23  5:38 ` Christoph Hellwig
2020-09-23  6:45   ` Leon Romanovsky
2020-09-23  6:54     ` Christoph Hellwig
2020-09-23  7:10       ` Leon Romanovsky
2020-09-23  7:21         ` Christoph Hellwig
2020-09-23 18:34   ` Jason Gunthorpe
2020-09-24  5:49     ` Christoph Hellwig
2020-09-24 11:49       ` Jason Gunthorpe
2020-10-06 14:29         ` Bart Van Assche
2020-10-06 16:53           ` Jason Gunthorpe
2020-10-06 18:22             ` Bart Van Assche
2020-10-06 18:39               ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).