linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* IB on s390 broken with commit 99db94940 "IB/core: Remove ib_device.dma_device"
@ 2017-02-27 20:17 Sebastian Ott
  2017-02-27 22:04 ` Bart Van Assche
  0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Ott @ 2017-02-27 20:17 UTC (permalink / raw)
  To: Bart Van Assche, Doug Ledford; +Cc: Gerald Schaefer, linux-rdma, linux-kernel

Hi,

commit 99db94940 "IB/core: Remove ib_device.dma_device"
breaks infiniband on s390 (and I think also other archs that do something
like to_pci_dev(dev) in one of their dma_ops callbacks).

With this commit you use the dma_ops of the device that called
ib_register_device but you call e.g. dma_map with ib_device->dev
as an argument.

S390's (pci specific) dma_map uses to_pci_dev(dev) to look into the
pci device (and its arch specific data) and oopses.

Calling dma_map with ib_device->dev.parent would work but then it
wouldn't make sense to copy dma_ops and mask from ib_device->dev.parent
to ib_device->dev..

Regards,
Sebastian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IB on s390 broken with commit 99db94940 "IB/core: Remove ib_device.dma_device"
  2017-02-27 20:17 IB on s390 broken with commit 99db94940 "IB/core: Remove ib_device.dma_device" Sebastian Ott
@ 2017-02-27 22:04 ` Bart Van Assche
  2017-02-28  8:53   ` Sebastian Ott
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2017-02-27 22:04 UTC (permalink / raw)
  To: sebott, dledford; +Cc: gerald.schaefer, linux-kernel, linux-rdma

On Mon, 2017-02-27 at 21:17 +0100, Sebastian Ott wrote:
> commit 99db94940 "IB/core: Remove ib_device.dma_device"
> breaks infiniband on s390 (and I think also other archs that do something
> like to_pci_dev(dev) in one of their dma_ops callbacks).
> 
> With this commit you use the dma_ops of the device that called
> ib_register_device but you call e.g. dma_map with ib_device->dev
> as an argument.
> 
> S390's (pci specific) dma_map uses to_pci_dev(dev) to look into the
> pci device (and its arch specific data) and oopses.
> 
> Calling dma_map with ib_device->dev.parent would work but then it
> wouldn't make sense to copy dma_ops and mask from ib_device->dev.parent
> to ib_device->dev..

How about something like the untested patch below?

---
 drivers/infiniband/core/device.c | 5 ++++-
 drivers/pci/probe.c              | 1 +
 include/linux/device.h           | 5 +++++
 include/linux/pci.h              | 5 ++++-
 4 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index a63e8400ea3b..989077fc6dbb 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -39,6 +39,7 @@
 #include <linux/init.h>
 #include <linux/mutex.h>
 #include <linux/netdevice.h>
+#include <linux/pci.h>
 #include <rdma/rdma_netlink.h>
 #include <rdma/ib_addr.h>
 #include <rdma/ib_cache.h>
@@ -336,8 +337,10 @@ int ib_register_device(struct ib_device *device,
 	struct device *parent = device->dev.parent;
 
 	WARN_ON_ONCE(!parent);
-	if (!device->dev.dma_ops)
+	if (!device->dev.dma_ops) {
 		device->dev.dma_ops = parent->dma_ops;
+		device->dev.pci_dev = to_pci_dev(parent);
+	}
 	if (!device->dev.dma_mask)
 		device->dev.dma_mask = parent->dma_mask;
 	if (!device->dev.coherent_dma_mask)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index dfc9a2794141..60d739b59520 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1736,6 +1736,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
 
 	INIT_LIST_HEAD(&dev->bus_list);
 	dev->dev.type = &pci_dev_type;
+	dev->dev.pci_dev = dev;
 	dev->bus = pci_bus_get(bus);
 
 	return dev;
diff --git a/include/linux/device.h b/include/linux/device.h
index 30c4570e928d..c18afd376d2a 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -42,6 +42,7 @@ struct fwnode_handle;
 struct iommu_ops;
 struct iommu_group;
 struct iommu_fwspec;
+struct pci_dev;
 
 struct bus_attribute {
 	struct attribute	attr;
@@ -860,6 +861,9 @@ struct dev_links_info {
  * 		segment limitations.
  * @dma_pools:	Dma pools (if dma'ble device).
  * @dma_mem:	Internal for coherent mem override.
+ * @pci_dev:	PCI device associated with this device. Used by DMA mapping
+ *		operations on architectures that need access to PCI device
+ *		structure elements that are not in struct device.
  * @cma_area:	Contiguous memory area for dma allocations
  * @archdata:	For arch-specific additions.
  * @of_node:	Associated device tree node.
@@ -940,6 +944,7 @@ struct device {
 
 	struct dma_coherent_mem	*dma_mem; /* internal for coherent mem
 					     override */
+	struct pci_dev		*pci_dev; /* for DMA mapping operations */
 #ifdef CONFIG_DMA_CMA
 	struct cma *cma_area;		/* contiguous memory area for dma
 					   allocations */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 282ed32244ce..ba1222f32046 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -409,7 +409,10 @@ static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
 
 struct pci_dev *pci_alloc_dev(struct pci_bus *bus);
 
-#define	to_pci_dev(n) container_of(n, struct pci_dev, dev)
+static inline struct pci_dev *to_pci_dev(const struct device *dev)
+{
+	return dev->pci_dev;
+}
 #define for_each_pci_dev(d) while ((d = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, d)) != NULL)
 
 static inline int pci_channel_offline(struct pci_dev *pdev)
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: IB on s390 broken with commit 99db94940 "IB/core: Remove ib_device.dma_device"
  2017-02-27 22:04 ` Bart Van Assche
@ 2017-02-28  8:53   ` Sebastian Ott
  2017-02-28  9:20     ` Sebastian Ott
  2017-02-28 16:50     ` Bart Van Assche
  0 siblings, 2 replies; 6+ messages in thread
From: Sebastian Ott @ 2017-02-28  8:53 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dledford, gerald.schaefer, linux-kernel, linux-rdma

On Mon, 27 Feb 2017, Bart Van Assche wrote:

> On Mon, 2017-02-27 at 21:17 +0100, Sebastian Ott wrote:
> > commit 99db94940 "IB/core: Remove ib_device.dma_device"
> > breaks infiniband on s390 (and I think also other archs that do something
> > like to_pci_dev(dev) in one of their dma_ops callbacks).
> > 
> > With this commit you use the dma_ops of the device that called
> > ib_register_device but you call e.g. dma_map with ib_device->dev
> > as an argument.
> > 
> > S390's (pci specific) dma_map uses to_pci_dev(dev) to look into the
> > pci device (and its arch specific data) and oopses.
> > 
> > Calling dma_map with ib_device->dev.parent would work but then it
> > wouldn't make sense to copy dma_ops and mask from ib_device->dev.parent
> > to ib_device->dev..
> 
> How about something like the untested patch below?

It works but it doesn't feel right (why should all pci devices have this
duplicated data).

Frankly I don't get the usecase of infiniband (sometimes) using
device->dev.dma_ops instead of parent->dma_ops. Also that these values are
selectively copied from the parent looks weird (opposed to all or nothing).

What about reintroducing dma_device (as an infiniband internal) and set it
to &ib_device->dev if you have to and to parent in all other cases?

Sebastian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IB on s390 broken with commit 99db94940 "IB/core: Remove ib_device.dma_device"
  2017-02-28  8:53   ` Sebastian Ott
@ 2017-02-28  9:20     ` Sebastian Ott
  2017-02-28 16:50     ` Bart Van Assche
  1 sibling, 0 replies; 6+ messages in thread
From: Sebastian Ott @ 2017-02-28  9:20 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dledford, gerald.schaefer, linux-kernel, linux-rdma

On Tue, 28 Feb 2017, Sebastian Ott wrote:
> On Mon, 27 Feb 2017, Bart Van Assche wrote:
> > On Mon, 2017-02-27 at 21:17 +0100, Sebastian Ott wrote:
> > > commit 99db94940 "IB/core: Remove ib_device.dma_device"
> > > breaks infiniband on s390 (and I think also other archs that do something
> > > like to_pci_dev(dev) in one of their dma_ops callbacks).
> > > 
> > > With this commit you use the dma_ops of the device that called
> > > ib_register_device but you call e.g. dma_map with ib_device->dev
> > > as an argument.
> > > 
> > > S390's (pci specific) dma_map uses to_pci_dev(dev) to look into the
> > > pci device (and its arch specific data) and oopses.
> > > 
> > > Calling dma_map with ib_device->dev.parent would work but then it
> > > wouldn't make sense to copy dma_ops and mask from ib_device->dev.parent
> > > to ib_device->dev..
> > 
> > How about something like the untested patch below?
> 
> It works but it doesn't feel right (why should all pci devices have this
> duplicated data).
> 
> Frankly I don't get the usecase of infiniband (sometimes) using
> device->dev.dma_ops instead of parent->dma_ops.

Ok, some drivers in drivers/infiniband/sw/ set dma_ops to &dma_virt_ops .

So instead of copying the dma_ops you could do

if (!device->dev.dma_ops) /*or use some new flag for ib_register_device*/
	device->dma_device = parent;
else
	device->dma_device = &device->dev;

Sebastian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IB on s390 broken with commit 99db94940 "IB/core: Remove ib_device.dma_device"
  2017-02-28  8:53   ` Sebastian Ott
  2017-02-28  9:20     ` Sebastian Ott
@ 2017-02-28 16:50     ` Bart Van Assche
  2017-02-28 19:53       ` Parav Pandit
  1 sibling, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2017-02-28 16:50 UTC (permalink / raw)
  To: sebott; +Cc: gerald.schaefer, linux-kernel, linux-rdma, dledford

On Tue, 2017-02-28 at 09:53 +0100, Sebastian Ott wrote:
> On Mon, 27 Feb 2017, Bart Van Assche wrote:
> 
> > On Mon, 2017-02-27 at 21:17 +0100, Sebastian Ott wrote:
> > > commit 99db94940 "IB/core: Remove ib_device.dma_device"
> > > breaks infiniband on s390 (and I think also other archs that do something
> > > like to_pci_dev(dev) in one of their dma_ops callbacks).
> > > 
> > > With this commit you use the dma_ops of the device that called
> > > ib_register_device but you call e.g. dma_map with ib_device->dev
> > > as an argument.
> > > 
> > > S390's (pci specific) dma_map uses to_pci_dev(dev) to look into the
> > > pci device (and its arch specific data) and oopses.
> > > 
> > > Calling dma_map with ib_device->dev.parent would work but then it
> > > wouldn't make sense to copy dma_ops and mask from ib_device->dev.parent
> > > to ib_device->dev..
> > 
> > How about something like the untested patch below?
> 
> It works but it doesn't feel right (why should all pci devices have this
> duplicated data).
> 
> Frankly I don't get the usecase of infiniband (sometimes) using
> device->dev.dma_ops instead of parent->dma_ops. Also that these values are
> selectively copied from the parent looks weird (opposed to all or nothing).
> 
> What about reintroducing dma_device (as an infiniband internal) and set it
> to &ib_device->dev if you have to and to parent in all other cases?

Hello Sebastian,

There are three kinds of RDMA drivers:
- RDMA drivers that always use DMA for transferring data between memory and
  HCA (e.g. mlx4, mlx5, ...). These drivers make the ULP call the PCI DMA
  mapping functions directly.
- RDMA drivers that never use DMA directly but use another driver for
  transferring data (e.g. rdma_rxe). This driver makes the ULP store virtual
  addresses in .dma_address.
- RDMA drivers that decide whether to use PIO or DMA depending on e.g. the
  QP type and the amount of data to be transferred (qib, hfi1). These drivers
  also make the ULP store virtual addresses in .dma_address and decide
  internally whether or not to invoke the PCI DMA mapping functions.

This is why a custom DMA mapping API was introduced in the RDMA subsystem.
Until recently the Linux RDMA subsystem not only had its own DMA mapping
operations but also its own template for DMA mapping operations (struct
ib_dma_mapping_ops). This is not only confusing but also led to a multitude
of incomplete and RDMA driver DMA mapping operations of which additionally
the behavior is slightly different of other DMA mapping operations. That's
why we want to evolve towards a single DMA mapping API. Reintroducing the
dma_device pointer in struct ib_device would make it impossible to use the
standard DMA mapping API for RDMA devices.

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: IB on s390 broken with commit 99db94940 "IB/core: Remove ib_device.dma_device"
  2017-02-28 16:50     ` Bart Van Assche
@ 2017-02-28 19:53       ` Parav Pandit
  0 siblings, 0 replies; 6+ messages in thread
From: Parav Pandit @ 2017-02-28 19:53 UTC (permalink / raw)
  To: Bart Van Assche, sebott
  Cc: gerald.schaefer, linux-kernel, linux-rdma, dledford

Hi Bart,

I am using Linux-block tree testing on x86_64.
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git

Commit ac1820fb286b552b6885d40ab34f1e59b815f1f1 introduced dma_ops related change that you made.
With this change I am hitting below error in mlx5_ib driver.
"DMAR: Allocating domain for mlx5_0 failed"

I revert back to commit edccb59429657b09806146339e2b27594c1d1da0.
With revert I do not hit the error.

I do not have cycles to debug/fix this currently. Do you think this might be related to your change?

Parav

> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Bart Van Assche
> Sent: Tuesday, February 28, 2017 10:50 AM
> To: sebott@linux.vnet.ibm.com
> Cc: gerald.schaefer@de.ibm.com; linux-kernel@vger.kernel.org; linux-
> rdma@vger.kernel.org; dledford@redhat.com
> Subject: Re: IB on s390 broken with commit 99db94940 "IB/core: Remove
> ib_device.dma_device"
> 
> On Tue, 2017-02-28 at 09:53 +0100, Sebastian Ott wrote:
> > On Mon, 27 Feb 2017, Bart Van Assche wrote:
> >
> > > On Mon, 2017-02-27 at 21:17 +0100, Sebastian Ott wrote:
> > > > commit 99db94940 "IB/core: Remove ib_device.dma_device"
> > > > breaks infiniband on s390 (and I think also other archs that do
> > > > something like to_pci_dev(dev) in one of their dma_ops callbacks).
> > > >
> > > > With this commit you use the dma_ops of the device that called
> > > > ib_register_device but you call e.g. dma_map with ib_device->dev
> > > > as an argument.
> > > >
> > > > S390's (pci specific) dma_map uses to_pci_dev(dev) to look into
> > > > the pci device (and its arch specific data) and oopses.
> > > >
> > > > Calling dma_map with ib_device->dev.parent would work but then it
> > > > wouldn't make sense to copy dma_ops and mask from
> > > > ib_device->dev.parent to ib_device->dev..
> > >
> > > How about something like the untested patch below?
> >
> > It works but it doesn't feel right (why should all pci devices have
> > this duplicated data).
> >
> > Frankly I don't get the usecase of infiniband (sometimes) using
> > device->dev.dma_ops instead of parent->dma_ops. Also that these values
> > device->are
> > selectively copied from the parent looks weird (opposed to all or nothing).
> >
> > What about reintroducing dma_device (as an infiniband internal) and
> > set it to &ib_device->dev if you have to and to parent in all other cases?
> 
> Hello Sebastian,
> 
> There are three kinds of RDMA drivers:
> - RDMA drivers that always use DMA for transferring data between memory
> and
>   HCA (e.g. mlx4, mlx5, ...). These drivers make the ULP call the PCI DMA
>   mapping functions directly.
> - RDMA drivers that never use DMA directly but use another driver for
>   transferring data (e.g. rdma_rxe). This driver makes the ULP store virtual
>   addresses in .dma_address.
> - RDMA drivers that decide whether to use PIO or DMA depending on e.g.
> the
>   QP type and the amount of data to be transferred (qib, hfi1). These drivers
>   also make the ULP store virtual addresses in .dma_address and decide
>   internally whether or not to invoke the PCI DMA mapping functions.
> 
> This is why a custom DMA mapping API was introduced in the RDMA
> subsystem.
> Until recently the Linux RDMA subsystem not only had its own DMA mapping
> operations but also its own template for DMA mapping operations (struct
> ib_dma_mapping_ops). This is not only confusing but also led to a multitude
> of incomplete and RDMA driver DMA mapping operations of which
> additionally the behavior is slightly different of other DMA mapping
> operations. That's why we want to evolve towards a single DMA mapping
> API. Reintroducing the dma_device pointer in struct ib_device would make it
> impossible to use the standard DMA mapping API for RDMA devices.
> 
> Bart.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-02-28 21:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-27 20:17 IB on s390 broken with commit 99db94940 "IB/core: Remove ib_device.dma_device" Sebastian Ott
2017-02-27 22:04 ` Bart Van Assche
2017-02-28  8:53   ` Sebastian Ott
2017-02-28  9:20     ` Sebastian Ott
2017-02-28 16:50     ` Bart Van Assche
2017-02-28 19:53       ` Parav Pandit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).