All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on POWER8
@ 2017-04-20  7:23 Alexey Kardashevskiy
  2017-04-20  7:23 ` [PATCH dpdk 1/5] vfio/ppc64/spapr: Use correct structures for add/remove windows Alexey Kardashevskiy
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-20  7:23 UTC (permalink / raw)
  To: dev; +Cc: Alexey Kardashevskiy, JPF, Gowrishankar Muthukrishnan

Hi!

This is my first attempt to use DPDK on POWER8 machine and yet
unsuccessful as it turned out DPDK only supports IB-Mellanox
(I only got ethernet-Mellanox, and requires OFED), rmmod on
Intel 40Gb module produces PCI errors (unrelated to DPDK) and
Broadcom bnx2x has few issues (below) and still crashes as
I suspect I got DMA mapping wrong, here is a backtrace:

Configuring Port 0 (socket 0)
PMD: bnx2x_issue_dmae_with_comp(): DMAE timeout!
PANIC in bnx2x_write_dmae():
DMAE failed (-1)22: [/lib/powerpc64le-linux-gnu/libc.so.6(__libc_start_main+0xb8) [0x3fffb7c23298]]
21: [/lib/powerpc64le-linux-gnu/libc.so.6(+0x2309c) [0x3fffb7c2309c]]
20: [/home/aik/pbuild/dpdk_build/app/testpmd(main+0x228) [0x100255d0]]
19: [/home/aik/pbuild/dpdk_build/app/testpmd(start_port+0x5dc) [0x1002341c]]
18: [/home/aik/pbuild/dpdk_build/app/testpmd(rte_eth_dev_start+0xc4) [0x1008b3c0]]
17: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x10117550]]
16: [/home/aik/pbuild/dpdk_build/app/testpmd(bnx2x_init+0x204) [0x100f7210]]
15: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100f6888]]
14: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100ee7f4]]
13: [/home/aik/pbuild/dpdk_build/app/testpmd(ecore_func_state_change+0x250) [0x10127794]]
12: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x1012734c]]
11: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x10126830]]
10: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x10126618]]
9: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x10100a98]]
8: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100ffe00]]
7: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100de614]]
6: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100de4cc]]
5: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x101063c0]]
4: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100e1f6c]]
3: [/home/aik/pbuild/dpdk_build/app/testpmd(bnx2x_write_dmae+0x11c) [0x100e1e40]]
2: [/home/aik/pbuild/dpdk_build/app/testpmd(__rte_panic+0x8c) [0x100b3e58]]
1: [/home/aik/pbuild/dpdk_build/app/testpmd(rte_dump_stack+0x40) [0x100b3cc4]]

Thread 1 "testpmd" received signal SIGABRT, Aborted.
0x00003fffb7c3edb0 in __GI_raise (sig=<optimised out>) at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

Still, some fixes are quite obvious and straigtforward.

This is based on sha1
2fc8e0bf0 Olivier Matz "log: fix dump of registered logs when disabled".

Please comment. Thanks.



Alexey Kardashevskiy (5):
  vfio/ppc64/spapr: Use correct structures for add/remove windows
  pci: Initialize common rte driver pointer
  RFC: bnx2x: Update firmware versions
  vfio: Do try setting IOMMU type if already set
  RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map

 lib/librte_eal/linuxapp/eal/eal_vfio.h |  8 +++++
 drivers/net/bnx2x/bnx2x.c              |  4 +--
 lib/librte_eal/common/eal_common_pci.c |  1 +
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 62 +++++++++++++++++++---------------
 4 files changed, 46 insertions(+), 29 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH dpdk 1/5] vfio/ppc64/spapr: Use correct structures for add/remove windows
  2017-04-20  7:23 [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on POWER8 Alexey Kardashevskiy
@ 2017-04-20  7:23 ` Alexey Kardashevskiy
  2017-04-20  7:23 ` [PATCH dpdk 2/5] pci: Initialize common rte driver pointer Alexey Kardashevskiy
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-20  7:23 UTC (permalink / raw)
  To: dev; +Cc: Alexey Kardashevskiy, JPF, Gowrishankar Muthukrishnan

This copies structures passed via VFIO_IOMMU_SPAPR_TCE_CREATE and
VFIO_IOMMU_SPAPR_TCE_REMOVE ioctls. The existing ones cannot possible work.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 lib/librte_eal/linuxapp/eal/eal_vfio.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index 239ac4d8d..4a0283cb4 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -69,13 +69,21 @@ struct vfio_iommu_spapr_register_memory {
 
 struct vfio_iommu_spapr_tce_create {
 	uint32_t argsz;
+	uint32_t flags;
+	/* in */
 	uint32_t page_shift;
+	uint32_t __resv1;
 	uint64_t window_size;
 	uint32_t levels;
+	uint32_t __resv2;
+	/* out */
+	uint64_t start_addr;
 };
 
 struct vfio_iommu_spapr_tce_remove {
 	uint32_t argsz;
+	uint32_t flags;
+	/* in */
 	uint64_t start_addr;
 };
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH dpdk 2/5] pci: Initialize common rte driver pointer
  2017-04-20  7:23 [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on POWER8 Alexey Kardashevskiy
  2017-04-20  7:23 ` [PATCH dpdk 1/5] vfio/ppc64/spapr: Use correct structures for add/remove windows Alexey Kardashevskiy
@ 2017-04-20  7:23 ` Alexey Kardashevskiy
  2017-04-24  9:28   ` Burakov, Anatoly
  2017-04-20  7:24 ` [PATCH dpdk 3/5] RFC: bnx2x: Update firmware versions Alexey Kardashevskiy
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-20  7:23 UTC (permalink / raw)
  To: dev; +Cc: Alexey Kardashevskiy, JPF, Gowrishankar Muthukrishnan

The existing code initializes a PCI driver pointer but not the common one.
As the result, ring_dma_zone_reserve() in drivers/net/bnx2x/bnx2x_rxtx.c
crashed as dev->device->driver==NULL.

This adds missing initialization.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 lib/librte_eal/common/eal_common_pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 6f0d4d8e4..b6b41be31 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -244,6 +244,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr,
 
 	/* reference driver structure */
 	dev->driver = dr;
+	dev->device.driver = &dr->driver;
 
 	/* call the driver probe() function */
 	ret = dr->probe(dr, dev);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH dpdk 3/5] RFC: bnx2x: Update firmware versions
  2017-04-20  7:23 [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on POWER8 Alexey Kardashevskiy
  2017-04-20  7:23 ` [PATCH dpdk 1/5] vfio/ppc64/spapr: Use correct structures for add/remove windows Alexey Kardashevskiy
  2017-04-20  7:23 ` [PATCH dpdk 2/5] pci: Initialize common rte driver pointer Alexey Kardashevskiy
@ 2017-04-20  7:24 ` Alexey Kardashevskiy
  2017-04-20  7:24 ` [PATCH dpdk 4/5] vfio: Do try setting IOMMU type if already set Alexey Kardashevskiy
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-20  7:24 UTC (permalink / raw)
  To: dev; +Cc: Alexey Kardashevskiy, JPF, Gowrishankar Muthukrishnan

Recent kernels/distros have updated firmware images, use them.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 drivers/net/bnx2x/bnx2x.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x.c b/drivers/net/bnx2x/bnx2x.c
index 1a7e1c8e1..17e8c6b4c 100644
--- a/drivers/net/bnx2x/bnx2x.c
+++ b/drivers/net/bnx2x/bnx2x.c
@@ -9535,8 +9535,8 @@ static void bnx2x_init_rte(struct bnx2x_softc *sc)
 }
 
 #define FW_HEADER_LEN 104
-#define FW_NAME_57711 "/lib/firmware/bnx2x/bnx2x-e1h-7.2.51.0.fw"
-#define FW_NAME_57810 "/lib/firmware/bnx2x/bnx2x-e2-7.2.51.0.fw"
+#define FW_NAME_57711 "/lib/firmware/bnx2x/bnx2x-e1h-7.13.1.0.fw"
+#define FW_NAME_57810 "/lib/firmware/bnx2x/bnx2x-e2-7.12.30.0.fw"
 
 void bnx2x_load_firmware(struct bnx2x_softc *sc)
 {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH dpdk 4/5] vfio: Do try setting IOMMU type if already set
  2017-04-20  7:23 [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on POWER8 Alexey Kardashevskiy
                   ` (2 preceding siblings ...)
  2017-04-20  7:24 ` [PATCH dpdk 3/5] RFC: bnx2x: Update firmware versions Alexey Kardashevskiy
@ 2017-04-20  7:24 ` Alexey Kardashevskiy
  2017-04-20 19:31   ` gowrishankar muthukrishnan
  2017-04-21  8:54   ` Andrew Rybchenko
  2017-04-20  7:24 ` [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map Alexey Kardashevskiy
  2017-04-22 21:11 ` [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on POWER8 Olga Shern
  5 siblings, 2 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-20  7:24 UTC (permalink / raw)
  To: dev; +Cc: Alexey Kardashevskiy, JPF, Gowrishankar Muthukrishnan

The existing code correctly checks if a container is set to a group and
does not try attaching a group to a container for a second/third/...
device from the same IOMMU group.

However it still tries to set IOMMU type to a container for every device
in a group which produces failure and closes container and group fds.

This moves vfio_set_iommu_type() and DMA mapping code under:
if (!(group_status.flags & VFIO_GROUP_FLAGS_CONTAINER_SET))

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 50 +++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 6e2e84ca7..46f951f4d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -298,33 +298,33 @@ vfio_setup_device(const char *sysfs_base, const char *dev_addr,
 			clear_group(vfio_group_fd);
 			return -1;
 		}
-	}
 
-	/*
-	 * pick an IOMMU type and set up DMA mappings for container
-	 *
-	 * needs to be done only once, only when first group is assigned to
-	 * a container and only in primary process. Note this can happen several
-	 * times with the hotplug functionality.
-	 */
-	if (internal_config.process_type == RTE_PROC_PRIMARY &&
-			vfio_cfg.vfio_active_groups == 1) {
-		/* select an IOMMU type which we will be using */
-		const struct vfio_iommu_type *t =
+		/*
+		 * pick an IOMMU type and set up DMA mappings for container
+		 *
+		 * needs to be done only once, only when first group is assigned to
+		 * a container and only in primary process. Note this can happen several
+		 * times with the hotplug functionality.
+		 */
+		if (internal_config.process_type == RTE_PROC_PRIMARY &&
+				vfio_cfg.vfio_active_groups == 1) {
+			/* select an IOMMU type which we will be using */
+			const struct vfio_iommu_type *t =
 				vfio_set_iommu_type(vfio_cfg.vfio_container_fd);
-		if (!t) {
-			RTE_LOG(ERR, EAL, "  %s failed to select IOMMU type\n", dev_addr);
-			close(vfio_group_fd);
-			clear_group(vfio_group_fd);
-			return -1;
-		}
-		ret = t->dma_map_func(vfio_cfg.vfio_container_fd);
-		if (ret) {
-			RTE_LOG(ERR, EAL, "  %s DMA remapping failed, "
-					"error %i (%s)\n", dev_addr, errno, strerror(errno));
-			close(vfio_group_fd);
-			clear_group(vfio_group_fd);
-			return -1;
+			if (!t) {
+				RTE_LOG(ERR, EAL, "  %s failed to select IOMMU type\n", dev_addr);
+				close(vfio_group_fd);
+				clear_group(vfio_group_fd);
+				return -1;
+			}
+			ret = t->dma_map_func(vfio_cfg.vfio_container_fd);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "  %s DMA remapping failed, "
+						"error %i (%s)\n", dev_addr, errno, strerror(errno));
+				close(vfio_group_fd);
+				clear_group(vfio_group_fd);
+				return -1;
+			}
 		}
 	}
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map
  2017-04-20  7:23 [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on POWER8 Alexey Kardashevskiy
                   ` (3 preceding siblings ...)
  2017-04-20  7:24 ` [PATCH dpdk 4/5] vfio: Do try setting IOMMU type if already set Alexey Kardashevskiy
@ 2017-04-20  7:24 ` Alexey Kardashevskiy
  2017-04-20  9:04   ` Jonas Pfefferle1
  2017-04-22 21:11 ` [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on POWER8 Olga Shern
  5 siblings, 1 reply; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-20  7:24 UTC (permalink / raw)
  To: dev; +Cc: Alexey Kardashevskiy, JPF, Gowrishankar Muthukrishnan

VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for
just created DMA window. It happens to start from zero because the default
window is removed (leaving no windows) and new window starts from zero.
However this is not guaranteed and the new window may start from another
address, this adds an error check.

Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI
bus address while in this case a physical address of a user page is used.
This changes IOVA to start from zero in a hope that the rest of DPDK
expects this.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 46f951f4d..8b8e75c4f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
 {
 	const struct rte_memseg *ms = rte_eal_get_physmem_layout();
 	int i, ret;
-
+	phys_addr_t io_offset;
 	struct vfio_iommu_spapr_register_memory reg = {
 		.argsz = sizeof(reg),
 		.flags = 0
@@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd)
 		return -1;
 	}
 
+	io_offset = create.start_addr;
+	if (io_offset) {
+		RTE_LOG(ERR, EAL, "  DMA offsets other than zero is not supported, "
+				"new window is created at %lx\n", io_offset);
+		return -1;
+	}
+
 	/* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
 	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
 		struct vfio_iommu_type1_dma_map dma_map;
@@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		dma_map.iova = io_offset;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
 				 VFIO_DMA_MAP_FLAG_WRITE;
 
@@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
 			return -1;
 		}
 
+		io_offset += dma_map.size;
 	}
 
 	return 0;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map
  2017-04-20  7:24 ` [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map Alexey Kardashevskiy
@ 2017-04-20  9:04   ` Jonas Pfefferle1
  2017-04-20 13:25     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 28+ messages in thread
From: Jonas Pfefferle1 @ 2017-04-20  9:04 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: dev, Gowrishankar Muthukrishnan, Adrian Schuepbach


Alexey Kardashevskiy <aik@ozlabs.ru> wrote on 20/04/2017 09:24:02:

> From: Alexey Kardashevskiy <aik@ozlabs.ru>
> To: dev@dpdk.org
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>, JPF@zurich.ibm.com,
> Gowrishankar Muthukrishnan <gowrishankar.m@in.ibm.com>
> Date: 20/04/2017 09:24
> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
> addresses for DMA map
>
> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for
> just created DMA window. It happens to start from zero because the
default
> window is removed (leaving no windows) and new window starts from zero.
> However this is not guaranteed and the new window may start from another
> address, this adds an error check.
>
> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI
> bus address while in this case a physical address of a user page is used.
> This changes IOVA to start from zero in a hope that the rest of DPDK
> expects this.

This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It will use the
phys_addr of the memory segment it got from /proc/self/pagemap cf.
librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here to the
actual iova which basically makes the whole virtual to phyiscal mapping
with pagemap unnecessary which I believe should be the case for VFIO
anyway. Pagemap should only be needed when using pci_uio.

>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
> librte_eal/linuxapp/eal/eal_vfio.c
> index 46f951f4d..8b8e75c4f 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>  {
>     const struct rte_memseg *ms = rte_eal_get_physmem_layout();
>     int i, ret;
> -
> +   phys_addr_t io_offset;
>     struct vfio_iommu_spapr_register_memory reg = {
>        .argsz = sizeof(reg),
>        .flags = 0
> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd)
>        return -1;
>     }
>
> +   io_offset = create.start_addr;
> +   if (io_offset) {
> +      RTE_LOG(ERR, EAL, "  DMA offsets other than zero is not supported,
"
> +            "new window is created at %lx\n", io_offset);
> +      return -1;
> +   }
> +
>     /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
>     for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>        struct vfio_iommu_type1_dma_map dma_map;
> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>        dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>        dma_map.vaddr = ms[i].addr_64;
>        dma_map.size = ms[i].len;
> -      dma_map.iova = ms[i].phys_addr;
> +      dma_map.iova = io_offset;
>        dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
>               VFIO_DMA_MAP_FLAG_WRITE;
>
> @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>           return -1;
>        }
>
> +      io_offset += dma_map.size;
>     }
>
>     return 0;
> --
> 2.11.0
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map
  2017-04-20  9:04   ` Jonas Pfefferle1
@ 2017-04-20 13:25     ` Alexey Kardashevskiy
  2017-04-20 14:22       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-20 13:25 UTC (permalink / raw)
  To: Jonas Pfefferle1; +Cc: dev, Gowrishankar Muthukrishnan, Adrian Schuepbach

On 20/04/17 19:04, Jonas Pfefferle1 wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote on 20/04/2017 09:24:02:
> 
>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>> To: dev@dpdk.org
>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>, JPF@zurich.ibm.com,
>> Gowrishankar Muthukrishnan <gowrishankar.m@in.ibm.com>
>> Date: 20/04/2017 09:24
>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
>> addresses for DMA map
>>
>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for
>> just created DMA window. It happens to start from zero because the default
>> window is removed (leaving no windows) and new window starts from zero.
>> However this is not guaranteed and the new window may start from another
>> address, this adds an error check.
>>
>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI
>> bus address while in this case a physical address of a user page is used.
>> This changes IOVA to start from zero in a hope that the rest of DPDK
>> expects this.
> 
> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It will use the
> phys_addr of the memory segment it got from /proc/self/pagemap cf.
> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here to the
> actual iova which basically makes the whole virtual to phyiscal mapping
> with pagemap unnecessary which I believe should be the case for VFIO
> anyway. Pagemap should only be needed when using pci_uio.


Ah, ok, makes sense now. But it sure needs a big fat comment there as it is
not obvious why host RAM address is used there as DMA window start is not
guaranteed.


> 
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
>> librte_eal/linuxapp/eal/eal_vfio.c
>> index 46f951f4d..8b8e75c4f 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>  {
>>     const struct rte_memseg *ms = rte_eal_get_physmem_layout();
>>     int i, ret;
>> -
>> +   phys_addr_t io_offset;
>>     struct vfio_iommu_spapr_register_memory reg = {
>>        .argsz = sizeof(reg),
>>        .flags = 0
>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>        return -1;
>>     }
>>  
>> +   io_offset = create.start_addr;
>> +   if (io_offset) {
>> +      RTE_LOG(ERR, EAL, "  DMA offsets other than zero is not supported, "
>> +            "new window is created at %lx\n", io_offset);
>> +      return -1;
>> +   }
>> +
>>     /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
>>     for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>>        struct vfio_iommu_type1_dma_map dma_map;
>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>        dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>>        dma_map.vaddr = ms[i].addr_64;
>>        dma_map.size = ms[i].len;
>> -      dma_map.iova = ms[i].phys_addr;
>> +      dma_map.iova = io_offset;
>>        dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
>>               VFIO_DMA_MAP_FLAG_WRITE;
>>  
>> @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>           return -1;
>>        }
>>  
>> +      io_offset += dma_map.size;
>>     }
>>  
>>     return 0;
>> --
>> 2.11.0
>>
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map
  2017-04-20 13:25     ` Alexey Kardashevskiy
@ 2017-04-20 14:22       ` Alexey Kardashevskiy
  2017-04-20 15:15         ` Jonas Pfefferle1
  2017-04-20 19:16         ` gowrishankar muthukrishnan
  0 siblings, 2 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-20 14:22 UTC (permalink / raw)
  To: Jonas Pfefferle1; +Cc: dev, Gowrishankar Muthukrishnan, Adrian Schuepbach

On 20/04/17 23:25, Alexey Kardashevskiy wrote:
> On 20/04/17 19:04, Jonas Pfefferle1 wrote:
>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote on 20/04/2017 09:24:02:
>>
>>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> To: dev@dpdk.org
>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>, JPF@zurich.ibm.com,
>>> Gowrishankar Muthukrishnan <gowrishankar.m@in.ibm.com>
>>> Date: 20/04/2017 09:24
>>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
>>> addresses for DMA map
>>>
>>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for
>>> just created DMA window. It happens to start from zero because the default
>>> window is removed (leaving no windows) and new window starts from zero.
>>> However this is not guaranteed and the new window may start from another
>>> address, this adds an error check.
>>>
>>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI
>>> bus address while in this case a physical address of a user page is used.
>>> This changes IOVA to start from zero in a hope that the rest of DPDK
>>> expects this.
>>
>> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It will use the
>> phys_addr of the memory segment it got from /proc/self/pagemap cf.
>> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here to the
>> actual iova which basically makes the whole virtual to phyiscal mapping
>> with pagemap unnecessary which I believe should be the case for VFIO
>> anyway. Pagemap should only be needed when using pci_uio.
> 
> 
> Ah, ok, makes sense now. But it sure needs a big fat comment there as it is
> not obvious why host RAM address is used there as DMA window start is not
> guaranteed.

Well, either way there is some bug - ms[i].phys_addr and ms[i].addr_64 both
have exact same value, in my setup it is 3fffb33c0000 which is a userspace
address - at least ms[i].phys_addr must be physical address.


> 
> 
>>
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>  lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
>>> librte_eal/linuxapp/eal/eal_vfio.c
>>> index 46f951f4d..8b8e75c4f 100644
>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>  {
>>>     const struct rte_memseg *ms = rte_eal_get_physmem_layout();
>>>     int i, ret;
>>> -
>>> +   phys_addr_t io_offset;
>>>     struct vfio_iommu_spapr_register_memory reg = {
>>>        .argsz = sizeof(reg),
>>>        .flags = 0
>>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>        return -1;
>>>     }
>>>  
>>> +   io_offset = create.start_addr;
>>> +   if (io_offset) {
>>> +      RTE_LOG(ERR, EAL, "  DMA offsets other than zero is not supported, "
>>> +            "new window is created at %lx\n", io_offset);
>>> +      return -1;
>>> +   }
>>> +
>>>     /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
>>>     for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>>>        struct vfio_iommu_type1_dma_map dma_map;
>>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>        dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>>>        dma_map.vaddr = ms[i].addr_64;
>>>        dma_map.size = ms[i].len;
>>> -      dma_map.iova = ms[i].phys_addr;
>>> +      dma_map.iova = io_offset;
>>>        dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
>>>               VFIO_DMA_MAP_FLAG_WRITE;
>>>  
>>> @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>           return -1;
>>>        }
>>>  
>>> +      io_offset += dma_map.size;
>>>     }
>>>  
>>>     return 0;
>>> --
>>> 2.11.0
>>>
>>
> 
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map
  2017-04-20 14:22       ` Alexey Kardashevskiy
@ 2017-04-20 15:15         ` Jonas Pfefferle1
  2017-04-20 22:01           ` Alexey Kardashevskiy
  2017-04-20 19:16         ` gowrishankar muthukrishnan
  1 sibling, 1 reply; 28+ messages in thread
From: Jonas Pfefferle1 @ 2017-04-20 15:15 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: dev, Adrian Schuepbach, Gowrishankar Muthukrishnan

Alexey Kardashevskiy <aik@ozlabs.ru> wrote on 20/04/2017 16:22:01:

> From: Alexey Kardashevskiy <aik@ozlabs.ru>
> To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
> Cc: dev@dpdk.org, Gowrishankar Muthukrishnan
> <gowrishankar.m@in.ibm.com>, Adrian Schuepbach <DRI@zurich.ibm.com>
> Date: 20/04/2017 16:22
> Subject: Re: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
> addresses for DMA map
>
> On 20/04/17 23:25, Alexey Kardashevskiy wrote:
> > On 20/04/17 19:04, Jonas Pfefferle1 wrote:
> >> Alexey Kardashevskiy <aik@ozlabs.ru> wrote on 20/04/2017 09:24:02:
> >>
> >>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>> To: dev@dpdk.org
> >>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>, JPF@zurich.ibm.com,
> >>> Gowrishankar Muthukrishnan <gowrishankar.m@in.ibm.com>
> >>> Date: 20/04/2017 09:24
> >>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
> >>> addresses for DMA map
> >>>
> >>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address
for
> >>> just created DMA window. It happens to start from zero because the
default
> >>> window is removed (leaving no windows) and new window starts from
zero.
> >>> However this is not guaranteed and the new window may start from
another
> >>> address, this adds an error check.
> >>>
> >>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a
PCI
> >>> bus address while in this case a physical address of a user page is
used.
> >>> This changes IOVA to start from zero in a hope that the rest of DPDK
> >>> expects this.
> >>
> >> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It will use
the
> >> phys_addr of the memory segment it got from /proc/self/pagemap cf.
> >> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here to
the
> >> actual iova which basically makes the whole virtual to phyiscal
mapping
> >> with pagemap unnecessary which I believe should be the case for VFIO
> >> anyway. Pagemap should only be needed when using pci_uio.
> >
> >
> > Ah, ok, makes sense now. But it sure needs a big fat comment there as
it is
> > not obvious why host RAM address is used there as DMA window start is
not
> > guaranteed.
>
> Well, either way there is some bug - ms[i].phys_addr and ms[i].addr_64
both
> have exact same value, in my setup it is 3fffb33c0000 which is a
userspace
> address - at least ms[i].phys_addr must be physical address.

This might be the case if you are not using hugetlbfs i.e. passing
"--no-huge" cf. eal_memory.c:980

	/* hugetlbfs can be disabled */
	if (internal_config.no_hugetlbfs) {
		addr = mmap(NULL, internal_config.memory, PROT_READ |
PROT_WRITE,
				MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
		if (addr == MAP_FAILED) {
			RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
					strerror(errno));
			return -1;
		}
		mcfg->memseg[0].phys_addr = (phys_addr_t)(uintptr_t)addr;
		mcfg->memseg[0].addr = addr;
		mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K;
		mcfg->memseg[0].len = internal_config.memory;
		mcfg->memseg[0].socket_id = 0;
		return 0;
	}

If it fails to get the virt2phys mapping it actually assigns iovas starting
from 0 to the memory segments, cf. set_physaddrs eal_memory.c:263

>
>
> >
> >
> >>
> >>>
> >>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>> ---
> >>>  lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
> >>>  1 file changed, 10 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
> >>> librte_eal/linuxapp/eal/eal_vfio.c
> >>> index 46f951f4d..8b8e75c4f 100644
> >>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> >>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> >>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >>>  {
> >>>     const struct rte_memseg *ms = rte_eal_get_physmem_layout();
> >>>     int i, ret;
> >>> -
> >>> +   phys_addr_t io_offset;
> >>>     struct vfio_iommu_spapr_register_memory reg = {
> >>>        .argsz = sizeof(reg),
> >>>        .flags = 0
> >>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >>>        return -1;
> >>>     }
> >>>
> >>> +   io_offset = create.start_addr;
> >>> +   if (io_offset) {
> >>> +      RTE_LOG(ERR, EAL, "  DMA offsets other than zero is not
> supported, "
> >>> +            "new window is created at %lx\n", io_offset);
> >>> +      return -1;
> >>> +   }
> >>> +
> >>>     /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
> >>>     for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> >>>        struct vfio_iommu_type1_dma_map dma_map;
> >>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >>>        dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
> >>>        dma_map.vaddr = ms[i].addr_64;
> >>>        dma_map.size = ms[i].len;
> >>> -      dma_map.iova = ms[i].phys_addr;
> >>> +      dma_map.iova = io_offset;
> >>>        dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
> >>>               VFIO_DMA_MAP_FLAG_WRITE;
> >>>
> >>> @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >>>           return -1;
> >>>        }
> >>>
> >>> +      io_offset += dma_map.size;
> >>>     }
> >>>
> >>>     return 0;
> >>> --
> >>> 2.11.0
> >>>
> >>
> >
> >
>
>
> --
> Alexey
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map
  2017-04-20 14:22       ` Alexey Kardashevskiy
  2017-04-20 15:15         ` Jonas Pfefferle1
@ 2017-04-20 19:16         ` gowrishankar muthukrishnan
  2017-04-21  3:42           ` Alexey Kardashevskiy
  1 sibling, 1 reply; 28+ messages in thread
From: gowrishankar muthukrishnan @ 2017-04-20 19:16 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Jonas Pfefferle1, Gowrishankar Muthukrishnan, Adrian Schuepbach, dev

On Thursday 20 April 2017 07:52 PM, Alexey Kardashevskiy wrote:
> On 20/04/17 23:25, Alexey Kardashevskiy wrote:
>> On 20/04/17 19:04, Jonas Pfefferle1 wrote:
>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote on 20/04/2017 09:24:02:
>>>
>>>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> To: dev@dpdk.org
>>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>, JPF@zurich.ibm.com,
>>>> Gowrishankar Muthukrishnan <gowrishankar.m@in.ibm.com>
>>>> Date: 20/04/2017 09:24
>>>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
>>>> addresses for DMA map
>>>>
>>>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for
>>>> just created DMA window. It happens to start from zero because the default
>>>> window is removed (leaving no windows) and new window starts from zero.
>>>> However this is not guaranteed and the new window may start from another
>>>> address, this adds an error check.
>>>>
>>>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI
>>>> bus address while in this case a physical address of a user page is used.
>>>> This changes IOVA to start from zero in a hope that the rest of DPDK
>>>> expects this.
>>> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It will use the
>>> phys_addr of the memory segment it got from /proc/self/pagemap cf.
>>> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here to the
>>> actual iova which basically makes the whole virtual to phyiscal mapping
>>> with pagemap unnecessary which I believe should be the case for VFIO
>>> anyway. Pagemap should only be needed when using pci_uio.
>>
>> Ah, ok, makes sense now. But it sure needs a big fat comment there as it is
>> not obvious why host RAM address is used there as DMA window start is not
>> guaranteed.
> Well, either way there is some bug - ms[i].phys_addr and ms[i].addr_64 both
> have exact same value, in my setup it is 3fffb33c0000 which is a userspace
> address - at least ms[i].phys_addr must be physical address.

This patch breaks i40e_dev_init() in my server.

EAL: PCI device 0004:01:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1583 net_i40e
EAL:   using IOMMU type 7 (sPAPR)
eth_i40e_dev_init(): Failed to init adminq: -32
EAL: Releasing pci mapped resource for 0004:01:00.0
EAL: Calling pci_unmap_resource for 0004:01:00.0 at 0x3fff82aa0000
EAL: Requested device 0004:01:00.0 cannot be used
EAL: PCI device 0004:01:00.1 on NUMA socket 1
EAL:   probe driver: 8086:1583 net_i40e
EAL:   using IOMMU type 7 (sPAPR)
eth_i40e_dev_init(): Failed to init adminq: -32
EAL: Releasing pci mapped resource for 0004:01:00.1
EAL: Calling pci_unmap_resource for 0004:01:00.1 at 0x3fff82aa0000
EAL: Requested device 0004:01:00.1 cannot be used
EAL: No probed ethernet devices

I have two memseg each of 1G size. Their mapped PA and VA are also 
different.

(gdb) p /x ms[0]
$3 = {phys_addr = 0x1e0b000000, {addr = 0x3effaf000000, addr_64 = 
0x3effaf000000},
   len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x1, nchannel 
= 0x0, nrank = 0x0}
(gdb) p /x ms[1]
$4 = {phys_addr = 0xf6d000000, {addr = 0x3efbaf000000, addr_64 = 
0x3efbaf000000},
   len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x0, nchannel 
= 0x0, nrank = 0x0}

Could you please recheck this. May be, if new DMA window does not start 
from bus address 0,
only then you reset dma_map.iova for this offset ?


Thanks,
Gowrishankar

>
>>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>>   lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
>>>>   1 file changed, 10 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
>>>> librte_eal/linuxapp/eal/eal_vfio.c
>>>> index 46f951f4d..8b8e75c4f 100644
>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>   {
>>>>      const struct rte_memseg *ms = rte_eal_get_physmem_layout();
>>>>      int i, ret;
>>>> -
>>>> +   phys_addr_t io_offset;
>>>>      struct vfio_iommu_spapr_register_memory reg = {
>>>>         .argsz = sizeof(reg),
>>>>         .flags = 0
>>>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>         return -1;
>>>>      }
>>>>   
>>>> +   io_offset = create.start_addr;
>>>> +   if (io_offset) {
>>>> +      RTE_LOG(ERR, EAL, "  DMA offsets other than zero is not supported, "
>>>> +            "new window is created at %lx\n", io_offset);
>>>> +      return -1;
>>>> +   }
>>>> +
>>>>      /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
>>>>      for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>>>>         struct vfio_iommu_type1_dma_map dma_map;
>>>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>         dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>>>>         dma_map.vaddr = ms[i].addr_64;
>>>>         dma_map.size = ms[i].len;
>>>> -      dma_map.iova = ms[i].phys_addr;
>>>> +      dma_map.iova = io_offset;
>>>>         dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
>>>>                VFIO_DMA_MAP_FLAG_WRITE;
>>>>   
>>>> @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>            return -1;
>>>>         }
>>>>   
>>>> +      io_offset += dma_map.size;
>>>>      }
>>>>   
>>>>      return 0;
>>>> --
>>>> 2.11.0
>>>>
>>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 4/5] vfio: Do try setting IOMMU type if already set
  2017-04-20  7:24 ` [PATCH dpdk 4/5] vfio: Do try setting IOMMU type if already set Alexey Kardashevskiy
@ 2017-04-20 19:31   ` gowrishankar muthukrishnan
  2017-04-21  8:54   ` Andrew Rybchenko
  1 sibling, 0 replies; 28+ messages in thread
From: gowrishankar muthukrishnan @ 2017-04-20 19:31 UTC (permalink / raw)
  To: Alexey Kardashevskiy, dev; +Cc: JPF, Gowrishankar Muthukrishnan

On Thursday 20 April 2017 12:54 PM, Alexey Kardashevskiy wrote:
> The existing code correctly checks if a container is set to a group and
> does not try attaching a group to a container for a second/third/...
> device from the same IOMMU group.
>
> However it still tries to set IOMMU type to a container for every device
> in a group which produces failure and closes container and group fds.
>
> This moves vfio_set_iommu_type() and DMA mapping code under:
> if (!(group_status.flags & VFIO_GROUP_FLAGS_CONTAINER_SET))
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Thanks, verified it for regression seen on current master for mentioned 
scenario.

Regards,
Gowrishankar

> ---
>   lib/librte_eal/linuxapp/eal/eal_vfio.c | 50 +++++++++++++++++-----------------
>   1 file changed, 25 insertions(+), 25 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> index 6e2e84ca7..46f951f4d 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -298,33 +298,33 @@ vfio_setup_device(const char *sysfs_base, const char *dev_addr,
>   			clear_group(vfio_group_fd);
>   			return -1;
>   		}
> -	}
>
> -	/*
> -	 * pick an IOMMU type and set up DMA mappings for container
> -	 *
> -	 * needs to be done only once, only when first group is assigned to
> -	 * a container and only in primary process. Note this can happen several
> -	 * times with the hotplug functionality.
> -	 */
> -	if (internal_config.process_type == RTE_PROC_PRIMARY &&
> -			vfio_cfg.vfio_active_groups == 1) {
> -		/* select an IOMMU type which we will be using */
> -		const struct vfio_iommu_type *t =
> +		/*
> +		 * pick an IOMMU type and set up DMA mappings for container
> +		 *
> +		 * needs to be done only once, only when first group is assigned to
> +		 * a container and only in primary process. Note this can happen several
> +		 * times with the hotplug functionality.
> +		 */
> +		if (internal_config.process_type == RTE_PROC_PRIMARY &&
> +				vfio_cfg.vfio_active_groups == 1) {
> +			/* select an IOMMU type which we will be using */
> +			const struct vfio_iommu_type *t =
>   				vfio_set_iommu_type(vfio_cfg.vfio_container_fd);
> -		if (!t) {
> -			RTE_LOG(ERR, EAL, "  %s failed to select IOMMU type\n", dev_addr);
> -			close(vfio_group_fd);
> -			clear_group(vfio_group_fd);
> -			return -1;
> -		}
> -		ret = t->dma_map_func(vfio_cfg.vfio_container_fd);
> -		if (ret) {
> -			RTE_LOG(ERR, EAL, "  %s DMA remapping failed, "
> -					"error %i (%s)\n", dev_addr, errno, strerror(errno));
> -			close(vfio_group_fd);
> -			clear_group(vfio_group_fd);
> -			return -1;
> +			if (!t) {
> +				RTE_LOG(ERR, EAL, "  %s failed to select IOMMU type\n", dev_addr);
> +				close(vfio_group_fd);
> +				clear_group(vfio_group_fd);
> +				return -1;
> +			}
> +			ret = t->dma_map_func(vfio_cfg.vfio_container_fd);
> +			if (ret) {
> +				RTE_LOG(ERR, EAL, "  %s DMA remapping failed, "
> +						"error %i (%s)\n", dev_addr, errno, strerror(errno));
> +				close(vfio_group_fd);
> +				clear_group(vfio_group_fd);
> +				return -1;
> +			}
>   		}
>   	}
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map
  2017-04-20 15:15         ` Jonas Pfefferle1
@ 2017-04-20 22:01           ` Alexey Kardashevskiy
  0 siblings, 0 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-20 22:01 UTC (permalink / raw)
  To: Jonas Pfefferle1; +Cc: dev, Adrian Schuepbach, Gowrishankar Muthukrishnan

On 21/04/17 01:15, Jonas Pfefferle1 wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote on 20/04/2017 16:22:01:
> 
>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>> To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
>> Cc: dev@dpdk.org, Gowrishankar Muthukrishnan
>> <gowrishankar.m@in.ibm.com>, Adrian Schuepbach <DRI@zurich.ibm.com>
>> Date: 20/04/2017 16:22
>> Subject: Re: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
>> addresses for DMA map
>>
>> On 20/04/17 23:25, Alexey Kardashevskiy wrote:
>> > On 20/04/17 19:04, Jonas Pfefferle1 wrote:
>> >> Alexey Kardashevskiy <aik@ozlabs.ru> wrote on 20/04/2017 09:24:02:
>> >>
>> >>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>> >>> To: dev@dpdk.org
>> >>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>, JPF@zurich.ibm.com,
>> >>> Gowrishankar Muthukrishnan <gowrishankar.m@in.ibm.com>
>> >>> Date: 20/04/2017 09:24
>> >>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
>> >>> addresses for DMA map
>> >>>
>> >>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for
>> >>> just created DMA window. It happens to start from zero because the
> default
>> >>> window is removed (leaving no windows) and new window starts from zero.
>> >>> However this is not guaranteed and the new window may start from another
>> >>> address, this adds an error check.
>> >>>
>> >>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI
>> >>> bus address while in this case a physical address of a user page is used.
>> >>> This changes IOVA to start from zero in a hope that the rest of DPDK
>> >>> expects this.
>> >>
>> >> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It will use the
>> >> phys_addr of the memory segment it got from /proc/self/pagemap cf.
>> >> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here to the
>> >> actual iova which basically makes the whole virtual to phyiscal mapping
>> >> with pagemap unnecessary which I believe should be the case for VFIO
>> >> anyway. Pagemap should only be needed when using pci_uio.
>> >
>> >
>> > Ah, ok, makes sense now. But it sure needs a big fat comment there as it is
>> > not obvious why host RAM address is used there as DMA window start is not
>> > guaranteed.
>>
>> Well, either way there is some bug - ms[i].phys_addr and ms[i].addr_64 both
>> have exact same value, in my setup it is 3fffb33c0000 which is a userspace
>> address - at least ms[i].phys_addr must be physical address.
> 
> This might be the case if you are not using hugetlbfs i.e. passing
> "--no-huge" cf. eal_memory.c:980
> 
> /* hugetlbfs can be disabled */
> if (internal_config.no_hugetlbfs) {
> addr = mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE,
> MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
> if (addr == MAP_FAILED) {
> RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
> strerror(errno));
> return -1;
> }
> mcfg->memseg[0].phys_addr = (phys_addr_t)(uintptr_t)addr;
> mcfg->memseg[0].addr = addr;
> mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K;
> mcfg->memseg[0].len = internal_config.memory;
> mcfg->memseg[0].socket_id = 0;
> return 0;
> }
> 
> If it fails to get the virt2phys mapping it actually assigns iovas starting
> from 0 to the memory segments, cf. set_physaddrs eal_memory.c:263

Right, this is the case here.


> 
>>
>>
>> >
>> >
>> >>
>> >>>
>> >>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> >>> ---
>> >>>  lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
>> >>>  1 file changed, 10 insertions(+), 2 deletions(-)
>> >>>
>> >>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
>> >>> librte_eal/linuxapp/eal/eal_vfio.c
>> >>> index 46f951f4d..8b8e75c4f 100644
>> >>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> >>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> >>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>> >>>  {
>> >>>     const struct rte_memseg *ms = rte_eal_get_physmem_layout();
>> >>>     int i, ret;
>> >>> -
>> >>> +   phys_addr_t io_offset;
>> >>>     struct vfio_iommu_spapr_register_memory reg = {
>> >>>        .argsz = sizeof(reg),
>> >>>        .flags = 0
>> >>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd)
>> >>>        return -1;
>> >>>     }
>> >>>  
>> >>> +   io_offset = create.start_addr;
>> >>> +   if (io_offset) {
>> >>> +      RTE_LOG(ERR, EAL, "  DMA offsets other than zero is not
>> supported, "
>> >>> +            "new window is created at %lx\n", io_offset);
>> >>> +      return -1;
>> >>> +   }
>> >>> +
>> >>>     /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
>> >>>     for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>> >>>        struct vfio_iommu_type1_dma_map dma_map;
>> >>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>> >>>        dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>> >>>        dma_map.vaddr = ms[i].addr_64;
>> >>>        dma_map.size = ms[i].len;
>> >>> -      dma_map.iova = ms[i].phys_addr;
>> >>> +      dma_map.iova = io_offset;
>> >>>        dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
>> >>>               VFIO_DMA_MAP_FLAG_WRITE;
>> >>>  
>> >>> @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>> >>>           return -1;
>> >>>        }
>> >>>  
>> >>> +      io_offset += dma_map.size;
>> >>>     }
>> >>>  
>> >>>     return 0;
>> >>> --
>> >>> 2.11.0
>> >>>
>> >>
>> >
>> >
>>
>>
>> --
>> Alexey
>>
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map
  2017-04-20 19:16         ` gowrishankar muthukrishnan
@ 2017-04-21  3:42           ` Alexey Kardashevskiy
  2017-04-21  8:43             ` Alexey Kardashevskiy
                               ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-21  3:42 UTC (permalink / raw)
  To: gowrishankar muthukrishnan
  Cc: Jonas Pfefferle1, Gowrishankar Muthukrishnan, Adrian Schuepbach, dev

On 21/04/17 05:16, gowrishankar muthukrishnan wrote:
> On Thursday 20 April 2017 07:52 PM, Alexey Kardashevskiy wrote:
>> On 20/04/17 23:25, Alexey Kardashevskiy wrote:
>>> On 20/04/17 19:04, Jonas Pfefferle1 wrote:
>>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote on 20/04/2017 09:24:02:
>>>>
>>>>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>> To: dev@dpdk.org
>>>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>, JPF@zurich.ibm.com,
>>>>> Gowrishankar Muthukrishnan <gowrishankar.m@in.ibm.com>
>>>>> Date: 20/04/2017 09:24
>>>>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
>>>>> addresses for DMA map
>>>>>
>>>>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for
>>>>> just created DMA window. It happens to start from zero because the
>>>>> default
>>>>> window is removed (leaving no windows) and new window starts from zero.
>>>>> However this is not guaranteed and the new window may start from another
>>>>> address, this adds an error check.
>>>>>
>>>>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI
>>>>> bus address while in this case a physical address of a user page is used.
>>>>> This changes IOVA to start from zero in a hope that the rest of DPDK
>>>>> expects this.
>>>> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It will use the
>>>> phys_addr of the memory segment it got from /proc/self/pagemap cf.
>>>> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here to the
>>>> actual iova which basically makes the whole virtual to phyiscal mapping
>>>> with pagemap unnecessary which I believe should be the case for VFIO
>>>> anyway. Pagemap should only be needed when using pci_uio.
>>>
>>> Ah, ok, makes sense now. But it sure needs a big fat comment there as it is
>>> not obvious why host RAM address is used there as DMA window start is not
>>> guaranteed.
>> Well, either way there is some bug - ms[i].phys_addr and ms[i].addr_64 both
>> have exact same value, in my setup it is 3fffb33c0000 which is a userspace
>> address - at least ms[i].phys_addr must be physical address.
> 
> This patch breaks i40e_dev_init() in my server.
> 
> EAL: PCI device 0004:01:00.0 on NUMA socket 1
> EAL:   probe driver: 8086:1583 net_i40e
> EAL:   using IOMMU type 7 (sPAPR)
> eth_i40e_dev_init(): Failed to init adminq: -32
> EAL: Releasing pci mapped resource for 0004:01:00.0
> EAL: Calling pci_unmap_resource for 0004:01:00.0 at 0x3fff82aa0000
> EAL: Requested device 0004:01:00.0 cannot be used
> EAL: PCI device 0004:01:00.1 on NUMA socket 1
> EAL:   probe driver: 8086:1583 net_i40e
> EAL:   using IOMMU type 7 (sPAPR)
> eth_i40e_dev_init(): Failed to init adminq: -32
> EAL: Releasing pci mapped resource for 0004:01:00.1
> EAL: Calling pci_unmap_resource for 0004:01:00.1 at 0x3fff82aa0000
> EAL: Requested device 0004:01:00.1 cannot be used
> EAL: No probed ethernet devices
> 
> I have two memseg each of 1G size. Their mapped PA and VA are also different.
> 
> (gdb) p /x ms[0]
> $3 = {phys_addr = 0x1e0b000000, {addr = 0x3effaf000000, addr_64 =
> 0x3effaf000000},
>   len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x1, nchannel =
> 0x0, nrank = 0x0}
> (gdb) p /x ms[1]
> $4 = {phys_addr = 0xf6d000000, {addr = 0x3efbaf000000, addr_64 =
> 0x3efbaf000000},
>   len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x0, nchannel =
> 0x0, nrank = 0x0}
> 
> Could you please recheck this. May be, if new DMA window does not start
> from bus address 0,
> only then you reset dma_map.iova for this offset ?

As we figured out, it is --no-huge effect.

Another thing - as I read the code - the window size comes from
rte_eal_get_physmem_size(). On my 512GB machine, DPDK allocates only 16GB
window so it is far away from 1:1 mapping which is believed to be DPDK
expectation. Looking now for a better version of rte_eal_get_physmem_size()...


And another problem - after few unsuccessful starts of app/testpmd, all
huge pages are gone:

aik@stratton2:~$ cat /proc/meminfo
MemTotal:       535527296 kB
MemFree:        516662272 kB
MemAvailable:   515501696 kB
...
HugePages_Total:    1024
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:      16384 kB


How is that possible? What is pinning these pages so testpmd process exit
does not clear that up?




> 
> 
> Thanks,
> Gowrishankar
> 
>>
>>>
>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>> ---
>>>>>   lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
>>>>>   1 file changed, 10 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
>>>>> librte_eal/linuxapp/eal/eal_vfio.c
>>>>> index 46f951f4d..8b8e75c4f 100644
>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>>   {
>>>>>      const struct rte_memseg *ms = rte_eal_get_physmem_layout();
>>>>>      int i, ret;
>>>>> -
>>>>> +   phys_addr_t io_offset;
>>>>>      struct vfio_iommu_spapr_register_memory reg = {
>>>>>         .argsz = sizeof(reg),
>>>>>         .flags = 0
>>>>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>>         return -1;
>>>>>      }
>>>>>   +   io_offset = create.start_addr;
>>>>> +   if (io_offset) {
>>>>> +      RTE_LOG(ERR, EAL, "  DMA offsets other than zero is not
>>>>> supported, "
>>>>> +            "new window is created at %lx\n", io_offset);
>>>>> +      return -1;
>>>>> +   }
>>>>> +
>>>>>      /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
>>>>>      for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>>>>>         struct vfio_iommu_type1_dma_map dma_map;
>>>>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>>         dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>>>>>         dma_map.vaddr = ms[i].addr_64;
>>>>>         dma_map.size = ms[i].len;
>>>>> -      dma_map.iova = ms[i].phys_addr;
>>>>> +      dma_map.iova = io_offset;
>>>>>         dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
>>>>>                VFIO_DMA_MAP_FLAG_WRITE;
>>>>>   @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>>            return -1;
>>>>>         }
>>>>>   +      io_offset += dma_map.size;
>>>>>      }
>>>>>        return 0;
>>>>> -- 
>>>>> 2.11.0
>>>>>
>>>
>>
> 
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map
  2017-04-21  3:42           ` Alexey Kardashevskiy
@ 2017-04-21  8:43             ` Alexey Kardashevskiy
       [not found]               ` <OF6F33ED54.7950E1EF-ONC1258109.003295E3-C1258109.00333E2E@notes.na.collabserv.com>
  2017-04-21  8:51             ` gowrishankar muthukrishnan
       [not found]             ` <OF45247CC5.192F9D29-ONC1258109.002D6497-C1258109.002F2868@notes.na.collabserv.com>
  2 siblings, 1 reply; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-21  8:43 UTC (permalink / raw)
  To: gowrishankar muthukrishnan
  Cc: Jonas Pfefferle1, Gowrishankar Muthukrishnan, Adrian Schuepbach, dev

On 21/04/17 13:42, Alexey Kardashevskiy wrote:
> On 21/04/17 05:16, gowrishankar muthukrishnan wrote:
>> On Thursday 20 April 2017 07:52 PM, Alexey Kardashevskiy wrote:
>>> On 20/04/17 23:25, Alexey Kardashevskiy wrote:
>>>> On 20/04/17 19:04, Jonas Pfefferle1 wrote:
>>>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote on 20/04/2017 09:24:02:
>>>>>
>>>>>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>> To: dev@dpdk.org
>>>>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>, JPF@zurich.ibm.com,
>>>>>> Gowrishankar Muthukrishnan <gowrishankar.m@in.ibm.com>
>>>>>> Date: 20/04/2017 09:24
>>>>>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
>>>>>> addresses for DMA map
>>>>>>
>>>>>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for
>>>>>> just created DMA window. It happens to start from zero because the
>>>>>> default
>>>>>> window is removed (leaving no windows) and new window starts from zero.
>>>>>> However this is not guaranteed and the new window may start from another
>>>>>> address, this adds an error check.
>>>>>>
>>>>>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI
>>>>>> bus address while in this case a physical address of a user page is used.
>>>>>> This changes IOVA to start from zero in a hope that the rest of DPDK
>>>>>> expects this.
>>>>> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It will use the
>>>>> phys_addr of the memory segment it got from /proc/self/pagemap cf.
>>>>> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here to the
>>>>> actual iova which basically makes the whole virtual to phyiscal mapping
>>>>> with pagemap unnecessary which I believe should be the case for VFIO
>>>>> anyway. Pagemap should only be needed when using pci_uio.
>>>>
>>>> Ah, ok, makes sense now. But it sure needs a big fat comment there as it is
>>>> not obvious why host RAM address is used there as DMA window start is not
>>>> guaranteed.
>>> Well, either way there is some bug - ms[i].phys_addr and ms[i].addr_64 both
>>> have exact same value, in my setup it is 3fffb33c0000 which is a userspace
>>> address - at least ms[i].phys_addr must be physical address.
>>
>> This patch breaks i40e_dev_init() in my server.
>>
>> EAL: PCI device 0004:01:00.0 on NUMA socket 1
>> EAL:   probe driver: 8086:1583 net_i40e
>> EAL:   using IOMMU type 7 (sPAPR)
>> eth_i40e_dev_init(): Failed to init adminq: -32
>> EAL: Releasing pci mapped resource for 0004:01:00.0
>> EAL: Calling pci_unmap_resource for 0004:01:00.0 at 0x3fff82aa0000
>> EAL: Requested device 0004:01:00.0 cannot be used
>> EAL: PCI device 0004:01:00.1 on NUMA socket 1
>> EAL:   probe driver: 8086:1583 net_i40e
>> EAL:   using IOMMU type 7 (sPAPR)
>> eth_i40e_dev_init(): Failed to init adminq: -32
>> EAL: Releasing pci mapped resource for 0004:01:00.1
>> EAL: Calling pci_unmap_resource for 0004:01:00.1 at 0x3fff82aa0000
>> EAL: Requested device 0004:01:00.1 cannot be used
>> EAL: No probed ethernet devices
>>
>> I have two memseg each of 1G size. Their mapped PA and VA are also different.
>>
>> (gdb) p /x ms[0]
>> $3 = {phys_addr = 0x1e0b000000, {addr = 0x3effaf000000, addr_64 =
>> 0x3effaf000000},
>>   len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x1, nchannel =
>> 0x0, nrank = 0x0}
>> (gdb) p /x ms[1]
>> $4 = {phys_addr = 0xf6d000000, {addr = 0x3efbaf000000, addr_64 =
>> 0x3efbaf000000},
>>   len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x0, nchannel =
>> 0x0, nrank = 0x0}
>>
>> Could you please recheck this. May be, if new DMA window does not start
>> from bus address 0,
>> only then you reset dma_map.iova for this offset ?
> 
> As we figured out, it is --no-huge effect.
> 
> Another thing - as I read the code - the window size comes from
> rte_eal_get_physmem_size(). On my 512GB machine, DPDK allocates only 16GB
> window so it is far away from 1:1 mapping which is believed to be DPDK
> expectation. Looking now for a better version of rte_eal_get_physmem_size()...


I have not found any helper to get a total RAM size or
round-up-to-power-of-two - I could look through memory segments, find the
one with highest ending physical address, round it up to power of two
(requirement on POWER8 platform for a DMA window size) and use it as a DMA
window size - is there kernel's order_base_2() analog?


> 
> 
> And another problem - after few unsuccessful starts of app/testpmd, all
> huge pages are gone:
> 
> aik@stratton2:~$ cat /proc/meminfo
> MemTotal:       535527296 kB
> MemFree:        516662272 kB
> MemAvailable:   515501696 kB
> ...
> HugePages_Total:    1024
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:      16384 kB
> 
> 
> How is that possible? What is pinning these pages so testpmd process exit
> does not clear that up?

Still not clear, any ideas why might be causing this?



btw what is the correct way of running DPDK with hugepages?

I basically create a folder in ~aik/hugepages and do
sudo mount -t hugetlbfs hugetlbfs ~aik/hugepages
sudo sysctl vm.nr_hugepages=4096

This creates bunch of pages:
aik@stratton2:~$ cat /proc/meminfo | grep HugePage
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:    4096
HugePages_Free:     4096
HugePages_Rsvd:        0
HugePages_Surp:        0


And then I am watching testpmd to detect hugepages (it does see 4096 16MB
pages) to allocate pages:
rte_eal_hugepage_init() calls map_all_hugepages(... orig=1) - here all 4096
pages are allocated, then it calls map_all_hugepages(... orig=0) - and here
I get lots of "EAL: Cannot get a virtual area: Cannot allocate memory" due
to obvious reason - all pages are allocated. Since you folks have this
tested somehow - what am I doing wrong? :) This is all very confusing -
what is that orig=0/1 business is all about?




> 
> 
> 
> 
>>
>>
>> Thanks,
>> Gowrishankar
>>
>>>
>>>>
>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>> ---
>>>>>>   lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
>>>>>>   1 file changed, 10 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
>>>>>> librte_eal/linuxapp/eal/eal_vfio.c
>>>>>> index 46f951f4d..8b8e75c4f 100644
>>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>>>   {
>>>>>>      const struct rte_memseg *ms = rte_eal_get_physmem_layout();
>>>>>>      int i, ret;
>>>>>> -
>>>>>> +   phys_addr_t io_offset;
>>>>>>      struct vfio_iommu_spapr_register_memory reg = {
>>>>>>         .argsz = sizeof(reg),
>>>>>>         .flags = 0
>>>>>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>>>         return -1;
>>>>>>      }
>>>>>>   +   io_offset = create.start_addr;
>>>>>> +   if (io_offset) {
>>>>>> +      RTE_LOG(ERR, EAL, "  DMA offsets other than zero is not
>>>>>> supported, "
>>>>>> +            "new window is created at %lx\n", io_offset);
>>>>>> +      return -1;
>>>>>> +   }
>>>>>> +
>>>>>>      /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
>>>>>>      for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>>>>>>         struct vfio_iommu_type1_dma_map dma_map;
>>>>>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>>>         dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>>>>>>         dma_map.vaddr = ms[i].addr_64;
>>>>>>         dma_map.size = ms[i].len;
>>>>>> -      dma_map.iova = ms[i].phys_addr;
>>>>>> +      dma_map.iova = io_offset;
>>>>>>         dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
>>>>>>                VFIO_DMA_MAP_FLAG_WRITE;
>>>>>>   @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>>>            return -1;
>>>>>>         }
>>>>>>   +      io_offset += dma_map.size;
>>>>>>      }
>>>>>>        return 0;
>>>>>> -- 
>>>>>> 2.11.0
>>>>>>
>>>>
>>>
>>
>>
> 
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map
  2017-04-21  3:42           ` Alexey Kardashevskiy
  2017-04-21  8:43             ` Alexey Kardashevskiy
@ 2017-04-21  8:51             ` gowrishankar muthukrishnan
       [not found]             ` <OF45247CC5.192F9D29-ONC1258109.002D6497-C1258109.002F2868@notes.na.collabserv.com>
  2 siblings, 0 replies; 28+ messages in thread
From: gowrishankar muthukrishnan @ 2017-04-21  8:51 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Jonas Pfefferle1, Gowrishankar Muthukrishnan, Adrian Schuepbach, dev

On Friday 21 April 2017 09:12 AM, Alexey Kardashevskiy wrote:
> On 21/04/17 05:16, gowrishankar muthukrishnan wrote:
>> On Thursday 20 April 2017 07:52 PM, Alexey Kardashevskiy wrote:
>>> On 20/04/17 23:25, Alexey Kardashevskiy wrote:
>>>> On 20/04/17 19:04, Jonas Pfefferle1 wrote:
>>>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote on 20/04/2017 09:24:02:
>>>>>
>>>>>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>> To: dev@dpdk.org
>>>>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>, JPF@zurich.ibm.com,
>>>>>> Gowrishankar Muthukrishnan <gowrishankar.m@in.ibm.com>
>>>>>> Date: 20/04/2017 09:24
>>>>>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
>>>>>> addresses for DMA map
>>>>>>
>>>>>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for
>>>>>> just created DMA window. It happens to start from zero because the
>>>>>> default
>>>>>> window is removed (leaving no windows) and new window starts from zero.
>>>>>> However this is not guaranteed and the new window may start from another
>>>>>> address, this adds an error check.
>>>>>>
>>>>>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI
>>>>>> bus address while in this case a physical address of a user page is used.
>>>>>> This changes IOVA to start from zero in a hope that the rest of DPDK
>>>>>> expects this.
>>>>> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It will use the
>>>>> phys_addr of the memory segment it got from /proc/self/pagemap cf.
>>>>> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here to the
>>>>> actual iova which basically makes the whole virtual to phyiscal mapping
>>>>> with pagemap unnecessary which I believe should be the case for VFIO
>>>>> anyway. Pagemap should only be needed when using pci_uio.
>>>> Ah, ok, makes sense now. But it sure needs a big fat comment there as it is
>>>> not obvious why host RAM address is used there as DMA window start is not
>>>> guaranteed.
>>> Well, either way there is some bug - ms[i].phys_addr and ms[i].addr_64 both
>>> have exact same value, in my setup it is 3fffb33c0000 which is a userspace
>>> address - at least ms[i].phys_addr must be physical address.
>> This patch breaks i40e_dev_init() in my server.
>>
>> EAL: PCI device 0004:01:00.0 on NUMA socket 1
>> EAL:   probe driver: 8086:1583 net_i40e
>> EAL:   using IOMMU type 7 (sPAPR)
>> eth_i40e_dev_init(): Failed to init adminq: -32
>> EAL: Releasing pci mapped resource for 0004:01:00.0
>> EAL: Calling pci_unmap_resource for 0004:01:00.0 at 0x3fff82aa0000
>> EAL: Requested device 0004:01:00.0 cannot be used
>> EAL: PCI device 0004:01:00.1 on NUMA socket 1
>> EAL:   probe driver: 8086:1583 net_i40e
>> EAL:   using IOMMU type 7 (sPAPR)
>> eth_i40e_dev_init(): Failed to init adminq: -32
>> EAL: Releasing pci mapped resource for 0004:01:00.1
>> EAL: Calling pci_unmap_resource for 0004:01:00.1 at 0x3fff82aa0000
>> EAL: Requested device 0004:01:00.1 cannot be used
>> EAL: No probed ethernet devices
>>
>> I have two memseg each of 1G size. Their mapped PA and VA are also different.
>>
>> (gdb) p /x ms[0]
>> $3 = {phys_addr = 0x1e0b000000, {addr = 0x3effaf000000, addr_64 =
>> 0x3effaf000000},
>>    len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x1, nchannel =
>> 0x0, nrank = 0x0}
>> (gdb) p /x ms[1]
>> $4 = {phys_addr = 0xf6d000000, {addr = 0x3efbaf000000, addr_64 =
>> 0x3efbaf000000},
>>    len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x0, nchannel =
>> 0x0, nrank = 0x0}
>>
>> Could you please recheck this. May be, if new DMA window does not start
>> from bus address 0,
>> only then you reset dma_map.iova for this offset ?
> As we figured out, it is --no-huge effect.
>
> Another thing - as I read the code - the window size comes from
> rte_eal_get_physmem_size(). On my 512GB machine, DPDK allocates only 16GB
> window so it is far away from 1:1 mapping which is believed to be DPDK
> expectation. Looking now for a better version of rte_eal_get_physmem_size()...

If your mem segs are more in count (not contiguous unless reserved in 
boot time),
you could check CONFIG_RTE_MAX_NUMA_NODES and CONFIG_RTE_MAX_MEMSEG ?.

Thanks,
Gowrishankar
> And another problem - after few unsuccessful starts of app/testpmd, all
> huge pages are gone:
>
> aik@stratton2:~$ cat /proc/meminfo
> MemTotal:       535527296 kB
> MemFree:        516662272 kB
> MemAvailable:   515501696 kB
> ...
> HugePages_Total:    1024
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:      16384 kB
>
>
> How is that possible? What is pinning these pages so testpmd process exit
> does not clear that up?
>
>
>
>>
>> Thanks,
>> Gowrishankar
>>
>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>> ---
>>>>>>    lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
>>>>>>    1 file changed, 10 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
>>>>>> librte_eal/linuxapp/eal/eal_vfio.c
>>>>>> index 46f951f4d..8b8e75c4f 100644
>>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>>>    {
>>>>>>       const struct rte_memseg *ms = rte_eal_get_physmem_layout();
>>>>>>       int i, ret;
>>>>>> -
>>>>>> +   phys_addr_t io_offset;
>>>>>>       struct vfio_iommu_spapr_register_memory reg = {
>>>>>>          .argsz = sizeof(reg),
>>>>>>          .flags = 0
>>>>>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>>>          return -1;
>>>>>>       }
>>>>>>    +   io_offset = create.start_addr;
>>>>>> +   if (io_offset) {
>>>>>> +      RTE_LOG(ERR, EAL, "  DMA offsets other than zero is not
>>>>>> supported, "
>>>>>> +            "new window is created at %lx\n", io_offset);
>>>>>> +      return -1;
>>>>>> +   }
>>>>>> +
>>>>>>       /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
>>>>>>       for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>>>>>>          struct vfio_iommu_type1_dma_map dma_map;
>>>>>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>>>          dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>>>>>>          dma_map.vaddr = ms[i].addr_64;
>>>>>>          dma_map.size = ms[i].len;
>>>>>> -      dma_map.iova = ms[i].phys_addr;
>>>>>> +      dma_map.iova = io_offset;
>>>>>>          dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
>>>>>>                 VFIO_DMA_MAP_FLAG_WRITE;
>>>>>>    @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>>>             return -1;
>>>>>>          }
>>>>>>    +      io_offset += dma_map.size;
>>>>>>       }
>>>>>>         return 0;
>>>>>> -- 
>>>>>> 2.11.0
>>>>>>
>>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 4/5] vfio: Do try setting IOMMU type if already set
  2017-04-20  7:24 ` [PATCH dpdk 4/5] vfio: Do try setting IOMMU type if already set Alexey Kardashevskiy
  2017-04-20 19:31   ` gowrishankar muthukrishnan
@ 2017-04-21  8:54   ` Andrew Rybchenko
  2017-04-26  7:50     ` Alexey Kardashevskiy
  1 sibling, 1 reply; 28+ messages in thread
From: Andrew Rybchenko @ 2017-04-21  8:54 UTC (permalink / raw)
  To: Alexey Kardashevskiy, dev; +Cc: JPF, Gowrishankar Muthukrishnan

On 04/20/2017 10:24 AM, Alexey Kardashevskiy wrote:
> The existing code correctly checks if a container is set to a group and
> does not try attaching a group to a container for a second/third/...
> device from the same IOMMU group.
>
> However it still tries to set IOMMU type to a container for every device
> in a group which produces failure and closes container and group fds.
>
> This moves vfio_set_iommu_type() and DMA mapping code under:
> if (!(group_status.flags & VFIO_GROUP_FLAGS_CONTAINER_SET))
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>   lib/librte_eal/linuxapp/eal/eal_vfio.c | 50 +++++++++++++++++-----------------
>   1 file changed, 25 insertions(+), 25 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> index 6e2e84ca7..46f951f4d 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -298,33 +298,33 @@ vfio_setup_device(const char *sysfs_base, const char *dev_addr,
>   			clear_group(vfio_group_fd);
>   			return -1;
>   		}
> -	}
>   
> -	/*
> -	 * pick an IOMMU type and set up DMA mappings for container
> -	 *
> -	 * needs to be done only once, only when first group is assigned to
> -	 * a container and only in primary process. Note this can happen several
> -	 * times with the hotplug functionality.
> -	 */
> -	if (internal_config.process_type == RTE_PROC_PRIMARY &&
> -			vfio_cfg.vfio_active_groups == 1) {
> -		/* select an IOMMU type which we will be using */
> -		const struct vfio_iommu_type *t =
> +		/*
> +		 * pick an IOMMU type and set up DMA mappings for container
> +		 *
> +		 * needs to be done only once, only when first group is assigned to
> +		 * a container and only in primary process. Note this can happen several
> +		 * times with the hotplug functionality.
> +		 */
> +		if (internal_config.process_type == RTE_PROC_PRIMARY &&
> +				vfio_cfg.vfio_active_groups == 1) {
> +			/* select an IOMMU type which we will be using */
> +			const struct vfio_iommu_type *t =
>   				vfio_set_iommu_type(vfio_cfg.vfio_container_fd);
> -		if (!t) {
> -			RTE_LOG(ERR, EAL, "  %s failed to select IOMMU type\n", dev_addr);
> -			close(vfio_group_fd);
> -			clear_group(vfio_group_fd);
> -			return -1;
> -		}
> -		ret = t->dma_map_func(vfio_cfg.vfio_container_fd);
> -		if (ret) {
> -			RTE_LOG(ERR, EAL, "  %s DMA remapping failed, "
> -					"error %i (%s)\n", dev_addr, errno, strerror(errno));
> -			close(vfio_group_fd);
> -			clear_group(vfio_group_fd);
> -			return -1;
> +			if (!t) {
> +				RTE_LOG(ERR, EAL, "  %s failed to select IOMMU type\n", dev_addr);
> +				close(vfio_group_fd);
> +				clear_group(vfio_group_fd);
> +				return -1;
> +			}
> +			ret = t->dma_map_func(vfio_cfg.vfio_container_fd);
> +			if (ret) {
> +				RTE_LOG(ERR, EAL, "  %s DMA remapping failed, "
> +						"error %i (%s)\n", dev_addr, errno, strerror(errno));
> +				close(vfio_group_fd);
> +				clear_group(vfio_group_fd);
> +				return -1;
> +			}
>   		}
>   	}

It looks like a duplicate of the earlier submitted patch 
http://dpdk.org/ml/archives/dev/2017-April/063077.html

Andrew.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map
       [not found]             ` <OF45247CC5.192F9D29-ONC1258109.002D6497-C1258109.002F2868@notes.na.collabserv.com>
@ 2017-04-21  8:59               ` Alexey Kardashevskiy
  0 siblings, 0 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-21  8:59 UTC (permalink / raw)
  To: Jonas Pfefferle1
  Cc: dev, Adrian Schuepbach, Gowrishankar Muthukrishnan,
	gowrishankar muthukrishnan

On 21/04/17 18:35, Jonas Pfefferle1 wrote:
> ----------------------------------------
> Jonas Pfefferle
> Cloud Storage & Analytics
> IBM Zurich Research Laboratory
> Saeumerstrasse 4
> CH-8803 Rueschlikon, Switzerland
> +41 44 724 8539
> 
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote on 21/04/2017 05:42:35:
> 
>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>> To: gowrishankar muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
>> Cc: Jonas Pfefferle1 <JPF@zurich.ibm.com>, Gowrishankar
>> Muthukrishnan <gowrishankar.m@in.ibm.com>, Adrian Schuepbach
>> <DRI@zurich.ibm.com>, "dev@dpdk.org" <dev@dpdk.org>
>> Date: 21/04/2017 05:42
>> Subject: Re: [dpdk-dev] [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use
>> correct bus addresses for DMA map
>>
>> On 21/04/17 05:16, gowrishankar muthukrishnan wrote:
>> > On Thursday 20 April 2017 07:52 PM, Alexey Kardashevskiy wrote:
>> >> On 20/04/17 23:25, Alexey Kardashevskiy wrote:
>> >>> On 20/04/17 19:04, Jonas Pfefferle1 wrote:
>> >>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote on 20/04/2017 09:24:02:
>> >>>>
>> >>>>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>> >>>>> To: dev@dpdk.org
>> >>>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>, JPF@zurich.ibm.com,
>> >>>>> Gowrishankar Muthukrishnan <gowrishankar.m@in.ibm.com>
>> >>>>> Date: 20/04/2017 09:24
>> >>>>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
>> >>>>> addresses for DMA map
>> >>>>>
>> >>>>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for
>> >>>>> just created DMA window. It happens to start from zero because the
>> >>>>> default
>> >>>>> window is removed (leaving no windows) and new window starts from zero.
>> >>>>> However this is not guaranteed and the new window may start from
> another
>> >>>>> address, this adds an error check.
>> >>>>>
>> >>>>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI
>> >>>>> bus address while in this case a physical address of a user
>> page is used.
>> >>>>> This changes IOVA to start from zero in a hope that the rest of DPDK
>> >>>>> expects this.
>> >>>> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It
>> will use the
>> >>>> phys_addr of the memory segment it got from /proc/self/pagemap cf.
>> >>>> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here
> to the
>> >>>> actual iova which basically makes the whole virtual to phyiscal mapping
>> >>>> with pagemap unnecessary which I believe should be the case for VFIO
>> >>>> anyway. Pagemap should only be needed when using pci_uio.
>> >>>
>> >>> Ah, ok, makes sense now. But it sure needs a big fat comment
>> there as it is
>> >>> not obvious why host RAM address is used there as DMA window start is not
>> >>> guaranteed.
>> >> Well, either way there is some bug - ms[i].phys_addr and ms[i].addr_64
> both
>> >> have exact same value, in my setup it is 3fffb33c0000 which is a userspace
>> >> address - at least ms[i].phys_addr must be physical address.
>> >
>> > This patch breaks i40e_dev_init() in my server.
>> >
>> > EAL: PCI device 0004:01:00.0 on NUMA socket 1
>> > EAL:   probe driver: 8086:1583 net_i40e
>> > EAL:   using IOMMU type 7 (sPAPR)
>> > eth_i40e_dev_init(): Failed to init adminq: -32
>> > EAL: Releasing pci mapped resource for 0004:01:00.0
>> > EAL: Calling pci_unmap_resource for 0004:01:00.0 at 0x3fff82aa0000
>> > EAL: Requested device 0004:01:00.0 cannot be used
>> > EAL: PCI device 0004:01:00.1 on NUMA socket 1
>> > EAL:   probe driver: 8086:1583 net_i40e
>> > EAL:   using IOMMU type 7 (sPAPR)
>> > eth_i40e_dev_init(): Failed to init adminq: -32
>> > EAL: Releasing pci mapped resource for 0004:01:00.1
>> > EAL: Calling pci_unmap_resource for 0004:01:00.1 at 0x3fff82aa0000
>> > EAL: Requested device 0004:01:00.1 cannot be used
>> > EAL: No probed ethernet devices
>> >
>> > I have two memseg each of 1G size. Their mapped PA and VA are
> alsodifferent.
>> >
>> > (gdb) p /x ms[0]
>> > $3 = {phys_addr = 0x1e0b000000, {addr = 0x3effaf000000, addr_64 =
>> > 0x3effaf000000},
>> >   len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x1, nchannel =
>> > 0x0, nrank = 0x0}
>> > (gdb) p /x ms[1]
>> > $4 = {phys_addr = 0xf6d000000, {addr = 0x3efbaf000000, addr_64 =
>> > 0x3efbaf000000},
>> >   len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x0, nchannel =
>> > 0x0, nrank = 0x0}
>> >
>> > Could you please recheck this. May be, if new DMA window does not start
>> > from bus address 0,
>> > only then you reset dma_map.iova for this offset ?
>>
>> As we figured out, it is --no-huge effect.
>>
>> Another thing - as I read the code - the window size comes from
>> rte_eal_get_physmem_size(). On my 512GB machine, DPDK allocates only 16GB
>> window so it is far away from 1:1 mapping which is believed to be DPDK
>> expectation. Looking now for a better version of
> rte_eal_get_physmem_size()...
> 
> You can try specifying the size with -m or --socket-mem.


Oh, right. Thanks.


>>
>>
>> And another problem - after few unsuccessful starts of app/testpmd, all
>> huge pages are gone:
>>
>> aik@stratton2:~$ cat /proc/meminfo
>> MemTotal:       535527296 kB
>> MemFree:        516662272 kB
>> MemAvailable:   515501696 kB
>> ...
>> HugePages_Total:    1024
>> HugePages_Free:        0
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>> Hugepagesize:      16384 kB
>>
>>
>> How is that possible? What is pinning these pages so testpmd process exit
>> does not clear that up?
> 
> I've also seen this. I think that happens if it does not cleanly shutdown.
> I regularly clean /dev/hugepages ...


Oh, I am learning new things about hugepages as we speak :) I think not
being anonymous mapping has this effect. Anyway, this is a bug - pages stay
allocated after every run of testpmd, even if it does not crash but just
does exit() :-/

I still cannot get it working, with Intel 40G ethernet now, this is how far
I get:

USER1: create a new mbuf pool <mbuf_pool_socket_1>: n=1419456, size=2176,
socket=1
EAL: Error - exiting with code: 1
  Cause: Creation of mbuf pool for socket 1 failed: Cannot allocate memory
aik@stratton2:~$


I have put more details to another email.


> 
>>
>>
>>
>>
>> >
>> >
>> > Thanks,
>> > Gowrishankar
>> >
>> >>
>> >>>
>> >>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> >>>>> ---
>> >>>>>   lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
>> >>>>>   1 file changed, 10 insertions(+), 2 deletions(-)
>> >>>>>
>> >>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
>> >>>>> librte_eal/linuxapp/eal/eal_vfio.c
>> >>>>> index 46f951f4d..8b8e75c4f 100644
>> >>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> >>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> >>>>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>> >>>>>   {
>> >>>>>      const struct rte_memseg *ms = rte_eal_get_physmem_layout();
>> >>>>>      int i, ret;
>> >>>>> -
>> >>>>> +   phys_addr_t io_offset;
>> >>>>>      struct vfio_iommu_spapr_register_memory reg = {
>> >>>>>         .argsz = sizeof(reg),
>> >>>>>         .flags = 0
>> >>>>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd)
>> >>>>>         return -1;
>> >>>>>      }
>> >>>>>   +   io_offset = create.start_addr;
>> >>>>> +   if (io_offset) {
>> >>>>> +      RTE_LOG(ERR, EAL, "  DMA offsets other than zero is not
>> >>>>> supported, "
>> >>>>> +            "new window is created at %lx\n", io_offset);
>> >>>>> +      return -1;
>> >>>>> +   }
>> >>>>> +
>> >>>>>      /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
>> >>>>>      for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>> >>>>>         struct vfio_iommu_type1_dma_map dma_map;
>> >>>>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>> >>>>>         dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>> >>>>>         dma_map.vaddr = ms[i].addr_64;
>> >>>>>         dma_map.size = ms[i].len;
>> >>>>> -      dma_map.iova = ms[i].phys_addr;
>> >>>>> +      dma_map.iova = io_offset;
>> >>>>>         dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
>> >>>>>                VFIO_DMA_MAP_FLAG_WRITE;
>> >>>>>   @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>> >>>>>            return -1;
>> >>>>>         }
>> >>>>>   +      io_offset += dma_map.size;
>> >>>>>      }
>> >>>>>        return 0;
>> >>>>> --
>> >>>>> 2.11.0
>> >>>>>
>> >>>
>> >>
>> >
>> >
>>
>>
>> --
>> Alexey
>>
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map
       [not found]               ` <OF6F33ED54.7950E1EF-ONC1258109.003295E3-C1258109.00333E2E@notes.na.collabserv.com>
@ 2017-04-22  0:12                 ` Alexey Kardashevskiy
  2017-04-24  9:40                   ` Burakov, Anatoly
  0 siblings, 1 reply; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-22  0:12 UTC (permalink / raw)
  To: Jonas Pfefferle1
  Cc: dev, Adrian Schuepbach, Gowrishankar Muthukrishnan,
	gowrishankar muthukrishnan

On 21/04/17 19:19, Jonas Pfefferle1 wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote on 21/04/2017 10:43:53:
> 
>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>> To: gowrishankar muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
>> Cc: Jonas Pfefferle1 <JPF@zurich.ibm.com>, Gowrishankar
>> Muthukrishnan <gowrishankar.m@in.ibm.com>, Adrian Schuepbach
>> <DRI@zurich.ibm.com>, "dev@dpdk.org" <dev@dpdk.org>
>> Date: 21/04/2017 10:44
>> Subject: Re: [dpdk-dev] [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use
>> correct bus addresses for DMA map
>>
>> On 21/04/17 13:42, Alexey Kardashevskiy wrote:
>> > On 21/04/17 05:16, gowrishankar muthukrishnan wrote:
>> >> On Thursday 20 April 2017 07:52 PM, Alexey Kardashevskiy wrote:
>> >>> On 20/04/17 23:25, Alexey Kardashevskiy wrote:
>> >>>> On 20/04/17 19:04, Jonas Pfefferle1 wrote:
>> >>>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote on 20/04/2017 09:24:02:
>> >>>>>
>> >>>>>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>> >>>>>> To: dev@dpdk.org
>> >>>>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>, JPF@zurich.ibm.com,
>> >>>>>> Gowrishankar Muthukrishnan <gowrishankar.m@in.ibm.com>
>> >>>>>> Date: 20/04/2017 09:24
>> >>>>>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
>> >>>>>> addresses for DMA map
>> >>>>>>
>> >>>>>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for
>> >>>>>> just created DMA window. It happens to start from zero because the
>> >>>>>> default
>> >>>>>> window is removed (leaving no windows) and new window starts from
> zero.
>> >>>>>> However this is not guaranteed and the new window may start
>> from another
>> >>>>>> address, this adds an error check.
>> >>>>>>
>> >>>>>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be
> a PCI
>> >>>>>> bus address while in this case a physical address of a user
>> page is used.
>> >>>>>> This changes IOVA to start from zero in a hope that the rest of DPDK
>> >>>>>> expects this.
>> >>>>> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It
>> will use the
>> >>>>> phys_addr of the memory segment it got from /proc/self/pagemap cf.
>> >>>>> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it
>> here to the
>> >>>>> actual iova which basically makes the whole virtual to phyiscal mapping
>> >>>>> with pagemap unnecessary which I believe should be the case for VFIO
>> >>>>> anyway. Pagemap should only be needed when using pci_uio.
>> >>>>
>> >>>> Ah, ok, makes sense now. But it sure needs a big fat comment
>> there as it is
>> >>>> not obvious why host RAM address is used there as DMA window start
> is not
>> >>>> guaranteed.
>> >>> Well, either way there is some bug - ms[i].phys_addr and ms
>> [i].addr_64 both
>> >>> have exact same value, in my setup it is 3fffb33c0000 which is auserspace
>> >>> address - at least ms[i].phys_addr must be physical address.
>> >>
>> >> This patch breaks i40e_dev_init() in my server.
>> >>
>> >> EAL: PCI device 0004:01:00.0 on NUMA socket 1
>> >> EAL:   probe driver: 8086:1583 net_i40e
>> >> EAL:   using IOMMU type 7 (sPAPR)
>> >> eth_i40e_dev_init(): Failed to init adminq: -32
>> >> EAL: Releasing pci mapped resource for 0004:01:00.0
>> >> EAL: Calling pci_unmap_resource for 0004:01:00.0 at 0x3fff82aa0000
>> >> EAL: Requested device 0004:01:00.0 cannot be used
>> >> EAL: PCI device 0004:01:00.1 on NUMA socket 1
>> >> EAL:   probe driver: 8086:1583 net_i40e
>> >> EAL:   using IOMMU type 7 (sPAPR)
>> >> eth_i40e_dev_init(): Failed to init adminq: -32
>> >> EAL: Releasing pci mapped resource for 0004:01:00.1
>> >> EAL: Calling pci_unmap_resource for 0004:01:00.1 at 0x3fff82aa0000
>> >> EAL: Requested device 0004:01:00.1 cannot be used
>> >> EAL: No probed ethernet devices
>> >>
>> >> I have two memseg each of 1G size. Their mapped PA and VA are
>> also different.
>> >>
>> >> (gdb) p /x ms[0]
>> >> $3 = {phys_addr = 0x1e0b000000, {addr = 0x3effaf000000, addr_64 =
>> >> 0x3effaf000000},
>> >>   len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x1, nchannel =
>> >> 0x0, nrank = 0x0}
>> >> (gdb) p /x ms[1]
>> >> $4 = {phys_addr = 0xf6d000000, {addr = 0x3efbaf000000, addr_64 =
>> >> 0x3efbaf000000},
>> >>   len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x0, nchannel =
>> >> 0x0, nrank = 0x0}
>> >>
>> >> Could you please recheck this. May be, if new DMA window does not start
>> >> from bus address 0,
>> >> only then you reset dma_map.iova for this offset ?
>> >
>> > As we figured out, it is --no-huge effect.
>> >
>> > Another thing - as I read the code - the window size comes from
>> > rte_eal_get_physmem_size(). On my 512GB machine, DPDK allocates only 16GB
>> > window so it is far away from 1:1 mapping which is believed to be DPDK
>> > expectation. Looking now for a better version of
>> rte_eal_get_physmem_size()...
>>
>>
>> I have not found any helper to get a total RAM size or
>> round-up-to-power-of-two - I could look through memory segments, find the
>> one with highest ending physical address, round it up to power of two
>> (requirement on POWER8 platform for a DMA window size) and use it as a DMA
>> window size - is there kernel's order_base_2() analog?
> 
> 
> I guess you have to iterate over the memory segments and create multiple
> windows covering each of them if you want to do a 1:1 mapping.


As for today, POWER8 systems can only do 2 windows. And one window is
enough actually, it can be as big as entire RAM and still have only the
mapping DPDK needs. The problem is to know this RAM size which is easy, I
just did not want to create yet another bicycle here, with reading
/proc/meminfo, etc.


> 
>>
>>
>> >
>> >
>> > And another problem - after few unsuccessful starts of app/testpmd, all
>> > huge pages are gone:
>> >
>> > aik@stratton2:~$ cat /proc/meminfo
>> > MemTotal:       535527296 kB
>> > MemFree:        516662272 kB
>> > MemAvailable:   515501696 kB
>> > ...
>> > HugePages_Total:    1024
>> > HugePages_Free:        0
>> > HugePages_Rsvd:        0
>> > HugePages_Surp:        0
>> > Hugepagesize:      16384 kB
>> >
>> >
>> > How is that possible? What is pinning these pages so testpmd process exit
>> > does not clear that up?
>>
>> Still not clear, any ideas why might be causing this?
>>
>>
>>
>> btw what is the correct way of running DPDK with hugepages?
>>
>> I basically create a folder in ~aik/hugepages and do
>> sudo mount -t hugetlbfs hugetlbfs ~aik/hugepages
>> sudo sysctl vm.nr_hugepages=4096
>>
>> This creates bunch of pages:
>> aik@stratton2:~$ cat /proc/meminfo | grep HugePage
>> AnonHugePages:         0 kB
>> ShmemHugePages:        0 kB
>> HugePages_Total:    4096
>> HugePages_Free:     4096
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>>
>>
>> And then I am watching testpmd to detect hugepages (it does see 4096 16MB
>> pages) to allocate pages:
>> rte_eal_hugepage_init() calls map_all_hugepages(... orig=1) - here all 4096
>> pages are allocated, then it calls map_all_hugepages(... orig=0) - and here
>> I get lots of "EAL: Cannot get a virtual area: Cannot allocate memory" due
>> to obvious reason - all pages are allocated. Since you folks have this
>> tested somehow - what am I doing wrong? :) This is all very confusing -
>> what is that orig=0/1 business is all about?
>>
> 
> DPDK tries to allocate all hugepages that are available to find the
> smallest amount of physically contiguous memory segments to cover the
> specified memory size. It then releases all those hugepages that it did not
> need not sure how this is related to orig=1/0 though.


No, it never does release a single page :-/

> You can specify your
> hugepage mount with --huge-dir maybe this helps.

No, makes no difference.

How do you run DPDK (full command line) on POWER8 to make any use of it?


> 
>>
>>
>>
>> >
>> >
>> >
>> >
>> >>
>> >>
>> >> Thanks,
>> >> Gowrishankar
>> >>
>> >>>
>> >>>>
>> >>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> >>>>>> ---
>> >>>>>>   lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
>> >>>>>>   1 file changed, 10 insertions(+), 2 deletions(-)
>> >>>>>>
>> >>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
>> >>>>>> librte_eal/linuxapp/eal/eal_vfio.c
>> >>>>>> index 46f951f4d..8b8e75c4f 100644
>> >>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> >>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> >>>>>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>> >>>>>>   {
>> >>>>>>      const struct rte_memseg *ms = rte_eal_get_physmem_layout();
>> >>>>>>      int i, ret;
>> >>>>>> -
>> >>>>>> +   phys_addr_t io_offset;
>> >>>>>>      struct vfio_iommu_spapr_register_memory reg = {
>> >>>>>>         .argsz = sizeof(reg),
>> >>>>>>         .flags = 0
>> >>>>>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd)
>> >>>>>>         return -1;
>> >>>>>>      }
>> >>>>>>   +   io_offset = create.start_addr;
>> >>>>>> +   if (io_offset) {
>> >>>>>> +      RTE_LOG(ERR, EAL, "  DMA offsets other than zero is not
>> >>>>>> supported, "
>> >>>>>> +            "new window is created at %lx\n", io_offset);
>> >>>>>> +      return -1;
>> >>>>>> +   }
>> >>>>>> +
>> >>>>>>      /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
>> >>>>>>      for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>> >>>>>>         struct vfio_iommu_type1_dma_map dma_map;
>> >>>>>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>> >>>>>>         dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>> >>>>>>         dma_map.vaddr = ms[i].addr_64;
>> >>>>>>         dma_map.size = ms[i].len;
>> >>>>>> -      dma_map.iova = ms[i].phys_addr;
>> >>>>>> +      dma_map.iova = io_offset;
>> >>>>>>         dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
>> >>>>>>                VFIO_DMA_MAP_FLAG_WRITE;
>> >>>>>>   @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>> >>>>>>            return -1;
>> >>>>>>         }
>> >>>>>>   +      io_offset += dma_map.size;
>> >>>>>>      }
>> >>>>>>        return 0;
>> >>>>>> --
>> >>>>>> 2.11.0
>> >>>>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >
>> >
>>
>>
>> --
>> Alexey
>>
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on POWER8
  2017-04-20  7:23 [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on POWER8 Alexey Kardashevskiy
                   ` (4 preceding siblings ...)
  2017-04-20  7:24 ` [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map Alexey Kardashevskiy
@ 2017-04-22 21:11 ` Olga Shern
  2017-04-23 13:35   ` Alexey Kardashevskiy
  5 siblings, 1 reply; 28+ messages in thread
From: Olga Shern @ 2017-04-22 21:11 UTC (permalink / raw)
  To: Alexey Kardashevskiy, dev; +Cc: JPF, Gowrishankar Muthukrishnan

Alexey, 

Mellanox support DPDK only on Ethernet, no IB.
And yes, you need to install Mellanox drivers , OFED, to support it.

What NIC do you have? 

Best Regards,
Olga

________________________________________________________________
Olga Shern 
SW Director DPDK 
Mellanox Technologies, Raanana Israel




> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Alexey
> Kardashevskiy
> Sent: Thursday, April 20, 2017 10:24 AM
> To: dev@dpdk.org
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>; JPF@zurich.ibm.com;
> Gowrishankar Muthukrishnan <gowrishankar.m@in.ibm.com>
> Subject: [dpdk-dev] [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on
> POWER8
> 
> Hi!
> 
> This is my first attempt to use DPDK on POWER8 machine and yet
> unsuccessful as it turned out DPDK only supports IB-Mellanox (I only got
> ethernet-Mellanox, and requires OFED), rmmod on Intel 40Gb module
> produces PCI errors (unrelated to DPDK) and Broadcom bnx2x has few issues
> (below) and still crashes as I suspect I got DMA mapping wrong, here is a
> backtrace:
> 
> Configuring Port 0 (socket 0)
> PMD: bnx2x_issue_dmae_with_comp(): DMAE timeout!
> PANIC in bnx2x_write_dmae():
> DMAE failed (-1)22: [/lib/powerpc64le-linux-
> gnu/libc.so.6(__libc_start_main+0xb8) [0x3fffb7c23298]]
> 21: [/lib/powerpc64le-linux-gnu/libc.so.6(+0x2309c) [0x3fffb7c2309c]]
> 20: [/home/aik/pbuild/dpdk_build/app/testpmd(main+0x228) [0x100255d0]]
> 19: [/home/aik/pbuild/dpdk_build/app/testpmd(start_port+0x5dc)
> [0x1002341c]]
> 18: [/home/aik/pbuild/dpdk_build/app/testpmd(rte_eth_dev_start+0xc4)
> [0x1008b3c0]]
> 17: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x10117550]]
> 16: [/home/aik/pbuild/dpdk_build/app/testpmd(bnx2x_init+0x204)
> [0x100f7210]]
> 15: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100f6888]]
> 14: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100ee7f4]]
> 13:
> [/home/aik/pbuild/dpdk_build/app/testpmd(ecore_func_state_change+0x
> 250) [0x10127794]]
> 12: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x1012734c]]
> 11: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x10126830]]
> 10: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x10126618]]
> 9: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x10100a98]]
> 8: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100ffe00]]
> 7: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100de614]]
> 6: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100de4cc]]
> 5: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x101063c0]]
> 4: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100e1f6c]]
> 3: [/home/aik/pbuild/dpdk_build/app/testpmd(bnx2x_write_dmae+0x11c)
> [0x100e1e40]]
> 2: [/home/aik/pbuild/dpdk_build/app/testpmd(__rte_panic+0x8c)
> [0x100b3e58]]
> 1: [/home/aik/pbuild/dpdk_build/app/testpmd(rte_dump_stack+0x40)
> [0x100b3cc4]]
> 
> Thread 1 "testpmd" received signal SIGABRT, Aborted.
> 0x00003fffb7c3edb0 in __GI_raise (sig=<optimised out>) at
> ../sysdeps/unix/sysv/linux/raise.c:54
> 54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> 
> Still, some fixes are quite obvious and straigtforward.
> 
> This is based on sha1
> 2fc8e0bf0 Olivier Matz "log: fix dump of registered logs when disabled".
> 
> Please comment. Thanks.
> 
> 
> 
> Alexey Kardashevskiy (5):
>   vfio/ppc64/spapr: Use correct structures for add/remove windows
>   pci: Initialize common rte driver pointer
>   RFC: bnx2x: Update firmware versions
>   vfio: Do try setting IOMMU type if already set
>   RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map
> 
>  lib/librte_eal/linuxapp/eal/eal_vfio.h |  8 +++++
>  drivers/net/bnx2x/bnx2x.c              |  4 +--
>  lib/librte_eal/common/eal_common_pci.c |  1 +
> lib/librte_eal/linuxapp/eal/eal_vfio.c | 62 +++++++++++++++++++------------
> ---
>  4 files changed, 46 insertions(+), 29 deletions(-)
> 
> --
> 2.11.0

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on POWER8
  2017-04-22 21:11 ` [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on POWER8 Olga Shern
@ 2017-04-23 13:35   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-23 13:35 UTC (permalink / raw)
  To: Olga Shern, dev; +Cc: JPF, Gowrishankar Muthukrishnan

On 23/04/17 07:11, Olga Shern wrote:
> Alexey, 
> 
> Mellanox support DPDK only on Ethernet, no IB.
> And yes, you need to install Mellanox drivers , OFED, to support it.


ibv_get_device_list(&i) returns no devices, the name suggests "ib" ==
infiniband but it may be just naming issue :)


> What NIC do you have? 

Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
Mellanox Technologies MT27700 Family [ConnectX-4]



> 
> Best Regards,
> Olga
> 
> ________________________________________________________________
> Olga Shern 
> SW Director DPDK 
> Mellanox Technologies, Raanana Israel
> 
> 
> 
> 
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Alexey
>> Kardashevskiy
>> Sent: Thursday, April 20, 2017 10:24 AM
>> To: dev@dpdk.org
>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>; JPF@zurich.ibm.com;
>> Gowrishankar Muthukrishnan <gowrishankar.m@in.ibm.com>
>> Subject: [dpdk-dev] [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on
>> POWER8
>>
>> Hi!
>>
>> This is my first attempt to use DPDK on POWER8 machine and yet
>> unsuccessful as it turned out DPDK only supports IB-Mellanox (I only got
>> ethernet-Mellanox, and requires OFED), rmmod on Intel 40Gb module
>> produces PCI errors (unrelated to DPDK) and Broadcom bnx2x has few issues
>> (below) and still crashes as I suspect I got DMA mapping wrong, here is a
>> backtrace:
>>
>> Configuring Port 0 (socket 0)
>> PMD: bnx2x_issue_dmae_with_comp(): DMAE timeout!
>> PANIC in bnx2x_write_dmae():
>> DMAE failed (-1)22: [/lib/powerpc64le-linux-
>> gnu/libc.so.6(__libc_start_main+0xb8) [0x3fffb7c23298]]
>> 21: [/lib/powerpc64le-linux-gnu/libc.so.6(+0x2309c) [0x3fffb7c2309c]]
>> 20: [/home/aik/pbuild/dpdk_build/app/testpmd(main+0x228) [0x100255d0]]
>> 19: [/home/aik/pbuild/dpdk_build/app/testpmd(start_port+0x5dc)
>> [0x1002341c]]
>> 18: [/home/aik/pbuild/dpdk_build/app/testpmd(rte_eth_dev_start+0xc4)
>> [0x1008b3c0]]
>> 17: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x10117550]]
>> 16: [/home/aik/pbuild/dpdk_build/app/testpmd(bnx2x_init+0x204)
>> [0x100f7210]]
>> 15: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100f6888]]
>> 14: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100ee7f4]]
>> 13:
>> [/home/aik/pbuild/dpdk_build/app/testpmd(ecore_func_state_change+0x
>> 250) [0x10127794]]
>> 12: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x1012734c]]
>> 11: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x10126830]]
>> 10: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x10126618]]
>> 9: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x10100a98]]
>> 8: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100ffe00]]
>> 7: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100de614]]
>> 6: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100de4cc]]
>> 5: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x101063c0]]
>> 4: [/home/aik/pbuild/dpdk_build/app/testpmd() [0x100e1f6c]]
>> 3: [/home/aik/pbuild/dpdk_build/app/testpmd(bnx2x_write_dmae+0x11c)
>> [0x100e1e40]]
>> 2: [/home/aik/pbuild/dpdk_build/app/testpmd(__rte_panic+0x8c)
>> [0x100b3e58]]
>> 1: [/home/aik/pbuild/dpdk_build/app/testpmd(rte_dump_stack+0x40)
>> [0x100b3cc4]]
>>
>> Thread 1 "testpmd" received signal SIGABRT, Aborted.
>> 0x00003fffb7c3edb0 in __GI_raise (sig=<optimised out>) at
>> ../sysdeps/unix/sysv/linux/raise.c:54
>> 54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>>
>> Still, some fixes are quite obvious and straigtforward.
>>
>> This is based on sha1
>> 2fc8e0bf0 Olivier Matz "log: fix dump of registered logs when disabled".
>>
>> Please comment. Thanks.
>>
>>
>>
>> Alexey Kardashevskiy (5):
>>   vfio/ppc64/spapr: Use correct structures for add/remove windows
>>   pci: Initialize common rte driver pointer
>>   RFC: bnx2x: Update firmware versions
>>   vfio: Do try setting IOMMU type if already set
>>   RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map
>>
>>  lib/librte_eal/linuxapp/eal/eal_vfio.h |  8 +++++
>>  drivers/net/bnx2x/bnx2x.c              |  4 +--
>>  lib/librte_eal/common/eal_common_pci.c |  1 +
>> lib/librte_eal/linuxapp/eal/eal_vfio.c | 62 +++++++++++++++++++------------
>> ---
>>  4 files changed, 46 insertions(+), 29 deletions(-)
>>
>> --
>> 2.11.0
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 2/5] pci: Initialize common rte driver pointer
  2017-04-20  7:23 ` [PATCH dpdk 2/5] pci: Initialize common rte driver pointer Alexey Kardashevskiy
@ 2017-04-24  9:28   ` Burakov, Anatoly
  0 siblings, 0 replies; 28+ messages in thread
From: Burakov, Anatoly @ 2017-04-24  9:28 UTC (permalink / raw)
  To: Alexey Kardashevskiy, dev; +Cc: JPF, Gowrishankar Muthukrishnan

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Alexey
> Kardashevskiy
> Sent: Thursday, April 20, 2017 8:24 AM
> To: dev@dpdk.org
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>; JPF@zurich.ibm.com;
> Gowrishankar Muthukrishnan <gowrishankar.m@in.ibm.com>
> Subject: [dpdk-dev] [PATCH dpdk 2/5] pci: Initialize common rte driver
> pointer
> 
> The existing code initializes a PCI driver pointer but not the common one.
> As the result, ring_dma_zone_reserve() in drivers/net/bnx2x/bnx2x_rxtx.c
> crashed as dev->device->driver==NULL.
> 
> This adds missing initialization.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

This and 3rd patch - could you please split them so that there are no unrelated patches in the patchset? I.e. a patchset for VFIO changes, a separate patch for PCI, and a patch for bnx2x driver, and then just make a note in the annotation that they are interdependent.

Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map
  2017-04-22  0:12                 ` Alexey Kardashevskiy
@ 2017-04-24  9:40                   ` Burakov, Anatoly
  0 siblings, 0 replies; 28+ messages in thread
From: Burakov, Anatoly @ 2017-04-24  9:40 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Jonas Pfefferle1
  Cc: dev, Adrian Schuepbach, Gowrishankar Muthukrishnan,
	gowrishankar muthukrishnan

Hi Alexey,

> > DPDK tries to allocate all hugepages that are available to find the
> > smallest amount of physically contiguous memory segments to cover the
> > specified memory size. It then releases all those hugepages that it
> > did not need not sure how this is related to orig=1/0 though.
> 
> 
> No, it never does release a single page :-/

That is weird.

As far as I can remember, when EAL initializes the pages, it checks if there are any active locks on hugepage files for a given prefix (which presumably you didn't set, so it uses a default "rte" prefix), and if there aren't, it removes the hugepage files. That way, if the pages are still in use (e.g. by a secondary process), they aren't removed, but if they aren't used, then they are freed, and reserved back.

That is, technically, DPDK never "frees" any pages (unless you don't supply -m/--socket-mem switch, in which case it does free unused pages, but still leaves used pages after exit), so after a DPDK process exit they're not cleaned up. However, whenever a primary DPDK process runs again, it is usually able to clean them up and thus should be able to initialize again. Perhaps something is preventing file removal from your hugetlbfs? Like, maybe a permissions issue or something?

Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 4/5] vfio: Do try setting IOMMU type if already set
  2017-04-21  8:54   ` Andrew Rybchenko
@ 2017-04-26  7:50     ` Alexey Kardashevskiy
  2017-04-26  8:27       ` Burakov, Anatoly
  0 siblings, 1 reply; 28+ messages in thread
From: Alexey Kardashevskiy @ 2017-04-26  7:50 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: JPF, Gowrishankar Muthukrishnan

On 21/04/17 18:54, Andrew Rybchenko wrote:
> On 04/20/2017 10:24 AM, Alexey Kardashevskiy wrote:
>> The existing code correctly checks if a container is set to a group and
>> does not try attaching a group to a container for a second/third/...
>> device from the same IOMMU group.
>>
>> However it still tries to set IOMMU type to a container for every device
>> in a group which produces failure and closes container and group fds.
>>
>> This moves vfio_set_iommu_type() and DMA mapping code under:
>> if (!(group_status.flags & VFIO_GROUP_FLAGS_CONTAINER_SET))
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>   lib/librte_eal/linuxapp/eal/eal_vfio.c | 50
>> +++++++++++++++++-----------------
>>   1 file changed, 25 insertions(+), 25 deletions(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> index 6e2e84ca7..46f951f4d 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> @@ -298,33 +298,33 @@ vfio_setup_device(const char *sysfs_base, const
>> char *dev_addr,
>>               clear_group(vfio_group_fd);
>>               return -1;
>>           }
>> -    }
>>   -    /*
>> -     * pick an IOMMU type and set up DMA mappings for container
>> -     *
>> -     * needs to be done only once, only when first group is assigned to
>> -     * a container and only in primary process. Note this can happen
>> several
>> -     * times with the hotplug functionality.
>> -     */
>> -    if (internal_config.process_type == RTE_PROC_PRIMARY &&
>> -            vfio_cfg.vfio_active_groups == 1) {
>> -        /* select an IOMMU type which we will be using */
>> -        const struct vfio_iommu_type *t =
>> +        /*
>> +         * pick an IOMMU type and set up DMA mappings for container
>> +         *
>> +         * needs to be done only once, only when first group is assigned to
>> +         * a container and only in primary process. Note this can happen
>> several
>> +         * times with the hotplug functionality.
>> +         */
>> +        if (internal_config.process_type == RTE_PROC_PRIMARY &&
>> +                vfio_cfg.vfio_active_groups == 1) {
>> +            /* select an IOMMU type which we will be using */
>> +            const struct vfio_iommu_type *t =
>>                   vfio_set_iommu_type(vfio_cfg.vfio_container_fd);
>> -        if (!t) {
>> -            RTE_LOG(ERR, EAL, "  %s failed to select IOMMU type\n",
>> dev_addr);
>> -            close(vfio_group_fd);
>> -            clear_group(vfio_group_fd);
>> -            return -1;
>> -        }
>> -        ret = t->dma_map_func(vfio_cfg.vfio_container_fd);
>> -        if (ret) {
>> -            RTE_LOG(ERR, EAL, "  %s DMA remapping failed, "
>> -                    "error %i (%s)\n", dev_addr, errno, strerror(errno));
>> -            close(vfio_group_fd);
>> -            clear_group(vfio_group_fd);
>> -            return -1;
>> +            if (!t) {
>> +                RTE_LOG(ERR, EAL, "  %s failed to select IOMMU type\n",
>> dev_addr);
>> +                close(vfio_group_fd);
>> +                clear_group(vfio_group_fd);
>> +                return -1;
>> +            }
>> +            ret = t->dma_map_func(vfio_cfg.vfio_container_fd);
>> +            if (ret) {
>> +                RTE_LOG(ERR, EAL, "  %s DMA remapping failed, "
>> +                        "error %i (%s)\n", dev_addr, errno,
>> strerror(errno));
>> +                close(vfio_group_fd);
>> +                clear_group(vfio_group_fd);
>> +                return -1;
>> +            }
>>           }
>>       }
> 
> It looks like a duplicate of the earlier submitted patch
> http://dpdk.org/ml/archives/dev/2017-April/063077.html


It is a duplicate indeed, what needs to be done to have it accepted in the
mainline tree?



-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 4/5] vfio: Do try setting IOMMU type if already set
  2017-04-26  7:50     ` Alexey Kardashevskiy
@ 2017-04-26  8:27       ` Burakov, Anatoly
  2017-04-26  8:45         ` Alejandro Lucero
  0 siblings, 1 reply; 28+ messages in thread
From: Burakov, Anatoly @ 2017-04-26  8:27 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Andrew Rybchenko, dev
  Cc: JPF, Gowrishankar Muthukrishnan, Alejandro Lucero

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Alexey
> Kardashevskiy
> Sent: Wednesday, April 26, 2017 8:51 AM
> To: Andrew Rybchenko <arybchenko@solarflare.com>; dev@dpdk.org
> Cc: JPF@zurich.ibm.com; Gowrishankar Muthukrishnan
> <gowrishankar.m@in.ibm.com>
> Subject: Re: [dpdk-dev] [PATCH dpdk 4/5] vfio: Do try setting IOMMU type if
> already set
> 
> On 21/04/17 18:54, Andrew Rybchenko wrote:
> > On 04/20/2017 10:24 AM, Alexey Kardashevskiy wrote:
> >> The existing code correctly checks if a container is set to a group
> >> and does not try attaching a group to a container for a second/third/...
> >> device from the same IOMMU group.
> >>
> >> However it still tries to set IOMMU type to a container for every
> >> device in a group which produces failure and closes container and group
> fds.
> >>
> >> This moves vfio_set_iommu_type() and DMA mapping code under:
> >> if (!(group_status.flags & VFIO_GROUP_FLAGS_CONTAINER_SET))
> >>
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> ---
> >>   lib/librte_eal/linuxapp/eal/eal_vfio.c | 50
> >> +++++++++++++++++-----------------
> >>   1 file changed, 25 insertions(+), 25 deletions(-)
> >>
> >> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> >> b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> >> index 6e2e84ca7..46f951f4d 100644
> >> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> >> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> >> @@ -298,33 +298,33 @@ vfio_setup_device(const char *sysfs_base,
> const
> >> char *dev_addr,
> >>               clear_group(vfio_group_fd);
> >>               return -1;
> >>           }
> >> -    }
> >>   -    /*
> >> -     * pick an IOMMU type and set up DMA mappings for container
> >> -     *
> >> -     * needs to be done only once, only when first group is assigned to
> >> -     * a container and only in primary process. Note this can happen
> >> several
> >> -     * times with the hotplug functionality.
> >> -     */
> >> -    if (internal_config.process_type == RTE_PROC_PRIMARY &&
> >> -            vfio_cfg.vfio_active_groups == 1) {
> >> -        /* select an IOMMU type which we will be using */
> >> -        const struct vfio_iommu_type *t =
> >> +        /*
> >> +         * pick an IOMMU type and set up DMA mappings for container
> >> +         *
> >> +         * needs to be done only once, only when first group is assigned to
> >> +         * a container and only in primary process. Note this can
> >> + happen
> >> several
> >> +         * times with the hotplug functionality.
> >> +         */
> >> +        if (internal_config.process_type == RTE_PROC_PRIMARY &&
> >> +                vfio_cfg.vfio_active_groups == 1) {
> >> +            /* select an IOMMU type which we will be using */
> >> +            const struct vfio_iommu_type *t =
> >>                   vfio_set_iommu_type(vfio_cfg.vfio_container_fd);
> >> -        if (!t) {
> >> -            RTE_LOG(ERR, EAL, "  %s failed to select IOMMU type\n",
> >> dev_addr);
> >> -            close(vfio_group_fd);
> >> -            clear_group(vfio_group_fd);
> >> -            return -1;
> >> -        }
> >> -        ret = t->dma_map_func(vfio_cfg.vfio_container_fd);
> >> -        if (ret) {
> >> -            RTE_LOG(ERR, EAL, "  %s DMA remapping failed, "
> >> -                    "error %i (%s)\n", dev_addr, errno, strerror(errno));
> >> -            close(vfio_group_fd);
> >> -            clear_group(vfio_group_fd);
> >> -            return -1;
> >> +            if (!t) {
> >> +                RTE_LOG(ERR, EAL, "  %s failed to select IOMMU
> >> + type\n",
> >> dev_addr);
> >> +                close(vfio_group_fd);
> >> +                clear_group(vfio_group_fd);
> >> +                return -1;
> >> +            }
> >> +            ret = t->dma_map_func(vfio_cfg.vfio_container_fd);
> >> +            if (ret) {
> >> +                RTE_LOG(ERR, EAL, "  %s DMA remapping failed, "
> >> +                        "error %i (%s)\n", dev_addr, errno,
> >> strerror(errno));
> >> +                close(vfio_group_fd);
> >> +                clear_group(vfio_group_fd);
> >> +                return -1;
> >> +            }
> >>           }
> >>       }
> >
> > It looks like a duplicate of the earlier submitted patch
> > http://dpdk.org/ml/archives/dev/2017-April/063077.html
> 
> 
> It is a duplicate indeed, what needs to be done to have it accepted in the
> mainline tree?

I think Alejandro Lucero (CC'd) mentioned that he has another patch for similar issue that he's working on (for ports being
assigned to the same IOMMU group). That patch would presumably fix this issue as well, so I'm inclined to
wait for that one. If that's not happening, then we can accept either this one or the earlier ones, and fix the
remaining issues (I have a setup where I can attempt to reproduce the issue, so that should be quick enough).

Thanks,
Anatoly 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 4/5] vfio: Do try setting IOMMU type if already set
  2017-04-26  8:27       ` Burakov, Anatoly
@ 2017-04-26  8:45         ` Alejandro Lucero
  2017-04-26  8:58           ` Burakov, Anatoly
  0 siblings, 1 reply; 28+ messages in thread
From: Alejandro Lucero @ 2017-04-26  8:45 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: Alexey Kardashevskiy, Andrew Rybchenko, dev, JPF,
	Gowrishankar Muthukrishnan

On Wed, Apr 26, 2017 at 9:27 AM, Burakov, Anatoly <anatoly.burakov@intel.com
> wrote:

> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Alexey
> > Kardashevskiy
> > Sent: Wednesday, April 26, 2017 8:51 AM
> > To: Andrew Rybchenko <arybchenko@solarflare.com>; dev@dpdk.org
> > Cc: JPF@zurich.ibm.com; Gowrishankar Muthukrishnan
> > <gowrishankar.m@in.ibm.com>
> > Subject: Re: [dpdk-dev] [PATCH dpdk 4/5] vfio: Do try setting IOMMU type
> if
> > already set
> >
> > On 21/04/17 18:54, Andrew Rybchenko wrote:
> > > On 04/20/2017 10:24 AM, Alexey Kardashevskiy wrote:
> > >> The existing code correctly checks if a container is set to a group
> > >> and does not try attaching a group to a container for a
> second/third/...
> > >> device from the same IOMMU group.
> > >>
> > >> However it still tries to set IOMMU type to a container for every
> > >> device in a group which produces failure and closes container and
> group
> > fds.
> > >>
> > >> This moves vfio_set_iommu_type() and DMA mapping code under:
> > >> if (!(group_status.flags & VFIO_GROUP_FLAGS_CONTAINER_SET))
> > >>
> > >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > >> ---
> > >>   lib/librte_eal/linuxapp/eal/eal_vfio.c | 50
> > >> +++++++++++++++++-----------------
> > >>   1 file changed, 25 insertions(+), 25 deletions(-)
> > >>
> > >> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > >> b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > >> index 6e2e84ca7..46f951f4d 100644
> > >> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > >> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > >> @@ -298,33 +298,33 @@ vfio_setup_device(const char *sysfs_base,
> > const
> > >> char *dev_addr,
> > >>               clear_group(vfio_group_fd);
> > >>               return -1;
> > >>           }
> > >> -    }
> > >>   -    /*
> > >> -     * pick an IOMMU type and set up DMA mappings for container
> > >> -     *
> > >> -     * needs to be done only once, only when first group is assigned
> to
> > >> -     * a container and only in primary process. Note this can happen
> > >> several
> > >> -     * times with the hotplug functionality.
> > >> -     */
> > >> -    if (internal_config.process_type == RTE_PROC_PRIMARY &&
> > >> -            vfio_cfg.vfio_active_groups == 1) {
> > >> -        /* select an IOMMU type which we will be using */
> > >> -        const struct vfio_iommu_type *t =
> > >> +        /*
> > >> +         * pick an IOMMU type and set up DMA mappings for container
> > >> +         *
> > >> +         * needs to be done only once, only when first group is
> assigned to
> > >> +         * a container and only in primary process. Note this can
> > >> + happen
> > >> several
> > >> +         * times with the hotplug functionality.
> > >> +         */
> > >> +        if (internal_config.process_type == RTE_PROC_PRIMARY &&
> > >> +                vfio_cfg.vfio_active_groups == 1) {
> > >> +            /* select an IOMMU type which we will be using */
> > >> +            const struct vfio_iommu_type *t =
> > >>                   vfio_set_iommu_type(vfio_cfg.vfio_container_fd);
> > >> -        if (!t) {
> > >> -            RTE_LOG(ERR, EAL, "  %s failed to select IOMMU type\n",
> > >> dev_addr);
> > >> -            close(vfio_group_fd);
> > >> -            clear_group(vfio_group_fd);
> > >> -            return -1;
> > >> -        }
> > >> -        ret = t->dma_map_func(vfio_cfg.vfio_container_fd);
> > >> -        if (ret) {
> > >> -            RTE_LOG(ERR, EAL, "  %s DMA remapping failed, "
> > >> -                    "error %i (%s)\n", dev_addr, errno,
> strerror(errno));
> > >> -            close(vfio_group_fd);
> > >> -            clear_group(vfio_group_fd);
> > >> -            return -1;
> > >> +            if (!t) {
> > >> +                RTE_LOG(ERR, EAL, "  %s failed to select IOMMU
> > >> + type\n",
> > >> dev_addr);
> > >> +                close(vfio_group_fd);
> > >> +                clear_group(vfio_group_fd);
> > >> +                return -1;
> > >> +            }
> > >> +            ret = t->dma_map_func(vfio_cfg.vfio_container_fd);
> > >> +            if (ret) {
> > >> +                RTE_LOG(ERR, EAL, "  %s DMA remapping failed, "
> > >> +                        "error %i (%s)\n", dev_addr, errno,
> > >> strerror(errno));
> > >> +                close(vfio_group_fd);
> > >> +                clear_group(vfio_group_fd);
> > >> +                return -1;
> > >> +            }
> > >>           }
> > >>       }
> > >
> > > It looks like a duplicate of the earlier submitted patch
> > > http://dpdk.org/ml/archives/dev/2017-April/063077.html
> >
> >
> > It is a duplicate indeed, what needs to be done to have it accepted in
> the
> > mainline tree?
>
> I think Alejandro Lucero (CC'd) mentioned that he has another patch for
> similar issue that he's working on (for ports being
> assigned to the same IOMMU group). That patch would presumably fix this
> issue as well, so I'm inclined to
> wait for that one. If that's not happening, then we can accept either this
> one or the earlier ones, and fix the
> remaining issues (I have a setup where I can attempt to reproduce the
> issue, so that should be quick enough).
>
>
I was counting on first submitted patch being accepted.

There is another thing to solve which is to unplug ports when there are
more than one (already plugged) in an IOMMU group. I have a patch for this
but I can not test it for that scenario, just for the "normal" one with one
port/device per IOMMU group. I'm working on having a way for testing the
first case.

Once I have said that, I would like to comment code after the patch fixing
the plug issue is fine. What I mean is, unplugging a device when more than
one device already plugged from the same IOMMU group do not break things.
Of course, I can only say that from a theoretical point of view after
studying the kernel VFIO code. The only problem is after that, none else
device hotplug can happen. I know that is not a good thing because it
requires to restart the app, but better than breaking the app with a crash.

I will submit later today the patch for fixing the unplug issue. Maybe
someone can test that asap while I get some way for doing it.



> Thanks,
> Anatoly
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 4/5] vfio: Do try setting IOMMU type if already set
  2017-04-26  8:45         ` Alejandro Lucero
@ 2017-04-26  8:58           ` Burakov, Anatoly
  2017-04-26 10:24             ` Alejandro Lucero
  0 siblings, 1 reply; 28+ messages in thread
From: Burakov, Anatoly @ 2017-04-26  8:58 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: Alexey Kardashevskiy, Andrew Rybchenko, dev, JPF,
	Gowrishankar Muthukrishnan

Hi Alejandro,

> I was counting on first submitted patch being accepted. 

OK, that's on me then.

> There is another thing to solve which is to unplug ports when there are more than one (already plugged) in an IOMMU group. I have a patch for this but I can not test it for that scenario, just for the "normal" one with one port/device per IOMMU group. > I'm working on having a way for testing the first case.

Have you tried older kernel versions yet? Those seem to work for me.

> The only problem is after that, none else device hotplug can happen. I know that is not a good thing because it requires to restart the app, but better than breaking the app with a crash.

Presumably, you're only talking about VFIO hotplug, and not hotplug in general? Is there anything that can be done to fix this issue in the long run? 
 
Thanks,
Anatoly


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH dpdk 4/5] vfio: Do try setting IOMMU type if already set
  2017-04-26  8:58           ` Burakov, Anatoly
@ 2017-04-26 10:24             ` Alejandro Lucero
  0 siblings, 0 replies; 28+ messages in thread
From: Alejandro Lucero @ 2017-04-26 10:24 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: Alexey Kardashevskiy, Andrew Rybchenko, dev, JPF,
	Gowrishankar Muthukrishnan

On Wed, Apr 26, 2017 at 9:58 AM, Burakov, Anatoly <anatoly.burakov@intel.com
> wrote:

> Hi Alejandro,
>
> > I was counting on first submitted patch being accepted.
>
> OK, that's on me then.
>
> > There is another thing to solve which is to unplug ports when there are
> more than one (already plugged) in an IOMMU group. I have a patch for this
> but I can not test it for that scenario, just for the "normal" one with one
> port/device per IOMMU group. > I'm working on having a way for testing the
> first case.
>
> Have you tried older kernel versions yet? Those seem to work for me.
>
>
I'm trying that one but run in some problems. Likely that is the quickest
way although I'm thinking on adding some option to the kernel for people
with such a hardware being able to test that option.



> > The only problem is after that, none else device hotplug can happen. I
> know that is not a good thing because it requires to restart the app, but
> better than breaking the app with a crash.
>
> Presumably, you're only talking about VFIO hotplug, and not hotplug in
> general? Is there anything that can be done to fix this issue in the long
> run?
>
>

Yes, just talking about VFIO. I think that patch I have should solve the
issue. It is just a matter of tracking how many devices are in use on a
vfio group, so just adding a new field to struct vfio_group and checking
number of devices when unplugging should be enough. Just when is the only
device plugged should close the vfio group.


> Thanks,
> Anatoly
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-04-26 10:24 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-20  7:23 [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on POWER8 Alexey Kardashevskiy
2017-04-20  7:23 ` [PATCH dpdk 1/5] vfio/ppc64/spapr: Use correct structures for add/remove windows Alexey Kardashevskiy
2017-04-20  7:23 ` [PATCH dpdk 2/5] pci: Initialize common rte driver pointer Alexey Kardashevskiy
2017-04-24  9:28   ` Burakov, Anatoly
2017-04-20  7:24 ` [PATCH dpdk 3/5] RFC: bnx2x: Update firmware versions Alexey Kardashevskiy
2017-04-20  7:24 ` [PATCH dpdk 4/5] vfio: Do try setting IOMMU type if already set Alexey Kardashevskiy
2017-04-20 19:31   ` gowrishankar muthukrishnan
2017-04-21  8:54   ` Andrew Rybchenko
2017-04-26  7:50     ` Alexey Kardashevskiy
2017-04-26  8:27       ` Burakov, Anatoly
2017-04-26  8:45         ` Alejandro Lucero
2017-04-26  8:58           ` Burakov, Anatoly
2017-04-26 10:24             ` Alejandro Lucero
2017-04-20  7:24 ` [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map Alexey Kardashevskiy
2017-04-20  9:04   ` Jonas Pfefferle1
2017-04-20 13:25     ` Alexey Kardashevskiy
2017-04-20 14:22       ` Alexey Kardashevskiy
2017-04-20 15:15         ` Jonas Pfefferle1
2017-04-20 22:01           ` Alexey Kardashevskiy
2017-04-20 19:16         ` gowrishankar muthukrishnan
2017-04-21  3:42           ` Alexey Kardashevskiy
2017-04-21  8:43             ` Alexey Kardashevskiy
     [not found]               ` <OF6F33ED54.7950E1EF-ONC1258109.003295E3-C1258109.00333E2E@notes.na.collabserv.com>
2017-04-22  0:12                 ` Alexey Kardashevskiy
2017-04-24  9:40                   ` Burakov, Anatoly
2017-04-21  8:51             ` gowrishankar muthukrishnan
     [not found]             ` <OF45247CC5.192F9D29-ONC1258109.002D6497-C1258109.002F2868@notes.na.collabserv.com>
2017-04-21  8:59               ` Alexey Kardashevskiy
2017-04-22 21:11 ` [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on POWER8 Olga Shern
2017-04-23 13:35   ` Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.