All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] Infrastructure to detect iova mapping on the bus
@ 2017-06-08 11:05 Santosh Shukla
  2017-06-08 11:05 ` [PATCH 01/10] bsdapp/eal_pci: get iommu class Santosh Shukla
                   ` (13 more replies)
  0 siblings, 14 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-06-08 11:05 UTC (permalink / raw)
  To: thomas, bruce.richardson, dev
  Cc: jerin.jacob, hemant.agrawal, shreyansh.jain, gaetan.rivet,
	Santosh Shukla

Q) Why do we need such infrastructure?

A) Some NPU hardware like OCTEONTX follows push model to get the packet
from the pktio device. Where packet allocation and freeing done
by the HW. Since HW can operate only on IOVA with help of SMMU/IOMMU,
when packet receives from the Ethernet device, it is the IOVA address
(which is PA in existing scheme).

Mapping IOVA as PA is expensive on those HW, where every packet
needs to be converted to VA from PA/IOVA.

This patchset proposes the method to autodetect the preferred
IOVA mode for a device. Summary of IOVA scheme:
- If all the devices are iommu capable and support IOMMU
  capable driver then selects IOVA_VA.
- If any of the devices are non-iommu then use default IOVA
  scheme ie. IOVA_PA.
- If no device found then IOVA scheme would be
  IOVA_DC (Don't care).

To achieve that, two global APIs introduced:
- rte_bus_get_iommu_class
- rte_pci_get_iommu_class

Return values for those APIs are:
enum rte_iova_mod {
        RTE_IOVA_DC, /* Don't care */
        RTE_IOVA_PA,
        RTE_IOVA_VA
}

Those are the bus policy for selecting IOVA mode. In case user
want to override bus IOVA mapping then added an EAL option
"--iova-mode=<string>". User to pass string format 'pa' --> IOVA_PA,
'va' --> IOVA_VA.

To support new eal option, adding global API:
- rte_eal_iova_mode

Patch Summary:
2) 1st - 2th patch: Adds infrastructure in linuxapp and bsdapp
layer.
1) 3rd patch: Introduces global bus api named rte_bus_get_iommu_class.
3) 4th patch: Add new eal option called --iova-mode=<mode-string>.
4) 5th - 6th patch: Logic to detect iova scheme.
5) 9th patch: Check IOVA mode before programing vfio dma_map.iova.
Default scheme is IOVA_PA.
6) 10th-12th patch: Check for IOVA_VA mode in below APIs
        - rte_mem_virt2phy
        - rte_mempool_virt2phy
        - rte_malloc_virt2phy
If set then return paddr=vaddr, else return value from default
implementation.

Test History:
- Tested for x86/XL710 40G NIC card for both modes (iova_va/pa).
- Tested for arm64/thunderx vNIC Integrated NIC for both modes
- Tested for arm64/Octeontx integrated NICs for only
  Iova_va mode(It supports only one mode.)
- Ran standalone tests like mempool_autotest, mbuf_autotest.
- Verified for Doxygen.

Work History:
Refer prev RFC proposal[1].

Noticed false positive checkpatch error:
- WARNING: quoted string split across lines
#60: FILE: lib/librte_eal/common/eal_common_bus.c:164:
+				RTE_LOG(INFO, EAL, "Bus (%s) iommu class of"
+					" devices not found.\n", bus->name);

- WARNING: LINUX_VERSION_CODE should be avoided, code should be for the version to which it is merged
#86: FILE: lib/librte_eal/linuxapp/eal/eal_vfio.c:822:
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 2, 0)

Thanks.

[1] http://dpdk.org/dev/patchwork/patch/24549/


Santosh Shukla (10):
  bsdapp/eal_pci: get iommu class
  linuxapp/eal_pci: get iommu class
  bus: get iommu class
  eal: add eal option to configure iova mode
  linuxapp/eal: detect iova mode
  bsdapp/eal: detect iova mapping mode
  linuxapp/eal_vfio: honor iova mode before mapping
  linuxapp/eal_memory: honor iova mode in virt2phy
  mempool: honor iova mode in virt2phy
  eal/rte_malloc: honor iova mode in virt2phy

 lib/librte_eal/bsdapp/eal/eal.c                 | 24 ++++++++++++----
 lib/librte_eal/bsdapp/eal/eal_pci.c             | 10 +++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  9 ++++++
 lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++
 lib/librte_eal/common/eal_common_options.c      | 31 ++++++++++++++++++++
 lib/librte_eal/common/eal_common_pci.c          |  1 +
 lib/librte_eal/common/eal_internal_cfg.h        |  1 +
 lib/librte_eal/common/eal_options.h             |  2 ++
 lib/librte_eal/common/include/rte_bus.h         | 31 ++++++++++++++++++++
 lib/librte_eal/common/include/rte_eal.h         | 10 +++++++
 lib/librte_eal/common/include/rte_pci.h         | 11 +++++++
 lib/librte_eal/common/rte_malloc.c              |  9 +++++-
 lib/librte_eal/linuxapp/eal/eal.c               | 24 ++++++++++++----
 lib/librte_eal/linuxapp/eal/eal_memory.c        |  3 ++
 lib/librte_eal/linuxapp/eal/eal_pci.c           | 38 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 33 +++++++++++++++++++--
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 +++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  9 ++++++
 lib/librte_mempool/rte_mempool.h                | 10 +++++--
 19 files changed, 266 insertions(+), 17 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH 01/10] bsdapp/eal_pci: get iommu class
  2017-06-08 11:05 [PATCH 00/10] Infrastructure to detect iova mapping on the bus Santosh Shukla
@ 2017-06-08 11:05 ` Santosh Shukla
  2017-06-08 11:05 ` [PATCH 02/10] linuxapp/eal_pci: " Santosh Shukla
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-06-08 11:05 UTC (permalink / raw)
  To: thomas, bruce.richardson, dev
  Cc: jerin.jacob, hemant.agrawal, shreyansh.jain, gaetan.rivet,
	Santosh Shukla

Introducing rte_pci_get_iommu_class API which helps to get iommu class
of PCI device on the bus and returns preferred iova mapping mode for
that bus.

Bsdapp case returns default iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/bsdapp/eal/eal_pci.c           | 10 ++++++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map |  7 +++++++
 lib/librte_eal/common/include/rte_bus.h       | 10 ++++++++++
 lib/librte_eal/common/include/rte_pci.h       | 11 +++++++++++
 4 files changed, 38 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c
index e321461d8..9c6670964 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -405,6 +405,16 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Get iommu class of pci devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	/* Has only one drv (RTE_KDRV_NIC_UIO) so ..*/
+	return RTE_IOVA_PA;
+}
+
 int
 pci_update_device(const struct rte_pci_addr *addr)
 {
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 2e48a7366..a9cc3a67e 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -193,3 +193,10 @@ DPDK_17.05 {
 	vfio_get_group_no;
 
 } DPDK_17.02;
+
+DPDK_17.08 {
+	global:
+
+	rte_pci_get_iommu_class;
+
+} DPDK_17.05;
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 7c3696926..56eacd0c9 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -56,6 +56,16 @@ extern "C" {
 /** Double linked list of buses */
 TAILQ_HEAD(rte_bus_list, rte_bus);
 
+
+/**
+ * IOVA mapping mode.
+ */
+enum rte_iova_mode {
+	RTE_IOVA_DC,	/* _DC --> Don't Care mode */
+	RTE_IOVA_PA,
+	RTE_IOVA_VA
+};
+
 /**
  * Bus specific scan for devices attached on the bus.
  * For each bus object, the scan would be reponsible for finding devices and
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index b82ab9e79..bc403123a 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -377,6 +377,16 @@ int
 rte_pci_probe(void);
 
 /**
+ * Get iommu class of PCI devices on the bus.
+ * And return their preferred iova mapping mode.
+ *
+ * @return
+ *   - enum rte_iova_mode.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void);
+
+/**
  * Map the PCI device resources in user space virtual memory address
  *
  * Note that driver should not call this function when flag
@@ -472,6 +482,7 @@ int rte_pci_detach(const struct rte_pci_addr *addr);
  */
 void rte_pci_dump(FILE *f);
 
+
 /**
  * Register a PCI driver.
  *
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH 02/10] linuxapp/eal_pci: get iommu class
  2017-06-08 11:05 [PATCH 00/10] Infrastructure to detect iova mapping on the bus Santosh Shukla
  2017-06-08 11:05 ` [PATCH 01/10] bsdapp/eal_pci: get iommu class Santosh Shukla
@ 2017-06-08 11:05 ` Santosh Shukla
  2017-07-05  8:17   ` Maxime Coquelin
  2017-06-08 11:05 ` [PATCH 03/10] bus: " Santosh Shukla
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-06-08 11:05 UTC (permalink / raw)
  To: thomas, bruce.richardson, dev
  Cc: jerin.jacob, hemant.agrawal, shreyansh.jain, gaetan.rivet,
	Santosh Shukla

Get iommu class of PCI device on the bus and returns preferred iova
mapping mode for that bus.

IOVA mapping scheme for linuxapp case:
- uio/uio_generic/vfio_noiommu --> default i.e.. (RTE_IOVA_PA)
- vfio --> RTE_IOVA_VA.
- In case of no device attached to any driver,
  return RTE_IOVA_DC.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/linuxapp/eal/eal_pci.c           | 38 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 23 +++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 +++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +++++
 4 files changed, 72 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 595622b21..2772e883e 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -45,6 +45,7 @@
 #include "eal_filesystem.h"
 #include "eal_private.h"
 #include "eal_pci_init.h"
+#include "eal_vfio.h"
 
 /**
  * @file
@@ -488,6 +489,43 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Get iommu class of PCI devices on the bus.
+ * Check that those devices are attached to iommu driver.
+ * If attached then return iova_va or iova_pa mode, else
+ * return with dont_care(_DC).
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	struct rte_pci_device *dev = NULL;
+	int ret = RTE_IOVA_DC;
+
+	TAILQ_FOREACH(dev, &rte_pci_bus.device_list, next) {
+
+		if (dev->kdrv == RTE_KDRV_UNKNOWN ||
+		    dev->kdrv == RTE_KDRV_NONE)
+			continue;
+
+		if (dev->kdrv != RTE_KDRV_VFIO) {
+			ret = RTE_IOVA_PA;
+			return ret;
+		}
+
+		ret = RTE_IOVA_VA;
+	}
+
+	/* In case of iova_va, check for vfio_noiommu mode */
+	if (ret == RTE_IOVA_VA) {
+#ifdef VFIO_PRESENT
+		if (vfio_noiommu_is_enabled() == 1)
+#endif
+			ret = RTE_IOVA_PA;
+	}
+
+	return ret;
+}
+
 /* Read PCI config space. */
 int rte_pci_read_config(const struct rte_pci_device *device,
 		void *buf, size_t len, off_t offset)
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 946df7e31..04914406f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -816,4 +816,27 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
 	return 0;
 }
 
+int
+vfio_noiommu_is_enabled(void)
+{
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 2, 0)
+	return -1;
+#else
+	int fd, ret, cnt __rte_unused;
+	char c;
+
+	ret = -1;
+	fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
+	if (fd < 0)
+		return -1;
+
+	cnt = read(fd, &c, 1);
+	if (c == 'Y')
+		ret = 1;
+
+	close(fd);
+	return ret;
+#endif
+}
+
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index 5ff63e5d7..26ea8e119 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -150,6 +150,8 @@ struct vfio_config {
 #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
 #define VFIO_GET_REGION_IDX(x) (x >> 40)
+#define VFIO_NOIOMMU_MODE      \
+	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
 
 /* DMA mapping function prototype.
  * Takes VFIO container fd as a parameter.
@@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
 
 int vfio_mp_sync_setup(void);
 
+int vfio_noiommu_is_enabled(void);
+
 #define SOCKET_REQ_CONTAINER 0x100
 #define SOCKET_REQ_GROUP 0x200
 #define SOCKET_CLR_GROUP 0x300
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 670bab3a5..2cea7c272 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -198,3 +198,10 @@ DPDK_17.05 {
 	vfio_get_group_no;
 
 } DPDK_17.02;
+
+DPDK_17.08 {
+	global:
+
+	rte_pci_get_iommu_class;
+
+} DPDK_17.05;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH 03/10] bus: get iommu class
  2017-06-08 11:05 [PATCH 00/10] Infrastructure to detect iova mapping on the bus Santosh Shukla
  2017-06-08 11:05 ` [PATCH 01/10] bsdapp/eal_pci: get iommu class Santosh Shukla
  2017-06-08 11:05 ` [PATCH 02/10] linuxapp/eal_pci: " Santosh Shukla
@ 2017-06-08 11:05 ` Santosh Shukla
  2017-06-08 11:05 ` [PATCH 04/10] eal: add eal option to configure iova mode Santosh Shukla
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-06-08 11:05 UTC (permalink / raw)
  To: thomas, bruce.richardson, dev
  Cc: jerin.jacob, hemant.agrawal, shreyansh.jain, gaetan.rivet,
	Santosh Shukla

Currently there is noway to detect iova address mapping scheme for a
device on the bus.

API(rte_bus_get_iommu_class) helps to automatically detect and select
appropriate iova mapping scheme for iommu capable device on that bus.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
 lib/librte_eal/common/eal_common_pci.c          |  1 +
 lib/librte_eal/common/include/rte_bus.h         | 21 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 47 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index a9cc3a67e..0beadacfb 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -198,5 +198,6 @@ DPDK_17.08 {
 	global:
 
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.05;
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 8f9baf8b8..04398275d 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -145,3 +145,26 @@ rte_bus_dump(FILE *f)
 		}
 	}
 }
+
+/*
+ * Get iommu class of devices on the bus.
+ */
+enum rte_iova_mode
+rte_bus_get_iommu_class(void)
+{
+	int mode = RTE_IOVA_DC;
+	struct rte_bus *bus;
+
+	TAILQ_FOREACH(bus, &rte_bus_list, next) {
+
+		if (bus->get_iommu_class) {
+			mode |= bus->get_iommu_class();
+		}
+	}
+
+	if (mode != RTE_IOVA_VA) {
+		/* Mapping could be _DC or _PA. Use default IOVA mode */
+		mode = RTE_IOVA_PA;
+	}
+	return mode;
+}
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 5ae520186..9d4ed7927 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -487,6 +487,7 @@ struct rte_pci_bus rte_pci_bus = {
 	.bus = {
 		.scan = rte_pci_scan,
 		.probe = rte_pci_probe,
+		.get_iommu_class = rte_pci_get_iommu_class,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 56eacd0c9..734f6f051 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -91,6 +91,16 @@ typedef int (*rte_bus_scan_t)(void);
  */
 typedef int (*rte_bus_probe_t)(void);
 
+
+/**
+ * Get iommu class of devices on the bus.
+ * Check that those devices are attached to iommu driver.
+ *
+ * @return
+ *      enum rte_iova_mode value.
+ */
+typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
+
 /**
  * A structure describing a generic bus.
  */
@@ -99,6 +109,7 @@ struct rte_bus {
 	const char *name;            /**< Name of the bus */
 	rte_bus_scan_t scan;         /**< Scan for devices attached to bus */
 	rte_bus_probe_t probe;       /**< Probe devices on bus */
+	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
 
 /**
@@ -150,6 +161,16 @@ int rte_bus_probe(void);
  */
 void rte_bus_dump(FILE *f);
 
+
+/**
+ * Get iommu class of devices on the bus.
+ * Check that those devices are attached to iommu driver.
+ *
+ * @return
+ *	enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_bus_get_iommu_class(void);
+
 /**
  * Helper for Bus registration.
  * The constructor has higher priority than PMD constructors.
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 2cea7c272..6c016c82e 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -203,5 +203,6 @@ DPDK_17.08 {
 	global:
 
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.05;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH 04/10] eal: add eal option to configure iova mode
  2017-06-08 11:05 [PATCH 00/10] Infrastructure to detect iova mapping on the bus Santosh Shukla
                   ` (2 preceding siblings ...)
  2017-06-08 11:05 ` [PATCH 03/10] bus: " Santosh Shukla
@ 2017-06-08 11:05 ` Santosh Shukla
  2017-06-08 11:05 ` [PATCH 05/10] linuxapp/eal: detect " Santosh Shukla
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-06-08 11:05 UTC (permalink / raw)
  To: thomas, bruce.richardson, dev
  Cc: jerin.jacob, hemant.agrawal, shreyansh.jain, gaetan.rivet,
	Santosh Shukla

In the case of user don't want to use bus iova scheme and want
to override.

For that, Adding eal option --iova-mode=<string> where valid input
string is 'pa' or 'va'.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/common/eal_common_options.c | 31 ++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_internal_cfg.h   |  1 +
 lib/librte_eal/common/eal_options.h        |  2 ++
 lib/librte_eal/common/include/rte_eal.h    | 10 ++++++++++
 4 files changed, 44 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index f470195f3..16ce32f00 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -95,6 +95,7 @@ eal_long_options[] = {
 	{OPT_VFIO_INTR,         1, NULL, OPT_VFIO_INTR_NUM        },
 	{OPT_VMWARE_TSC_MAP,    0, NULL, OPT_VMWARE_TSC_MAP_NUM   },
 	{OPT_XEN_DOM0,          0, NULL, OPT_XEN_DOM0_NUM         },
+	{OPT_IOVA_MODE,	        1, NULL, OPT_IOVA_MODE_NUM        },
 	{0,                     0, NULL, 0                        }
 };
 
@@ -161,6 +162,7 @@ eal_reset_internal_config(struct internal_config *internal_cfg)
 #endif
 	internal_cfg->vmware_tsc_map = 0;
 	internal_cfg->create_uio_dev = 0;
+	internal_cfg->iova_mode = RTE_IOVA_PA;
 }
 
 static int
@@ -791,6 +793,25 @@ eal_parse_proc_type(const char *arg)
 	return RTE_PROC_INVALID;
 }
 
+static int
+eal_parse_iova_mode(const char *name)
+{
+	int mode;
+
+	if (name == NULL)
+		return -1;
+
+	if (!strcmp("pa", name))
+		mode = RTE_IOVA_PA;
+	else if (!strcmp("va", name))
+		mode = RTE_IOVA_VA;
+	else
+		return -1;
+
+	internal_config.iova_mode = mode;
+	return 0;
+}
+
 int
 eal_parse_common_option(int opt, const char *optarg,
 			struct internal_config *conf)
@@ -933,6 +954,14 @@ eal_parse_common_option(int opt, const char *optarg,
 		core_parsed = 1;
 		break;
 
+	case OPT_IOVA_MODE_NUM:
+		if (eal_parse_iova_mode(optarg) < 0) {
+			RTE_LOG(ERR, EAL, "invalid parameters for --"
+				OPT_IOVA_MODE "\n");
+			return -1;
+		}
+		break;
+
 	/* don't know what to do, leave this to caller */
 	default:
 		return 1;
@@ -1083,5 +1112,7 @@ eal_common_usage(void)
 	       "  --"OPT_NO_PCI"            Disable PCI\n"
 	       "  --"OPT_NO_HPET"           Disable HPET\n"
 	       "  --"OPT_NO_SHCONF"         No shared config (mmap'd files)\n"
+	       "  --"OPT_IOVA_MODE"         Set iova mode. 'pa' for IOVA_PA\n"
+	       "                            'va' for IOVA_VA\n"
 	       "\n", RTE_MAX_LCORE);
 }
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index 7b7e8c887..4649e3c02 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -84,6 +84,7 @@ struct internal_config {
 	const char *hugepage_dir;         /**< specific hugetlbfs directory to use */
 
 	unsigned num_hugepage_sizes;      /**< how many sizes on this system */
+	enum rte_iova_mode iova_mode ;    /**< Set iova mode on this system  */
 	struct hugepage_info hugepage_info[MAX_HUGEPAGE_SIZES];
 };
 extern struct internal_config internal_config; /**< Global EAL configuration. */
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index a881c62e2..7c5556eda 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -83,6 +83,8 @@ enum {
 	OPT_VMWARE_TSC_MAP_NUM,
 #define OPT_XEN_DOM0          "xen-dom0"
 	OPT_XEN_DOM0_NUM,
+#define OPT_IOVA_MODE          "iova-mode"
+	OPT_IOVA_MODE_NUM,
 	OPT_LONG_MAX_NUM
 };
 
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index abf020bf9..50f881365 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -45,6 +45,7 @@
 
 #include <rte_per_lcore.h>
 #include <rte_config.h>
+#include <rte_bus.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -283,6 +284,15 @@ static inline int rte_gettid(void)
 	return RTE_PER_LCORE(_thread_id);
 }
 
+
+/**
+ * Get the iova mode
+ *
+ * @return
+ *   enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_eal_iova_mode(void);
+
 #define RTE_INIT(func) \
 static void __attribute__((constructor, used)) func(void)
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH 05/10] linuxapp/eal: detect iova mode
  2017-06-08 11:05 [PATCH 00/10] Infrastructure to detect iova mapping on the bus Santosh Shukla
                   ` (3 preceding siblings ...)
  2017-06-08 11:05 ` [PATCH 04/10] eal: add eal option to configure iova mode Santosh Shukla
@ 2017-06-08 11:05 ` Santosh Shukla
  2017-07-05 13:17   ` Hemant Agrawal
  2017-06-08 11:05 ` [PATCH 06/10] bsdapp/eal: detect iova mapping mode Santosh Shukla
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-06-08 11:05 UTC (permalink / raw)
  To: thomas, bruce.richardson, dev
  Cc: jerin.jacob, hemant.agrawal, shreyansh.jain, gaetan.rivet,
	Santosh Shukla

- Moving late bus scanning to up..just after eal_parsing.
- Detect iova mapping mode based on user provided eal option
  (rte_eal_iova_mode) and result of rte_bus_scan_iommu_class.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/linuxapp/eal/eal.c               | 24 ++++++++++++++++++------
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 7c78f2dc2..54f42d752 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -122,6 +122,13 @@ struct internal_config internal_config;
 /* used by rte_rdtsc() */
 int rte_cycles_vmware_tsc_map;
 
+/* Get the iova mode */
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return internal_config.iova_mode;
+}
+
 /* Return a pointer to the configuration structure */
 struct rte_config *
 rte_eal_get_configuration(void)
@@ -793,6 +800,17 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		return -1;
+	}
+
+	if (rte_eal_iova_mode() == RTE_IOVA_VA &&
+	    rte_bus_get_iommu_class() == RTE_IOVA_VA) {
+		internal_config.iova_mode = RTE_IOVA_VA;
+	}
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			internal_config.xen_dom0_support == 0 &&
@@ -890,12 +908,6 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 6c016c82e..79b005036 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -204,5 +204,6 @@ DPDK_17.08 {
 
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.05;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH 06/10] bsdapp/eal: detect iova mapping mode
  2017-06-08 11:05 [PATCH 00/10] Infrastructure to detect iova mapping on the bus Santosh Shukla
                   ` (4 preceding siblings ...)
  2017-06-08 11:05 ` [PATCH 05/10] linuxapp/eal: detect " Santosh Shukla
@ 2017-06-08 11:05 ` Santosh Shukla
  2017-06-08 11:05 ` [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-06-08 11:05 UTC (permalink / raw)
  To: thomas, bruce.richardson, dev
  Cc: jerin.jacob, hemant.agrawal, shreyansh.jain, gaetan.rivet,
	Santosh Shukla

- Moving late bus scanning to up..just after eal_parsing.
- Mapping mode would be default for bsdapp. It supports
  only one pass through mode (RTE_KDRV_NIC_UIO)

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/bsdapp/eal/eal.c               | 24 ++++++++++++++++++------
 lib/librte_eal/bsdapp/eal/rte_eal_version.map |  1 +
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 05f0c1f90..d9c6617bf 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -113,6 +113,13 @@ struct internal_config internal_config;
 /* used by rte_rdtsc() */
 int rte_cycles_vmware_tsc_map;
 
+/* Get the iova mode */
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return internal_config.iova_mode;
+}
+
 /* Return a pointer to the configuration structure */
 struct rte_config *
 rte_eal_get_configuration(void)
@@ -536,6 +543,17 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		return -1;
+	}
+
+	if (rte_eal_iova_mode() == RTE_IOVA_VA &&
+	    rte_bus_get_iommu_class() == RTE_IOVA_VA) {
+		internal_config.iova_mode = RTE_IOVA_VA;
+	}
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			eal_hugepage_info_init() < 0) {
@@ -615,12 +633,6 @@ rte_eal_init(int argc, char **argv)
 		rte_config.master_lcore, thread_id, cpuset,
 		ret == 0 ? "" : "...");
 
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 0beadacfb..6900626fe 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -199,5 +199,6 @@ DPDK_17.08 {
 
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.05;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping
  2017-06-08 11:05 [PATCH 00/10] Infrastructure to detect iova mapping on the bus Santosh Shukla
                   ` (5 preceding siblings ...)
  2017-06-08 11:05 ` [PATCH 06/10] bsdapp/eal: detect iova mapping mode Santosh Shukla
@ 2017-06-08 11:05 ` Santosh Shukla
  2017-07-05  9:14   ` Maxime Coquelin
  2017-06-08 11:05 ` [PATCH 08/10] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-06-08 11:05 UTC (permalink / raw)
  To: thomas, bruce.richardson, dev
  Cc: jerin.jacob, hemant.agrawal, shreyansh.jain, gaetan.rivet,
	Santosh Shukla

Check iova mode and accordingly map iova to pa or va.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 04914406f..348b7a7f4 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
 
 		ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
@@ -792,7 +795,10 @@ vfio_spapr_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
 				 VFIO_DMA_MAP_FLAG_WRITE;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH 08/10] linuxapp/eal_memory: honor iova mode in virt2phy
  2017-06-08 11:05 [PATCH 00/10] Infrastructure to detect iova mapping on the bus Santosh Shukla
                   ` (6 preceding siblings ...)
  2017-06-08 11:05 ` [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
@ 2017-06-08 11:05 ` Santosh Shukla
  2017-06-08 11:05 ` [PATCH 09/10] mempool: " Santosh Shukla
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-06-08 11:05 UTC (permalink / raw)
  To: thomas, bruce.richardson, dev
  Cc: jerin.jacob, hemant.agrawal, shreyansh.jain, gaetan.rivet,
	Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 9c9baf628..1261efda0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -140,6 +140,9 @@ rte_mem_virt2phy(const void *virtaddr)
 	int page_size;
 	off_t offset;
 
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		return (uintptr_t)virtaddr;
+
 	/* when using dom0, /proc/self/pagemap always returns 0, check in
 	 * dpdk memory by browsing the memsegs */
 	if (rte_xen_dom0_supported()) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH 09/10] mempool: honor iova mode in virt2phy
  2017-06-08 11:05 [PATCH 00/10] Infrastructure to detect iova mapping on the bus Santosh Shukla
                   ` (7 preceding siblings ...)
  2017-06-08 11:05 ` [PATCH 08/10] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
@ 2017-06-08 11:05 ` Santosh Shukla
  2017-06-08 11:05 ` [PATCH 10/10] eal/rte_malloc: " Santosh Shukla
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-06-08 11:05 UTC (permalink / raw)
  To: thomas, bruce.richardson, dev
  Cc: jerin.jacob, hemant.agrawal, shreyansh.jain, gaetan.rivet,
	Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_mempool/rte_mempool.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 76b5b3b15..fafa77e3b 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -78,6 +78,7 @@
 #include <rte_ring.h>
 #include <rte_memcpy.h>
 #include <rte_common.h>
+#include <rte_bus.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -1397,9 +1398,14 @@ rte_mempool_empty(const struct rte_mempool *mp)
 static inline phys_addr_t
 rte_mempool_virt2phy(__rte_unused const struct rte_mempool *mp, const void *elt)
 {
-	const struct rte_mempool_objhdr *hdr;
-	hdr = (const struct rte_mempool_objhdr *)RTE_PTR_SUB(elt,
+	struct rte_mempool_objhdr *hdr;
+
+	hdr = (struct rte_mempool_objhdr *)RTE_PTR_SUB(elt,
 		sizeof(*hdr));
+
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		hdr->physaddr = (uintptr_t)elt;
+
 	return hdr->physaddr;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH 10/10] eal/rte_malloc: honor iova mode in virt2phy
  2017-06-08 11:05 [PATCH 00/10] Infrastructure to detect iova mapping on the bus Santosh Shukla
                   ` (8 preceding siblings ...)
  2017-06-08 11:05 ` [PATCH 09/10] mempool: " Santosh Shukla
@ 2017-06-08 11:05 ` Santosh Shukla
  2017-07-04  4:41 ` [PATCH 00/10] Infrastructure to detect iova mapping on the bus santosh
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-06-08 11:05 UTC (permalink / raw)
  To: thomas, bruce.richardson, dev
  Cc: jerin.jacob, hemant.agrawal, shreyansh.jain, gaetan.rivet,
	Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/common/rte_malloc.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index f4a883529..2b00f8a56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -251,8 +251,15 @@ rte_malloc_set_limit(__rte_unused const char *type,
 phys_addr_t
 rte_malloc_virt2phy(const void *addr)
 {
+	phys_addr_t paddr;
 	const struct malloc_elem *elem = malloc_elem_from_data(addr);
 	if (elem == NULL)
 		return 0;
-	return elem->ms->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		paddr = (uintptr_t)addr;
+	else
+		paddr = elem->ms->phys_addr +
+			((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+	return paddr;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* Re: [PATCH 00/10] Infrastructure to detect iova mapping on the bus
  2017-06-08 11:05 [PATCH 00/10] Infrastructure to detect iova mapping on the bus Santosh Shukla
                   ` (9 preceding siblings ...)
  2017-06-08 11:05 ` [PATCH 10/10] eal/rte_malloc: " Santosh Shukla
@ 2017-07-04  4:41 ` santosh
  2017-07-04  7:19   ` Thomas Monjalon
  2017-07-04 10:10 ` Thomas Monjalon
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-04  4:41 UTC (permalink / raw)
  To: thomas, bruce.richardson, dev
  Cc: jerin.jacob, hemant.agrawal, shreyansh.jain, gaetan.rivet

Hi,

On Thursday 08 June 2017 04:35 PM, Santosh Shukla wrote:

> Q) Why do we need such infrastructure?
>
> A) Some NPU hardware like OCTEONTX follows push model to get the packet
> from the pktio device. Where packet allocation and freeing done
> by the HW. Since HW can operate only on IOVA with help of SMMU/IOMMU,
> when packet receives from the Ethernet device, it is the IOVA address
> (which is PA in existing scheme).
>
> Mapping IOVA as PA is expensive on those HW, where every packet
> needs to be converted to VA from PA/IOVA.
>
> This patchset proposes the method to autodetect the preferred
> IOVA mode for a device. Summary of IOVA scheme:
> - If all the devices are iommu capable and support IOMMU
>   capable driver then selects IOVA_VA.
> - If any of the devices are non-iommu then use default IOVA
>   scheme ie. IOVA_PA.
> - If no device found then IOVA scheme would be
>   IOVA_DC (Don't care).
>
> To achieve that, two global APIs introduced:
> - rte_bus_get_iommu_class
> - rte_pci_get_iommu_class
>
> Return values for those APIs are:
> enum rte_iova_mod {
>         RTE_IOVA_DC, /* Don't care */
>         RTE_IOVA_PA,
>         RTE_IOVA_VA
> }
>
> Those are the bus policy for selecting IOVA mode. In case user
> want to override bus IOVA mapping then added an EAL option
> "--iova-mode=<string>". User to pass string format 'pa' --> IOVA_PA,
> 'va' --> IOVA_VA.
>
> To support new eal option, adding global API:
> - rte_eal_iova_mode
>
> Patch Summary:
> 2) 1st - 2th patch: Adds infrastructure in linuxapp and bsdapp
> layer.
> 1) 3rd patch: Introduces global bus api named rte_bus_get_iommu_class.
> 3) 4th patch: Add new eal option called --iova-mode=<mode-string>.
> 4) 5th - 6th patch: Logic to detect iova scheme.
> 5) 9th patch: Check IOVA mode before programing vfio dma_map.iova.
> Default scheme is IOVA_PA.
> 6) 10th-12th patch: Check for IOVA_VA mode in below APIs
>         - rte_mem_virt2phy
>         - rte_mempool_virt2phy
>         - rte_malloc_virt2phy
> If set then return paddr=vaddr, else return value from default
> implementation.
>
> Test History:
> - Tested for x86/XL710 40G NIC card for both modes (iova_va/pa).
> - Tested for arm64/thunderx vNIC Integrated NIC for both modes
> - Tested for arm64/Octeontx integrated NICs for only
>   Iova_va mode(It supports only one mode.)
> - Ran standalone tests like mempool_autotest, mbuf_autotest.
> - Verified for Doxygen.
>
> Work History:
> Refer prev RFC proposal[1].
>
> Noticed false positive checkpatch error:
> - WARNING: quoted string split across lines
> #60: FILE: lib/librte_eal/common/eal_common_bus.c:164:
> +				RTE_LOG(INFO, EAL, "Bus (%s) iommu class of"
> +					" devices not found.\n", bus->name);
>
> - WARNING: LINUX_VERSION_CODE should be avoided, code should be for the version to which it is merged
> #86: FILE: lib/librte_eal/linuxapp/eal/eal_vfio.c:822:
> +#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 2, 0)
>
> Thanks.
>
> [1] http://dpdk.org/dev/patchwork/patch/24549/
>
Ping?

Thanks.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 00/10] Infrastructure to detect iova mapping on the bus
  2017-07-04  4:41 ` [PATCH 00/10] Infrastructure to detect iova mapping on the bus santosh
@ 2017-07-04  7:19   ` Thomas Monjalon
  2017-07-04  7:57     ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Thomas Monjalon @ 2017-07-04  7:19 UTC (permalink / raw)
  To: santosh
  Cc: dev, bruce.richardson, jerin.jacob, hemant.agrawal,
	shreyansh.jain, gaetan.rivet, sergio.gonzalez.monroy,
	Anatoly Burakov

04/07/2017 06:41, santosh:
> Ping?

You should try to ping Sergio, memory maintainer,
and Anatoly, VFIO maintainer.

Given that
- there is no review at all,
- it is conflicting with the bus/PCI rework in progress,
it will not be considered for 17.08.

Note: we are also missing some reviews for the bus rework.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 00/10] Infrastructure to detect iova mapping on the bus
  2017-07-04  7:19   ` Thomas Monjalon
@ 2017-07-04  7:57     ` santosh
  2017-07-04  9:03       ` Thomas Monjalon
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-04  7:57 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, bruce.richardson, jerin.jacob, hemant.agrawal,
	shreyansh.jain, gaetan.rivet, sergio.gonzalez.monroy,
	Anatoly Burakov

Hi Thomas,

On Tuesday 04 July 2017 12:49 PM, Thomas Monjalon wrote:

> 04/07/2017 06:41, santosh:
>> Ping?
> You should try to ping Sergio, memory maintainer,
> and Anatoly, VFIO maintainer.
>
> Given that
> - there is no review at all,

By default if no review then its maintainer responsibility to review Or 
ask someone to review?

BTW: Who is the bus maintainer? I don't see entry in MAINTAINER file.

> - it is conflicting with the bus/PCI rework in progress,
> it will not be considered for 17.08.

We're adding only two new iommu_class bus api in rte_bus, I'm not sure
about conflict. If there is conflict then I should see review comment for
same in my patch set?

This initiatives came out from [1], and we put lot of effort in
breaking down api from bus till library layer. This framework indeed
a need for those platform which cares for iova=va like octeontx, dpaa2 and
perhaps many future SoCs. W/o this framework, we can't get pktio working for octeontx ethdev 
in dpdk, can't get HW pool manager working for Octeontx offload blocks.

I agree that I missed on sergio or Anatoly But crux of design is rte_bus
layer. I expect comment on those area, right?

And if we have consent on bus approach then rest changes are trivial.

I didn't ping before as You had picked my patch set and asked for review comment in past.

Can we include it in RC2? Because it will delay upstreaming effort of octeontx ethdev driver
and other dependent driver for us.

Thanks.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 00/10] Infrastructure to detect iova mapping on the bus
  2017-07-04  7:57     ` santosh
@ 2017-07-04  9:03       ` Thomas Monjalon
  2017-07-04  9:21         ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Thomas Monjalon @ 2017-07-04  9:03 UTC (permalink / raw)
  To: santosh
  Cc: dev, bruce.richardson, jerin.jacob, hemant.agrawal,
	shreyansh.jain, gaetan.rivet, sergio.gonzalez.monroy,
	Anatoly Burakov

04/07/2017 09:57, santosh:
> Hi Thomas,
> 
> On Tuesday 04 July 2017 12:49 PM, Thomas Monjalon wrote:
> 
> > 04/07/2017 06:41, santosh:
> >> Ping?
> > You should try to ping Sergio, memory maintainer,
> > and Anatoly, VFIO maintainer.
> >
> > Given that
> > - there is no review at all,
> 
> By default if no review then its maintainer responsibility to review Or 
> ask someone to review?

Yes, but it is also the responsibility of the author.

> BTW: Who is the bus maintainer? I don't see entry in MAINTAINER file.

Bus code is new and there is no maintainer yet.

> > - it is conflicting with the bus/PCI rework in progress,
> > it will not be considered for 17.08.
> 
> We're adding only two new iommu_class bus api in rte_bus, I'm not sure
> about conflict. If there is conflict then I should see review comment for
> same in my patch set?

It is mostly a time conflict.

> This initiatives came out from [1], and we put lot of effort in

You forgot the [1] reference.

> breaking down api from bus till library layer. This framework indeed
> a need for those platform which cares for iova=va like octeontx, dpaa2 and
> perhaps many future SoCs. W/o this framework, we can't get pktio working for octeontx ethdev 
> in dpdk, can't get HW pool manager working for Octeontx offload blocks.
> 
> I agree that I missed on sergio or Anatoly But crux of design is rte_bus
> layer. I expect comment on those area, right?
> 
> And if we have consent on bus approach then rest changes are trivial.
> 
> I didn't ping before as You had picked my patch set and asked for review comment in past.
> 
> Can we include it in RC2? Because it will delay upstreaming effort of octeontx ethdev driver
> and other dependent driver for us.

This series is touching to many parts of DPDK.
It really depends on maintainers of malloc, mempool and vfio.

I'm also afraid your cover letter is too difficult to understand,
because most of us do not know the acronyms you are talking about.
I will comment on it.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 00/10] Infrastructure to detect iova mapping on the bus
  2017-07-04  9:03       ` Thomas Monjalon
@ 2017-07-04  9:21         ` santosh
  2017-07-04  9:42           ` Thomas Monjalon
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-04  9:21 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, bruce.richardson, jerin.jacob, hemant.agrawal,
	shreyansh.jain, gaetan.rivet, sergio.gonzalez.monroy,
	Anatoly Burakov


On Tuesday 04 July 2017 02:33 PM, Thomas Monjalon wrote:

> 04/07/2017 09:57, santosh:
>> Hi Thomas,
>>
>> On Tuesday 04 July 2017 12:49 PM, Thomas Monjalon wrote:
>>
>>> 04/07/2017 06:41, santosh:
>>>> Ping?
>>> You should try to ping Sergio, memory maintainer,
>>> and Anatoly, VFIO maintainer.
>>>
>>> Given that
>>> - there is no review at all,
>> By default if no review then its maintainer responsibility to review Or 
>> ask someone to review?
> Yes, but it is also the responsibility of the author.

To review my own code? Or if its about pinging then you already picked and
asked list for review comment then why should I spam list by sending ping?

>> BTW: Who is the bus maintainer? I don't see entry in MAINTAINER file.
> Bus code is new and there is no maintainer yet.

then else you expect from author?

>
>>> - it is conflicting with the bus/PCI rework in progress,
>>> it will not be considered for 17.08.
>> We're adding only two new iommu_class bus api in rte_bus, I'm not sure
>> about conflict. If there is conflict then I should see review comment for
>> same in my patch set?
> It is mostly a time conflict.

I have not been told about that, so how author get to know. I could understand
code conflict , Can you suggest how to align and address time conflict, how
author could address time conflict?

>
>> This initiatives came out from [1], and we put lot of effort in
> You forgot the [1] reference.

http://dpdk.org/dev/patchwork/patch/24549/

If we're upto taking short cut then simply requested to push -iova-va as eal arg
but intent was to address in a proper way .. propose framework. That takes effort
and time!.

>
>> breaking down api from bus till library layer. This framework indeed
>> a need for those platform which cares for iova=va like octeontx, dpaa2 and
>> perhaps many future SoCs. W/o this framework, we can't get pktio working for octeontx ethdev 
>> in dpdk, can't get HW pool manager working for Octeontx offload blocks.
>>
>> I agree that I missed on sergio or Anatoly But crux of design is rte_bus
>> layer. I expect comment on those area, right?
>>
>> And if we have consent on bus approach then rest changes are trivial.
>>
>> I didn't ping before as You had picked my patch set and asked for review comment in past.
>>
>> Can we include it in RC2? Because it will delay upstreaming effort of octeontx ethdev driver
>> and other dependent driver for us.
> This series is touching to many parts of DPDK.
> It really depends on maintainers of malloc, mempool and vfio.

You are missing a point and this why a review feedback on crux of rte_bus
design was essential. if we agreed on rte_xx_iova_mode() then changes
at malloc/mempool and vfio is trivial or could have thought upon way to address in simpler way. 
And if there is disagreement then drop this approach. Provided if someone has better solution.
For that review comment on rte_bus changes a must.

>
> I'm also afraid your cover letter is too difficult to understand,
> because most of us do not know the acronyms you are talking about.
> I will comment on it.

Which is part not understood? can you please elaborate on details? How would
author come to know about that? Do I need to send patches to some other list where
most of folks review?

Thanks.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 00/10] Infrastructure to detect iova mapping on the bus
  2017-07-04  9:21         ` santosh
@ 2017-07-04  9:42           ` Thomas Monjalon
  0 siblings, 0 replies; 248+ messages in thread
From: Thomas Monjalon @ 2017-07-04  9:42 UTC (permalink / raw)
  To: santosh
  Cc: dev, bruce.richardson, jerin.jacob, hemant.agrawal,
	shreyansh.jain, gaetan.rivet, sergio.gonzalez.monroy,
	Anatoly Burakov

04/07/2017 11:21, santosh:
> 
> On Tuesday 04 July 2017 02:33 PM, Thomas Monjalon wrote:
> 
> > 04/07/2017 09:57, santosh:
> >> Hi Thomas,
> >>
> >> On Tuesday 04 July 2017 12:49 PM, Thomas Monjalon wrote:
> >>
> >>> 04/07/2017 06:41, santosh:
> >>>> Ping?
> >>> You should try to ping Sergio, memory maintainer,
> >>> and Anatoly, VFIO maintainer.
> >>>
> >>> Given that
> >>> - there is no review at all,
> >> By default if no review then its maintainer responsibility to review Or 
> >> ask someone to review?
> > Yes, but it is also the responsibility of the author.
> 
> To review my own code? Or if its about pinging then you already picked and
> asked list for review comment then why should I spam list by sending ping?

Not reviewing your own code :)
Yes you're right, I've tried to ping.
People do not make a lot of reviews. That's where we need more help.
There would be no problem if everyone waiting for a review were reviewing
some patches from others.

> >> BTW: Who is the bus maintainer? I don't see entry in MAINTAINER file.
> > Bus code is new and there is no maintainer yet.
> 
> then else you expect from author?

Yes, there are several authors of the bus rework.

> >>> - it is conflicting with the bus/PCI rework in progress,
> >>> it will not be considered for 17.08.
> >> We're adding only two new iommu_class bus api in rte_bus, I'm not sure
> >> about conflict. If there is conflict then I should see review comment for
> >> same in my patch set?
> > It is mostly a time conflict.
> 
> I have not been told about that, so how author get to know. I could understand
> code conflict , Can you suggest how to align and address time conflict, how
> author could address time conflict?

You could try to make other patches touching bus code to be reviewed,
so integrated faster.

> >> This initiatives came out from [1], and we put lot of effort in
> > You forgot the [1] reference.
> 
> http://dpdk.org/dev/patchwork/patch/24549/
> 
> If we're upto taking short cut then simply requested to push -iova-va as eal arg
> but intent was to address in a proper way .. propose framework. That takes effort
> and time!.

Yes, I understand.
It is a big work, and it may take time to be properly reviewed.
It happens that we cannot get others working with us in the right
timeframe. I try to coordinate but sometimes there are some fails.
We can still try to speed things up and see what happens.

> >> breaking down api from bus till library layer. This framework indeed
> >> a need for those platform which cares for iova=va like octeontx, dpaa2 and
> >> perhaps many future SoCs. W/o this framework, we can't get pktio working for octeontx ethdev 
> >> in dpdk, can't get HW pool manager working for Octeontx offload blocks.
> >>
> >> I agree that I missed on sergio or Anatoly But crux of design is rte_bus
> >> layer. I expect comment on those area, right?
> >>
> >> And if we have consent on bus approach then rest changes are trivial.
> >>
> >> I didn't ping before as You had picked my patch set and asked for review comment in past.
> >>
> >> Can we include it in RC2? Because it will delay upstreaming effort of octeontx ethdev driver
> >> and other dependent driver for us.
> > This series is touching to many parts of DPDK.
> > It really depends on maintainers of malloc, mempool and vfio.
> 
> You are missing a point and this why a review feedback on crux of rte_bus
> design was essential. if we agreed on rte_xx_iova_mode() then changes
> at malloc/mempool and vfio is trivial or could have thought upon way to address in simpler way. 
> And if there is disagreement then drop this approach. Provided if someone has better solution.
> For that review comment on rte_bus changes a must.
> 
> > I'm also afraid your cover letter is too difficult to understand,
> > because most of us do not know the acronyms you are talking about.
> > I will comment on it.
> 
> Which is part not understood? can you please elaborate on details? How would
> author come to know about that? Do I need to send patches to some other list where
> most of folks review?

Santosh, please do not blame me, I cannot review everything on the
mailing list.
I am going to ask questions in order to make this cover letter
easier to understand.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 00/10] Infrastructure to detect iova mapping on the bus
  2017-06-08 11:05 [PATCH 00/10] Infrastructure to detect iova mapping on the bus Santosh Shukla
                   ` (10 preceding siblings ...)
  2017-07-04  4:41 ` [PATCH 00/10] Infrastructure to detect iova mapping on the bus santosh
@ 2017-07-04 10:10 ` Thomas Monjalon
  2017-07-04 11:20   ` santosh
  2017-07-05  9:30 ` Maxime Coquelin
  2017-07-10 11:42 ` [PATCH v2 00/12] " Santosh Shukla
  13 siblings, 1 reply; 248+ messages in thread
From: Thomas Monjalon @ 2017-07-04 10:10 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: dev, bruce.richardson, jerin.jacob, hemant.agrawal,
	shreyansh.jain, gaetan.rivet, jblunck, olivier.matz, jianbo.liu,
	anatoly.burakov, sergio.gonzalez.monroy

Hi Santosh,
Let's try to make this proposal clearer in order to have some reviews.

08/06/2017 13:05, Santosh Shukla:
> Q) Why do we need such infrastructure?
> 
> A) Some NPU hardware like OCTEONTX follows push model to get the packet
> from the pktio device. Where packet allocation and freeing done
> by the HW. Since HW can operate only on IOVA with help of SMMU/IOMMU,

Some readers may not know IOVA: IO Virtual address.
Some explanations:
	https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt
	http://vfio.blogspot.fr/2014/08/iommu-groups-inside-and-out.html

It must be said that SMMU is equivalent to IOMMU for ARM:
	https://developer.arm.com/products/system-ip/system-controllers/system-memory-management-unit

> when packet receives from the Ethernet device, it is the IOVA address
> (which is PA in existing scheme).

You mean that we are currently using only Physical Address (PA)?

> Mapping IOVA as PA is expensive on those HW, where every packet
> needs to be converted to VA from PA/IOVA.

Please, could you explain how and where addresses are converted currently?

> This patchset proposes the method to autodetect the preferred
> IOVA mode for a device. Summary of IOVA scheme:
> - If all the devices are iommu capable and support IOMMU
>   capable driver then selects IOVA_VA.
> - If any of the devices are non-iommu then use default IOVA
>   scheme ie. IOVA_PA.
> - If no device found then IOVA scheme would be
>   IOVA_DC (Don't care).

I think you should better describe these modes and how they behave.

> To achieve that, two global APIs introduced:
> - rte_bus_get_iommu_class
> - rte_pci_get_iommu_class
> 
> Return values for those APIs are:
> enum rte_iova_mod {
>         RTE_IOVA_DC, /* Don't care */
>         RTE_IOVA_PA,
>         RTE_IOVA_VA
> }
> 
> Those are the bus policy for selecting IOVA mode. In case user
> want to override bus IOVA mapping then added an EAL option
> "--iova-mode=<string>". User to pass string format 'pa' --> IOVA_PA,
> 'va' --> IOVA_VA.
> 
> To support new eal option, adding global API:
> - rte_eal_iova_mode
> 
> Patch Summary:
> 2) 1st - 2th patch: Adds infrastructure in linuxapp and bsdapp
> layer.
> 1) 3rd patch: Introduces global bus api named rte_bus_get_iommu_class.
> 3) 4th patch: Add new eal option called --iova-mode=<mode-string>.
> 4) 5th - 6th patch: Logic to detect iova scheme.
> 5) 9th patch: Check IOVA mode before programing vfio dma_map.iova.
> Default scheme is IOVA_PA.
> 6) 10th-12th patch: Check for IOVA_VA mode in below APIs
>         - rte_mem_virt2phy
>         - rte_mempool_virt2phy
>         - rte_malloc_virt2phy
> If set then return paddr=vaddr, else return value from default
> implementation.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 00/10] Infrastructure to detect iova mapping on the bus
  2017-07-04 10:10 ` Thomas Monjalon
@ 2017-07-04 11:20   ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-07-04 11:20 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, bruce.richardson, jerin.jacob, hemant.agrawal,
	shreyansh.jain, gaetan.rivet, jblunck, olivier.matz, jianbo.liu,
	anatoly.burakov, sergio.gonzalez.monroy

On Tuesday 04 July 2017 03:40 PM, Thomas Monjalon wrote:

> Hi Santosh,
> Let's try to make this proposal clearer in order to have some reviews.
>
> 08/06/2017 13:05, Santosh Shukla:
>> Q) Why do we need such infrastructure?
>>
>> A) Some NPU hardware like OCTEONTX follows push model to get the packet
>> from the pktio device. Where packet allocation and freeing done
>> by the HW. Since HW can operate only on IOVA with help of SMMU/IOMMU,
> Some readers may not know IOVA: IO Virtual address.
> Some explanations:
> 	https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt
> 	http://vfio.blogspot.fr/2014/08/iommu-groups-inside-and-out.html
>
> It must be said that SMMU is equivalent to IOMMU for ARM:
> 	https://developer.arm.com/products/system-ip/system-controllers/system-memory-management-unit
>
>> when packet receives from the Ethernet device, it is the IOVA address
>> (which is PA in existing scheme).
> You mean that we are currently using only Physical Address (PA)?

Yes. DPDK default approach is iova=pa. Refer [1], latest example [2].
[1] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n709
[2] http://dpdk.org/browse/dpdk/tree/drivers/bus/fslmc/fslmc_vfio.c#n231

>> Mapping IOVA as PA is expensive on those HW, where every packet
>> needs to be converted to VA from PA/IOVA.
> Please, could you explain how and where addresses are converted currently?

HW(iommu/smmu) does. 
VFIO case for example: user could program vfio(s) dma_map.iova as _pa or _va.

And below api does address translation in dpdk:
rte_mem_virt2phy
rte_malloc_virt2phy
rte_mempool_virt2phy.

>
>> This patchset proposes the method to autodetect the preferred
>> IOVA mode for a device. Summary of IOVA scheme:
>> - If all the devices are iommu capable and support IOMMU
>>   capable driver then selects IOVA_VA.
>> - If any of the devices are non-iommu then use default IOVA
>>   scheme ie. IOVA_PA.
>> - If no device found then IOVA scheme would be
>>   IOVA_DC (Don't care).
> I think you should better describe these modes and how they behave.

Aren't they self explanatory? meaning
0) If I program my dma device (of-course, iommu-backed-dma-device) as iova = va, 
then expect dma address (iova) a _va.
1) If I program my dma device (noiommu, e.g. vfio-noiommu or igb_uio case) as iova=pa,
then expect _pa.
2) If I program my dma device (+iommu-backed) as iova = pa
then expect dma address as _pa. 

above described approach tested and works for both x86 and arm64.

The default scheme for iova mapping is iova=pa. And framework
allows user to explicitly override any scheme via --iova-mode=<>.

Thanks.

>> To achieve that, two global APIs introduced:
>> - rte_bus_get_iommu_class
>> - rte_pci_get_iommu_class
>>
>> Return values for those APIs are:
>> enum rte_iova_mod {
>>         RTE_IOVA_DC, /* Don't care */
>>         RTE_IOVA_PA,
>>         RTE_IOVA_VA
>> }
>>
>> Those are the bus policy for selecting IOVA mode. In case user
>> want to override bus IOVA mapping then added an EAL option
>> "--iova-mode=<string>". User to pass string format 'pa' --> IOVA_PA,
>> 'va' --> IOVA_VA.
>>
>> To support new eal option, adding global API:
>> - rte_eal_iova_mode
>>
>> Patch Summary:
>> 2) 1st - 2th patch: Adds infrastructure in linuxapp and bsdapp
>> layer.
>> 1) 3rd patch: Introduces global bus api named rte_bus_get_iommu_class.
>> 3) 4th patch: Add new eal option called --iova-mode=<mode-string>.
>> 4) 5th - 6th patch: Logic to detect iova scheme.
>> 5) 9th patch: Check IOVA mode before programing vfio dma_map.iova.
>> Default scheme is IOVA_PA.
>> 6) 10th-12th patch: Check for IOVA_VA mode in below APIs
>>         - rte_mem_virt2phy
>>         - rte_mempool_virt2phy
>>         - rte_malloc_virt2phy
>> If set then return paddr=vaddr, else return value from default
>> implementation.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 02/10] linuxapp/eal_pci: get iommu class
  2017-06-08 11:05 ` [PATCH 02/10] linuxapp/eal_pci: " Santosh Shukla
@ 2017-07-05  8:17   ` Maxime Coquelin
  2017-07-05 10:05     ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Maxime Coquelin @ 2017-07-05  8:17 UTC (permalink / raw)
  To: Santosh Shukla, thomas, bruce.richardson, dev
  Cc: jerin.jacob, hemant.agrawal, shreyansh.jain, gaetan.rivet



On 06/08/2017 01:05 PM, Santosh Shukla wrote:
> Get iommu class of PCI device on the bus and returns preferred iova
> mapping mode for that bus.
> 
> IOVA mapping scheme for linuxapp case:
> - uio/uio_generic/vfio_noiommu --> default i.e.. (RTE_IOVA_PA)
> - vfio --> RTE_IOVA_VA.
> - In case of no device attached to any driver,
>    return RTE_IOVA_DC.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>   lib/librte_eal/linuxapp/eal/eal_pci.c           | 38 +++++++++++++++++++++++++
>   lib/librte_eal/linuxapp/eal/eal_vfio.c          | 23 +++++++++++++++
>   lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 +++
>   lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +++++
>   4 files changed, 72 insertions(+)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
> index 595622b21..2772e883e 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> @@ -45,6 +45,7 @@
>   #include "eal_filesystem.h"
>   #include "eal_private.h"
>   #include "eal_pci_init.h"
> +#include "eal_vfio.h"
>   
>   /**
>    * @file
> @@ -488,6 +489,43 @@ rte_pci_scan(void)
>   	return -1;
>   }
>   
> +/*
> + * Get iommu class of PCI devices on the bus.
> + * Check that those devices are attached to iommu driver.
> + * If attached then return iova_va or iova_pa mode, else
> + * return with dont_care(_DC).
> + */
> +enum rte_iova_mode
> +rte_pci_get_iommu_class(void)
> +{
> +	struct rte_pci_device *dev = NULL;
> +	int ret = RTE_IOVA_DC;
> +
> +	TAILQ_FOREACH(dev, &rte_pci_bus.device_list, next) {
> +
> +		if (dev->kdrv == RTE_KDRV_UNKNOWN ||
> +		    dev->kdrv == RTE_KDRV_NONE)
> +			continue;
> +
> +		if (dev->kdrv != RTE_KDRV_VFIO) {
> +			ret = RTE_IOVA_PA;
> +			return ret;
> +		}
> +
> +		ret = RTE_IOVA_VA;
> +	}
> +
> +	/* In case of iova_va, check for vfio_noiommu mode */
> +	if (ret == RTE_IOVA_VA) {
> +#ifdef VFIO_PRESENT
> +		if (vfio_noiommu_is_enabled() == 1)
> +#endif
> +			ret = RTE_IOVA_PA;
> +	}


> +
> +	return ret;
> +}
> +
>   /* Read PCI config space. */
>   int rte_pci_read_config(const struct rte_pci_device *device,
>   		void *buf, size_t len, off_t offset)
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> index 946df7e31..04914406f 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -816,4 +816,27 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
>   	return 0;
>   }
>   
> +int
> +vfio_noiommu_is_enabled(void)
> +{
> +#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 2, 0)

Please don't check against Kernel version, as distros may
have backported the feature on older Kernels versions (RHEL
for exmaple).

Also, it doesn't look necessary since open should fail below
on older kernels.

> +	return -1;
> +#else
> +	int fd, ret, cnt __rte_unused;
> +	char c;
> +
> +	ret = -1;
> +	fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
> +	if (fd < 0)
> +		return -1;
> +
> +	cnt = read(fd, &c, 1);
> +	if (c == 'Y')
> +		ret = 1;
> +
> +	close(fd);
> +	return ret;
> +#endif
> +}
> +
>   #endif
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
> index 5ff63e5d7..26ea8e119 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
> @@ -150,6 +150,8 @@ struct vfio_config {
>   #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
>   #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
>   #define VFIO_GET_REGION_IDX(x) (x >> 40)
> +#define VFIO_NOIOMMU_MODE      \
> +	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
>   
>   /* DMA mapping function prototype.
>    * Takes VFIO container fd as a parameter.
> @@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
>   
>   int vfio_mp_sync_setup(void);
>   
> +int vfio_noiommu_is_enabled(void);
> +
>   #define SOCKET_REQ_CONTAINER 0x100
>   #define SOCKET_REQ_GROUP 0x200
>   #define SOCKET_CLR_GROUP 0x300
> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> index 670bab3a5..2cea7c272 100644
> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> @@ -198,3 +198,10 @@ DPDK_17.05 {
>   	vfio_get_group_no;
>   
>   } DPDK_17.02;
> +
> +DPDK_17.08 {
> +	global:
> +
> +	rte_pci_get_iommu_class;
> +
> +} DPDK_17.05;
> 

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping
  2017-06-08 11:05 ` [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
@ 2017-07-05  9:14   ` Maxime Coquelin
  2017-07-05 15:43     ` Jerin Jacob
  0 siblings, 1 reply; 248+ messages in thread
From: Maxime Coquelin @ 2017-07-05  9:14 UTC (permalink / raw)
  To: Santosh Shukla, thomas, bruce.richardson, dev
  Cc: jerin.jacob, hemant.agrawal, shreyansh.jain, gaetan.rivet



On 06/08/2017 01:05 PM, Santosh Shukla wrote:
> Check iova mode and accordingly map iova to pa or va.
> 
> Signed-off-by: Santosh Shukla<santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob<jerin.jacob@caviumnetworks.com>
> ---
>   lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> index 04914406f..348b7a7f4 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
>   		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>   		dma_map.vaddr = ms[i].addr_64;
>   		dma_map.size = ms[i].len;
> -		dma_map.iova = ms[i].phys_addr;
> +		if (rte_eal_iova_mode() == RTE_IOVA_VA)
> +			dma_map.iova = dma_map.vaddr;
> +		else
> +			dma_map.iova = ms[i].phys_addr;
>   		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
>   

IIUC, it is changing default behavior for VFIO devices.

I see a possible problem, but I'm not sure the case is valid.

Imagine you have two devices in the iommu group, and the two devices are
used in separate processes. Each process could try two different
physical addresses at the same virtual address, and so the second map
would fail.

By using physical addresses, you are safe against this problem.

Any thoughts?

Cheers,
Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 00/10] Infrastructure to detect iova mapping on the bus
  2017-06-08 11:05 [PATCH 00/10] Infrastructure to detect iova mapping on the bus Santosh Shukla
                   ` (11 preceding siblings ...)
  2017-07-04 10:10 ` Thomas Monjalon
@ 2017-07-05  9:30 ` Maxime Coquelin
  2017-07-05  9:47   ` santosh
  2017-07-10 11:42 ` [PATCH v2 00/12] " Santosh Shukla
  13 siblings, 1 reply; 248+ messages in thread
From: Maxime Coquelin @ 2017-07-05  9:30 UTC (permalink / raw)
  To: Santosh Shukla, thomas, bruce.richardson, dev
  Cc: jerin.jacob, hemant.agrawal, shreyansh.jain, gaetan.rivet



On 06/08/2017 01:05 PM, Santosh Shukla wrote:
> Q) Why do we need such infrastructure?
> 
> A) Some NPU hardware like OCTEONTX follows push model to get the packet
> from the pktio device. Where packet allocation and freeing done
> by the HW. Since HW can operate only on IOVA with help of SMMU/IOMMU,
> when packet receives from the Ethernet device, it is the IOVA address
> (which is PA in existing scheme).
> 
> Mapping IOVA as PA is expensive on those HW, where every packet
> needs to be converted to VA from PA/IOVA.
> 
> This patchset proposes the method to autodetect the preferred
> IOVA mode for a device. Summary of IOVA scheme:
> - If all the devices are iommu capable and support IOMMU
>    capable driver then selects IOVA_VA.
> - If any of the devices are non-iommu then use default IOVA
>    scheme ie. IOVA_PA.
> - If no device found then IOVA scheme would be
>    IOVA_DC (Don't care).

Isn't it possible to have a per-device granularity?
For example, with virt case, having a physical NIC using VFIO with
iommu, and virtio devices with noiommu.

If the physical NIC prefers working with VAs, why forcing it to use
PAs? Maybe I missed a limitation though.

Cheers,
Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 00/10] Infrastructure to detect iova mapping on the bus
  2017-07-05  9:30 ` Maxime Coquelin
@ 2017-07-05  9:47   ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-07-05  9:47 UTC (permalink / raw)
  To: Maxime Coquelin, thomas, bruce.richardson, dev
  Cc: jerin.jacob, hemant.agrawal, shreyansh.jain, gaetan.rivet

Hi Maxime,

On Wednesday 05 July 2017 03:00 PM, Maxime Coquelin wrote:

>
> On 06/08/2017 01:05 PM, Santosh Shukla wrote:
>> Q) Why do we need such infrastructure?
>>
>> A) Some NPU hardware like OCTEONTX follows push model to get the packet
>> from the pktio device. Where packet allocation and freeing done
>> by the HW. Since HW can operate only on IOVA with help of SMMU/IOMMU,
>> when packet receives from the Ethernet device, it is the IOVA address
>> (which is PA in existing scheme).
>>
>> Mapping IOVA as PA is expensive on those HW, where every packet
>> needs to be converted to VA from PA/IOVA.
>>
>> This patchset proposes the method to autodetect the preferred
>> IOVA mode for a device. Summary of IOVA scheme:
>> - If all the devices are iommu capable and support IOMMU
>>    capable driver then selects IOVA_VA.
>> - If any of the devices are non-iommu then use default IOVA
>>    scheme ie. IOVA_PA.
>> - If no device found then IOVA scheme would be
>>    IOVA_DC (Don't care).
>
> Isn't it possible to have a per-device granularity?
> For example, with virt case, having a physical NIC using VFIO with
> iommu, and virtio devices with noiommu.
>
At device level granularity, Classification will fall under with iommu and
w/o iommu category.

Like in your example, virtio and physical NIC classified under with iommu and w/o iommu category.
And best way to detect with or w/o iommu is bus layer. That way bus could decide
upon iova mapping mode. Note that iova mapping rules are not enforced. User can
alway override iova-mode mapping.

> If the physical NIC prefers working with VAs, why forcing it to use
> PAs? Maybe I missed a limitation though.
>
With this patch set, if physical NIC + iommu/vfio then mapping mode is iova=va which is 
what you mentioned.

Thanks.

> Cheers,
> Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 02/10] linuxapp/eal_pci: get iommu class
  2017-07-05  8:17   ` Maxime Coquelin
@ 2017-07-05 10:05     ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-07-05 10:05 UTC (permalink / raw)
  To: Maxime Coquelin, thomas, bruce.richardson, dev
  Cc: jerin.jacob, hemant.agrawal, shreyansh.jain, gaetan.rivet

On Wednesday 05 July 2017 01:47 PM, Maxime Coquelin wrote:

>
> On 06/08/2017 01:05 PM, Santosh Shukla wrote:
>> Get iommu class of PCI device on the bus and returns preferred iova
>> mapping mode for that bus.
>>
>> IOVA mapping scheme for linuxapp case:
>> - uio/uio_generic/vfio_noiommu --> default i.e.. (RTE_IOVA_PA)
>> - vfio --> RTE_IOVA_VA.
>> - In case of no device attached to any driver,
>>    return RTE_IOVA_DC.
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> ---
>>   lib/librte_eal/linuxapp/eal/eal_pci.c           | 38 +++++++++++++++++++++++++
>>   lib/librte_eal/linuxapp/eal/eal_vfio.c          | 23 +++++++++++++++
>>   lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 +++
>>   lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +++++
>>   4 files changed, 72 insertions(+)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> index 595622b21..2772e883e 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> @@ -45,6 +45,7 @@
>>   #include "eal_filesystem.h"
>>   #include "eal_private.h"
>>   #include "eal_pci_init.h"
>> +#include "eal_vfio.h"
>>     /**
>>    * @file
>> @@ -488,6 +489,43 @@ rte_pci_scan(void)
>>       return -1;
>>   }
>>   +/*
>> + * Get iommu class of PCI devices on the bus.
>> + * Check that those devices are attached to iommu driver.
>> + * If attached then return iova_va or iova_pa mode, else
>> + * return with dont_care(_DC).
>> + */
>> +enum rte_iova_mode
>> +rte_pci_get_iommu_class(void)
>> +{
>> +    struct rte_pci_device *dev = NULL;
>> +    int ret = RTE_IOVA_DC;
>> +
>> +    TAILQ_FOREACH(dev, &rte_pci_bus.device_list, next) {
>> +
>> +        if (dev->kdrv == RTE_KDRV_UNKNOWN ||
>> +            dev->kdrv == RTE_KDRV_NONE)
>> +            continue;
>> +
>> +        if (dev->kdrv != RTE_KDRV_VFIO) {
>> +            ret = RTE_IOVA_PA;
>> +            return ret;
>> +        }
>> +
>> +        ret = RTE_IOVA_VA;
>> +    }
>> +
>> +    /* In case of iova_va, check for vfio_noiommu mode */
>> +    if (ret == RTE_IOVA_VA) {
>> +#ifdef VFIO_PRESENT
>> +        if (vfio_noiommu_is_enabled() == 1)
>> +#endif
>> +            ret = RTE_IOVA_PA;
>> +    }
>
>
>> +
>> +    return ret;
>> +}
>> +
>>   /* Read PCI config space. */
>>   int rte_pci_read_config(const struct rte_pci_device *device,
>>           void *buf, size_t len, off_t offset)
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> index 946df7e31..04914406f 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> @@ -816,4 +816,27 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
>>       return 0;
>>   }
>>   +int
>> +vfio_noiommu_is_enabled(void)
>> +{
>> +#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 2, 0)
>
> Please don't check against Kernel version, as distros may
> have backported the feature on older Kernels versions (RHEL
> for exmaple).
>
> Also, it doesn't look necessary since open should fail below
> on older kernels.
>
valid point. in v2. Thanks.

>> +    return -1;
>> +#else
>> +    int fd, ret, cnt __rte_unused;
>> +    char c;
>> +
>> +    ret = -1;
>> +    fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
>> +    if (fd < 0)
>> +        return -1;
>> +
>> +    cnt = read(fd, &c, 1);
>> +    if (c == 'Y')
>> +        ret = 1;
>> +
>> +    close(fd);
>> +    return ret;
>> +#endif
>> +}
>> +
>>   #endif
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
>> index 5ff63e5d7..26ea8e119 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
>> @@ -150,6 +150,8 @@ struct vfio_config {
>>   #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
>>   #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
>>   #define VFIO_GET_REGION_IDX(x) (x >> 40)
>> +#define VFIO_NOIOMMU_MODE      \
>> +    "/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
>>     /* DMA mapping function prototype.
>>    * Takes VFIO container fd as a parameter.
>> @@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
>>     int vfio_mp_sync_setup(void);
>>   +int vfio_noiommu_is_enabled(void);
>> +
>>   #define SOCKET_REQ_CONTAINER 0x100
>>   #define SOCKET_REQ_GROUP 0x200
>>   #define SOCKET_CLR_GROUP 0x300
>> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>> index 670bab3a5..2cea7c272 100644
>> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>> @@ -198,3 +198,10 @@ DPDK_17.05 {
>>       vfio_get_group_no;
>>     } DPDK_17.02;
>> +
>> +DPDK_17.08 {
>> +    global:
>> +
>> +    rte_pci_get_iommu_class;
>> +
>> +} DPDK_17.05;
>>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 05/10] linuxapp/eal: detect iova mode
  2017-06-08 11:05 ` [PATCH 05/10] linuxapp/eal: detect " Santosh Shukla
@ 2017-07-05 13:17   ` Hemant Agrawal
  2017-07-05 13:49     ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Hemant Agrawal @ 2017-07-05 13:17 UTC (permalink / raw)
  To: Santosh Shukla, thomas, bruce.richardson, dev
  Cc: jerin.jacob, shreyansh.jain, gaetan.rivet

On 6/8/2017 4:35 PM, Santosh Shukla wrote:
> - Moving late bus scanning to up..just after eal_parsing.
> - Detect iova mapping mode based on user provided eal option
>   (rte_eal_iova_mode) and result of rte_bus_scan_iommu_class.
>
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>  lib/librte_eal/linuxapp/eal/eal.c               | 24 ++++++++++++++++++------
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>  2 files changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
> index 7c78f2dc2..54f42d752 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -122,6 +122,13 @@ struct internal_config internal_config;
>  /* used by rte_rdtsc() */
>  int rte_cycles_vmware_tsc_map;
>
> +/* Get the iova mode */
> +enum rte_iova_mode
> +rte_eal_iova_mode(void)
> +{
> +	return internal_config.iova_mode;
> +}
> +
>  /* Return a pointer to the configuration structure */
>  struct rte_config *
>  rte_eal_get_configuration(void)
> @@ -793,6 +800,17 @@ rte_eal_init(int argc, char **argv)
>  		return -1;
>  	}
>
> +	if (rte_bus_scan()) {
> +		rte_eal_init_alert("Cannot scan the buses for devices\n");
> +		rte_errno = ENODEV;
> +		return -1;
> +	}
> +

The bus scanning includes allocating memory for the devices. It can not 
be moved so early.

I am still testing it out.

> +	if (rte_eal_iova_mode() == RTE_IOVA_VA &&
> +	    rte_bus_get_iommu_class() == RTE_IOVA_VA) {
> +		internal_config.iova_mode = RTE_IOVA_VA;
> +	}
> +
>  	if (internal_config.no_hugetlbfs == 0 &&
>  			internal_config.process_type != RTE_PROC_SECONDARY &&
>  			internal_config.xen_dom0_support == 0 &&
> @@ -890,12 +908,6 @@ rte_eal_init(int argc, char **argv)
>  		return -1;
>  	}
>
> -	if (rte_bus_scan()) {
> -		rte_eal_init_alert("Cannot scan the buses for devices\n");
> -		rte_errno = ENODEV;
> -		return -1;
> -	}
> -
>  	RTE_LCORE_FOREACH_SLAVE(i) {
>
>  		/*
> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> index 6c016c82e..79b005036 100644
> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> @@ -204,5 +204,6 @@ DPDK_17.08 {
>
>  	rte_pci_get_iommu_class;
>  	rte_bus_get_iommu_class;
> +	rte_eal_iova_mode;
>
>  } DPDK_17.05;
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 05/10] linuxapp/eal: detect iova mode
  2017-07-05 13:17   ` Hemant Agrawal
@ 2017-07-05 13:49     ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-07-05 13:49 UTC (permalink / raw)
  To: Hemant Agrawal, thomas, bruce.richardson, dev
  Cc: jerin.jacob, shreyansh.jain, gaetan.rivet

Hi Hemant,

On Wednesday 05 July 2017 06:47 PM, Hemant Agrawal wrote:

> On 6/8/2017 4:35 PM, Santosh Shukla wrote:
>> - Moving late bus scanning to up..just after eal_parsing.
>> - Detect iova mapping mode based on user provided eal option
>>   (rte_eal_iova_mode) and result of rte_bus_scan_iommu_class.
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> ---
>>  lib/librte_eal/linuxapp/eal/eal.c               | 24 ++++++++++++++++++------
>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>  2 files changed, 19 insertions(+), 6 deletions(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
>> index 7c78f2dc2..54f42d752 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal.c
>> @@ -122,6 +122,13 @@ struct internal_config internal_config;
>>  /* used by rte_rdtsc() */
>>  int rte_cycles_vmware_tsc_map;
>>
>> +/* Get the iova mode */
>> +enum rte_iova_mode
>> +rte_eal_iova_mode(void)
>> +{
>> +    return internal_config.iova_mode;
>> +}
>> +
>>  /* Return a pointer to the configuration structure */
>>  struct rte_config *
>>  rte_eal_get_configuration(void)
>> @@ -793,6 +800,17 @@ rte_eal_init(int argc, char **argv)
>>          return -1;
>>      }
>>
>> +    if (rte_bus_scan()) {
>> +        rte_eal_init_alert("Cannot scan the buses for devices\n");
>> +        rte_errno = ENODEV;
>> +        return -1;
>> +    }
>> +
>
> The bus scanning includes allocating memory for the devices. It can not be moved so early.
>
Right and that memory allocation is malloc based. I verified for same for pci_scan_one case.
Also looking at drivers/bus/fslmc/* , IIUC then your not calling rte_mem* api's at the time of
bus scanning, right? And you do dma_mapping at ethdev initialization time which is referring
to memseg, so afaict, prepositioning bus_scan at very early won't effect functionality.
 

> I am still testing it out.

Thanks for testing, Please share you feedback.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping
  2017-07-05  9:14   ` Maxime Coquelin
@ 2017-07-05 15:43     ` Jerin Jacob
  2017-07-06  7:58       ` Maxime Coquelin
  0 siblings, 1 reply; 248+ messages in thread
From: Jerin Jacob @ 2017-07-05 15:43 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Santosh Shukla, thomas, bruce.richardson, dev, hemant.agrawal,
	shreyansh.jain, gaetan.rivet

-----Original Message-----
> Date: Wed, 5 Jul 2017 11:14:01 +0200
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> To: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
>  thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org
> CC: jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com,
>  shreyansh.jain@nxp.com, gaetan.rivet@6wind.com
> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
>  before mapping
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>  Thunderbird/52.1.0
> 
> 
> 
> On 06/08/2017 01:05 PM, Santosh Shukla wrote:
> > Check iova mode and accordingly map iova to pa or va.
> > 
> > Signed-off-by: Santosh Shukla<santosh.shukla@caviumnetworks.com>
> > Signed-off-by: Jerin Jacob<jerin.jacob@caviumnetworks.com>
> > ---
> >   lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
> >   1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > index 04914406f..348b7a7f4 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
> >   		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
> >   		dma_map.vaddr = ms[i].addr_64;
> >   		dma_map.size = ms[i].len;
> > -		dma_map.iova = ms[i].phys_addr;
> > +		if (rte_eal_iova_mode() == RTE_IOVA_VA)
> > +			dma_map.iova = dma_map.vaddr;
> > +		else
> > +			dma_map.iova = ms[i].phys_addr;
> >   		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
> 
> IIUC, it is changing default behavior for VFIO devices.
> 
> I see a possible problem, but I'm not sure the case is valid.
> 
> Imagine you have two devices in the iommu group, and the two devices are
> used in separate processes. Each process could try two different
> physical addresses at the same virtual address, and so the second map
> would fail.

IMO, Doesn't look like a problem. Here is the data flow

1) The vfio DMA map function(vfio_type1_dma_map()) will be called only
on primary process
http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n359

2) On secondary process, DPDK rte_eal_huge_page_attach() will make sure
that, the Secondary process has the _same_ virtual address as primary or
exit from on attach.
http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_memory.c#n1452 

3) Since secondary process adds the mapped the virtual address in step (2).
in the page table in OS. On SMMU entry miss(When device
request from I/O transaction), OS will load the mapping and update the SMMU 
"context" with page tables from MMU.

Let me add the background for why this feature is required in DPDK to
enable NPU style co-processors.

The traditional NICs the Rx path code look like this:
1) On control path, Fill the mempool with buffers
2) on rx_burst(), alloc the mbuf from mempool
3) SW has the mbuf in hand(which is a virtual address) and program the
HW with mbuf->buf_physaddr)
4) Return the last pushed mbuf(will be updated by HW by now)


On NPU style co-processors, situation is different as the buffer recycling 
has been done in HW unlike SW model. Here is the data flow:
1) On control path, Fill the HW mempool with buffers(Obviously the IOVA
address, which is PA in existing model)
2) on rx_burst, HW gives you IOVA address(as address as step 1)
3) As application expects VA to operate on it, rx_burst() needs to
convert to VA from PAA. Which is very costly.
Instead with this IOVA as VA scheme, We can avoid the cost of converting
with help of IOMMU/SMMU.

This patch set auto detects the mode based available of type devices in
bus and provides an option to override mode based on eal argument, so we
don't foresee any issue with this approach and welcome any alternative
approaches.

Similar problem exists in another part of the code in DPDK,
http://dpdk.org/browse/dpdk/tree/drivers/bus/fslmc/fslmc_vfio.c#n231
Its a conditional compilation based approach with duplicating the vfio
code and we are trying to fix the problem in a generic way so that
everyone can get benefited out of it.

Comments are welcome.

/Jerin

> 
> By using physical addresses, you are safe against this problem.
> 
> Any thoughts?
> 
> Cheers,
> Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping
  2017-07-05 15:43     ` Jerin Jacob
@ 2017-07-06  7:58       ` Maxime Coquelin
  2017-07-06  9:49         ` Jerin Jacob
  0 siblings, 1 reply; 248+ messages in thread
From: Maxime Coquelin @ 2017-07-06  7:58 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Santosh Shukla, thomas, bruce.richardson, dev, hemant.agrawal,
	shreyansh.jain, gaetan.rivet



On 07/05/2017 05:43 PM, Jerin Jacob wrote:
> -----Original Message-----
>> Date: Wed, 5 Jul 2017 11:14:01 +0200
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> To: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
>>   thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org
>> CC: jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com,
>>   shreyansh.jain@nxp.com, gaetan.rivet@6wind.com
>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
>>   before mapping
>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>>   Thunderbird/52.1.0
>>
>>
>>
>> On 06/08/2017 01:05 PM, Santosh Shukla wrote:
>>> Check iova mode and accordingly map iova to pa or va.
>>>
>>> Signed-off-by: Santosh Shukla<santosh.shukla@caviumnetworks.com>
>>> Signed-off-by: Jerin Jacob<jerin.jacob@caviumnetworks.com>
>>> ---
>>>    lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
>>>    1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>> index 04914406f..348b7a7f4 100644
>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>> @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
>>>    		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>>>    		dma_map.vaddr = ms[i].addr_64;
>>>    		dma_map.size = ms[i].len;
>>> -		dma_map.iova = ms[i].phys_addr;
>>> +		if (rte_eal_iova_mode() == RTE_IOVA_VA)
>>> +			dma_map.iova = dma_map.vaddr;
>>> +		else
>>> +			dma_map.iova = ms[i].phys_addr;
>>>    		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
>>
>> IIUC, it is changing default behavior for VFIO devices.
>>
>> I see a possible problem, but I'm not sure the case is valid.
>>
>> Imagine you have two devices in the iommu group, and the two devices are
>> used in separate processes. Each process could try two different
>> physical addresses at the same virtual address, and so the second map
>> would fail.
> 
> IMO, Doesn't look like a problem. Here is the data flow
> 
> 1) The vfio DMA map function(vfio_type1_dma_map()) will be called only
> on primary process
> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n359
> 
> 2) On secondary process, DPDK rte_eal_huge_page_attach() will make sure
> that, the Secondary process has the _same_ virtual address as primary or
> exit from on attach.
> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_memory.c#n1452
> 
> 3) Since secondary process adds the mapped the virtual address in step (2).
> in the page table in OS. On SMMU entry miss(When device
> request from I/O transaction), OS will load the mapping and update the SMMU
> "context" with page tables from MMU.

Ok thanks for the detailed info, but what about the case where the same
iommu group is used by two primary processes?

I don't know how frequent it is, but if ACS is not supported by either 
the endpoint or the the root port, then you would have to share the same 
IOMMU group for all the ports of your card. Right?

> Let me add the background for why this feature is required in DPDK to
> enable NPU style co-processors.
> 
> The traditional NICs the Rx path code look like this:
> 1) On control path, Fill the mempool with buffers
> 2) on rx_burst(), alloc the mbuf from mempool
> 3) SW has the mbuf in hand(which is a virtual address) and program the
> HW with mbuf->buf_physaddr)
> 4) Return the last pushed mbuf(will be updated by HW by now)
> 
> 
> On NPU style co-processors, situation is different as the buffer recycling
> has been done in HW unlike SW model. Here is the data flow:
> 1) On control path, Fill the HW mempool with buffers(Obviously the IOVA
> address, which is PA in existing model)
> 2) on rx_burst, HW gives you IOVA address(as address as step 1)
> 3) As application expects VA to operate on it, rx_burst() needs to
> convert to VA from PAA. Which is very costly.
> Instead with this IOVA as VA scheme, We can avoid the cost of converting
> with help of IOMMU/SMMU.
> 
> This patch set auto detects the mode based available of type devices in
> bus and provides an option to override mode based on eal argument, so we
> don't foresee any issue with this approach and welcome any alternative
> approaches.

I don't question the need of the feature for these kind of
co-processors, using VA as IOVA in your case seems very valid.

What concerns me is that we change the default behavior for all other
devices. Having an option to override is fine to me, but the default
mode should remain the same IMHO.

Wouldn't it be possible to default to VA as IOVA only when an HW mempool
is in use?

> Similar problem exists in another part of the code in DPDK,
> http://dpdk.org/browse/dpdk/tree/drivers/bus/fslmc/fslmc_vfio.c#n231
> Its a conditional compilation based approach with duplicating the vfio
> code and we are trying to fix the problem in a generic way so that
> everyone can get benefited out of it.
> 
> Comments are welcome.

Thanks,
Maxime

> /Jerin
> 
>>
>> By using physical addresses, you are safe against this problem.
>>
>> Any thoughts?
>>
>> Cheers,
>> Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping
  2017-07-06  7:58       ` Maxime Coquelin
@ 2017-07-06  9:49         ` Jerin Jacob
  2017-07-06 10:59           ` Maxime Coquelin
  0 siblings, 1 reply; 248+ messages in thread
From: Jerin Jacob @ 2017-07-06  9:49 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Santosh Shukla, thomas, bruce.richardson, dev, hemant.agrawal,
	shreyansh.jain, gaetan.rivet

-----Original Message-----
> Date: Thu, 6 Jul 2017 09:58:41 +0200
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
>  thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org,
>  hemant.agrawal@nxp.com, shreyansh.jain@nxp.com, gaetan.rivet@6wind.com
> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
>  before mapping
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>  Thunderbird/52.1.0
> 
> 
> 
> On 07/05/2017 05:43 PM, Jerin Jacob wrote:
> > -----Original Message-----
> > > Date: Wed, 5 Jul 2017 11:14:01 +0200
> > > From: Maxime Coquelin <maxime.coquelin@redhat.com>
> > > To: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
> > >   thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org
> > > CC: jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com,
> > >   shreyansh.jain@nxp.com, gaetan.rivet@6wind.com
> > > Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
> > >   before mapping
> > > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
> > >   Thunderbird/52.1.0
> > > 
> > > 
> > > 
> > > On 06/08/2017 01:05 PM, Santosh Shukla wrote:
> > > > Check iova mode and accordingly map iova to pa or va.
> > > > 
> > > > Signed-off-by: Santosh Shukla<santosh.shukla@caviumnetworks.com>
> > > > Signed-off-by: Jerin Jacob<jerin.jacob@caviumnetworks.com>
> > > > ---
> > > >    lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
> > > >    1 file changed, 8 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > > index 04914406f..348b7a7f4 100644
> > > > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > > @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
> > > >    		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
> > > >    		dma_map.vaddr = ms[i].addr_64;
> > > >    		dma_map.size = ms[i].len;
> > > > -		dma_map.iova = ms[i].phys_addr;
> > > > +		if (rte_eal_iova_mode() == RTE_IOVA_VA)
> > > > +			dma_map.iova = dma_map.vaddr;
> > > > +		else
> > > > +			dma_map.iova = ms[i].phys_addr;
> > > >    		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
> > > 
> > > IIUC, it is changing default behavior for VFIO devices.
> > > 
> > > I see a possible problem, but I'm not sure the case is valid.
> > > 
> > > Imagine you have two devices in the iommu group, and the two devices are
> > > used in separate processes. Each process could try two different
> > > physical addresses at the same virtual address, and so the second map
> > > would fail.
> > 
> > IMO, Doesn't look like a problem. Here is the data flow
> > 
> > 1) The vfio DMA map function(vfio_type1_dma_map()) will be called only
> > on primary process
> > http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n359
> > 
> > 2) On secondary process, DPDK rte_eal_huge_page_attach() will make sure
> > that, the Secondary process has the _same_ virtual address as primary or
> > exit from on attach.
> > http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_memory.c#n1452
> > 
> > 3) Since secondary process adds the mapped the virtual address in step (2).
> > in the page table in OS. On SMMU entry miss(When device
> > request from I/O transaction), OS will load the mapping and update the SMMU
> > "context" with page tables from MMU.
> 
> Ok thanks for the detailed info, but what about the case where the same
> iommu group is used by two primary processes?

Does that case exist with DPDK? We always need to blacklist same BDF in
the secondary process to make things work with existing DPDK setup. Which
make sense as well. Only primary process configures the HW blocks.

> 
> I don't know how frequent it is, but if ACS is not supported by either the
> endpoint or the the root port, then you would have to share the same IOMMU
> group for all the ports of your card. Right?

ACS is supported in our card(it not in bypass mode) and one mempool PCI BDF
comes as a IOMMU group.

If it in bypass mode anyway you use in vfio-noiommu mode as
there is no protection anyway.

> 
> > Let me add the background for why this feature is required in DPDK to
> > enable NPU style co-processors.
> > 
> > The traditional NICs the Rx path code look like this:
> > 1) On control path, Fill the mempool with buffers
> > 2) on rx_burst(), alloc the mbuf from mempool
> > 3) SW has the mbuf in hand(which is a virtual address) and program the
> > HW with mbuf->buf_physaddr)
> > 4) Return the last pushed mbuf(will be updated by HW by now)
> > 
> > 
> > On NPU style co-processors, situation is different as the buffer recycling
> > has been done in HW unlike SW model. Here is the data flow:
> > 1) On control path, Fill the HW mempool with buffers(Obviously the IOVA
> > address, which is PA in existing model)
> > 2) on rx_burst, HW gives you IOVA address(as address as step 1)
> > 3) As application expects VA to operate on it, rx_burst() needs to
> > convert to VA from PAA. Which is very costly.
> > Instead with this IOVA as VA scheme, We can avoid the cost of converting
> > with help of IOMMU/SMMU.
> > 
> > This patch set auto detects the mode based available of type devices in
> > bus and provides an option to override mode based on eal argument, so we
> > don't foresee any issue with this approach and welcome any alternative
> > approaches.
> 
> I don't question the need of the feature for these kind of
> co-processors, using VA as IOVA in your case seems very valid.
> 
> What concerns me is that we change the default behavior for all other
> devices. Having an option to override is fine to me, but the default
> mode should remain the same IMHO.

Doesn't seems to be a technical point. But I agree with your concern.
we will address it.
I think, we have two ways to address it.

option 1:
- In existing patch,
a) we are currently setting(internal_cfg->iova_mode = RTE_IOVA_PA)
  http://dpdk.org/dev/patchwork/patch/25192
b) only when with eal argument sets to RTE_IOVA_VA and then bus probed
value == RTE_IOVA_VA the final mode will be RTE_IOVA_VA
http://dpdk.org/dev/patchwork/patch/25193/
check the code after rte_bus_scan()

option 2:
On rte_pci_get_iommu_class() in http://dpdk.org/dev/patchwork/patch/25190/
we can check the rte_pci_device.id.vendor_id == CAVIUM to select the
mode so other type of devices safe.

I think, option 2 makes sense, as it gives foolproof auto detection scheme and
without effecting any other devices that not interested in this scheme

Does that address your concern about the patchset?

> Wouldn't it be possible to default to VA as IOVA only when an HW mempool
> is in use?

It will be too late as in the normal scheme of things, application
creates the pool.

> 
> > Similar problem exists in another part of the code in DPDK,
> > http://dpdk.org/browse/dpdk/tree/drivers/bus/fslmc/fslmc_vfio.c#n231
> > Its a conditional compilation based approach with duplicating the vfio
> > code and we are trying to fix the problem in a generic way so that
> > everyone can get benefited out of it.
> > 
> > Comments are welcome.
> 
> Thanks,
> Maxime
> 
> > /Jerin
> > 
> > > 
> > > By using physical addresses, you are safe against this problem.
> > > 
> > > Any thoughts?
> > > 
> > > Cheers,
> > > Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping
  2017-07-06  9:49         ` Jerin Jacob
@ 2017-07-06 10:59           ` Maxime Coquelin
  2017-07-06 11:12             ` Jerin Jacob
  2017-07-06 11:19             ` santosh
  0 siblings, 2 replies; 248+ messages in thread
From: Maxime Coquelin @ 2017-07-06 10:59 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Santosh Shukla, thomas, bruce.richardson, dev, hemant.agrawal,
	shreyansh.jain, gaetan.rivet



On 07/06/2017 11:49 AM, Jerin Jacob wrote:
> -----Original Message-----
>> Date: Thu, 6 Jul 2017 09:58:41 +0200
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> CC: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
>>   thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org,
>>   hemant.agrawal@nxp.com, shreyansh.jain@nxp.com, gaetan.rivet@6wind.com
>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
>>   before mapping
>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>>   Thunderbird/52.1.0
>>
>>
>>
>> On 07/05/2017 05:43 PM, Jerin Jacob wrote:
>>> -----Original Message-----
>>>> Date: Wed, 5 Jul 2017 11:14:01 +0200
>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> To: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
>>>>    thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org
>>>> CC: jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com,
>>>>    shreyansh.jain@nxp.com, gaetan.rivet@6wind.com
>>>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
>>>>    before mapping
>>>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>>>>    Thunderbird/52.1.0
>>>>
>>>>
>>>>
>>>> On 06/08/2017 01:05 PM, Santosh Shukla wrote:
>>>>> Check iova mode and accordingly map iova to pa or va.
>>>>>
>>>>> Signed-off-by: Santosh Shukla<santosh.shukla@caviumnetworks.com>
>>>>> Signed-off-by: Jerin Jacob<jerin.jacob@caviumnetworks.com>
>>>>> ---
>>>>>     lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
>>>>>     1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>> index 04914406f..348b7a7f4 100644
>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>> @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
>>>>>     		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>>>>>     		dma_map.vaddr = ms[i].addr_64;
>>>>>     		dma_map.size = ms[i].len;
>>>>> -		dma_map.iova = ms[i].phys_addr;
>>>>> +		if (rte_eal_iova_mode() == RTE_IOVA_VA)
>>>>> +			dma_map.iova = dma_map.vaddr;
>>>>> +		else
>>>>> +			dma_map.iova = ms[i].phys_addr;
>>>>>     		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
>>>>
>>>> IIUC, it is changing default behavior for VFIO devices.
>>>>
>>>> I see a possible problem, but I'm not sure the case is valid.
>>>>
>>>> Imagine you have two devices in the iommu group, and the two devices are
>>>> used in separate processes. Each process could try two different
>>>> physical addresses at the same virtual address, and so the second map
>>>> would fail.
>>>
>>> IMO, Doesn't look like a problem. Here is the data flow
>>>
>>> 1) The vfio DMA map function(vfio_type1_dma_map()) will be called only
>>> on primary process
>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n359
>>>
>>> 2) On secondary process, DPDK rte_eal_huge_page_attach() will make sure
>>> that, the Secondary process has the _same_ virtual address as primary or
>>> exit from on attach.
>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_memory.c#n1452
>>>
>>> 3) Since secondary process adds the mapped the virtual address in step (2).
>>> in the page table in OS. On SMMU entry miss(When device
>>> request from I/O transaction), OS will load the mapping and update the SMMU
>>> "context" with page tables from MMU.
>>
>> Ok thanks for the detailed info, but what about the case where the same
>> iommu group is used by two primary processes?
> 
> Does that case exist with DPDK? We always need to blacklist same BDF in
> the secondary process to make things work with existing DPDK setup. Which
> make sense as well. Only primary process configures the HW blocks.

I meant the case when two BDF are in the same IOMMU group (if ACS is not
supported at some point in the hierarchy). And I meant two primary
processes running, like for example two containers running each a DPDK
application.

Maybe this is not a valid use-case (it is not secure, as it would break
isolation between the two containers), but it seems that it is something
DPDK allows today, if I'm not mistaken.

>>
>> I don't know how frequent it is, but if ACS is not supported by either the
>> endpoint or the the root port, then you would have to share the same IOMMU
>> group for all the ports of your card. Right?
> 
> ACS is supported in our card(it not in bypass mode) and one mempool PCI BDF
> comes as a IOMMU group.
> 
> If it in bypass mode anyway you use in vfio-noiommu mode as
> there is no protection anyway.
Yes.

>>
>>> Let me add the background for why this feature is required in DPDK to
>>> enable NPU style co-processors.
>>>
>>> The traditional NICs the Rx path code look like this:
>>> 1) On control path, Fill the mempool with buffers
>>> 2) on rx_burst(), alloc the mbuf from mempool
>>> 3) SW has the mbuf in hand(which is a virtual address) and program the
>>> HW with mbuf->buf_physaddr)
>>> 4) Return the last pushed mbuf(will be updated by HW by now)
>>>
>>>
>>> On NPU style co-processors, situation is different as the buffer recycling
>>> has been done in HW unlike SW model. Here is the data flow:
>>> 1) On control path, Fill the HW mempool with buffers(Obviously the IOVA
>>> address, which is PA in existing model)
>>> 2) on rx_burst, HW gives you IOVA address(as address as step 1)
>>> 3) As application expects VA to operate on it, rx_burst() needs to
>>> convert to VA from PAA. Which is very costly.
>>> Instead with this IOVA as VA scheme, We can avoid the cost of converting
>>> with help of IOMMU/SMMU.
>>>
>>> This patch set auto detects the mode based available of type devices in
>>> bus and provides an option to override mode based on eal argument, so we
>>> don't foresee any issue with this approach and welcome any alternative
>>> approaches.
>>
>> I don't question the need of the feature for these kind of
>> co-processors, using VA as IOVA in your case seems very valid.
>>
>> What concerns me is that we change the default behavior for all other
>> devices. Having an option to override is fine to me, but the default
>> mode should remain the same IMHO.
> 
> Doesn't seems to be a technical point. But I agree with your concern.
> we will address it.
> I think, we have two ways to address it.
> 
> option 1:
> - In existing patch,
> a) we are currently setting(internal_cfg->iova_mode = RTE_IOVA_PA)
>    http://dpdk.org/dev/patchwork/patch/25192
> b) only when with eal argument sets to RTE_IOVA_VA and then bus probed
> value == RTE_IOVA_VA the final mode will be RTE_IOVA_VA
> http://dpdk.org/dev/patchwork/patch/25193/
> check the code after rte_bus_scan()
> 
> option 2:
> On rte_pci_get_iommu_class() in http://dpdk.org/dev/patchwork/patch/25190/
> we can check the rte_pci_device.id.vendor_id == CAVIUM to select the
> mode so other type of devices safe.
> 
> I think, option 2 makes sense, as it gives foolproof auto detection scheme and
> without effecting any other devices that not interested in this scheme
> 
> Does that address your concern about the patchset?

Yes it does, or maybe create a new flag in struct rte_pci_driver's
drv_flags to provide a hint it prefers to use VA as IOVA?

It, of course, would just be a hint, and should be set only when other
conditions are met.

>> Wouldn't it be possible to default to VA as IOVA only when an HW mempool
>> is in use?
> 
> It will be too late as in the normal scheme of things, application
> creates the pool.

OK, makes sense.

Thanks,
Maxime

>>
>>> Similar problem exists in another part of the code in DPDK,
>>> http://dpdk.org/browse/dpdk/tree/drivers/bus/fslmc/fslmc_vfio.c#n231
>>> Its a conditional compilation based approach with duplicating the vfio
>>> code and we are trying to fix the problem in a generic way so that
>>> everyone can get benefited out of it.
>>>
>>> Comments are welcome.
>>
>> Thanks,
>> Maxime
>>
>>> /Jerin
>>>
>>>>
>>>> By using physical addresses, you are safe against this problem.
>>>>
>>>> Any thoughts?
>>>>
>>>> Cheers,
>>>> Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping
  2017-07-06 10:59           ` Maxime Coquelin
@ 2017-07-06 11:12             ` Jerin Jacob
  2017-07-06 11:19             ` santosh
  1 sibling, 0 replies; 248+ messages in thread
From: Jerin Jacob @ 2017-07-06 11:12 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Santosh Shukla, thomas, bruce.richardson, dev, hemant.agrawal,
	shreyansh.jain, gaetan.rivet

-----Original Message-----
> Date: Thu, 6 Jul 2017 12:59:04 +0200
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
>  thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org,
>  hemant.agrawal@nxp.com, shreyansh.jain@nxp.com, gaetan.rivet@6wind.com
> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
>  before mapping
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>  Thunderbird/52.1.0
> 
> 
> 
> On 07/06/2017 11:49 AM, Jerin Jacob wrote:
> > -----Original Message-----
> > > Date: Thu, 6 Jul 2017 09:58:41 +0200
> > > From: Maxime Coquelin <maxime.coquelin@redhat.com>
> > > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > CC: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
> > >   thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org,
> > >   hemant.agrawal@nxp.com, shreyansh.jain@nxp.com, gaetan.rivet@6wind.com
> > > Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
> > >   before mapping
> > > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
> > >   Thunderbird/52.1.0
> > > 
> > > 
> > > 
> > > > > 
> > > > > 
> > > > > On 06/08/2017 01:05 PM, Santosh Shukla wrote:
> > > > > > Check iova mode and accordingly map iova to pa or va.
> > > > > > 
> > > > > > Signed-off-by: Santosh Shukla<santosh.shukla@caviumnetworks.com>
> > > > > > Signed-off-by: Jerin Jacob<jerin.jacob@caviumnetworks.com>
> > > > > > ---
> > > > > >     lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
> > > > > >     1 file changed, 8 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > > > > index 04914406f..348b7a7f4 100644
> > > > > > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > > > > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > > > > @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
> > > > > >     		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
> > > > > >     		dma_map.vaddr = ms[i].addr_64;
> > > > > >     		dma_map.size = ms[i].len;
> > > > > > -		dma_map.iova = ms[i].phys_addr;
> > > > > > +		if (rte_eal_iova_mode() == RTE_IOVA_VA)
> > > > > > +			dma_map.iova = dma_map.vaddr;
> > > > > > +		else
> > > > > > +			dma_map.iova = ms[i].phys_addr;
> > > > > >     		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
> > > > > 
> > > > > IIUC, it is changing default behavior for VFIO devices.
> > > > > 
> > > > > I see a possible problem, but I'm not sure the case is valid.
> > > > > 
> > > > > Imagine you have two devices in the iommu group, and the two devices are
> > > > > used in separate processes. Each process could try two different
> > > > > physical addresses at the same virtual address, and so the second map
> > > > > would fail.
> > > > 
> > > > IMO, Doesn't look like a problem. Here is the data flow
> > > > 
> > > > 1) The vfio DMA map function(vfio_type1_dma_map()) will be called only
> > > > on primary process
> > > > http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n359
> > > > 
> > > > 2) On secondary process, DPDK rte_eal_huge_page_attach() will make sure
> > > > that, the Secondary process has the _same_ virtual address as primary or
> > > > exit from on attach.
> > > > http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_memory.c#n1452
> > > > 
> > > > 3) Since secondary process adds the mapped the virtual address in step (2).
> > > > in the page table in OS. On SMMU entry miss(When device
> > > > request from I/O transaction), OS will load the mapping and update the SMMU
> > > > "context" with page tables from MMU.
> > > 
> > > Ok thanks for the detailed info, but what about the case where the same
> > > iommu group is used by two primary processes?
> > 
> > Does that case exist with DPDK? We always need to blacklist same BDF in
> > the secondary process to make things work with existing DPDK setup. Which
> > make sense as well. Only primary process configures the HW blocks.
> 
> I meant the case when two BDF are in the same IOMMU group (if ACS is not
> supported at some point in the hierarchy). And I meant two primary
> processes running, like for example two containers running each a DPDK
> application.
> 
> Maybe this is not a valid use-case (it is not secure, as it would break
> isolation between the two containers), but it seems that it is something
> DPDK allows today, if I'm not mistaken.

Not sure. Doesn't seems to valid case with VFIO as without ACS anyway
it will break security (the all point of IOMMU protection == VFIO)

> 
> > > 
> > > I don't know how frequent it is, but if ACS is not supported by either the
> > > endpoint or the the root port, then you would have to share the same IOMMU
> > > group for all the ports of your card. Right?
> > 
> > ACS is supported in our card(it not in bypass mode) and one mempool PCI BDF
> > comes as a IOMMU group.
> > 
> > If it in bypass mode anyway you use in vfio-noiommu mode as
> > there is no protection anyway.
> > > What concerns me is that we change the default behavior for all other
> > > devices. Having an option to override is fine to me, but the default
> > > mode should remain the same IMHO.
> > 
> > Doesn't seems to be a technical point. But I agree with your concern.
> > we will address it.
> > I think, we have two ways to address it.
> > 
> > option 1:
> > - In existing patch,
> > a) we are currently setting(internal_cfg->iova_mode = RTE_IOVA_PA)
> >    http://dpdk.org/dev/patchwork/patch/25192
> > b) only when with eal argument sets to RTE_IOVA_VA and then bus probed
> > value == RTE_IOVA_VA the final mode will be RTE_IOVA_VA
> > http://dpdk.org/dev/patchwork/patch/25193/
> > check the code after rte_bus_scan()
> > 
> > option 2:
> > On rte_pci_get_iommu_class() in http://dpdk.org/dev/patchwork/patch/25190/
> > we can check the rte_pci_device.id.vendor_id == CAVIUM to select the
> > mode so other type of devices safe.
> > 
> > I think, option 2 makes sense, as it gives foolproof auto detection scheme and
> > without effecting any other devices that not interested in this scheme
> > 
> > Does that address your concern about the patchset?
> 
> Yes it does, or maybe create a new flag in struct rte_pci_driver's
> drv_flags to provide a hint it prefers to use VA as IOVA?
> 
> It, of course, would just be a hint, and should be set only when other
> conditions are met.

Yes. Makes sense. We will roll out v2 with option2 + your rte_pci_driver
suggestion.

Thanks a lot for the review. Appreciated.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping
  2017-07-06 10:59           ` Maxime Coquelin
  2017-07-06 11:12             ` Jerin Jacob
@ 2017-07-06 11:19             ` santosh
  2017-07-06 13:08               ` Maxime Coquelin
  1 sibling, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-06 11:19 UTC (permalink / raw)
  To: Maxime Coquelin, Jerin Jacob
  Cc: thomas, bruce.richardson, dev, hemant.agrawal, shreyansh.jain,
	gaetan.rivet

On Thursday 06 July 2017 04:29 PM, Maxime Coquelin wrote:

>
> On 07/06/2017 11:49 AM, Jerin Jacob wrote:
>> -----Original Message-----
>>> Date: Thu, 6 Jul 2017 09:58:41 +0200
>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>> CC: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
>>>   thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org,
>>>   hemant.agrawal@nxp.com, shreyansh.jain@nxp.com, gaetan.rivet@6wind.com
>>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
>>>   before mapping
>>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>>>   Thunderbird/52.1.0
>>>
>>>
>>>
>>> On 07/05/2017 05:43 PM, Jerin Jacob wrote:
>>>> -----Original Message-----
>>>>> Date: Wed, 5 Jul 2017 11:14:01 +0200
>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>> To: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
>>>>>    thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org
>>>>> CC: jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com,
>>>>>    shreyansh.jain@nxp.com, gaetan.rivet@6wind.com
>>>>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
>>>>>    before mapping
>>>>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>>>>>    Thunderbird/52.1.0
>>>>>
>>>>>
>>>>>
>>>>> On 06/08/2017 01:05 PM, Santosh Shukla wrote:
>>>>>> Check iova mode and accordingly map iova to pa or va.
>>>>>>
>>>>>> Signed-off-by: Santosh Shukla<santosh.shukla@caviumnetworks.com>
>>>>>> Signed-off-by: Jerin Jacob<jerin.jacob@caviumnetworks.com>
>>>>>> ---
>>>>>>     lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
>>>>>>     1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>> index 04914406f..348b7a7f4 100644
>>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>> @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
>>>>>>             dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>>>>>>             dma_map.vaddr = ms[i].addr_64;
>>>>>>             dma_map.size = ms[i].len;
>>>>>> -        dma_map.iova = ms[i].phys_addr;
>>>>>> +        if (rte_eal_iova_mode() == RTE_IOVA_VA)
>>>>>> +            dma_map.iova = dma_map.vaddr;
>>>>>> +        else
>>>>>> +            dma_map.iova = ms[i].phys_addr;
>>>>>>             dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
>>>>>
>>>>> IIUC, it is changing default behavior for VFIO devices.
>>>>>
>>>>> I see a possible problem, but I'm not sure the case is valid.
>>>>>
>>>>> Imagine you have two devices in the iommu group, and the two devices are
>>>>> used in separate processes. Each process could try two different
>>>>> physical addresses at the same virtual address, and so the second map
>>>>> would fail.
>>>>
>>>> IMO, Doesn't look like a problem. Here is the data flow
>>>>
>>>> 1) The vfio DMA map function(vfio_type1_dma_map()) will be called only
>>>> on primary process
>>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n359
>>>>
>>>> 2) On secondary process, DPDK rte_eal_huge_page_attach() will make sure
>>>> that, the Secondary process has the _same_ virtual address as primary or
>>>> exit from on attach.
>>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_memory.c#n1452
>>>>
>>>> 3) Since secondary process adds the mapped the virtual address in step (2).
>>>> in the page table in OS. On SMMU entry miss(When device
>>>> request from I/O transaction), OS will load the mapping and update the SMMU
>>>> "context" with page tables from MMU.
>>>
>>> Ok thanks for the detailed info, but what about the case where the same
>>> iommu group is used by two primary processes?
>>
>> Does that case exist with DPDK? We always need to blacklist same BDF in
>> the secondary process to make things work with existing DPDK setup. Which
>> make sense as well. Only primary process configures the HW blocks.
>
> I meant the case when two BDF are in the same IOMMU group (if ACS is not
> supported at some point in the hierarchy). And I meant two primary
> processes running, like for example two containers running each a DPDK
> application.
>
> Maybe this is not a valid use-case (it is not secure, as it would break
> isolation between the two containers), but it seems that it is something
> DPDK allows today, if I'm not mistaken.
>
I'm not sure how two primary process could run, as because latter primary process
would try accessing /var/run/.rte_config and would fail at this [1] point.

It's not valid use-case for dpdk (imo).
[1] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal.c#n204

>>>
>>> I don't know how frequent it is, but if ACS is not supported by either the
>>> endpoint or the the root port, then you would have to share the same IOMMU
>>> group for all the ports of your card. Right?
>>
>> ACS is supported in our card(it not in bypass mode) and one mempool PCI BDF
>> comes as a IOMMU group.
>>
>> If it in bypass mode anyway you use in vfio-noiommu mode as
>> there is no protection anyway.
> Yes.
>
>>>
>>>> Let me add the background for why this feature is required in DPDK to
>>>> enable NPU style co-processors.
>>>>
>>>> The traditional NICs the Rx path code look like this:
>>>> 1) On control path, Fill the mempool with buffers
>>>> 2) on rx_burst(), alloc the mbuf from mempool
>>>> 3) SW has the mbuf in hand(which is a virtual address) and program the
>>>> HW with mbuf->buf_physaddr)
>>>> 4) Return the last pushed mbuf(will be updated by HW by now)
>>>>
>>>>
>>>> On NPU style co-processors, situation is different as the buffer recycling
>>>> has been done in HW unlike SW model. Here is the data flow:
>>>> 1) On control path, Fill the HW mempool with buffers(Obviously the IOVA
>>>> address, which is PA in existing model)
>>>> 2) on rx_burst, HW gives you IOVA address(as address as step 1)
>>>> 3) As application expects VA to operate on it, rx_burst() needs to
>>>> convert to VA from PAA. Which is very costly.
>>>> Instead with this IOVA as VA scheme, We can avoid the cost of converting
>>>> with help of IOMMU/SMMU.
>>>>
>>>> This patch set auto detects the mode based available of type devices in
>>>> bus and provides an option to override mode based on eal argument, so we
>>>> don't foresee any issue with this approach and welcome any alternative
>>>> approaches.
>>>
>>> I don't question the need of the feature for these kind of
>>> co-processors, using VA as IOVA in your case seems very valid.
>>>
>>> What concerns me is that we change the default behavior for all other
>>> devices. Having an option to override is fine to me, but the default
>>> mode should remain the same IMHO.
>>
>> Doesn't seems to be a technical point. But I agree with your concern.
>> we will address it.
>> I think, we have two ways to address it.
>>
>> option 1:
>> - In existing patch,
>> a) we are currently setting(internal_cfg->iova_mode = RTE_IOVA_PA)
>>    http://dpdk.org/dev/patchwork/patch/25192
>> b) only when with eal argument sets to RTE_IOVA_VA and then bus probed
>> value == RTE_IOVA_VA the final mode will be RTE_IOVA_VA
>> http://dpdk.org/dev/patchwork/patch/25193/
>> check the code after rte_bus_scan()
>>
>> option 2:
>> On rte_pci_get_iommu_class() in http://dpdk.org/dev/patchwork/patch/25190/
>> we can check the rte_pci_device.id.vendor_id == CAVIUM to select the
>> mode so other type of devices safe.
>>
>> I think, option 2 makes sense, as it gives foolproof auto detection scheme and
>> without effecting any other devices that not interested in this scheme
>>
>> Does that address your concern about the patchset?
>
> Yes it does, or maybe create a new flag in struct rte_pci_driver's
> drv_flags to provide a hint it prefers to use VA as IOVA?
>
> It, of course, would just be a hint, and should be set only when other
> conditions are met.
>
>>> Wouldn't it be possible to default to VA as IOVA only when an HW mempool
>>> is in use?
>>
>> It will be too late as in the normal scheme of things, application
>> creates the pool.
>
> OK, makes sense.
>
> Thanks,
> Maxime
>
>>>
>>>> Similar problem exists in another part of the code in DPDK,
>>>> http://dpdk.org/browse/dpdk/tree/drivers/bus/fslmc/fslmc_vfio.c#n231
>>>> Its a conditional compilation based approach with duplicating the vfio
>>>> code and we are trying to fix the problem in a generic way so that
>>>> everyone can get benefited out of it.
>>>>
>>>> Comments are welcome.
>>>
>>> Thanks,
>>> Maxime
>>>
>>>> /Jerin
>>>>
>>>>>
>>>>> By using physical addresses, you are safe against this problem.
>>>>>
>>>>> Any thoughts?
>>>>>
>>>>> Cheers,
>>>>> Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping
  2017-07-06 11:19             ` santosh
@ 2017-07-06 13:08               ` Maxime Coquelin
  2017-07-06 13:11                 ` Maxime Coquelin
  0 siblings, 1 reply; 248+ messages in thread
From: Maxime Coquelin @ 2017-07-06 13:08 UTC (permalink / raw)
  To: santosh, Jerin Jacob
  Cc: thomas, bruce.richardson, dev, hemant.agrawal, shreyansh.jain,
	gaetan.rivet



On 07/06/2017 01:19 PM, santosh wrote:
> On Thursday 06 July 2017 04:29 PM, Maxime Coquelin wrote:
> 
>>
>> On 07/06/2017 11:49 AM, Jerin Jacob wrote:
>>> -----Original Message-----
>>>> Date: Thu, 6 Jul 2017 09:58:41 +0200
>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>> CC: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
>>>>    thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org,
>>>>    hemant.agrawal@nxp.com, shreyansh.jain@nxp.com, gaetan.rivet@6wind.com
>>>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
>>>>    before mapping
>>>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>>>>    Thunderbird/52.1.0
>>>>
>>>>
>>>>
>>>> On 07/05/2017 05:43 PM, Jerin Jacob wrote:
>>>>> -----Original Message-----
>>>>>> Date: Wed, 5 Jul 2017 11:14:01 +0200
>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>> To: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
>>>>>>     thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org
>>>>>> CC: jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com,
>>>>>>     shreyansh.jain@nxp.com, gaetan.rivet@6wind.com
>>>>>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
>>>>>>     before mapping
>>>>>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>>>>>>     Thunderbird/52.1.0
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 06/08/2017 01:05 PM, Santosh Shukla wrote:
>>>>>>> Check iova mode and accordingly map iova to pa or va.
>>>>>>>
>>>>>>> Signed-off-by: Santosh Shukla<santosh.shukla@caviumnetworks.com>
>>>>>>> Signed-off-by: Jerin Jacob<jerin.jacob@caviumnetworks.com>
>>>>>>> ---
>>>>>>>      lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
>>>>>>>      1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>> index 04914406f..348b7a7f4 100644
>>>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>> @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
>>>>>>>              dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>>>>>>>              dma_map.vaddr = ms[i].addr_64;
>>>>>>>              dma_map.size = ms[i].len;
>>>>>>> -        dma_map.iova = ms[i].phys_addr;
>>>>>>> +        if (rte_eal_iova_mode() == RTE_IOVA_VA)
>>>>>>> +            dma_map.iova = dma_map.vaddr;
>>>>>>> +        else
>>>>>>> +            dma_map.iova = ms[i].phys_addr;
>>>>>>>              dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
>>>>>>
>>>>>> IIUC, it is changing default behavior for VFIO devices.
>>>>>>
>>>>>> I see a possible problem, but I'm not sure the case is valid.
>>>>>>
>>>>>> Imagine you have two devices in the iommu group, and the two devices are
>>>>>> used in separate processes. Each process could try two different
>>>>>> physical addresses at the same virtual address, and so the second map
>>>>>> would fail.
>>>>>
>>>>> IMO, Doesn't look like a problem. Here is the data flow
>>>>>
>>>>> 1) The vfio DMA map function(vfio_type1_dma_map()) will be called only
>>>>> on primary process
>>>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n359
>>>>>
>>>>> 2) On secondary process, DPDK rte_eal_huge_page_attach() will make sure
>>>>> that, the Secondary process has the _same_ virtual address as primary or
>>>>> exit from on attach.
>>>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_memory.c#n1452
>>>>>
>>>>> 3) Since secondary process adds the mapped the virtual address in step (2).
>>>>> in the page table in OS. On SMMU entry miss(When device
>>>>> request from I/O transaction), OS will load the mapping and update the SMMU
>>>>> "context" with page tables from MMU.
>>>>
>>>> Ok thanks for the detailed info, but what about the case where the same
>>>> iommu group is used by two primary processes?
>>>
>>> Does that case exist with DPDK? We always need to blacklist same BDF in
>>> the secondary process to make things work with existing DPDK setup. Which
>>> make sense as well. Only primary process configures the HW blocks.
>>
>> I meant the case when two BDF are in the same IOMMU group (if ACS is not
>> supported at some point in the hierarchy). And I meant two primary
>> processes running, like for example two containers running each a DPDK
>> application.
>>
>> Maybe this is not a valid use-case (it is not secure, as it would break
>> isolation between the two containers), but it seems that it is something
>> DPDK allows today, if I'm not mistaken.
>>
> I'm not sure how two primary process could run, as because latter primary process
> would try accessing /var/run/.rte_config and would fail at this [1] point.
> 
> It's not valid use-case for dpdk (imo).
> [1] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal.c#n204

Yes this is possible. I had never used it before, but Thomas told me it
is supported by setting--file-prefix option. I had a trial, and I
confirm it works:
session 1> ./install/bin/testpmd -l 0,2 --socket-mem=1024 -w 
0000:05:00.0 --proc-type=primary --file-prefix=app1 -- --disable-hw-vlan 
-i --rxq=1 --txq=1         --nb-cores=1 --forward-mode=io
session 2> ./install/bin/testpmd -l 0,3 --socket-mem=1024 -w 
0000:05:00.1 --proc-type=primary --file-prefix=app2 -- --disable-hw-vlan 
-i --rxq=1 --txq=1         --nb-cores=1 --forward-mode=io

In the above example, two ports of the same card is used by two
processes. Note that in this case, ACS is supproted and both ports have
their own iommu group.

Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping
  2017-07-06 13:08               ` Maxime Coquelin
@ 2017-07-06 13:11                 ` Maxime Coquelin
  2017-07-06 14:13                   ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Maxime Coquelin @ 2017-07-06 13:11 UTC (permalink / raw)
  To: santosh, Jerin Jacob
  Cc: thomas, bruce.richardson, dev, hemant.agrawal, shreyansh.jain,
	gaetan.rivet



On 07/06/2017 03:08 PM, Maxime Coquelin wrote:
> 
> 
> On 07/06/2017 01:19 PM, santosh wrote:
>> On Thursday 06 July 2017 04:29 PM, Maxime Coquelin wrote:
>>
>>>
>>> On 07/06/2017 11:49 AM, Jerin Jacob wrote:
>>>> -----Original Message-----
>>>>> Date: Thu, 6 Jul 2017 09:58:41 +0200
>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>>> CC: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
>>>>>    thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org,
>>>>>    hemant.agrawal@nxp.com, shreyansh.jain@nxp.com, 
>>>>> gaetan.rivet@6wind.com
>>>>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova 
>>>>> mode
>>>>>    before mapping
>>>>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>>>>>    Thunderbird/52.1.0
>>>>>
>>>>>
>>>>>
>>>>> On 07/05/2017 05:43 PM, Jerin Jacob wrote:
>>>>>> -----Original Message-----
>>>>>>> Date: Wed, 5 Jul 2017 11:14:01 +0200
>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>> To: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
>>>>>>>     thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org
>>>>>>> CC: jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com,
>>>>>>>     shreyansh.jain@nxp.com, gaetan.rivet@6wind.com
>>>>>>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor 
>>>>>>> iova mode
>>>>>>>     before mapping
>>>>>>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>>>>>>>     Thunderbird/52.1.0
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 06/08/2017 01:05 PM, Santosh Shukla wrote:
>>>>>>>> Check iova mode and accordingly map iova to pa or va.
>>>>>>>>
>>>>>>>> Signed-off-by: Santosh Shukla<santosh.shukla@caviumnetworks.com>
>>>>>>>> Signed-off-by: Jerin Jacob<jerin.jacob@caviumnetworks.com>
>>>>>>>> ---
>>>>>>>>      lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
>>>>>>>>      1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c 
>>>>>>>> b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>>> index 04914406f..348b7a7f4 100644
>>>>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>>> @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
>>>>>>>>              dma_map.argsz = sizeof(struct 
>>>>>>>> vfio_iommu_type1_dma_map);
>>>>>>>>              dma_map.vaddr = ms[i].addr_64;
>>>>>>>>              dma_map.size = ms[i].len;
>>>>>>>> -        dma_map.iova = ms[i].phys_addr;
>>>>>>>> +        if (rte_eal_iova_mode() == RTE_IOVA_VA)
>>>>>>>> +            dma_map.iova = dma_map.vaddr;
>>>>>>>> +        else
>>>>>>>> +            dma_map.iova = ms[i].phys_addr;
>>>>>>>>              dma_map.flags = VFIO_DMA_MAP_FLAG_READ | 
>>>>>>>> VFIO_DMA_MAP_FLAG_WRITE;
>>>>>>>
>>>>>>> IIUC, it is changing default behavior for VFIO devices.
>>>>>>>
>>>>>>> I see a possible problem, but I'm not sure the case is valid.
>>>>>>>
>>>>>>> Imagine you have two devices in the iommu group, and the two 
>>>>>>> devices are
>>>>>>> used in separate processes. Each process could try two different
>>>>>>> physical addresses at the same virtual address, and so the second 
>>>>>>> map
>>>>>>> would fail.
>>>>>>
>>>>>> IMO, Doesn't look like a problem. Here is the data flow
>>>>>>
>>>>>> 1) The vfio DMA map function(vfio_type1_dma_map()) will be called 
>>>>>> only
>>>>>> on primary process
>>>>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n359 
>>>>>>
>>>>>>
>>>>>> 2) On secondary process, DPDK rte_eal_huge_page_attach() will make 
>>>>>> sure
>>>>>> that, the Secondary process has the _same_ virtual address as 
>>>>>> primary or
>>>>>> exit from on attach.
>>>>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_memory.c#n1452 
>>>>>>
>>>>>>
>>>>>> 3) Since secondary process adds the mapped the virtual address in 
>>>>>> step (2).
>>>>>> in the page table in OS. On SMMU entry miss(When device
>>>>>> request from I/O transaction), OS will load the mapping and update 
>>>>>> the SMMU
>>>>>> "context" with page tables from MMU.
>>>>>
>>>>> Ok thanks for the detailed info, but what about the case where the 
>>>>> same
>>>>> iommu group is used by two primary processes?
>>>>
>>>> Does that case exist with DPDK? We always need to blacklist same BDF in
>>>> the secondary process to make things work with existing DPDK setup. 
>>>> Which
>>>> make sense as well. Only primary process configures the HW blocks.
>>>
>>> I meant the case when two BDF are in the same IOMMU group (if ACS is not
>>> supported at some point in the hierarchy). And I meant two primary
>>> processes running, like for example two containers running each a DPDK
>>> application.
>>>
>>> Maybe this is not a valid use-case (it is not secure, as it would break
>>> isolation between the two containers), but it seems that it is something
>>> DPDK allows today, if I'm not mistaken.
>>>
>> I'm not sure how two primary process could run, as because latter 
>> primary process
>> would try accessing /var/run/.rte_config and would fail at this [1] 
>> point.
>>
>> It's not valid use-case for dpdk (imo).
>> [1] 
>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal.c#n204
> 
> Yes this is possible. I had never used it before, but Thomas told me it
> is supported by setting--file-prefix option. I had a trial, and I
> confirm it works:
> session 1> ./install/bin/testpmd -l 0,2 --socket-mem=1024 -w 
> 0000:05:00.0 --proc-type=primary --file-prefix=app1 -- --disable-hw-vlan 
> -i --rxq=1 --txq=1         --nb-cores=1 --forward-mode=io
> session 2> ./install/bin/testpmd -l 0,3 --socket-mem=1024 -w 
> 0000:05:00.1 --proc-type=primary --file-prefix=app2 -- --disable-hw-vlan 
> -i --rxq=1 --txq=1         --nb-cores=1 --forward-mode=io
> 
> In the above example, two ports of the same card is used by two
> processes. Note that in this case, ACS is supproted and both ports have
> their own iommu group.

# ls -al /var/run/.app*
-rw-r-----. 1 root root 208420 Jul  6 09:08 /var/run/.app1_config
-rw-r--r--. 1 root root  49728 Jul  6 09:08 /var/run/.app1_hugepage_info
srwxr-xr-x. 1 root root      0 Jul  6 09:08 /var/run/.app1_mp_socket
-rw-r-----. 1 root root 208420 Jul  6 09:08 /var/run/.app2_config
-rw-r--r--. 1 root root  45584 Jul  6 09:08 /var/run/.app2_hugepage_info
srwxr-xr-x. 1 root root      0 Jul  6 09:08 /var/run/.app2_mp_socket

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping
  2017-07-06 13:11                 ` Maxime Coquelin
@ 2017-07-06 14:13                   ` santosh
  2017-07-06 14:39                     ` Maxime Coquelin
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-06 14:13 UTC (permalink / raw)
  To: Maxime Coquelin, Jerin Jacob
  Cc: thomas, bruce.richardson, dev, hemant.agrawal, shreyansh.jain,
	gaetan.rivet

On Thursday 06 July 2017 06:41 PM, Maxime Coquelin wrote:

>
> On 07/06/2017 03:08 PM, Maxime Coquelin wrote:
>>
>>
>> On 07/06/2017 01:19 PM, santosh wrote:
>>> On Thursday 06 July 2017 04:29 PM, Maxime Coquelin wrote:
>>>
>>>>
>>>> On 07/06/2017 11:49 AM, Jerin Jacob wrote:
>>>>> -----Original Message-----
>>>>>> Date: Thu, 6 Jul 2017 09:58:41 +0200
>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>>>> CC: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
>>>>>>    thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org,
>>>>>>    hemant.agrawal@nxp.com, shreyansh.jain@nxp.com, gaetan.rivet@6wind.com
>>>>>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
>>>>>>    before mapping
>>>>>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>>>>>>    Thunderbird/52.1.0
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 07/05/2017 05:43 PM, Jerin Jacob wrote:
>>>>>>> -----Original Message-----
>>>>>>>> Date: Wed, 5 Jul 2017 11:14:01 +0200
>>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>>> To: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
>>>>>>>>     thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org
>>>>>>>> CC: jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com,
>>>>>>>>     shreyansh.jain@nxp.com, gaetan.rivet@6wind.com
>>>>>>>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
>>>>>>>>     before mapping
>>>>>>>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>>>>>>>>     Thunderbird/52.1.0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 06/08/2017 01:05 PM, Santosh Shukla wrote:
>>>>>>>>> Check iova mode and accordingly map iova to pa or va.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Santosh Shukla<santosh.shukla@caviumnetworks.com>
>>>>>>>>> Signed-off-by: Jerin Jacob<jerin.jacob@caviumnetworks.com>
>>>>>>>>> ---
>>>>>>>>>      lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
>>>>>>>>>      1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>>>> index 04914406f..348b7a7f4 100644
>>>>>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>>>> @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
>>>>>>>>>              dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>>>>>>>>>              dma_map.vaddr = ms[i].addr_64;
>>>>>>>>>              dma_map.size = ms[i].len;
>>>>>>>>> -        dma_map.iova = ms[i].phys_addr;
>>>>>>>>> +        if (rte_eal_iova_mode() == RTE_IOVA_VA)
>>>>>>>>> +            dma_map.iova = dma_map.vaddr;
>>>>>>>>> +        else
>>>>>>>>> +            dma_map.iova = ms[i].phys_addr;
>>>>>>>>>              dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
>>>>>>>>
>>>>>>>> IIUC, it is changing default behavior for VFIO devices.
>>>>>>>>
>>>>>>>> I see a possible problem, but I'm not sure the case is valid.
>>>>>>>>
>>>>>>>> Imagine you have two devices in the iommu group, and the two devices are
>>>>>>>> used in separate processes. Each process could try two different
>>>>>>>> physical addresses at the same virtual address, and so the second map
>>>>>>>> would fail.
>>>>>>>
>>>>>>> IMO, Doesn't look like a problem. Here is the data flow
>>>>>>>
>>>>>>> 1) The vfio DMA map function(vfio_type1_dma_map()) will be called only
>>>>>>> on primary process
>>>>>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n359
>>>>>>>
>>>>>>> 2) On secondary process, DPDK rte_eal_huge_page_attach() will make sure
>>>>>>> that, the Secondary process has the _same_ virtual address as primary or
>>>>>>> exit from on attach.
>>>>>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_memory.c#n1452
>>>>>>>
>>>>>>> 3) Since secondary process adds the mapped the virtual address in step (2).
>>>>>>> in the page table in OS. On SMMU entry miss(When device
>>>>>>> request from I/O transaction), OS will load the mapping and update the SMMU
>>>>>>> "context" with page tables from MMU.
>>>>>>
>>>>>> Ok thanks for the detailed info, but what about the case where the same
>>>>>> iommu group is used by two primary processes?
>>>>>
>>>>> Does that case exist with DPDK? We always need to blacklist same BDF in
>>>>> the secondary process to make things work with existing DPDK setup. Which
>>>>> make sense as well. Only primary process configures the HW blocks.
>>>>
>>>> I meant the case when two BDF are in the same IOMMU group (if ACS is not
>>>> supported at some point in the hierarchy). And I meant two primary
>>>> processes running, like for example two containers running each a DPDK
>>>> application.
>>>>
>>>> Maybe this is not a valid use-case (it is not secure, as it would break
>>>> isolation between the two containers), but it seems that it is something
>>>> DPDK allows today, if I'm not mistaken.
>>>>
>>> I'm not sure how two primary process could run, as because latter primary process
>>> would try accessing /var/run/.rte_config and would fail at this [1] point.
>>>
>>> It's not valid use-case for dpdk (imo).
>>> [1] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal.c#n204
>>
>> Yes this is possible. I had never used it before, but Thomas told me it
>> is supported by setting--file-prefix option. I had a trial, and I
>> confirm it works:
>> session 1> ./install/bin/testpmd -l 0,2 --socket-mem=1024 -w 0000:05:00.0 --proc-type=primary --file-prefix=app1 -- --disable-hw-vlan -i --rxq=1 --txq=1         --nb-cores=1 --forward-mode=io
>> session 2> ./install/bin/testpmd -l 0,3 --socket-mem=1024 -w 0000:05:00.1 --proc-type=primary --file-prefix=app2 -- --disable-hw-vlan -i --rxq=1 --txq=1         --nb-cores=1 --forward-mode=io
>>
>> In the above example, two ports of the same card is used by two
>> processes. Note that in this case, ACS is supproted and both ports have
>> their own iommu group.
>
> # ls -al /var/run/.app*
> -rw-r-----. 1 root root 208420 Jul  6 09:08 /var/run/.app1_config
> -rw-r--r--. 1 root root  49728 Jul  6 09:08 /var/run/.app1_hugepage_info
> srwxr-xr-x. 1 root root      0 Jul  6 09:08 /var/run/.app1_mp_socket
> -rw-r-----. 1 root root 208420 Jul  6 09:08 /var/run/.app2_config
> -rw-r--r--. 1 root root  45584 Jul  6 09:08 /var/run/.app2_hugepage_info
> srwxr-xr-x. 1 root root      0 Jul  6 09:08 /var/run/.app2_mp_socket
>
Yes, You're right, you can start two primary process, I missed that point. 
Use-case which you mentioned is ok, because they are under two different iommu
group so proposed scheme will work. It may not work for the case when ACS not present,
so its bypass mode which falls under vfio-noiommu category.

Having said that: Per discussion on [1]. The proposed scheme where 
bus makes decision based on pci_id and/or pci_drv will be a full proof
solution, and that way other types of devices will not be impacted. Right?

[1] https://www.mail-archive.com/dev@dpdk.org/msg70283.html

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping
  2017-07-06 14:13                   ` santosh
@ 2017-07-06 14:39                     ` Maxime Coquelin
  0 siblings, 0 replies; 248+ messages in thread
From: Maxime Coquelin @ 2017-07-06 14:39 UTC (permalink / raw)
  To: santosh, Jerin Jacob
  Cc: thomas, bruce.richardson, dev, hemant.agrawal, shreyansh.jain,
	gaetan.rivet



On 07/06/2017 04:13 PM, santosh wrote:
> On Thursday 06 July 2017 06:41 PM, Maxime Coquelin wrote:
> 
>>
>> On 07/06/2017 03:08 PM, Maxime Coquelin wrote:
>>>
>>>
>>> On 07/06/2017 01:19 PM, santosh wrote:
>>>> On Thursday 06 July 2017 04:29 PM, Maxime Coquelin wrote:
>>>>
>>>>>
>>>>> On 07/06/2017 11:49 AM, Jerin Jacob wrote:
>>>>>> -----Original Message-----
>>>>>>> Date: Thu, 6 Jul 2017 09:58:41 +0200
>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>>>>> CC: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
>>>>>>>     thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org,
>>>>>>>     hemant.agrawal@nxp.com, shreyansh.jain@nxp.com, gaetan.rivet@6wind.com
>>>>>>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
>>>>>>>     before mapping
>>>>>>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>>>>>>>     Thunderbird/52.1.0
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 07/05/2017 05:43 PM, Jerin Jacob wrote:
>>>>>>>> -----Original Message-----
>>>>>>>>> Date: Wed, 5 Jul 2017 11:14:01 +0200
>>>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>>>> To: Santosh Shukla <santosh.shukla@caviumnetworks.com>,
>>>>>>>>>      thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org
>>>>>>>>> CC: jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com,
>>>>>>>>>      shreyansh.jain@nxp.com, gaetan.rivet@6wind.com
>>>>>>>>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode
>>>>>>>>>      before mapping
>>>>>>>>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>>>>>>>>>      Thunderbird/52.1.0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 06/08/2017 01:05 PM, Santosh Shukla wrote:
>>>>>>>>>> Check iova mode and accordingly map iova to pa or va.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Santosh Shukla<santosh.shukla@caviumnetworks.com>
>>>>>>>>>> Signed-off-by: Jerin Jacob<jerin.jacob@caviumnetworks.com>
>>>>>>>>>> ---
>>>>>>>>>>       lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
>>>>>>>>>>       1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>>>>> index 04914406f..348b7a7f4 100644
>>>>>>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>>>>>> @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
>>>>>>>>>>               dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>>>>>>>>>>               dma_map.vaddr = ms[i].addr_64;
>>>>>>>>>>               dma_map.size = ms[i].len;
>>>>>>>>>> -        dma_map.iova = ms[i].phys_addr;
>>>>>>>>>> +        if (rte_eal_iova_mode() == RTE_IOVA_VA)
>>>>>>>>>> +            dma_map.iova = dma_map.vaddr;
>>>>>>>>>> +        else
>>>>>>>>>> +            dma_map.iova = ms[i].phys_addr;
>>>>>>>>>>               dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
>>>>>>>>>
>>>>>>>>> IIUC, it is changing default behavior for VFIO devices.
>>>>>>>>>
>>>>>>>>> I see a possible problem, but I'm not sure the case is valid.
>>>>>>>>>
>>>>>>>>> Imagine you have two devices in the iommu group, and the two devices are
>>>>>>>>> used in separate processes. Each process could try two different
>>>>>>>>> physical addresses at the same virtual address, and so the second map
>>>>>>>>> would fail.
>>>>>>>>
>>>>>>>> IMO, Doesn't look like a problem. Here is the data flow
>>>>>>>>
>>>>>>>> 1) The vfio DMA map function(vfio_type1_dma_map()) will be called only
>>>>>>>> on primary process
>>>>>>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n359
>>>>>>>>
>>>>>>>> 2) On secondary process, DPDK rte_eal_huge_page_attach() will make sure
>>>>>>>> that, the Secondary process has the _same_ virtual address as primary or
>>>>>>>> exit from on attach.
>>>>>>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_memory.c#n1452
>>>>>>>>
>>>>>>>> 3) Since secondary process adds the mapped the virtual address in step (2).
>>>>>>>> in the page table in OS. On SMMU entry miss(When device
>>>>>>>> request from I/O transaction), OS will load the mapping and update the SMMU
>>>>>>>> "context" with page tables from MMU.
>>>>>>>
>>>>>>> Ok thanks for the detailed info, but what about the case where the same
>>>>>>> iommu group is used by two primary processes?
>>>>>>
>>>>>> Does that case exist with DPDK? We always need to blacklist same BDF in
>>>>>> the secondary process to make things work with existing DPDK setup. Which
>>>>>> make sense as well. Only primary process configures the HW blocks.
>>>>>
>>>>> I meant the case when two BDF are in the same IOMMU group (if ACS is not
>>>>> supported at some point in the hierarchy). And I meant two primary
>>>>> processes running, like for example two containers running each a DPDK
>>>>> application.
>>>>>
>>>>> Maybe this is not a valid use-case (it is not secure, as it would break
>>>>> isolation between the two containers), but it seems that it is something
>>>>> DPDK allows today, if I'm not mistaken.
>>>>>
>>>> I'm not sure how two primary process could run, as because latter primary process
>>>> would try accessing /var/run/.rte_config and would fail at this [1] point.
>>>>
>>>> It's not valid use-case for dpdk (imo).
>>>> [1] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal.c#n204
>>>
>>> Yes this is possible. I had never used it before, but Thomas told me it
>>> is supported by setting--file-prefix option. I had a trial, and I
>>> confirm it works:
>>> session 1> ./install/bin/testpmd -l 0,2 --socket-mem=1024 -w 0000:05:00.0 --proc-type=primary --file-prefix=app1 -- --disable-hw-vlan -i --rxq=1 --txq=1         --nb-cores=1 --forward-mode=io
>>> session 2> ./install/bin/testpmd -l 0,3 --socket-mem=1024 -w 0000:05:00.1 --proc-type=primary --file-prefix=app2 -- --disable-hw-vlan -i --rxq=1 --txq=1         --nb-cores=1 --forward-mode=io
>>>
>>> In the above example, two ports of the same card is used by two
>>> processes. Note that in this case, ACS is supproted and both ports have
>>> their own iommu group.
>>
>> # ls -al /var/run/.app*
>> -rw-r-----. 1 root root 208420 Jul  6 09:08 /var/run/.app1_config
>> -rw-r--r--. 1 root root  49728 Jul  6 09:08 /var/run/.app1_hugepage_info
>> srwxr-xr-x. 1 root root      0 Jul  6 09:08 /var/run/.app1_mp_socket
>> -rw-r-----. 1 root root 208420 Jul  6 09:08 /var/run/.app2_config
>> -rw-r--r--. 1 root root  45584 Jul  6 09:08 /var/run/.app2_hugepage_info
>> srwxr-xr-x. 1 root root      0 Jul  6 09:08 /var/run/.app2_mp_socket
>>
> Yes, You're right, you can start two primary process, I missed that point.
> Use-case which you mentioned is ok, because they are under two different iommu
> group so proposed scheme will work. It may not work for the case when ACS not present,
> so its bypass mode which falls under vfio-noiommu category.
> 
> Having said that: Per discussion on [1]. The proposed scheme where
> bus makes decision based on pci_id and/or pci_drv will be a full proof
> solution, and that way other types of devices will not be impacted. Right?


Right!

Thanks,
Maxime
> [1] https://www.mail-archive.com/dev@dpdk.org/msg70283.html
> 
> 

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v2 00/12] Infrastructure to detect iova mapping on the bus
  2017-06-08 11:05 [PATCH 00/10] Infrastructure to detect iova mapping on the bus Santosh Shukla
                   ` (12 preceding siblings ...)
  2017-07-05  9:30 ` Maxime Coquelin
@ 2017-07-10 11:42 ` Santosh Shukla
  2017-07-10 11:42   ` [PATCH v2 01/12] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
                     ` (12 more replies)
  13 siblings, 13 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-10 11:42 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

v2:
Based on the discussion on the thread [2].
Introducing RTE_PCI_DRV_NEED_IOVA_VA flag for autodetection of iova va mapping. 
If a PCI driver demand for IOVA as VA scheme then the driver can add it in the
PCI driver registration function.

Algorithm to select IOVA as VA for PCI bus case:
    0. Look for device attached to vfio kdrv and has .drv_flag set
    to RTE_PCI_DRV_NEED_IOVA_VA.
    1. Look for any device attached to UIO class of driver.
    2. Check for vfio-noiommu mode enabled.

    If 1) & 2) is false and 0) is true then select
    mapping scheme as iova=va. Otherwise use default
    mapping scheme (iova_pa).

That way, Bus can truly autodetect the iova mapping mode for
a device Or a set of the device.

v1 --> v2 change history:
- Removed override eal option i.e. (--iova-mode=<>) Because we have means to
  truly autodetect the iova mode.
- Introduced RTE_PCI_DRV_NEED_IOVA_VA drv_flag.
- Using NEED_IOVA_VA drv_flag in autodetection logic.
- Removed Linux version check macro in vfio code, As per Maxime feedback.
- Moved rte_pci_match API from local to global.


Patch Summary:
0) 1st: Introducing a new flag in rte_pci_drv
1) 2nd: declare rte_pci_match api in pci header. Required for autodetection in
follow up patches.
2) 3nd - 4th: autodetection mapping infrastructure for Linux/bsdapp.
3) 5th: Introduces global bus API named rte_bus_get_iommu_class.
4) 6th: iova mode helper API.
5) 7th - 8th: Calls rte_bus_get_iommu_class API for Linux/bsdapp and returns
their iova mode.
6) 9th: Check iova mode and accordingly map vfio.dma_map to _pa or _va.
7) 10th - 12th: Check for IOVA_VA mode in below APIs
        - rte_mem_virt2phy
        - rte_mempool_virt2phy
        - rte_malloc_virt2phy

Test History:
- Tested for x86/XL710 40G NIC card for both modes (iova_va/pa).
- Tested for arm64/thunderx vNIC Integrated NIC for both modes
- Tested for arm64/Octeontx integrated NICs for only
  Iova_va mode(It supports only one mode.)
- Ran standalone tests like mempool_autotest, mbuf_autotest.
- Verified for Doxygen.

Work History:
For v1, Refer [1].

Checkpatch result:
- No error/warning noticed.

[1] https://www.mail-archive.com/dev@dpdk.org/msg67438.html
[2] https://www.mail-archive.com/dev@dpdk.org/msg70279.html


Santosh Shukla (12):
  eal/pci: introduce PCI driver iova as va flag
  eal/pci: export match function
  bsdapp/eal_pci: get iommu class
  linuxapp/eal_pci: get iommu class
  bus: get iommu class
  eal: introduce iova mode helper api
  linuxapp/eal: auto detect iova mode
  bsdapp/eal: auto detect iova mapping mode
  linuxapp/eal_vfio: honor iova mode before mapping
  linuxapp/eal_memory: honor iova mode in virt2phy
  mempool: honor iova mode in virt2phy
  eal/rte_malloc: honor iova mode in virt2phy

 lib/librte_eal/bsdapp/eal/eal.c                 | 22 ++++++---
 lib/librte_eal/bsdapp/eal/eal_pci.c             | 10 ++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  4 ++
 lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++
 lib/librte_eal/common/eal_common_pci.c          | 11 +----
 lib/librte_eal/common/include/rte_bus.h         | 31 ++++++++++++
 lib/librte_eal/common/include/rte_eal.h         | 12 +++++
 lib/librte_eal/common/include/rte_pci.h         | 28 +++++++++++
 lib/librte_eal/common/rte_malloc.c              |  9 +++-
 lib/librte_eal/linuxapp/eal/eal.c               | 22 ++++++---
 lib/librte_eal/linuxapp/eal/eal_memory.c        |  3 ++
 lib/librte_eal/linuxapp/eal/eal_pci.c           | 66 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 29 ++++++++++-
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  4 ++
 lib/librte_mempool/rte_mempool.h                | 10 +++-
 16 files changed, 262 insertions(+), 26 deletions(-)

-- 
2.13.0

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v2 01/12] eal/pci: introduce PCI driver iova as va flag
  2017-07-10 11:42 ` [PATCH v2 00/12] " Santosh Shukla
@ 2017-07-10 11:42   ` Santosh Shukla
  2017-07-10 11:42   ` [PATCH v2 02/12] eal/pci: export match function Santosh Shukla
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-10 11:42 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Introducing RTE_PCI_DRV_NEED_IOVA_VA flag. Flag used when driver needs
to operate in iova=va mode.

Why driver need iova=va mapping?

On NPU style co-processors like Octeontx, the buffer recycling has been
done in HW, unlike SW model. Here is the data flow:
1) On control path, Fill the HW mempool with buffers(iova as pa address)
2) on rx_burst, HW gives you IOVA address(iova as pa address)
3) As application expects VA to operate on it, rx_burst() needs to
convert to _va from _pa. Which is very expensive.
Instead of that if iova as va mapping, we can avoid the cost of
converting with help of IOMMU/SMMU.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/common/include/rte_pci.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 8b123391c..ac79040dd 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -202,6 +202,8 @@ struct rte_pci_bus {
 #define RTE_PCI_DRV_INTR_RMV 0x0010
 /** Device driver needs to keep mapped resources if unsupported dev detected */
 #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
+/** Device driver needs iova as va */
+#define RTE_PCI_DRV_NEED_IOVA_VA 0X0040
 
 /**
  * A structure describing a PCI mapping.
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v2 02/12] eal/pci: export match function
  2017-07-10 11:42 ` [PATCH v2 00/12] " Santosh Shukla
  2017-07-10 11:42   ` [PATCH v2 01/12] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
@ 2017-07-10 11:42   ` Santosh Shukla
  2017-07-10 11:42   ` [PATCH v2 03/12] bsdapp/eal_pci: get iommu class Santosh Shukla
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-10 11:42 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Export rte_pci_match() function as it needed in the followup patch.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/eal_common_pci.c          | 10 +---------
 lib/librte_eal/common/include/rte_pci.h         | 15 +++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 4 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 381f895cd..8d43df0bb 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -200,6 +200,7 @@ DPDK_17.08 {
 	rte_bus_find;
 	rte_bus_find_by_device;
 	rte_bus_find_by_name;
+	rte_pci_match;
 
 } DPDK_17.05;
 
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 76bbcc853..8b6ecebd6 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -128,16 +128,8 @@ pci_unmap_resource(void *requested_addr, size_t size)
 
 /*
  * Match the PCI Driver and Device using the ID Table
- *
- * @param pci_drv
- *	PCI driver from which ID table would be extracted
- * @param pci_dev
- *	PCI device to match against the driver
- * @return
- *	1 for successful match
- *	0 for unsuccessful match
  */
-static int
+int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev)
 {
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index ac79040dd..4a485674e 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -368,6 +368,21 @@ int rte_pci_scan(void);
 int
 rte_pci_probe(void);
 
+/*
+ * Match the PCI Driver and Device using the ID Table
+ *
+ * @param pci_drv
+ *      PCI driver from which ID table would be extracted
+ * @param pci_dev
+ *      PCI device to match against the driver
+ * @return
+ *      1 for successful match
+ *      0 for unsuccessful match
+ */
+int
+rte_pci_match(const struct rte_pci_driver *pci_drv,
+	      const struct rte_pci_device *pci_dev);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 0f9e009b6..c91dd44c4 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -205,6 +205,7 @@ DPDK_17.08 {
 	rte_bus_find;
 	rte_bus_find_by_device;
 	rte_bus_find_by_name;
+	rte_pci_match;
 
 } DPDK_17.05;
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v2 03/12] bsdapp/eal_pci: get iommu class
  2017-07-10 11:42 ` [PATCH v2 00/12] " Santosh Shukla
  2017-07-10 11:42   ` [PATCH v2 01/12] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
  2017-07-10 11:42   ` [PATCH v2 02/12] eal/pci: export match function Santosh Shukla
@ 2017-07-10 11:42   ` Santosh Shukla
  2017-07-10 11:42   ` [PATCH v2 04/12] linuxapp/eal_pci: " Santosh Shukla
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-10 11:42 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Introducing rte_pci_get_iommu_class API which helps to get iommu class
of PCI device on the bus and returns preferred iova mapping mode for
PCI bus.

Bsdapp case returns default iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/bsdapp/eal/eal_pci.c           | 10 ++++++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map |  1 +
 lib/librte_eal/common/include/rte_bus.h       |  9 +++++++++
 lib/librte_eal/common/include/rte_pci.h       | 11 +++++++++++
 4 files changed, 31 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c
index e321461d8..40a951e31 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -405,6 +405,16 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Get iommu class of pci devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	/* Supports only RTE_KDRV_NIC_UIO */
+	return RTE_IOVA_PA;
+}
+
 int
 pci_update_device(const struct rte_pci_addr *addr)
 {
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 8d43df0bb..33c2c32c0 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -201,6 +201,7 @@ DPDK_17.08 {
 	rte_bus_find_by_device;
 	rte_bus_find_by_name;
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.05;
 
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 37cc230ad..deced4f28 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -56,6 +56,15 @@ extern "C" {
 /** Double linked list of buses */
 TAILQ_HEAD(rte_bus_list, rte_bus);
 
+
+/**
+ * IOVA mapping mode.
+ */
+enum rte_iova_mode {
+	RTE_IOVA_PA = 1,
+	RTE_IOVA_VA
+};
+
 /**
  * Bus specific scan for devices attached on the bus.
  * For each bus object, the scan would be responsible for finding devices and
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 4a485674e..c58361132 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -383,6 +383,17 @@ int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev);
 
+
+/**
+ * Get iommu class of PCI devices on the bus.
+ * And return their preferred iova mapping mode.
+ *
+ * @return
+ *   - enum rte_iova_mode.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v2 04/12] linuxapp/eal_pci: get iommu class
  2017-07-10 11:42 ` [PATCH v2 00/12] " Santosh Shukla
                     ` (2 preceding siblings ...)
  2017-07-10 11:42   ` [PATCH v2 03/12] bsdapp/eal_pci: get iommu class Santosh Shukla
@ 2017-07-10 11:42   ` Santosh Shukla
  2017-07-10 11:42   ` [PATCH v2 05/12] bus: " Santosh Shukla
                     ` (8 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-10 11:42 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Get iommu class of PCI device on the bus and returns preferred iova
mapping mode for that bus.

Algorithm for iova scheme selection for PCI bus:
0. Look for device attached to vfio kdrv and has .drv_flag set
to RTE_PCI_DRV_NEED_IOVA_VA.
1. Look for any device attached to UIO class of driver.
2. Check for vfio-noiommu mode enabled.

If 1) & 2) is false and 0) is true then select
mapping scheme as iova=va. Otherwise use default
mapping scheme (iova_pa).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
v1 --> v2:
- Removed Linux version check in vfio_noiommu func. Refer [1].
- Extending autodetction logic for _iommu_class.
Refer [2].

[1] https://www.mail-archive.com/dev@dpdk.org/msg70108.html
[2] https://www.mail-archive.com/dev@dpdk.org/msg70279.html

 lib/librte_eal/linuxapp/eal/eal_pci.c           | 66 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 4 files changed, 90 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 7d9e1a99b..573caa000 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -45,6 +45,7 @@
 #include "eal_filesystem.h"
 #include "eal_private.h"
 #include "eal_pci_init.h"
+#include "eal_vfio.h"
 
 /**
  * @file
@@ -488,6 +489,71 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Any one of the device bound to uio
+ */
+static inline int
+pci_device_bound_uio(void)
+{
+	struct rte_pci_device *dev = NULL;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
+		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
+			return 1;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Any one of the device has iova as va
+ */
+static inline int
+pci_device_has_iova_va(void)
+{
+	struct rte_pci_device *dev = NULL;
+	struct rte_pci_driver *drv = NULL;
+
+	FOREACH_DRIVER_ON_PCIBUS(drv) {
+		if (drv && drv->drv_flags & RTE_PCI_DRV_NEED_IOVA_VA) {
+			FOREACH_DEVICE_ON_PCIBUS(dev) {
+				if (dev->kdrv == RTE_KDRV_VFIO &&
+				    rte_pci_match(drv, dev))
+					return 1;
+			}
+		}
+	}
+	return 0;
+}
+
+/*
+ * Get iommu class of PCI devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	bool is_vfio_noiommu_enabled;
+	bool has_iova_va;
+	bool is_bound_uio;
+
+	has_iova_va = pci_device_has_iova_va();
+	is_bound_uio = pci_device_bound_uio();
+	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
+
+	if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
+		return RTE_IOVA_VA;
+
+	if (has_iova_va) {
+		if (is_vfio_noiommu_enabled)
+			RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
+		if (is_bound_uio)
+			RTE_LOG(WARNING, EAL, "Some device attached to UIO\n");
+	}
+
+	return RTE_IOVA_PA;
+}
+
 /* Read PCI config space. */
 int rte_pci_read_config(const struct rte_pci_device *device,
 		void *buf, size_t len, off_t offset)
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 946df7e31..c8a97b7e7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
 	return 0;
 }
 
+int
+vfio_noiommu_is_enabled(void)
+{
+	int fd, ret, cnt __rte_unused;
+	char c;
+
+	ret = -1;
+	fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
+	if (fd < 0)
+		return -1;
+
+	cnt = read(fd, &c, 1);
+	if (c == 'Y')
+		ret = 1;
+
+	close(fd);
+	return ret;
+}
+
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index 5ff63e5d7..26ea8e119 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -150,6 +150,8 @@ struct vfio_config {
 #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
 #define VFIO_GET_REGION_IDX(x) (x >> 40)
+#define VFIO_NOIOMMU_MODE      \
+	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
 
 /* DMA mapping function prototype.
  * Takes VFIO container fd as a parameter.
@@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
 
 int vfio_mp_sync_setup(void);
 
+int vfio_noiommu_is_enabled(void);
+
 #define SOCKET_REQ_CONTAINER 0x100
 #define SOCKET_REQ_GROUP 0x200
 #define SOCKET_CLR_GROUP 0x300
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index c91dd44c4..044f89c7c 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -206,6 +206,7 @@ DPDK_17.08 {
 	rte_bus_find_by_device;
 	rte_bus_find_by_name;
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.05;
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v2 05/12] bus: get iommu class
  2017-07-10 11:42 ` [PATCH v2 00/12] " Santosh Shukla
                     ` (3 preceding siblings ...)
  2017-07-10 11:42   ` [PATCH v2 04/12] linuxapp/eal_pci: " Santosh Shukla
@ 2017-07-10 11:42   ` Santosh Shukla
  2017-07-10 11:42   ` [PATCH v2 06/12] eal: introduce iova mode helper api Santosh Shukla
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-10 11:42 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

API(rte_bus_get_iommu_class) helps to automatically detect and select
appropriate iova mapping scheme for iommu capable device on that bus.

Algorithm for iova scheme selection for bus:
0. Iterate throught bus_list.
1. Collect each bus iova mode value and update into 'mode' var.
2. Here value '1' is _pa and value '2' is _va mode.
So mode selection scheme is like:
if mode == 2 then iova mode is _va.
if mode == 1 then iova mode is _pa
if mode  == 3 then iova mode ia _pa.

So mode !=2  will be default iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
 lib/librte_eal/common/eal_common_pci.c          |  1 +
 lib/librte_eal/common/include/rte_bus.h         | 22 ++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 48 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 33c2c32c0..a2dd65a33 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -202,6 +202,7 @@ DPDK_17.08 {
 	rte_bus_find_by_name;
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.05;
 
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 1d3635c50..0a4d953ee 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -223,3 +223,26 @@ rte_bus_find_by_device_name(const char *str)
 		c[0] = '\0';
 	return rte_bus_find(NULL, bus_can_parse, name);
 }
+
+
+/*
+ * Get iommu class of devices on the bus.
+ */
+enum rte_iova_mode
+rte_bus_get_iommu_class(void)
+{
+	int mode = 0;
+	struct rte_bus *bus;
+
+	TAILQ_FOREACH(bus, &rte_bus_list, next) {
+
+		if (bus->get_iommu_class)
+			mode |= bus->get_iommu_class();
+	}
+
+	if (mode != RTE_IOVA_VA) {
+		/* Use default IOVA mode */
+		mode = RTE_IOVA_PA;
+	}
+	return mode;
+}
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 8b6ecebd6..bdf2e7c3a 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -552,6 +552,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.plug = pci_plug,
 		.unplug = pci_unplug,
 		.parse = pci_parse,
+		.get_iommu_class = rte_pci_get_iommu_class,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index deced4f28..5336fa18f 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,17 @@ struct rte_bus_conf {
 	enum rte_bus_scan_mode scan_mode; /**< Scan policy. */
 };
 
+
+/**
+ * Get iommu class of devices on the bus.
+ * Check that those devices are attached to iommu driver.
+ *
+ * @return
+ *      enum rte_iova_mode value.
+ */
+typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
+
+
 /**
  * A structure describing a generic bus.
  */
@@ -195,6 +206,7 @@ struct rte_bus {
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	struct rte_bus_conf conf;    /**< Bus configuration */
+	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
 
 /**
@@ -298,6 +310,16 @@ struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
  */
 struct rte_bus *rte_bus_find_by_name(const char *busname);
 
+
+/**
+ * Get iommu class of devices on the bus.
+ * Check that those devices are attached to iommu driver.
+ *
+ * @return
+ *     enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_bus_get_iommu_class(void);
+
 /**
  * Helper for Bus registration.
  * The constructor has higher priority than PMD constructors.
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 044f89c7c..186c7b0fd 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -207,6 +207,7 @@ DPDK_17.08 {
 	rte_bus_find_by_name;
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.05;
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v2 06/12] eal: introduce iova mode helper api
  2017-07-10 11:42 ` [PATCH v2 00/12] " Santosh Shukla
                     ` (4 preceding siblings ...)
  2017-07-10 11:42   ` [PATCH v2 05/12] bus: " Santosh Shukla
@ 2017-07-10 11:42   ` Santosh Shukla
  2017-07-10 11:42   ` [PATCH v2 07/12] linuxapp/eal: auto detect iova mode Santosh Shukla
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-10 11:42 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Introducing rte_eal_iova_mode() helper API. This API
used by non-eal library for detecting iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
---
 lib/librte_eal/bsdapp/eal/eal.c                 |  6 ++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/include/rte_eal.h         | 12 ++++++++++++
 lib/librte_eal/linuxapp/eal/eal.c               |  6 ++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 26 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 05f0c1f90..e1aee8c3e 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -120,6 +120,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index a2dd65a33..43cb11d7b 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -203,6 +203,7 @@ DPDK_17.08 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.05;
 
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 6b7c5ca92..849f5f050 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -45,6 +45,7 @@
 
 #include <rte_per_lcore.h>
 #include <rte_config.h>
+#include <rte_bus.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -85,6 +86,9 @@ struct rte_config {
 	/** Primary or secondary configuration */
 	enum rte_proc_type_t process_type;
 
+	/** PA or VA mapping mode */
+	enum rte_iova_mode iova_mode;
+
 	/**
 	 * Pointer to memory configuration, which may be shared across multiple
 	 * DPDK instances
@@ -283,6 +287,14 @@ static inline int rte_gettid(void)
 	return RTE_PER_LCORE(_thread_id);
 }
 
+/**
+ * Get the iova mode
+ *
+ * @return
+ *   enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_eal_iova_mode(void);
+
 #define RTE_INIT(func) \
 static void __attribute__((constructor, used)) func(void)
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 7c78f2dc2..2546b55e4 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -129,6 +129,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 186c7b0fd..0de876c26 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -208,6 +208,7 @@ DPDK_17.08 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.05;
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v2 07/12] linuxapp/eal: auto detect iova mode
  2017-07-10 11:42 ` [PATCH v2 00/12] " Santosh Shukla
                     ` (5 preceding siblings ...)
  2017-07-10 11:42   ` [PATCH v2 06/12] eal: introduce iova mode helper api Santosh Shukla
@ 2017-07-10 11:42   ` Santosh Shukla
  2017-07-10 11:42   ` [PATCH v2 08/12] bsdapp/eal: auto detect iova mapping mode Santosh Shukla
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-10 11:42 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

- Moving late bus scanning to up..just after eal_parsing.
- Auto detect iova mapping mode, based on the result of
  rte_bus_scan_iommu_class.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 2546b55e4..7b4dd70de 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -799,6 +799,16 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	if (rte_bus_get_iommu_class() == RTE_IOVA_VA)
+		rte_eal_get_configuration()->iova_mode = RTE_IOVA_VA;
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			internal_config.xen_dom0_support == 0 &&
@@ -896,12 +906,6 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v2 08/12] bsdapp/eal: auto detect iova mapping mode
  2017-07-10 11:42 ` [PATCH v2 00/12] " Santosh Shukla
                     ` (6 preceding siblings ...)
  2017-07-10 11:42   ` [PATCH v2 07/12] linuxapp/eal: auto detect iova mode Santosh Shukla
@ 2017-07-10 11:42   ` Santosh Shukla
  2017-07-10 11:42   ` [PATCH v2 09/12] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-10 11:42 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

- Moving late bus scanning to up..just after eal_parsing.
- Mapping mode would be default for bsdapp. It supports
  only one pass through mode (RTE_KDRV_NIC_UIO)

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/bsdapp/eal/eal.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index e1aee8c3e..7c63b2fa7 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -542,6 +542,16 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	if (rte_bus_get_iommu_class() == RTE_IOVA_VA)
+		rte_eal_get_configuration()->iova_mode = RTE_IOVA_VA;
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			eal_hugepage_info_init() < 0) {
@@ -621,12 +631,6 @@ rte_eal_init(int argc, char **argv)
 		rte_config.master_lcore, thread_id, cpuset,
 		ret == 0 ? "" : "...");
 
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v2 09/12] linuxapp/eal_vfio: honor iova mode before mapping
  2017-07-10 11:42 ` [PATCH v2 00/12] " Santosh Shukla
                     ` (7 preceding siblings ...)
  2017-07-10 11:42   ` [PATCH v2 08/12] bsdapp/eal: auto detect iova mapping mode Santosh Shukla
@ 2017-07-10 11:42   ` Santosh Shukla
  2017-07-10 11:42   ` [PATCH v2 10/12] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-10 11:42 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Check iova mode and accordingly map iova to pa or va.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c8a97b7e7..b32cd09a2 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
 
 		ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
@@ -792,7 +795,10 @@ vfio_spapr_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
 				 VFIO_DMA_MAP_FLAG_WRITE;
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v2 10/12] linuxapp/eal_memory: honor iova mode in virt2phy
  2017-07-10 11:42 ` [PATCH v2 00/12] " Santosh Shukla
                     ` (8 preceding siblings ...)
  2017-07-10 11:42   ` [PATCH v2 09/12] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
@ 2017-07-10 11:42   ` Santosh Shukla
  2017-07-10 11:42   ` [PATCH v2 11/12] mempool: " Santosh Shukla
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-10 11:42 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 040f24a43..7bff815b6 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -141,6 +141,9 @@ rte_mem_virt2phy(const void *virtaddr)
 	int page_size;
 	off_t offset;
 
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		return (uintptr_t)virtaddr;
+
 	/* when using dom0, /proc/self/pagemap always returns 0, check in
 	 * dpdk memory by browsing the memsegs */
 	if (rte_xen_dom0_supported()) {
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v2 11/12] mempool: honor iova mode in virt2phy
  2017-07-10 11:42 ` [PATCH v2 00/12] " Santosh Shukla
                     ` (9 preceding siblings ...)
  2017-07-10 11:42   ` [PATCH v2 10/12] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
@ 2017-07-10 11:42   ` Santosh Shukla
  2017-07-10 12:27     ` Olivier Matz
  2017-07-10 11:42   ` [PATCH v2 12/12] eal/rte_malloc: " Santosh Shukla
  2017-07-11  6:16   ` [PATCH v3 00/11] Infrastructure to detect iova mapping on the bus Santosh Shukla
  12 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-07-10 11:42 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_mempool/rte_mempool.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 76b5b3b15..fafa77e3b 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -78,6 +78,7 @@
 #include <rte_ring.h>
 #include <rte_memcpy.h>
 #include <rte_common.h>
+#include <rte_bus.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -1397,9 +1398,14 @@ rte_mempool_empty(const struct rte_mempool *mp)
 static inline phys_addr_t
 rte_mempool_virt2phy(__rte_unused const struct rte_mempool *mp, const void *elt)
 {
-	const struct rte_mempool_objhdr *hdr;
-	hdr = (const struct rte_mempool_objhdr *)RTE_PTR_SUB(elt,
+	struct rte_mempool_objhdr *hdr;
+
+	hdr = (struct rte_mempool_objhdr *)RTE_PTR_SUB(elt,
 		sizeof(*hdr));
+
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		hdr->physaddr = (uintptr_t)elt;
+
 	return hdr->physaddr;
 }
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v2 12/12] eal/rte_malloc: honor iova mode in virt2phy
  2017-07-10 11:42 ` [PATCH v2 00/12] " Santosh Shukla
                     ` (10 preceding siblings ...)
  2017-07-10 11:42   ` [PATCH v2 11/12] mempool: " Santosh Shukla
@ 2017-07-10 11:42   ` Santosh Shukla
  2017-07-11  6:16   ` [PATCH v3 00/11] Infrastructure to detect iova mapping on the bus Santosh Shukla
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-10 11:42 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/common/rte_malloc.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 5c0627bf4..d65c05a4d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -251,10 +251,17 @@ rte_malloc_set_limit(__rte_unused const char *type,
 phys_addr_t
 rte_malloc_virt2phy(const void *addr)
 {
+	phys_addr_t paddr;
 	const struct malloc_elem *elem = malloc_elem_from_data(addr);
 	if (elem == NULL)
 		return RTE_BAD_PHYS_ADDR;
 	if (elem->ms->phys_addr == RTE_BAD_PHYS_ADDR)
 		return RTE_BAD_PHYS_ADDR;
-	return elem->ms->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		paddr = (uintptr_t)addr;
+	else
+		paddr = elem->ms->phys_addr +
+			((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+	return paddr;
 }
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* Re: [PATCH v2 11/12] mempool: honor iova mode in virt2phy
  2017-07-10 11:42   ` [PATCH v2 11/12] mempool: " Santosh Shukla
@ 2017-07-10 12:27     ` Olivier Matz
  2017-07-10 13:30       ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Olivier Matz @ 2017-07-10 12:27 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: thomas, dev, bruce.richardson, jerin.jacob, hemant.agrawal,
	shreyansh.jain, gaetan.rivet, sergio.gonzalez.monroy,
	anatoly.burakov, stephen, maxime.coquelin

On Mon, 10 Jul 2017 11:42:34 +0000, Santosh Shukla <santosh.shukla@caviumnetworks.com> wrote:
> Check iova mode and accordingly return phy addr.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>  lib/librte_mempool/rte_mempool.h | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
> index 76b5b3b15..fafa77e3b 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -78,6 +78,7 @@
>  #include <rte_ring.h>
>  #include <rte_memcpy.h>
>  #include <rte_common.h>
> +#include <rte_bus.h>
>  
>  #ifdef __cplusplus
>  extern "C" {
> @@ -1397,9 +1398,14 @@ rte_mempool_empty(const struct rte_mempool *mp)
>  static inline phys_addr_t
>  rte_mempool_virt2phy(__rte_unused const struct rte_mempool *mp, const void *elt)
>  {
> -	const struct rte_mempool_objhdr *hdr;
> -	hdr = (const struct rte_mempool_objhdr *)RTE_PTR_SUB(elt,
> +	struct rte_mempool_objhdr *hdr;
> +
> +	hdr = (struct rte_mempool_objhdr *)RTE_PTR_SUB(elt,
>  		sizeof(*hdr));
> +
> +	if (rte_eal_iova_mode() == RTE_IOVA_VA)
> +		hdr->physaddr = (uintptr_t)elt;
> +
>  	return hdr->physaddr;
>  }
>  

This overrides the physaddr field in the object hdr, this is
surely not what you want (note that hdr was const).

This change could at least take place in mempool_add_elem().
There is even maybe no need to change rte_mempool at all: if
rte_memzone_reserve() already returns the proper address in
memzone->phys_addr, it should be transparent.

I didn't check the patchset in detail, but in my understanding,
what we call physaddr in dpdk is actually a bus address. Shouldn't
we start to rename some of these fields and functions to avoid
confusion?


Thanks,
Olivier

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v2 11/12] mempool: honor iova mode in virt2phy
  2017-07-10 12:27     ` Olivier Matz
@ 2017-07-10 13:30       ` santosh
  2017-07-10 13:51         ` Thomas Monjalon
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-10 13:30 UTC (permalink / raw)
  To: Olivier Matz
  Cc: thomas, dev, bruce.richardson, jerin.jacob, hemant.agrawal,
	shreyansh.jain, gaetan.rivet, sergio.gonzalez.monroy,
	anatoly.burakov, stephen, maxime.coquelin

Hi Olivier,

On Monday 10 July 2017 05:57 PM, Olivier Matz wrote:

> On Mon, 10 Jul 2017 11:42:34 +0000, Santosh Shukla <santosh.shukla@caviumnetworks.com> wrote:
>> Check iova mode and accordingly return phy addr.
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> ---
>>  lib/librte_mempool/rte_mempool.h | 10 ++++++++--
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
>> index 76b5b3b15..fafa77e3b 100644
>> --- a/lib/librte_mempool/rte_mempool.h
>> +++ b/lib/librte_mempool/rte_mempool.h
>> @@ -78,6 +78,7 @@
>>  #include <rte_ring.h>
>>  #include <rte_memcpy.h>
>>  #include <rte_common.h>
>> +#include <rte_bus.h>
>>  
>>  #ifdef __cplusplus
>>  extern "C" {
>> @@ -1397,9 +1398,14 @@ rte_mempool_empty(const struct rte_mempool *mp)
>>  static inline phys_addr_t
>>  rte_mempool_virt2phy(__rte_unused const struct rte_mempool *mp, const void *elt)
>>  {
>> -	const struct rte_mempool_objhdr *hdr;
>> -	hdr = (const struct rte_mempool_objhdr *)RTE_PTR_SUB(elt,
>> +	struct rte_mempool_objhdr *hdr;
>> +
>> +	hdr = (struct rte_mempool_objhdr *)RTE_PTR_SUB(elt,
>>  		sizeof(*hdr));
>> +
>> +	if (rte_eal_iova_mode() == RTE_IOVA_VA)
>> +		hdr->physaddr = (uintptr_t)elt;
>> +
>>  	return hdr->physaddr;
>>  }
>>  
> This overrides the physaddr field in the object hdr, this is
> surely not what you want (note that hdr was const).
>
> This change could at least take place in mempool_add_elem().
> There is even maybe no need to change rte_mempool at all: if
> rte_memzone_reserve() already returns the proper address in
> memzone->phys_addr, it should be transparent.

Indeed. The change at rte_malloc_virt2phy() make sure
that mz->phys_addr has va_addr in case iova=va mapping mode.
So its transparent for both modes.

virt2phy translation for mempool like derivative
class api not required. Provided that rte_mem/malloc_virt2phy()
apis are iova mode aware.

> I didn't check the patchset in detail, but in my understanding,
> what we call physaddr in dpdk is actually a bus address. Shouldn't
> we start to rename some of these fields and functions to avoid
> confusion?

Agree.
While working on iova mode thing and reading these vir2phy api -
confused me more. Actually it should be iova2va, va2iova or pa2iova,iova2pa..
where iova address is nothing but bus address Or we should refer to linux
semantics.

We thought of addressing semantics after this series, Not a priority in IMO.

>
> Thanks,
> Olivier

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v2 11/12] mempool: honor iova mode in virt2phy
  2017-07-10 13:30       ` santosh
@ 2017-07-10 13:51         ` Thomas Monjalon
  2017-07-10 13:56           ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Thomas Monjalon @ 2017-07-10 13:51 UTC (permalink / raw)
  To: santosh
  Cc: Olivier Matz, dev, bruce.richardson, jerin.jacob, hemant.agrawal,
	shreyansh.jain, gaetan.rivet, sergio.gonzalez.monroy,
	anatoly.burakov, stephen, maxime.coquelin

10/07/2017 15:30, santosh:
> Hi Olivier,
> 
> On Monday 10 July 2017 05:57 PM, Olivier Matz wrote:
> > I didn't check the patchset in detail, but in my understanding,
> > what we call physaddr in dpdk is actually a bus address. Shouldn't
> > we start to rename some of these fields and functions to avoid
> > confusion?
> 
> Agree.
> While working on iova mode thing and reading these vir2phy api -
> confused me more. Actually it should be iova2va, va2iova or pa2iova,iova2pa..
> where iova address is nothing but bus address Or we should refer to linux
> semantics.
> 
> We thought of addressing semantics after this series, Not a priority in IMO.

I think it is a priority to start with semantics.
The work is too hard with wrong semantic otherwise.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v2 11/12] mempool: honor iova mode in virt2phy
  2017-07-10 13:51         ` Thomas Monjalon
@ 2017-07-10 13:56           ` santosh
  2017-07-10 14:09             ` Thomas Monjalon
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-10 13:56 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Olivier Matz, dev, bruce.richardson, jerin.jacob, hemant.agrawal,
	shreyansh.jain, gaetan.rivet, sergio.gonzalez.monroy,
	anatoly.burakov, stephen, maxime.coquelin

On Monday 10 July 2017 07:21 PM, Thomas Monjalon wrote:

> 10/07/2017 15:30, santosh:
>> Hi Olivier,
>>
>> On Monday 10 July 2017 05:57 PM, Olivier Matz wrote:
>>> I didn't check the patchset in detail, but in my understanding,
>>> what we call physaddr in dpdk is actually a bus address. Shouldn't
>>> we start to rename some of these fields and functions to avoid
>>> confusion?
>> Agree.
>> While working on iova mode thing and reading these vir2phy api -
>> confused me more. Actually it should be iova2va, va2iova or pa2iova,iova2pa..
>> where iova address is nothing but bus address Or we should refer to linux
>> semantics.
>>
>> We thought of addressing semantics after this series, Not a priority in IMO.
> I think it is a priority to start with semantics.
> The work is too hard with wrong semantic otherwise.

Sorry, I don;t agree with you. Semantic shouldn't lower the iova priority.
iova framework is blocking SoC's. w/o iova framework : One has to live with
hackish solution for their SoC.

Semantic change in any-case could be pipelined. It shouldn't be like
Semantics change gets priority and therefore it blocks other SoCs.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v2 11/12] mempool: honor iova mode in virt2phy
  2017-07-10 13:56           ` santosh
@ 2017-07-10 14:09             ` Thomas Monjalon
  2017-07-10 14:22               ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Thomas Monjalon @ 2017-07-10 14:09 UTC (permalink / raw)
  To: santosh
  Cc: Olivier Matz, dev, bruce.richardson, jerin.jacob, hemant.agrawal,
	shreyansh.jain, gaetan.rivet, sergio.gonzalez.monroy,
	anatoly.burakov, stephen, maxime.coquelin

10/07/2017 15:56, santosh:
> On Monday 10 July 2017 07:21 PM, Thomas Monjalon wrote:
> 
> > 10/07/2017 15:30, santosh:
> >> Hi Olivier,
> >>
> >> On Monday 10 July 2017 05:57 PM, Olivier Matz wrote:
> >>> I didn't check the patchset in detail, but in my understanding,
> >>> what we call physaddr in dpdk is actually a bus address. Shouldn't
> >>> we start to rename some of these fields and functions to avoid
> >>> confusion?
> >> Agree.
> >> While working on iova mode thing and reading these vir2phy api -
> >> confused me more. Actually it should be iova2va, va2iova or pa2iova,iova2pa..
> >> where iova address is nothing but bus address Or we should refer to linux
> >> semantics.
> >>
> >> We thought of addressing semantics after this series, Not a priority in IMO.
> > I think it is a priority to start with semantics.
> > The work is too hard with wrong semantic otherwise.
> 
> Sorry, I don;t agree with you. Semantic shouldn't lower the iova priority.
> iova framework is blocking SoC's. w/o iova framework : One has to live with
> hackish solution for their SoC.
> 
> Semantic change in any-case could be pipelined. It shouldn't be like
> Semantics change gets priority and therefore it blocks other SoCs.

I am not saying it is blocking.
I just say that you have not started your work by the beginning,
and now it make reviews difficult (from what I understand).
You must make all the efforts to make your patches easier to
understand and accept.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v2 11/12] mempool: honor iova mode in virt2phy
  2017-07-10 14:09             ` Thomas Monjalon
@ 2017-07-10 14:22               ` santosh
  2017-07-10 14:37                 ` Thomas Monjalon
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-10 14:22 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Olivier Matz, dev, bruce.richardson, jerin.jacob, hemant.agrawal,
	shreyansh.jain, gaetan.rivet, sergio.gonzalez.monroy,
	anatoly.burakov, stephen, maxime.coquelin

On Monday 10 July 2017 07:39 PM, Thomas Monjalon wrote:

> 10/07/2017 15:56, santosh:
>> On Monday 10 July 2017 07:21 PM, Thomas Monjalon wrote:
>>
>>> 10/07/2017 15:30, santosh:
>>>> Hi Olivier,
>>>>
>>>> On Monday 10 July 2017 05:57 PM, Olivier Matz wrote:
>>>>> I didn't check the patchset in detail, but in my understanding,
>>>>> what we call physaddr in dpdk is actually a bus address. Shouldn't
>>>>> we start to rename some of these fields and functions to avoid
>>>>> confusion?
>>>> Agree.
>>>> While working on iova mode thing and reading these vir2phy api -
>>>> confused me more. Actually it should be iova2va, va2iova or pa2iova,iova2pa..
>>>> where iova address is nothing but bus address Or we should refer to linux
>>>> semantics.
>>>>
>>>> We thought of addressing semantics after this series, Not a priority in IMO.
>>> I think it is a priority to start with semantics.
>>> The work is too hard with wrong semantic otherwise.
>> Sorry, I don;t agree with you. Semantic shouldn't lower the iova priority.
>> iova framework is blocking SoC's. w/o iova framework : One has to live with
>> hackish solution for their SoC.
>>
>> Semantic change in any-case could be pipelined. It shouldn't be like
>> Semantics change gets priority and therefore it blocks other SoCs.
> I am not saying it is blocking.
> I just say that you have not started your work by the beginning,
> and now it make reviews difficult (from what I understand).
> You must make all the efforts to make your patches easier to
> understand and accept.

It's just about changing name for virt2phy api's.. But changing those function
names require deprecation notice, Once iova patchset is merged then I'll
take up responsibility for sending deprecation notice and change those api
name in the next release.
 

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v2 11/12] mempool: honor iova mode in virt2phy
  2017-07-10 14:22               ` santosh
@ 2017-07-10 14:37                 ` Thomas Monjalon
  2017-08-04  4:00                   ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Thomas Monjalon @ 2017-07-10 14:37 UTC (permalink / raw)
  To: santosh
  Cc: Olivier Matz, dev, bruce.richardson, jerin.jacob, hemant.agrawal,
	shreyansh.jain, gaetan.rivet, sergio.gonzalez.monroy,
	anatoly.burakov, stephen, maxime.coquelin

10/07/2017 16:22, santosh:
> On Monday 10 July 2017 07:39 PM, Thomas Monjalon wrote:
> 
> > 10/07/2017 15:56, santosh:
> >> On Monday 10 July 2017 07:21 PM, Thomas Monjalon wrote:
> >>
> >>> 10/07/2017 15:30, santosh:
> >>>> Hi Olivier,
> >>>>
> >>>> On Monday 10 July 2017 05:57 PM, Olivier Matz wrote:
> >>>>> I didn't check the patchset in detail, but in my understanding,
> >>>>> what we call physaddr in dpdk is actually a bus address. Shouldn't
> >>>>> we start to rename some of these fields and functions to avoid
> >>>>> confusion?
> >>>> Agree.
> >>>> While working on iova mode thing and reading these vir2phy api -
> >>>> confused me more. Actually it should be iova2va, va2iova or pa2iova,iova2pa..
> >>>> where iova address is nothing but bus address Or we should refer to linux
> >>>> semantics.
> >>>>
> >>>> We thought of addressing semantics after this series, Not a priority in IMO.
> >>> I think it is a priority to start with semantics.
> >>> The work is too hard with wrong semantic otherwise.
> >> Sorry, I don;t agree with you. Semantic shouldn't lower the iova priority.
> >> iova framework is blocking SoC's. w/o iova framework : One has to live with
> >> hackish solution for their SoC.
> >>
> >> Semantic change in any-case could be pipelined. It shouldn't be like
> >> Semantics change gets priority and therefore it blocks other SoCs.
> > I am not saying it is blocking.
> > I just say that you have not started your work by the beginning,
> > and now it make reviews difficult (from what I understand).
> > You must make all the efforts to make your patches easier to
> > understand and accept.
> 
> It's just about changing name for virt2phy api's.. But changing those function
> names require deprecation notice, Once iova patchset is merged then I'll
> take up responsibility for sending deprecation notice and change those api
> name in the next release.

This series is not going to be integrated in 17.08.
Anyway, you should probably send the deprecation notice now,
in order to change the semantic in 17.11.
Olivier was also talking about physaddr wording in EAL code.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v3 00/11] Infrastructure to detect iova mapping on the bus
  2017-07-10 11:42 ` [PATCH v2 00/12] " Santosh Shukla
                     ` (11 preceding siblings ...)
  2017-07-10 11:42   ` [PATCH v2 12/12] eal/rte_malloc: " Santosh Shukla
@ 2017-07-11  6:16   ` Santosh Shukla
  2017-07-11  6:16     ` [PATCH v3 01/11] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
                       ` (11 more replies)
  12 siblings, 12 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-11  6:16 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

v3:
Removed virt2phy translation for mempool (suggested by Olivier [4]).
Patch series rebased on 'a6e3149d0c0fac39a6fc970bdadfae14f875c9c6'.

v2:
Based on the discussion on the thread [3].
Introducing RTE_PCI_DRV_NEED_IOVA_VA flag for autodetection of iova va mapping. 
If a PCI driver demand for IOVA as VA scheme then the driver can add it in the
PCI driver registration function.

Algorithm to select IOVA as VA for PCI bus case:
    0. Look for device attached to vfio kdrv and has .drv_flag set
    to RTE_PCI_DRV_NEED_IOVA_VA.
    1. Look for any device attached to UIO class of driver.
    2. Check for vfio-noiommu mode enabled.

    If 1) & 2) is false and 0) is true then select
    mapping scheme as iova=va. Otherwise use default
    mapping scheme (iova_pa).

That way, Bus can truly autodetect the iova mapping mode for
a device Or a set of the device.

v1 --> v2:
- Removed override eal option i.e. (--iova-mode=<>) Because we have means to
  truly autodetect the iova mode.
- Introduced RTE_PCI_DRV_NEED_IOVA_VA drv_flag (Suggested by Maxime).
- Using NEED_IOVA_VA drv_flag in autodetection logic.
- Removed Linux version check macro in vfio code, As per Maxime feedback.
- Moved rte_pci_match API from local to global.

v2 --> v3:
- Removed rte_mempool_virt2phy (suggested by Olivier)

Patch Summary:
0) 1st: Introducing a new flag in rte_pci_drv
1) 2nd: declare rte_pci_match api in pci header. Required for autodetection in
follow up patches.
2) 3nd - 4th: autodetection mapping infrastructure for Linux/bsdapp.
3) 5th: Introduces global bus API named rte_bus_get_iommu_class.
4) 6th: iova mode helper API.
5) 7th - 8th: Calls rte_bus_get_iommu_class API for Linux/bsdapp and returns
their iova mode.
6) 9th: Check iova mode and accordingly map vfio.dma_map to _pa or _va.
7) 10th - 11th: Check for IOVA_VA mode in below APIs
        - rte_mem_virt2phy
        - rte_malloc_virt2phy

Test History:
- Tested for x86/XL710 40G NIC card for both modes (iova_va/pa).
- Tested for arm64/thunderx vNIC Integrated NIC for both modes
- Tested for arm64/Octeontx integrated NICs for only
  Iova_va mode(It supports only one mode.)
- Ran standalone tests like mempool_autotest, mbuf_autotest.
- Verified for Doxygen.

Work History:
For v1, Refer [1].
For v2, Refer [2].

Checkpatch result:
- No error/warning noticed.

[1] https://www.mail-archive.com/dev@dpdk.org/msg67438.html
[2] https://www.mail-archive.com/dev@dpdk.org/msg70674.html
[3] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
[4] https://www.mail-archive.com/dev@dpdk.org/msg70692.html

Santosh Shukla (11):
  eal/pci: introduce PCI driver iova as va flag
  eal/pci: export match function
  bsdapp/eal_pci: get iommu class
  linuxapp/eal_pci: get iommu class
  bus: get iommu class
  eal: introduce iova mode helper api
  linuxapp/eal: auto detect iova mode
  bsdapp/eal: auto detect iova mapping mode
  linuxapp/eal_vfio: honor iova mode before mapping
  linuxapp/eal_memory: honor iova mode in virt2phy
  eal/rte_malloc: honor iova mode in virt2phy

 lib/librte_eal/bsdapp/eal/eal.c                 | 22 ++++++---
 lib/librte_eal/bsdapp/eal/eal_pci.c             | 10 ++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  4 ++
 lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++
 lib/librte_eal/common/eal_common_pci.c          | 11 +----
 lib/librte_eal/common/include/rte_bus.h         | 31 ++++++++++++
 lib/librte_eal/common/include/rte_eal.h         | 12 +++++
 lib/librte_eal/common/include/rte_pci.h         | 28 +++++++++++
 lib/librte_eal/common/rte_malloc.c              |  9 +++-
 lib/librte_eal/linuxapp/eal/eal.c               | 22 ++++++---
 lib/librte_eal/linuxapp/eal/eal_memory.c        |  3 ++
 lib/librte_eal/linuxapp/eal/eal_pci.c           | 66 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 29 ++++++++++-
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  4 ++
 15 files changed, 254 insertions(+), 24 deletions(-)

-- 
2.13.0

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v3 01/11] eal/pci: introduce PCI driver iova as va flag
  2017-07-11  6:16   ` [PATCH v3 00/11] Infrastructure to detect iova mapping on the bus Santosh Shukla
@ 2017-07-11  6:16     ` Santosh Shukla
  2017-07-11  9:09       ` Maxime Coquelin
  2017-07-11  6:16     ` [PATCH v3 02/11] eal/pci: export match function Santosh Shukla
                       ` (10 subsequent siblings)
  11 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-07-11  6:16 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Introducing RTE_PCI_DRV_NEED_IOVA_VA flag. Flag used when driver needs
to operate in iova=va mode.

Why driver need iova=va mapping?

On NPU style co-processors like Octeontx, the buffer recycling has been
done in HW, unlike SW model. Here is the data flow:
1) On control path, Fill the HW mempool with buffers(iova as pa address)
2) on rx_burst, HW gives you IOVA address(iova as pa address)
3) As application expects VA to operate on it, rx_burst() needs to
convert to _va from _pa. Which is very expensive.
Instead of that if iova as va mapping, we can avoid the cost of
converting with help of IOMMU/SMMU.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/common/include/rte_pci.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 8b123391c..ac79040dd 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -202,6 +202,8 @@ struct rte_pci_bus {
 #define RTE_PCI_DRV_INTR_RMV 0x0010
 /** Device driver needs to keep mapped resources if unsupported dev detected */
 #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
+/** Device driver needs iova as va */
+#define RTE_PCI_DRV_NEED_IOVA_VA 0X0040
 
 /**
  * A structure describing a PCI mapping.
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v3 02/11] eal/pci: export match function
  2017-07-11  6:16   ` [PATCH v3 00/11] Infrastructure to detect iova mapping on the bus Santosh Shukla
  2017-07-11  6:16     ` [PATCH v3 01/11] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
@ 2017-07-11  6:16     ` Santosh Shukla
  2017-07-11  9:11       ` Maxime Coquelin
  2017-07-11  6:16     ` [PATCH v3 03/11] bsdapp/eal_pci: get iommu class Santosh Shukla
                       ` (9 subsequent siblings)
  11 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-07-11  6:16 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Export rte_pci_match() function as it needed in the followup patch.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/eal_common_pci.c          | 10 +---------
 lib/librte_eal/common/include/rte_pci.h         | 15 +++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 4 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 381f895cd..8d43df0bb 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -200,6 +200,7 @@ DPDK_17.08 {
 	rte_bus_find;
 	rte_bus_find_by_device;
 	rte_bus_find_by_name;
+	rte_pci_match;
 
 } DPDK_17.05;
 
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 76bbcc853..8b6ecebd6 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -128,16 +128,8 @@ pci_unmap_resource(void *requested_addr, size_t size)
 
 /*
  * Match the PCI Driver and Device using the ID Table
- *
- * @param pci_drv
- *	PCI driver from which ID table would be extracted
- * @param pci_dev
- *	PCI device to match against the driver
- * @return
- *	1 for successful match
- *	0 for unsuccessful match
  */
-static int
+int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev)
 {
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index ac79040dd..4a485674e 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -368,6 +368,21 @@ int rte_pci_scan(void);
 int
 rte_pci_probe(void);
 
+/*
+ * Match the PCI Driver and Device using the ID Table
+ *
+ * @param pci_drv
+ *      PCI driver from which ID table would be extracted
+ * @param pci_dev
+ *      PCI device to match against the driver
+ * @return
+ *      1 for successful match
+ *      0 for unsuccessful match
+ */
+int
+rte_pci_match(const struct rte_pci_driver *pci_drv,
+	      const struct rte_pci_device *pci_dev);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 0f9e009b6..c91dd44c4 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -205,6 +205,7 @@ DPDK_17.08 {
 	rte_bus_find;
 	rte_bus_find_by_device;
 	rte_bus_find_by_name;
+	rte_pci_match;
 
 } DPDK_17.05;
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v3 03/11] bsdapp/eal_pci: get iommu class
  2017-07-11  6:16   ` [PATCH v3 00/11] Infrastructure to detect iova mapping on the bus Santosh Shukla
  2017-07-11  6:16     ` [PATCH v3 01/11] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
  2017-07-11  6:16     ` [PATCH v3 02/11] eal/pci: export match function Santosh Shukla
@ 2017-07-11  6:16     ` Santosh Shukla
  2017-07-11  9:15       ` Maxime Coquelin
  2017-07-11  6:16     ` [PATCH v3 04/11] linuxapp/eal_pci: " Santosh Shukla
                       ` (8 subsequent siblings)
  11 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-07-11  6:16 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Introducing rte_pci_get_iommu_class API which helps to get iommu class
of PCI device on the bus and returns preferred iova mapping mode for
PCI bus.

Bsdapp case returns default iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/bsdapp/eal/eal_pci.c           | 10 ++++++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map |  1 +
 lib/librte_eal/common/include/rte_bus.h       |  9 +++++++++
 lib/librte_eal/common/include/rte_pci.h       | 11 +++++++++++
 4 files changed, 31 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c
index e321461d8..40a951e31 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -405,6 +405,16 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Get iommu class of pci devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	/* Supports only RTE_KDRV_NIC_UIO */
+	return RTE_IOVA_PA;
+}
+
 int
 pci_update_device(const struct rte_pci_addr *addr)
 {
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 8d43df0bb..33c2c32c0 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -201,6 +201,7 @@ DPDK_17.08 {
 	rte_bus_find_by_device;
 	rte_bus_find_by_name;
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.05;
 
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index af9f0e13f..7a0cfb165 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -55,6 +55,15 @@ extern "C" {
 /** Double linked list of buses */
 TAILQ_HEAD(rte_bus_list, rte_bus);
 
+
+/**
+ * IOVA mapping mode.
+ */
+enum rte_iova_mode {
+	RTE_IOVA_PA = 1,
+	RTE_IOVA_VA
+};
+
 /**
  * Bus specific scan for devices attached on the bus.
  * For each bus object, the scan would be responsible for finding devices and
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 4a485674e..c58361132 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -383,6 +383,17 @@ int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev);
 
+
+/**
+ * Get iommu class of PCI devices on the bus.
+ * And return their preferred iova mapping mode.
+ *
+ * @return
+ *   - enum rte_iova_mode.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v3 04/11] linuxapp/eal_pci: get iommu class
  2017-07-11  6:16   ` [PATCH v3 00/11] Infrastructure to detect iova mapping on the bus Santosh Shukla
                       ` (2 preceding siblings ...)
  2017-07-11  6:16     ` [PATCH v3 03/11] bsdapp/eal_pci: get iommu class Santosh Shukla
@ 2017-07-11  6:16     ` Santosh Shukla
  2017-07-11  9:23       ` Maxime Coquelin
                         ` (2 more replies)
  2017-07-11  6:16     ` [PATCH v3 05/11] bus: " Santosh Shukla
                       ` (7 subsequent siblings)
  11 siblings, 3 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-11  6:16 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Get iommu class of PCI device on the bus and returns preferred iova
mapping mode for that bus.

Algorithm for iova scheme selection for PCI bus:
0. Look for device attached to vfio kdrv and has .drv_flag set
to RTE_PCI_DRV_NEED_IOVA_VA.
1. Look for any device attached to UIO class of driver.
2. Check for vfio-noiommu mode enabled.

If 1) & 2) is false and 0) is true then select
mapping scheme as iova=va. Otherwise use default
mapping scheme (iova_pa).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
v1 --> v2:
- Removed Linux version check in vfio_noiommu func. Refer [1].
- Extending autodetction logic for _iommu_class.
Refer [2].

[1] https://www.mail-archive.com/dev@dpdk.org/msg70108.html
[2] https://www.mail-archive.com/dev@dpdk.org/msg70279.html

 lib/librte_eal/linuxapp/eal/eal_pci.c           | 66 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 4 files changed, 90 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 7d9e1a99b..573caa000 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -45,6 +45,7 @@
 #include "eal_filesystem.h"
 #include "eal_private.h"
 #include "eal_pci_init.h"
+#include "eal_vfio.h"
 
 /**
  * @file
@@ -488,6 +489,71 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Any one of the device bound to uio
+ */
+static inline int
+pci_device_bound_uio(void)
+{
+	struct rte_pci_device *dev = NULL;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
+		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
+			return 1;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Any one of the device has iova as va
+ */
+static inline int
+pci_device_has_iova_va(void)
+{
+	struct rte_pci_device *dev = NULL;
+	struct rte_pci_driver *drv = NULL;
+
+	FOREACH_DRIVER_ON_PCIBUS(drv) {
+		if (drv && drv->drv_flags & RTE_PCI_DRV_NEED_IOVA_VA) {
+			FOREACH_DEVICE_ON_PCIBUS(dev) {
+				if (dev->kdrv == RTE_KDRV_VFIO &&
+				    rte_pci_match(drv, dev))
+					return 1;
+			}
+		}
+	}
+	return 0;
+}
+
+/*
+ * Get iommu class of PCI devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	bool is_vfio_noiommu_enabled;
+	bool has_iova_va;
+	bool is_bound_uio;
+
+	has_iova_va = pci_device_has_iova_va();
+	is_bound_uio = pci_device_bound_uio();
+	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
+
+	if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
+		return RTE_IOVA_VA;
+
+	if (has_iova_va) {
+		if (is_vfio_noiommu_enabled)
+			RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
+		if (is_bound_uio)
+			RTE_LOG(WARNING, EAL, "Some device attached to UIO\n");
+	}
+
+	return RTE_IOVA_PA;
+}
+
 /* Read PCI config space. */
 int rte_pci_read_config(const struct rte_pci_device *device,
 		void *buf, size_t len, off_t offset)
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 946df7e31..c8a97b7e7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
 	return 0;
 }
 
+int
+vfio_noiommu_is_enabled(void)
+{
+	int fd, ret, cnt __rte_unused;
+	char c;
+
+	ret = -1;
+	fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
+	if (fd < 0)
+		return -1;
+
+	cnt = read(fd, &c, 1);
+	if (c == 'Y')
+		ret = 1;
+
+	close(fd);
+	return ret;
+}
+
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index 5ff63e5d7..26ea8e119 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -150,6 +150,8 @@ struct vfio_config {
 #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
 #define VFIO_GET_REGION_IDX(x) (x >> 40)
+#define VFIO_NOIOMMU_MODE      \
+	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
 
 /* DMA mapping function prototype.
  * Takes VFIO container fd as a parameter.
@@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
 
 int vfio_mp_sync_setup(void);
 
+int vfio_noiommu_is_enabled(void);
+
 #define SOCKET_REQ_CONTAINER 0x100
 #define SOCKET_REQ_GROUP 0x200
 #define SOCKET_CLR_GROUP 0x300
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index c91dd44c4..044f89c7c 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -206,6 +206,7 @@ DPDK_17.08 {
 	rte_bus_find_by_device;
 	rte_bus_find_by_name;
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.05;
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v3 05/11] bus: get iommu class
  2017-07-11  6:16   ` [PATCH v3 00/11] Infrastructure to detect iova mapping on the bus Santosh Shukla
                       ` (3 preceding siblings ...)
  2017-07-11  6:16     ` [PATCH v3 04/11] linuxapp/eal_pci: " Santosh Shukla
@ 2017-07-11  6:16     ` Santosh Shukla
  2017-07-14  8:07       ` Hemant Agrawal
  2017-07-11  6:16     ` [PATCH v3 06/11] eal: introduce iova mode helper api Santosh Shukla
                       ` (6 subsequent siblings)
  11 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-07-11  6:16 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

API(rte_bus_get_iommu_class) helps to automatically detect and select
appropriate iova mapping scheme for iommu capable device on that bus.

Algorithm for iova scheme selection for bus:
0. Iterate through bus_list.
1. Collect each bus iova mode value and update into 'mode' var.
2. Here value '1' is _pa and value '2' is _va mode.
So mode selection scheme is like:
if mode == 2 then iova mode is _va.
if mode == 1 then iova mode is _pa
if mode  == 3 then iova mode ia _pa.

So mode !=2  will be default iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
 lib/librte_eal/common/eal_common_pci.c          |  1 +
 lib/librte_eal/common/include/rte_bus.h         | 22 ++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 48 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 33c2c32c0..a2dd65a33 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -202,6 +202,7 @@ DPDK_17.08 {
 	rte_bus_find_by_name;
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.05;
 
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 08bec2d93..5d5753ac9 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
 		c[0] = '\0';
 	return rte_bus_find(NULL, bus_can_parse, name);
 }
+
+
+/*
+ * Get iommu class of devices on the bus.
+ */
+enum rte_iova_mode
+rte_bus_get_iommu_class(void)
+{
+	int mode = 0;
+	struct rte_bus *bus;
+
+	TAILQ_FOREACH(bus, &rte_bus_list, next) {
+
+		if (bus->get_iommu_class)
+			mode |= bus->get_iommu_class();
+	}
+
+	if (mode != RTE_IOVA_VA) {
+		/* Use default IOVA mode */
+		mode = RTE_IOVA_PA;
+	}
+	return mode;
+}
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 8b6ecebd6..bdf2e7c3a 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -552,6 +552,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.plug = pci_plug,
 		.unplug = pci_unplug,
 		.parse = pci_parse,
+		.get_iommu_class = rte_pci_get_iommu_class,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 7a0cfb165..8b2805b7f 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -181,6 +181,17 @@ struct rte_bus_conf {
 	enum rte_bus_scan_mode scan_mode; /**< Scan policy. */
 };
 
+
+/**
+ * Get iommu class of devices on the bus.
+ * Check that those devices are attached to iommu driver.
+ *
+ * @return
+ *      enum rte_iova_mode value.
+ */
+typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
+
+
 /**
  * A structure describing a generic bus.
  */
@@ -194,6 +205,7 @@ struct rte_bus {
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	struct rte_bus_conf conf;    /**< Bus configuration */
+	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
 
 /**
@@ -293,6 +305,16 @@ struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
  */
 struct rte_bus *rte_bus_find_by_name(const char *busname);
 
+
+/**
+ * Get iommu class of devices on the bus.
+ * Check that those devices are attached to iommu driver.
+ *
+ * @return
+ *     enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_bus_get_iommu_class(void);
+
 /**
  * Helper for Bus registration.
  * The constructor has higher priority than PMD constructors.
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 044f89c7c..186c7b0fd 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -207,6 +207,7 @@ DPDK_17.08 {
 	rte_bus_find_by_name;
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.05;
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v3 06/11] eal: introduce iova mode helper api
  2017-07-11  6:16   ` [PATCH v3 00/11] Infrastructure to detect iova mapping on the bus Santosh Shukla
                       ` (4 preceding siblings ...)
  2017-07-11  6:16     ` [PATCH v3 05/11] bus: " Santosh Shukla
@ 2017-07-11  6:16     ` Santosh Shukla
  2017-07-11  6:16     ` [PATCH v3 07/11] linuxapp/eal: auto detect iova mode Santosh Shukla
                       ` (5 subsequent siblings)
  11 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-11  6:16 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Introducing rte_eal_iova_mode() helper API. This API
used by non-eal library for detecting iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
---
 lib/librte_eal/bsdapp/eal/eal.c                 |  6 ++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/include/rte_eal.h         | 12 ++++++++++++
 lib/librte_eal/linuxapp/eal/eal.c               |  6 ++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 26 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 05f0c1f90..e1aee8c3e 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -120,6 +120,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index a2dd65a33..43cb11d7b 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -203,6 +203,7 @@ DPDK_17.08 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.05;
 
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 6b7c5ca92..849f5f050 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -45,6 +45,7 @@
 
 #include <rte_per_lcore.h>
 #include <rte_config.h>
+#include <rte_bus.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -85,6 +86,9 @@ struct rte_config {
 	/** Primary or secondary configuration */
 	enum rte_proc_type_t process_type;
 
+	/** PA or VA mapping mode */
+	enum rte_iova_mode iova_mode;
+
 	/**
 	 * Pointer to memory configuration, which may be shared across multiple
 	 * DPDK instances
@@ -283,6 +287,14 @@ static inline int rte_gettid(void)
 	return RTE_PER_LCORE(_thread_id);
 }
 
+/**
+ * Get the iova mode
+ *
+ * @return
+ *   enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_eal_iova_mode(void);
+
 #define RTE_INIT(func) \
 static void __attribute__((constructor, used)) func(void)
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 7c78f2dc2..2546b55e4 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -129,6 +129,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 186c7b0fd..0de876c26 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -208,6 +208,7 @@ DPDK_17.08 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.05;
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v3 07/11] linuxapp/eal: auto detect iova mode
  2017-07-11  6:16   ` [PATCH v3 00/11] Infrastructure to detect iova mapping on the bus Santosh Shukla
                       ` (5 preceding siblings ...)
  2017-07-11  6:16     ` [PATCH v3 06/11] eal: introduce iova mode helper api Santosh Shukla
@ 2017-07-11  6:16     ` Santosh Shukla
  2017-07-13 11:29       ` Hemant Agrawal
  2017-07-11  6:16     ` [PATCH v3 08/11] bsdapp/eal: auto detect iova mapping mode Santosh Shukla
                       ` (4 subsequent siblings)
  11 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-07-11  6:16 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

- Moving late bus scanning to up..just after eal_parsing.
- Auto detect iova mapping mode, based on the result of
  rte_bus_scan_iommu_class.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 2546b55e4..7b4dd70de 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -799,6 +799,16 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	if (rte_bus_get_iommu_class() == RTE_IOVA_VA)
+		rte_eal_get_configuration()->iova_mode = RTE_IOVA_VA;
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			internal_config.xen_dom0_support == 0 &&
@@ -896,12 +906,6 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v3 08/11] bsdapp/eal: auto detect iova mapping mode
  2017-07-11  6:16   ` [PATCH v3 00/11] Infrastructure to detect iova mapping on the bus Santosh Shukla
                       ` (6 preceding siblings ...)
  2017-07-11  6:16     ` [PATCH v3 07/11] linuxapp/eal: auto detect iova mode Santosh Shukla
@ 2017-07-11  6:16     ` Santosh Shukla
  2017-07-11  6:16     ` [PATCH v3 09/11] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
                       ` (3 subsequent siblings)
  11 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-11  6:16 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

- Moving late bus scanning to up..just after eal_parsing.
- Mapping mode would be default for bsdapp. It supports
  only one pass through mode (RTE_KDRV_NIC_UIO)

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/bsdapp/eal/eal.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index e1aee8c3e..7c63b2fa7 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -542,6 +542,16 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	if (rte_bus_get_iommu_class() == RTE_IOVA_VA)
+		rte_eal_get_configuration()->iova_mode = RTE_IOVA_VA;
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			eal_hugepage_info_init() < 0) {
@@ -621,12 +631,6 @@ rte_eal_init(int argc, char **argv)
 		rte_config.master_lcore, thread_id, cpuset,
 		ret == 0 ? "" : "...");
 
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v3 09/11] linuxapp/eal_vfio: honor iova mode before mapping
  2017-07-11  6:16   ` [PATCH v3 00/11] Infrastructure to detect iova mapping on the bus Santosh Shukla
                       ` (7 preceding siblings ...)
  2017-07-11  6:16     ` [PATCH v3 08/11] bsdapp/eal: auto detect iova mapping mode Santosh Shukla
@ 2017-07-11  6:16     ` Santosh Shukla
  2017-07-11  6:16     ` [PATCH v3 10/11] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
                       ` (2 subsequent siblings)
  11 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-11  6:16 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Check iova mode and accordingly map iova to pa or va.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c8a97b7e7..b32cd09a2 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
 
 		ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
@@ -792,7 +795,10 @@ vfio_spapr_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
 				 VFIO_DMA_MAP_FLAG_WRITE;
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v3 10/11] linuxapp/eal_memory: honor iova mode in virt2phy
  2017-07-11  6:16   ` [PATCH v3 00/11] Infrastructure to detect iova mapping on the bus Santosh Shukla
                       ` (8 preceding siblings ...)
  2017-07-11  6:16     ` [PATCH v3 09/11] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
@ 2017-07-11  6:16     ` Santosh Shukla
  2017-07-11  6:16     ` [PATCH v3 11/11] eal/rte_malloc: " Santosh Shukla
  2017-07-18  5:59     ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
  11 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-11  6:16 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 040f24a43..7bff815b6 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -141,6 +141,9 @@ rte_mem_virt2phy(const void *virtaddr)
 	int page_size;
 	off_t offset;
 
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		return (uintptr_t)virtaddr;
+
 	/* when using dom0, /proc/self/pagemap always returns 0, check in
 	 * dpdk memory by browsing the memsegs */
 	if (rte_xen_dom0_supported()) {
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v3 11/11] eal/rte_malloc: honor iova mode in virt2phy
  2017-07-11  6:16   ` [PATCH v3 00/11] Infrastructure to detect iova mapping on the bus Santosh Shukla
                       ` (9 preceding siblings ...)
  2017-07-11  6:16     ` [PATCH v3 10/11] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
@ 2017-07-11  6:16     ` Santosh Shukla
  2017-07-18  5:59     ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
  11 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-11  6:16 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/common/rte_malloc.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 5c0627bf4..d65c05a4d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -251,10 +251,17 @@ rte_malloc_set_limit(__rte_unused const char *type,
 phys_addr_t
 rte_malloc_virt2phy(const void *addr)
 {
+	phys_addr_t paddr;
 	const struct malloc_elem *elem = malloc_elem_from_data(addr);
 	if (elem == NULL)
 		return RTE_BAD_PHYS_ADDR;
 	if (elem->ms->phys_addr == RTE_BAD_PHYS_ADDR)
 		return RTE_BAD_PHYS_ADDR;
-	return elem->ms->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		paddr = (uintptr_t)addr;
+	else
+		paddr = elem->ms->phys_addr +
+			((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+	return paddr;
 }
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 01/11] eal/pci: introduce PCI driver iova as va flag
  2017-07-11  6:16     ` [PATCH v3 01/11] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
@ 2017-07-11  9:09       ` Maxime Coquelin
  2017-07-11 10:35         ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Maxime Coquelin @ 2017-07-11  9:09 UTC (permalink / raw)
  To: Santosh Shukla, thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	olivier.matz



On 07/11/2017 08:16 AM, Santosh Shukla wrote:
> Introducing RTE_PCI_DRV_NEED_IOVA_VA flag. Flag used when driver needs
> to operate in iova=va mode.
> 
> Why driver need iova=va mapping?
> 
> On NPU style co-processors like Octeontx, the buffer recycling has been
> done in HW, unlike SW model. Here is the data flow:
> 1) On control path, Fill the HW mempool with buffers(iova as pa address)
> 2) on rx_burst, HW gives you IOVA address(iova as pa address)
> 3) As application expects VA to operate on it, rx_burst() needs to
> convert to _va from _pa. Which is very expensive.
> Instead of that if iova as va mapping, we can avoid the cost of
> converting with help of IOMMU/SMMU.
> 
> Signed-off-by: Santosh Shukla<santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob<jerin.jacob@caviumnetworks.com>
> ---
>   lib/librte_eal/common/include/rte_pci.h | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
> index 8b123391c..ac79040dd 100644
> --- a/lib/librte_eal/common/include/rte_pci.h
> +++ b/lib/librte_eal/common/include/rte_pci.h
> @@ -202,6 +202,8 @@ struct rte_pci_bus {
>   #define RTE_PCI_DRV_INTR_RMV 0x0010
>   /** Device driver needs to keep mapped resources if unsupported dev detected */
>   #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
> +/** Device driver needs iova as va */
> +#define RTE_PCI_DRV_NEED_IOVA_VA 0X0040
>   

Maybe not a big deal, but using NEED tends to say that the driver cannot
work if not using VA as IOVA. If my understanding is correct, this is
not the case, the performance will be poor but the device will be
functional.

Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 02/11] eal/pci: export match function
  2017-07-11  6:16     ` [PATCH v3 02/11] eal/pci: export match function Santosh Shukla
@ 2017-07-11  9:11       ` Maxime Coquelin
  2017-07-11  9:12         ` Maxime Coquelin
  0 siblings, 1 reply; 248+ messages in thread
From: Maxime Coquelin @ 2017-07-11  9:11 UTC (permalink / raw)
  To: Santosh Shukla, thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	olivier.matz



On 07/11/2017 08:16 AM, Santosh Shukla wrote:
> Export rte_pci_match() function as it needed in the followup patch.
> 
> Signed-off-by: Santosh Shukla<santosh.shukla@caviumnetworks.com>
> ---
>   lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
>   lib/librte_eal/common/eal_common_pci.c          | 10 +---------
>   lib/librte_eal/common/include/rte_pci.h         | 15 +++++++++++++++
>   lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>   4 files changed, 18 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> index 381f895cd..8d43df0bb 100644
> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> @@ -200,6 +200,7 @@ DPDK_17.08 {
>   	rte_bus_find;
>   	rte_bus_find_by_device;
>   	rte_bus_find_by_name;
> +	rte_pci_match;
>   
>   } DPDK_17.05;
>   

Shouldn't be DPDK_17.08?

Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 02/11] eal/pci: export match function
  2017-07-11  9:11       ` Maxime Coquelin
@ 2017-07-11  9:12         ` Maxime Coquelin
  0 siblings, 0 replies; 248+ messages in thread
From: Maxime Coquelin @ 2017-07-11  9:12 UTC (permalink / raw)
  To: Santosh Shukla, thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	olivier.matz



On 07/11/2017 11:11 AM, Maxime Coquelin wrote:
> 
> 
> On 07/11/2017 08:16 AM, Santosh Shukla wrote:
>> Export rte_pci_match() function as it needed in the followup patch.
>>
>> Signed-off-by: Santosh Shukla<santosh.shukla@caviumnetworks.com>
>> ---
>>   lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
>>   lib/librte_eal/common/eal_common_pci.c          | 10 +---------
>>   lib/librte_eal/common/include/rte_pci.h         | 15 +++++++++++++++
>>   lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>   4 files changed, 18 insertions(+), 9 deletions(-)
>>
>> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
>> b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>> index 381f895cd..8d43df0bb 100644
>> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>> @@ -200,6 +200,7 @@ DPDK_17.08 {
>>       rte_bus_find;
>>       rte_bus_find_by_device;
>>       rte_bus_find_by_name;
>> +    rte_pci_match;
>>   } DPDK_17.05;
> 
> Shouldn't be DPDK_17.08?

Nevermind, I misread. It looks good to me.

Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>

> Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 03/11] bsdapp/eal_pci: get iommu class
  2017-07-11  6:16     ` [PATCH v3 03/11] bsdapp/eal_pci: get iommu class Santosh Shukla
@ 2017-07-11  9:15       ` Maxime Coquelin
  2017-07-11 10:41         ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Maxime Coquelin @ 2017-07-11  9:15 UTC (permalink / raw)
  To: Santosh Shukla, thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	olivier.matz



On 07/11/2017 08:16 AM, Santosh Shukla wrote:
>   
> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
> index af9f0e13f..7a0cfb165 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -55,6 +55,15 @@ extern "C" {
>   /** Double linked list of buses */
>   TAILQ_HEAD(rte_bus_list, rte_bus);
>   
> +
> +/**
> + * IOVA mapping mode.
> + */
> +enum rte_iova_mode {
> +	RTE_IOVA_PA = 1,
> +	RTE_IOVA_VA
> +};
> +
>   /**
>    * Bus specific scan for devices attached on the bus.
>    * For each bus object, the scan would be responsible for finding devices and
> diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
> index 4a485674e..c58361132 100644
> --- a/lib/librte_eal/common/include/rte_pci.h
> +++ b/lib/librte_eal/common/include/rte_pci.h
> @@ -383,6 +383,17 @@ int
>   rte_pci_match(const struct rte_pci_driver *pci_drv,
>   	      const struct rte_pci_device *pci_dev);
>   
> +
> +/**
> + * Get iommu class of PCI devices on the bus.
> + * And return their preferred iova mapping mode.
> + *
> + * @return
> + *   - enum rte_iova_mode.
> + */
> +enum rte_iova_mode
> +rte_pci_get_iommu_class(void);
> +
>   /**
>    * Map the PCI device resources in user space virtual memory address
>    *

I would have put this in a separate patch, as not bsd specifics.

Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 04/11] linuxapp/eal_pci: get iommu class
  2017-07-11  6:16     ` [PATCH v3 04/11] linuxapp/eal_pci: " Santosh Shukla
@ 2017-07-11  9:23       ` Maxime Coquelin
  2017-07-11 10:43         ` santosh
  2017-07-12  8:20       ` Sergio Gonzalez Monroy
  2017-07-14  7:39       ` Hemant Agrawal
  2 siblings, 1 reply; 248+ messages in thread
From: Maxime Coquelin @ 2017-07-11  9:23 UTC (permalink / raw)
  To: Santosh Shukla, thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	olivier.matz



On 07/11/2017 08:16 AM, Santosh Shukla wrote:
> Get iommu class of PCI device on the bus and returns preferred iova
> mapping mode for that bus.
> 
> Algorithm for iova scheme selection for PCI bus:
> 0. Look for device attached to vfio kdrv and has .drv_flag set
> to RTE_PCI_DRV_NEED_IOVA_VA.
> 1. Look for any device attached to UIO class of driver.
> 2. Check for vfio-noiommu mode enabled.
> 
> If 1) & 2) is false and 0) is true then select
> mapping scheme as iova=va. Otherwise use default
> mapping scheme (iova_pa).
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
> v1 --> v2:
> - Removed Linux version check in vfio_noiommu func. Refer [1].
> - Extending autodetction logic for _iommu_class.
> Refer [2].
> 
> [1] https://www.mail-archive.com/dev@dpdk.org/msg70108.html
> [2] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
> 
>   lib/librte_eal/linuxapp/eal/eal_pci.c           | 66 +++++++++++++++++++++++++
>   lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++++
>   lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>   lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>   4 files changed, 90 insertions(+)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
> index 7d9e1a99b..573caa000 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> @@ -45,6 +45,7 @@
>   #include "eal_filesystem.h"
>   #include "eal_private.h"
>   #include "eal_pci_init.h"
> +#include "eal_vfio.h"
>   
>   /**
>    * @file
> @@ -488,6 +489,71 @@ rte_pci_scan(void)
>   	return -1;
>   }
>   
> +/*
> + * Any one of the device bound to uio
> + */
> +static inline int
> +pci_device_bound_uio(void)
> +{
> +	struct rte_pci_device *dev = NULL;
> +
> +	FOREACH_DEVICE_ON_PCIBUS(dev) {
> +		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
> +		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
> +			return 1;
> +		}
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Any one of the device has iova as va
> + */
> +static inline int
> +pci_device_has_iova_va(void)
> +{
> +	struct rte_pci_device *dev = NULL;
> +	struct rte_pci_driver *drv = NULL;
> +
> +	FOREACH_DRIVER_ON_PCIBUS(drv) {
> +		if (drv && drv->drv_flags & RTE_PCI_DRV_NEED_IOVA_VA) {
> +			FOREACH_DEVICE_ON_PCIBUS(dev) {
> +				if (dev->kdrv == RTE_KDRV_VFIO &&
> +				    rte_pci_match(drv, dev))
> +					return 1;
> +			}
> +		}
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Get iommu class of PCI devices on the bus.
> + */
> +enum rte_iova_mode
> +rte_pci_get_iommu_class(void)
> +{
> +	bool is_vfio_noiommu_enabled;
> +	bool has_iova_va;
> +	bool is_bound_uio;
> +
> +	has_iova_va = pci_device_has_iova_va();
> +	is_bound_uio = pci_device_bound_uio();
> +	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
> +
> +	if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
> +		return RTE_IOVA_VA;
> +
> +	if (has_iova_va) {
> +		if (is_vfio_noiommu_enabled)
> +			RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
> +		if (is_bound_uio)
> +			RTE_LOG(WARNING, EAL, "Some device attached to UIO\n");

Maybe worth having more verbose warning for user not familiar with the
feature. Like stating that some devices want VA but PA will be used
because of vfio-noiommu or UIO.

Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 01/11] eal/pci: introduce PCI driver iova as va flag
  2017-07-11  9:09       ` Maxime Coquelin
@ 2017-07-11 10:35         ` santosh
  2017-07-11 12:07           ` Maxime Coquelin
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-11 10:35 UTC (permalink / raw)
  To: Maxime Coquelin, thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	olivier.matz

Hi Maxime,

On Tuesday 11 July 2017 02:39 PM, Maxime Coquelin wrote:

>
>
> On 07/11/2017 08:16 AM, Santosh Shukla wrote:
>> Introducing RTE_PCI_DRV_NEED_IOVA_VA flag. Flag used when driver needs
>> to operate in iova=va mode.
>>
>> Why driver need iova=va mapping?
>>
>> On NPU style co-processors like Octeontx, the buffer recycling has been
>> done in HW, unlike SW model. Here is the data flow:
>> 1) On control path, Fill the HW mempool with buffers(iova as pa address)
>> 2) on rx_burst, HW gives you IOVA address(iova as pa address)
>> 3) As application expects VA to operate on it, rx_burst() needs to
>> convert to _va from _pa. Which is very expensive.
>> Instead of that if iova as va mapping, we can avoid the cost of
>> converting with help of IOMMU/SMMU.
>>
>> Signed-off-by: Santosh Shukla<santosh.shukla@caviumnetworks.com>
>> Signed-off-by: Jerin Jacob<jerin.jacob@caviumnetworks.com>
>> ---
>>   lib/librte_eal/common/include/rte_pci.h | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
>> index 8b123391c..ac79040dd 100644
>> --- a/lib/librte_eal/common/include/rte_pci.h
>> +++ b/lib/librte_eal/common/include/rte_pci.h
>> @@ -202,6 +202,8 @@ struct rte_pci_bus {
>>   #define RTE_PCI_DRV_INTR_RMV 0x0010
>>   /** Device driver needs to keep mapped resources if unsupported dev detected */
>>   #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
>> +/** Device driver needs iova as va */
>> +#define RTE_PCI_DRV_NEED_IOVA_VA 0X0040
>>   
>
> Maybe not a big deal, but using NEED tends to say that the driver cannot
> work if not using VA as IOVA. If my understanding is correct, this is
> not the case, the performance will be poor but the device will be
> functional.
>
Agree, How about renaming to RTE_PCI_DRV_IOVA_AS_VA, make sense?

Thanks.

> Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 03/11] bsdapp/eal_pci: get iommu class
  2017-07-11  9:15       ` Maxime Coquelin
@ 2017-07-11 10:41         ` santosh
  2017-07-11 12:09           ` Maxime Coquelin
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-11 10:41 UTC (permalink / raw)
  To: Maxime Coquelin, thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	olivier.matz

On Tuesday 11 July 2017 02:45 PM, Maxime Coquelin wrote:

>
> On 07/11/2017 08:16 AM, Santosh Shukla wrote:
>>   diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
>> index af9f0e13f..7a0cfb165 100644
>> --- a/lib/librte_eal/common/include/rte_bus.h
>> +++ b/lib/librte_eal/common/include/rte_bus.h
>> @@ -55,6 +55,15 @@ extern "C" {
>>   /** Double linked list of buses */
>>   TAILQ_HEAD(rte_bus_list, rte_bus);
>>   +
>> +/**
>> + * IOVA mapping mode.
>> + */
>> +enum rte_iova_mode {
>> +    RTE_IOVA_PA = 1,
>> +    RTE_IOVA_VA
>> +};
>> +
>>   /**
>>    * Bus specific scan for devices attached on the bus.
>>    * For each bus object, the scan would be responsible for finding devices and
>> diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
>> index 4a485674e..c58361132 100644
>> --- a/lib/librte_eal/common/include/rte_pci.h
>> +++ b/lib/librte_eal/common/include/rte_pci.h
>> @@ -383,6 +383,17 @@ int
>>   rte_pci_match(const struct rte_pci_driver *pci_drv,
>>             const struct rte_pci_device *pci_dev);
>>   +
>> +/**
>> + * Get iommu class of PCI devices on the bus.
>> + * And return their preferred iova mapping mode.
>> + *
>> + * @return
>> + *   - enum rte_iova_mode.
>> + */
>> +enum rte_iova_mode
>> +rte_pci_get_iommu_class(void);
>> +
>>   /**
>>    * Map the PCI device resources in user space virtual memory address
>>    *
>
> I would have put this in a separate patch, as not bsd specifics.
>
I'll pull that out in v4, and perhaps squash into [01/11], as both changes (RTE_PCI_DRV_ and this one) 
are on same rte_pci.h file. Is it Ok with you? or you prefer separate patch for both
(RTE_PCI_DRV_ and this one)?

> Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 04/11] linuxapp/eal_pci: get iommu class
  2017-07-11  9:23       ` Maxime Coquelin
@ 2017-07-11 10:43         ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-07-11 10:43 UTC (permalink / raw)
  To: Maxime Coquelin, thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	olivier.matz

On Tuesday 11 July 2017 02:53 PM, Maxime Coquelin wrote:

>
> On 07/11/2017 08:16 AM, Santosh Shukla wrote:
>> Get iommu class of PCI device on the bus and returns preferred iova
>> mapping mode for that bus.
>>
>> Algorithm for iova scheme selection for PCI bus:
>> 0. Look for device attached to vfio kdrv and has .drv_flag set
>> to RTE_PCI_DRV_NEED_IOVA_VA.
>> 1. Look for any device attached to UIO class of driver.
>> 2. Check for vfio-noiommu mode enabled.
>>
>> If 1) & 2) is false and 0) is true then select
>> mapping scheme as iova=va. Otherwise use default
>> mapping scheme (iova_pa).
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> ---
>> v1 --> v2:
>> - Removed Linux version check in vfio_noiommu func. Refer [1].
>> - Extending autodetction logic for _iommu_class.
>> Refer [2].
>>
>> [1] https://www.mail-archive.com/dev@dpdk.org/msg70108.html
>> [2] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
>>
>>   lib/librte_eal/linuxapp/eal/eal_pci.c           | 66 +++++++++++++++++++++++++
>>   lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++++
>>   lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>>   lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>   4 files changed, 90 insertions(+)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> index 7d9e1a99b..573caa000 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> @@ -45,6 +45,7 @@
>>   #include "eal_filesystem.h"
>>   #include "eal_private.h"
>>   #include "eal_pci_init.h"
>> +#include "eal_vfio.h"
>>     /**
>>    * @file
>> @@ -488,6 +489,71 @@ rte_pci_scan(void)
>>       return -1;
>>   }
>>   +/*
>> + * Any one of the device bound to uio
>> + */
>> +static inline int
>> +pci_device_bound_uio(void)
>> +{
>> +    struct rte_pci_device *dev = NULL;
>> +
>> +    FOREACH_DEVICE_ON_PCIBUS(dev) {
>> +        if (dev->kdrv == RTE_KDRV_IGB_UIO ||
>> +           dev->kdrv == RTE_KDRV_UIO_GENERIC) {
>> +            return 1;
>> +        }
>> +    }
>> +    return 0;
>> +}
>> +
>> +/*
>> + * Any one of the device has iova as va
>> + */
>> +static inline int
>> +pci_device_has_iova_va(void)
>> +{
>> +    struct rte_pci_device *dev = NULL;
>> +    struct rte_pci_driver *drv = NULL;
>> +
>> +    FOREACH_DRIVER_ON_PCIBUS(drv) {
>> +        if (drv && drv->drv_flags & RTE_PCI_DRV_NEED_IOVA_VA) {
>> +            FOREACH_DEVICE_ON_PCIBUS(dev) {
>> +                if (dev->kdrv == RTE_KDRV_VFIO &&
>> +                    rte_pci_match(drv, dev))
>> +                    return 1;
>> +            }
>> +        }
>> +    }
>> +    return 0;
>> +}
>> +
>> +/*
>> + * Get iommu class of PCI devices on the bus.
>> + */
>> +enum rte_iova_mode
>> +rte_pci_get_iommu_class(void)
>> +{
>> +    bool is_vfio_noiommu_enabled;
>> +    bool has_iova_va;
>> +    bool is_bound_uio;
>> +
>> +    has_iova_va = pci_device_has_iova_va();
>> +    is_bound_uio = pci_device_bound_uio();
>> +    is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
>> +
>> +    if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
>> +        return RTE_IOVA_VA;
>> +
>> +    if (has_iova_va) {
>> +        if (is_vfio_noiommu_enabled)
>> +            RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
>> +        if (is_bound_uio)
>> +            RTE_LOG(WARNING, EAL, "Some device attached to UIO\n");
>
> Maybe worth having more verbose warning for user not familiar with the
> feature. Like stating that some devices want VA but PA will be used
> because of vfio-noiommu or UIO.
>
Yes. in v4.
Thanks.

> Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 01/11] eal/pci: introduce PCI driver iova as va flag
  2017-07-11 10:35         ` santosh
@ 2017-07-11 12:07           ` Maxime Coquelin
  0 siblings, 0 replies; 248+ messages in thread
From: Maxime Coquelin @ 2017-07-11 12:07 UTC (permalink / raw)
  To: santosh, thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	olivier.matz



On 07/11/2017 12:35 PM, santosh wrote:
> Hi Maxime,
> 
> On Tuesday 11 July 2017 02:39 PM, Maxime Coquelin wrote:
> 
>>
>>
>> On 07/11/2017 08:16 AM, Santosh Shukla wrote:
>>> Introducing RTE_PCI_DRV_NEED_IOVA_VA flag. Flag used when driver needs
>>> to operate in iova=va mode.
>>>
>>> Why driver need iova=va mapping?
>>>
>>> On NPU style co-processors like Octeontx, the buffer recycling has been
>>> done in HW, unlike SW model. Here is the data flow:
>>> 1) On control path, Fill the HW mempool with buffers(iova as pa address)
>>> 2) on rx_burst, HW gives you IOVA address(iova as pa address)
>>> 3) As application expects VA to operate on it, rx_burst() needs to
>>> convert to _va from _pa. Which is very expensive.
>>> Instead of that if iova as va mapping, we can avoid the cost of
>>> converting with help of IOMMU/SMMU.
>>>
>>> Signed-off-by: Santosh Shukla<santosh.shukla@caviumnetworks.com>
>>> Signed-off-by: Jerin Jacob<jerin.jacob@caviumnetworks.com>
>>> ---
>>>    lib/librte_eal/common/include/rte_pci.h | 2 ++
>>>    1 file changed, 2 insertions(+)
>>>
>>> diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
>>> index 8b123391c..ac79040dd 100644
>>> --- a/lib/librte_eal/common/include/rte_pci.h
>>> +++ b/lib/librte_eal/common/include/rte_pci.h
>>> @@ -202,6 +202,8 @@ struct rte_pci_bus {
>>>    #define RTE_PCI_DRV_INTR_RMV 0x0010
>>>    /** Device driver needs to keep mapped resources if unsupported dev detected */
>>>    #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
>>> +/** Device driver needs iova as va */
>>> +#define RTE_PCI_DRV_NEED_IOVA_VA 0X0040
>>>    
>>
>> Maybe not a big deal, but using NEED tends to say that the driver cannot
>> work if not using VA as IOVA. If my understanding is correct, this is
>> not the case, the performance will be poor but the device will be
>> functional.
>>
> Agree, How about renaming to RTE_PCI_DRV_IOVA_AS_VA, make sense?
Yes, and if one day we have some hw only supporting VA, then we could
introduce RTE_PCI_DRV_IOVA_AS_VA_ONLY.

Thanks
> 
> Thanks.
> 
>> Maxime
> 

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 03/11] bsdapp/eal_pci: get iommu class
  2017-07-11 10:41         ` santosh
@ 2017-07-11 12:09           ` Maxime Coquelin
  0 siblings, 0 replies; 248+ messages in thread
From: Maxime Coquelin @ 2017-07-11 12:09 UTC (permalink / raw)
  To: santosh, thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	olivier.matz



On 07/11/2017 12:41 PM, santosh wrote:
> On Tuesday 11 July 2017 02:45 PM, Maxime Coquelin wrote:
> 
>>
>> On 07/11/2017 08:16 AM, Santosh Shukla wrote:
>>>    diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
>>> index af9f0e13f..7a0cfb165 100644
>>> --- a/lib/librte_eal/common/include/rte_bus.h
>>> +++ b/lib/librte_eal/common/include/rte_bus.h
>>> @@ -55,6 +55,15 @@ extern "C" {
>>>    /** Double linked list of buses */
>>>    TAILQ_HEAD(rte_bus_list, rte_bus);
>>>    +
>>> +/**
>>> + * IOVA mapping mode.
>>> + */
>>> +enum rte_iova_mode {
>>> +    RTE_IOVA_PA = 1,
>>> +    RTE_IOVA_VA
>>> +};
>>> +
>>>    /**
>>>     * Bus specific scan for devices attached on the bus.
>>>     * For each bus object, the scan would be responsible for finding devices and
>>> diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
>>> index 4a485674e..c58361132 100644
>>> --- a/lib/librte_eal/common/include/rte_pci.h
>>> +++ b/lib/librte_eal/common/include/rte_pci.h
>>> @@ -383,6 +383,17 @@ int
>>>    rte_pci_match(const struct rte_pci_driver *pci_drv,
>>>              const struct rte_pci_device *pci_dev);
>>>    +
>>> +/**
>>> + * Get iommu class of PCI devices on the bus.
>>> + * And return their preferred iova mapping mode.
>>> + *
>>> + * @return
>>> + *   - enum rte_iova_mode.
>>> + */
>>> +enum rte_iova_mode
>>> +rte_pci_get_iommu_class(void);
>>> +
>>>    /**
>>>     * Map the PCI device resources in user space virtual memory address
>>>     *
>>
>> I would have put this in a separate patch, as not bsd specifics.
>>
> I'll pull that out in v4, and perhaps squash into [01/11], as both changes (RTE_PCI_DRV_ and this one)
> are on same rte_pci.h file. Is it Ok with you? or you prefer separate patch for both
> (RTE_PCI_DRV_ and this one)?

I'm fine you squash this part in patch 1.

Thanks,
Maxime

>> Maxime
> 

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 04/11] linuxapp/eal_pci: get iommu class
  2017-07-11  6:16     ` [PATCH v3 04/11] linuxapp/eal_pci: " Santosh Shukla
  2017-07-11  9:23       ` Maxime Coquelin
@ 2017-07-12  8:20       ` Sergio Gonzalez Monroy
  2017-07-13  8:23         ` santosh
  2017-07-14  7:39       ` Hemant Agrawal
  2 siblings, 1 reply; 248+ messages in thread
From: Sergio Gonzalez Monroy @ 2017-07-12  8:20 UTC (permalink / raw)
  To: Santosh Shukla, thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, anatoly.burakov, stephen, maxime.coquelin,
	olivier.matz

On 11/07/2017 07:16, Santosh Shukla wrote:
> Get iommu class of PCI device on the bus and returns preferred iova
> mapping mode for that bus.
>
> Algorithm for iova scheme selection for PCI bus:
> 0. Look for device attached to vfio kdrv and has .drv_flag set
> to RTE_PCI_DRV_NEED_IOVA_VA.
> 1. Look for any device attached to UIO class of driver.
> 2. Check for vfio-noiommu mode enabled.
>
> If 1) & 2) is false and 0) is true then select
> mapping scheme as iova=va. Otherwise use default
> mapping scheme (iova_pa).
>
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
> v1 --> v2:
> - Removed Linux version check in vfio_noiommu func. Refer [1].
> - Extending autodetction logic for _iommu_class.
> Refer [2].
>
> [1] https://www.mail-archive.com/dev@dpdk.org/msg70108.html
> [2] https://www.mail-archive.com/dev@dpdk.org/msg70279.html

Just wondering how it all works with device hotplug.
Correct me if I am wrong but if EAL decides to use IOVA_AS_VA scheme,
if we were to attach a device that needs IOVA_AS_PA, it will not work 
and should fail to attach, right?

Thanks,
Sergio

>   lib/librte_eal/linuxapp/eal/eal_pci.c           | 66 +++++++++++++++++++++++++
>   lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++++
>   lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>   lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>   4 files changed, 90 insertions(+)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
> index 7d9e1a99b..573caa000 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> @@ -45,6 +45,7 @@
>   #include "eal_filesystem.h"
>   #include "eal_private.h"
>   #include "eal_pci_init.h"
> +#include "eal_vfio.h"
>   
>   /**
>    * @file
> @@ -488,6 +489,71 @@ rte_pci_scan(void)
>   	return -1;
>   }
>   
> +/*
> + * Any one of the device bound to uio
> + */
> +static inline int
> +pci_device_bound_uio(void)
> +{
> +	struct rte_pci_device *dev = NULL;
> +
> +	FOREACH_DEVICE_ON_PCIBUS(dev) {
> +		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
> +		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
> +			return 1;
> +		}
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Any one of the device has iova as va
> + */
> +static inline int
> +pci_device_has_iova_va(void)
> +{
> +	struct rte_pci_device *dev = NULL;
> +	struct rte_pci_driver *drv = NULL;
> +
> +	FOREACH_DRIVER_ON_PCIBUS(drv) {
> +		if (drv && drv->drv_flags & RTE_PCI_DRV_NEED_IOVA_VA) {
> +			FOREACH_DEVICE_ON_PCIBUS(dev) {
> +				if (dev->kdrv == RTE_KDRV_VFIO &&
> +				    rte_pci_match(drv, dev))
> +					return 1;
> +			}
> +		}
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Get iommu class of PCI devices on the bus.
> + */
> +enum rte_iova_mode
> +rte_pci_get_iommu_class(void)
> +{
> +	bool is_vfio_noiommu_enabled;
> +	bool has_iova_va;
> +	bool is_bound_uio;
> +
> +	has_iova_va = pci_device_has_iova_va();
> +	is_bound_uio = pci_device_bound_uio();
> +	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
> +
> +	if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
> +		return RTE_IOVA_VA;
> +
> +	if (has_iova_va) {
> +		if (is_vfio_noiommu_enabled)
> +			RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
> +		if (is_bound_uio)
> +			RTE_LOG(WARNING, EAL, "Some device attached to UIO\n");
> +	}
> +
> +	return RTE_IOVA_PA;
> +}
> +
>   /* Read PCI config space. */
>   int rte_pci_read_config(const struct rte_pci_device *device,
>   		void *buf, size_t len, off_t offset)
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> index 946df7e31..c8a97b7e7 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
>   	return 0;
>   }
>   
> +int
> +vfio_noiommu_is_enabled(void)
> +{
> +	int fd, ret, cnt __rte_unused;
> +	char c;
> +
> +	ret = -1;
> +	fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
> +	if (fd < 0)
> +		return -1;
> +
> +	cnt = read(fd, &c, 1);
> +	if (c == 'Y')
> +		ret = 1;
> +
> +	close(fd);
> +	return ret;
> +}
> +
>   #endif
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
> index 5ff63e5d7..26ea8e119 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
> @@ -150,6 +150,8 @@ struct vfio_config {
>   #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
>   #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
>   #define VFIO_GET_REGION_IDX(x) (x >> 40)
> +#define VFIO_NOIOMMU_MODE      \
> +	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
>   
>   /* DMA mapping function prototype.
>    * Takes VFIO container fd as a parameter.
> @@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
>   
>   int vfio_mp_sync_setup(void);
>   
> +int vfio_noiommu_is_enabled(void);
> +
>   #define SOCKET_REQ_CONTAINER 0x100
>   #define SOCKET_REQ_GROUP 0x200
>   #define SOCKET_CLR_GROUP 0x300
> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> index c91dd44c4..044f89c7c 100644
> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> @@ -206,6 +206,7 @@ DPDK_17.08 {
>   	rte_bus_find_by_device;
>   	rte_bus_find_by_name;
>   	rte_pci_match;
> +	rte_pci_get_iommu_class;
>   
>   } DPDK_17.05;
>   

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 04/11] linuxapp/eal_pci: get iommu class
  2017-07-12  8:20       ` Sergio Gonzalez Monroy
@ 2017-07-13  8:23         ` santosh
  2017-07-14  7:43           ` Sergio Gonzalez Monroy
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-13  8:23 UTC (permalink / raw)
  To: Sergio Gonzalez Monroy, thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, anatoly.burakov, stephen, maxime.coquelin,
	olivier.matz

Hi Sergio,

On Wednesday 12 July 2017 01:50 PM, Sergio Gonzalez Monroy wrote:

> On 11/07/2017 07:16, Santosh Shukla wrote:
>> Get iommu class of PCI device on the bus and returns preferred iova
>> mapping mode for that bus.
>>
>> Algorithm for iova scheme selection for PCI bus:
>> 0. Look for device attached to vfio kdrv and has .drv_flag set
>> to RTE_PCI_DRV_NEED_IOVA_VA.
>> 1. Look for any device attached to UIO class of driver.
>> 2. Check for vfio-noiommu mode enabled.
>>
>> If 1) & 2) is false and 0) is true then select
>> mapping scheme as iova=va. Otherwise use default
>> mapping scheme (iova_pa).
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> ---
>> v1 --> v2:
>> - Removed Linux version check in vfio_noiommu func. Refer [1].
>> - Extending autodetction logic for _iommu_class.
>> Refer [2].
>>
>> [1] https://www.mail-archive.com/dev@dpdk.org/msg70108.html
>> [2] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
>
> Just wondering how it all works with device hotplug.
> Correct me if I am wrong but if EAL decides to use IOVA_AS_VA scheme,
> if we were to attach a device that needs IOVA_AS_PA, it will not work and should fail to attach, right?
>
It will work for igb_uio case, and won't work for vfio-noiommu hotplug case(Invalid case).

Yes, we can dictate iova awareness to hotplug/unplug area.

> Thanks,
> Sergio
>
>>   lib/librte_eal/linuxapp/eal/eal_pci.c           | 66 +++++++++++++++++++++++++
>>   lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++++
>>   lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>>   lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>   4 files changed, 90 insertions(+)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> index 7d9e1a99b..573caa000 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> @@ -45,6 +45,7 @@
>>   #include "eal_filesystem.h"
>>   #include "eal_private.h"
>>   #include "eal_pci_init.h"
>> +#include "eal_vfio.h"
>>     /**
>>    * @file
>> @@ -488,6 +489,71 @@ rte_pci_scan(void)
>>       return -1;
>>   }
>>   +/*
>> + * Any one of the device bound to uio
>> + */
>> +static inline int
>> +pci_device_bound_uio(void)
>> +{
>> +    struct rte_pci_device *dev = NULL;
>> +
>> +    FOREACH_DEVICE_ON_PCIBUS(dev) {
>> +        if (dev->kdrv == RTE_KDRV_IGB_UIO ||
>> +           dev->kdrv == RTE_KDRV_UIO_GENERIC) {
>> +            return 1;
>> +        }
>> +    }
>> +    return 0;
>> +}
>> +
>> +/*
>> + * Any one of the device has iova as va
>> + */
>> +static inline int
>> +pci_device_has_iova_va(void)
>> +{
>> +    struct rte_pci_device *dev = NULL;
>> +    struct rte_pci_driver *drv = NULL;
>> +
>> +    FOREACH_DRIVER_ON_PCIBUS(drv) {
>> +        if (drv && drv->drv_flags & RTE_PCI_DRV_NEED_IOVA_VA) {
>> +            FOREACH_DEVICE_ON_PCIBUS(dev) {
>> +                if (dev->kdrv == RTE_KDRV_VFIO &&
>> +                    rte_pci_match(drv, dev))
>> +                    return 1;
>> +            }
>> +        }
>> +    }
>> +    return 0;
>> +}
>> +
>> +/*
>> + * Get iommu class of PCI devices on the bus.
>> + */
>> +enum rte_iova_mode
>> +rte_pci_get_iommu_class(void)
>> +{
>> +    bool is_vfio_noiommu_enabled;
>> +    bool has_iova_va;
>> +    bool is_bound_uio;
>> +
>> +    has_iova_va = pci_device_has_iova_va();
>> +    is_bound_uio = pci_device_bound_uio();
>> +    is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
>> +
>> +    if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
>> +        return RTE_IOVA_VA;
>> +
>> +    if (has_iova_va) {
>> +        if (is_vfio_noiommu_enabled)
>> +            RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
>> +        if (is_bound_uio)
>> +            RTE_LOG(WARNING, EAL, "Some device attached to UIO\n");
>> +    }
>> +
>> +    return RTE_IOVA_PA;
>> +}
>> +
>>   /* Read PCI config space. */
>>   int rte_pci_read_config(const struct rte_pci_device *device,
>>           void *buf, size_t len, off_t offset)
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> index 946df7e31..c8a97b7e7 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> @@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
>>       return 0;
>>   }
>>   +int
>> +vfio_noiommu_is_enabled(void)
>> +{
>> +    int fd, ret, cnt __rte_unused;
>> +    char c;
>> +
>> +    ret = -1;
>> +    fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
>> +    if (fd < 0)
>> +        return -1;
>> +
>> +    cnt = read(fd, &c, 1);
>> +    if (c == 'Y')
>> +        ret = 1;
>> +
>> +    close(fd);
>> +    return ret;
>> +}
>> +
>>   #endif
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
>> index 5ff63e5d7..26ea8e119 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
>> @@ -150,6 +150,8 @@ struct vfio_config {
>>   #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
>>   #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
>>   #define VFIO_GET_REGION_IDX(x) (x >> 40)
>> +#define VFIO_NOIOMMU_MODE      \
>> +    "/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
>>     /* DMA mapping function prototype.
>>    * Takes VFIO container fd as a parameter.
>> @@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
>>     int vfio_mp_sync_setup(void);
>>   +int vfio_noiommu_is_enabled(void);
>> +
>>   #define SOCKET_REQ_CONTAINER 0x100
>>   #define SOCKET_REQ_GROUP 0x200
>>   #define SOCKET_CLR_GROUP 0x300
>> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>> index c91dd44c4..044f89c7c 100644
>> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>> @@ -206,6 +206,7 @@ DPDK_17.08 {
>>       rte_bus_find_by_device;
>>       rte_bus_find_by_name;
>>       rte_pci_match;
>> +    rte_pci_get_iommu_class;
>>     } DPDK_17.05;
>>   
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 07/11] linuxapp/eal: auto detect iova mode
  2017-07-11  6:16     ` [PATCH v3 07/11] linuxapp/eal: auto detect iova mode Santosh Shukla
@ 2017-07-13 11:29       ` Hemant Agrawal
  2017-07-13 11:45         ` Hemant Agrawal
  2017-07-13 18:25         ` santosh
  0 siblings, 2 replies; 248+ messages in thread
From: Hemant Agrawal @ 2017-07-13 11:29 UTC (permalink / raw)
  To: Santosh Shukla, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On 7/11/2017 11:46 AM, Santosh Shukla wrote:
> - Moving late bus scanning to up..just after eal_parsing.
> - Auto detect iova mapping mode, based on the result of
>   rte_bus_scan_iommu_class.
>
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>  lib/librte_eal/linuxapp/eal/eal.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
> index 2546b55e4..7b4dd70de 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -799,6 +799,16 @@ rte_eal_init(int argc, char **argv)
>  		return -1;
>  	}
>
> +	if (rte_bus_scan()) {
> +		rte_eal_init_alert("Cannot scan the buses for devices\n");
> +		rte_errno = ENODEV;
> +		return -1;
> +	}
> +

The original place of the bus scan was with the following factors:
1. The bus scan requires the VFIO to be enabled atleast in dpaa2 case.
(VFIO code still need cleanup to be support non-pci cleanly). I tried 
moving it before bus_scan, this helped in bus scanning.

2. During SCAN, the bus may allocate memory to devices or for it's own 
usages.  rte_malloc or mempool is required in cases to support 
multi-process environment. (e.g. dpaa2 create dpbp or dpio device memory 
using the rte_malloc call).

Since none of the other rte library (mempool, memzone, tailq) is 
available at this point, it will create significant restriction on the 
bus scan.

We will prefer if you can re-introduce the "iova_mode" and allow the 
application choose, which mode it want to run.

This auto-detect logic may not work for many buses and it is going
to create serious restrictions on the bus_scan code.

> +	/* autodetect the iova mapping mode (default is iova_pa) */
> +	if (rte_bus_get_iommu_class() == RTE_IOVA_VA)
> +		rte_eal_get_configuration()->iova_mode = RTE_IOVA_VA;
> +
>  	if (internal_config.no_hugetlbfs == 0 &&
>  			internal_config.process_type != RTE_PROC_SECONDARY &&
>  			internal_config.xen_dom0_support == 0 &&
> @@ -896,12 +906,6 @@ rte_eal_init(int argc, char **argv)
>  		return -1;
>  	}
>
> -	if (rte_bus_scan()) {
> -		rte_eal_init_alert("Cannot scan the buses for devices\n");
> -		rte_errno = ENODEV;
> -		return -1;
> -	}
> -
>  	RTE_LCORE_FOREACH_SLAVE(i) {
>
>  		/*
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 07/11] linuxapp/eal: auto detect iova mode
  2017-07-13 11:29       ` Hemant Agrawal
@ 2017-07-13 11:45         ` Hemant Agrawal
  2017-07-13 18:25         ` santosh
  1 sibling, 0 replies; 248+ messages in thread
From: Hemant Agrawal @ 2017-07-13 11:45 UTC (permalink / raw)
  To: Santosh Shukla, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On 7/13/2017 4:59 PM, Hemant Agrawal wrote:
> On 7/11/2017 11:46 AM, Santosh Shukla wrote:
>> - Moving late bus scanning to up..just after eal_parsing.
>> - Auto detect iova mapping mode, based on the result of
>>   rte_bus_scan_iommu_class.
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> ---
>>  lib/librte_eal/linuxapp/eal/eal.c | 16 ++++++++++------
>>  1 file changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal.c
>> b/lib/librte_eal/linuxapp/eal/eal.c
>> index 2546b55e4..7b4dd70de 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal.c
>> @@ -799,6 +799,16 @@ rte_eal_init(int argc, char **argv)
>>          return -1;
>>      }
>>
>> +    if (rte_bus_scan()) {
>> +        rte_eal_init_alert("Cannot scan the buses for devices\n");
>> +        rte_errno = ENODEV;
>> +        return -1;
>> +    }
>> +
>
> The original place of the bus scan was with the following factors:
> 1. The bus scan requires the VFIO to be enabled atleast in dpaa2 case.
> (VFIO code still need cleanup to be support non-pci cleanly). I tried
> moving it before bus_scan, this helped in bus scanning.
>
> 2. During SCAN, the bus may allocate memory to devices or for it's own
> usages.  rte_malloc or mempool is required in cases to support
> multi-process environment. (e.g. dpaa2 create dpbp or dpio device memory
> using the rte_malloc call).
>
> Since none of the other rte library (mempool, memzone, tailq) is
> available at this point, it will create significant restriction on the
> bus scan.
>
> We will prefer if you can re-introduce the "iova_mode" and allow the
> application choose, which mode it want to run.
>
> This auto-detect logic may not work for many buses and it is going
> to create serious restrictions on the bus_scan code.
>

Is it possible that you offer a *rte_bus_pre_scan* kind of infra to 
detect the bus iommu class only. This way it will address all the concerns.

>> +    /* autodetect the iova mapping mode (default is iova_pa) */
>> +    if (rte_bus_get_iommu_class() == RTE_IOVA_VA)
>> +        rte_eal_get_configuration()->iova_mode = RTE_IOVA_VA;
>> +
>>      if (internal_config.no_hugetlbfs == 0 &&
>>              internal_config.process_type != RTE_PROC_SECONDARY &&
>>              internal_config.xen_dom0_support == 0 &&
>> @@ -896,12 +906,6 @@ rte_eal_init(int argc, char **argv)
>>          return -1;
>>      }
>>
>> -    if (rte_bus_scan()) {
>> -        rte_eal_init_alert("Cannot scan the buses for devices\n");
>> -        rte_errno = ENODEV;
>> -        return -1;
>> -    }
>> -
>>      RTE_LCORE_FOREACH_SLAVE(i) {
>>
>>          /*
>>
>
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 07/11] linuxapp/eal: auto detect iova mode
  2017-07-13 11:29       ` Hemant Agrawal
  2017-07-13 11:45         ` Hemant Agrawal
@ 2017-07-13 18:25         ` santosh
  2017-07-14  8:49           ` Hemant Agrawal
  1 sibling, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-13 18:25 UTC (permalink / raw)
  To: Hemant Agrawal, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On Thursday 13 July 2017 04:59 PM, Hemant Agrawal wrote:

> On 7/11/2017 11:46 AM, Santosh Shukla wrote:
>> - Moving late bus scanning to up..just after eal_parsing.
>> - Auto detect iova mapping mode, based on the result of
>>   rte_bus_scan_iommu_class.
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> ---
>>  lib/librte_eal/linuxapp/eal/eal.c | 16 ++++++++++------
>>  1 file changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
>> index 2546b55e4..7b4dd70de 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal.c
>> @@ -799,6 +799,16 @@ rte_eal_init(int argc, char **argv)
>>          return -1;
>>      }
>>
>> +    if (rte_bus_scan()) {
>> +        rte_eal_init_alert("Cannot scan the buses for devices\n");
>> +        rte_errno = ENODEV;
>> +        return -1;
>> +    }
>> +
>
> The original place of the bus scan was with the following factors:
> 1. The bus scan requires the VFIO to be enabled atleast in dpaa2 case.
> (VFIO code still need cleanup to be support non-pci cleanly). I tried moving it before bus_scan, this helped in bus scanning.
>
bus_scan should do scanning, device enumeration, detecting devices and 
interface that device bound to, that interface could be VFIO, UIO, UIO_GENERIC etc..

PCI bus scanning (in eal/) strictly comply to what I mentioned above, thus
aut-detection works gracefully.

However fslmc_bus 'scan' doesn't do device scanning, instead It call vfio dependent
code which ideally should fall in 'resource mapping' category,. ideally should
happen at bus probe time.

Example: 
rte_fslmc_bus_scan()
	--> fslmc_vfio_setup_group
	--> fslmc_vfio_process_group

So it is doing _setup_ inside scan ops, which in PCI(/vfio-pci) case happens 
at probe time (`vfio_setup_device`).

In order to benefit iova auto-detection infrastructure: fslmc bus should
do to two things:

0) fslmc bus scan should look at /sys/bus/platform/drivers/vfio-platform/*
and find out that devices bind to vfio-platform or not, if yes then update kdrv
entry mentioning interface type example VFIO. That-way flsmc bus gets capability to
inform rte_bus about IOMMU capable interface. Right now, existing implementation
don't have means to inform rte_bus about his devices like pci_bus has!.

1) defer the vfio_seup from scan to bus->probe().

 

> 2. During SCAN, the bus may allocate memory to devices or for it's own usages.  rte_malloc or mempool is required in cases to support multi-process environment. (e.g. dpaa2 create dpbp or dpio device memory using the rte_malloc call).
>
If bus scanning adheres to device detection or enumeration then rte_malloc/mempool

not required, Example eal/pci bus scanning.


And in fslmc bus case: if vfio_setup deferred at bus->probe time then
bus->scan won't have memory dependency.

> Since none of the other rte library (mempool, memzone, tailq) is available at this point, it will create significant restriction on the bus scan.
>
> We will prefer if you can re-introduce the "iova_mode" and allow the application choose, which mode it want to run.
>
> This auto-detect logic may not work for many buses and it is going
> to create serious restrictions on the bus_scan code.
>
fslmc is only bus besides PCI. Auto-detection works gracefully for PCI-bus.
Can you give a try to said proposal?

Ideally vfio-platform code should sit into eal/vfio like eal/vfio-pci is.
Otherwise it will keep creating problems for new generic framework like we're
discussing one.

if said proposal doesn't work for you then I will re-introduce iova-mode as
eal arg, that will override iova mapping mode. But IMHO, eal arg should be
intermediate solution. Once vfio-platform code properly re-factored and merged,
We should remove those eal iova-mode args.

>> +    /* autodetect the iova mapping mode (default is iova_pa) */
>> +    if (rte_bus_get_iommu_class() == RTE_IOVA_VA)
>> +        rte_eal_get_configuration()->iova_mode = RTE_IOVA_VA;
>> +
>>      if (internal_config.no_hugetlbfs == 0 &&
>>              internal_config.process_type != RTE_PROC_SECONDARY &&
>>              internal_config.xen_dom0_support == 0 &&
>> @@ -896,12 +906,6 @@ rte_eal_init(int argc, char **argv)
>>          return -1;
>>      }
>>
>> -    if (rte_bus_scan()) {
>> -        rte_eal_init_alert("Cannot scan the buses for devices\n");
>> -        rte_errno = ENODEV;
>> -        return -1;
>> -    }
>> -
>>      RTE_LCORE_FOREACH_SLAVE(i) {
>>
>>          /*
>>
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 04/11] linuxapp/eal_pci: get iommu class
  2017-07-11  6:16     ` [PATCH v3 04/11] linuxapp/eal_pci: " Santosh Shukla
  2017-07-11  9:23       ` Maxime Coquelin
  2017-07-12  8:20       ` Sergio Gonzalez Monroy
@ 2017-07-14  7:39       ` Hemant Agrawal
  2017-07-14  7:55         ` santosh
  2 siblings, 1 reply; 248+ messages in thread
From: Hemant Agrawal @ 2017-07-14  7:39 UTC (permalink / raw)
  To: Santosh Shukla, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On 7/11/2017 11:46 AM, Santosh Shukla wrote:
> Get iommu class of PCI device on the bus and returns preferred iova
> mapping mode for that bus.
>
> Algorithm for iova scheme selection for PCI bus:
> 0. Look for device attached to vfio kdrv and has .drv_flag set
> to RTE_PCI_DRV_NEED_IOVA_VA.
> 1. Look for any device attached to UIO class of driver.
> 2. Check for vfio-noiommu mode enabled.
>
> If 1) & 2) is false and 0) is true then select
> mapping scheme as iova=va. Otherwise use default
> mapping scheme (iova_pa).
>
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
> v1 --> v2:
> - Removed Linux version check in vfio_noiommu func. Refer [1].
> - Extending autodetction logic for _iommu_class.
> Refer [2].
>
> [1] https://www.mail-archive.com/dev@dpdk.org/msg70108.html
> [2] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
>
>  lib/librte_eal/linuxapp/eal/eal_pci.c           | 66 +++++++++++++++++++++++++
>  lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++++
>  lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>  4 files changed, 90 insertions(+)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
> index 7d9e1a99b..573caa000 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> @@ -45,6 +45,7 @@
>  #include "eal_filesystem.h"
>  #include "eal_private.h"
>  #include "eal_pci_init.h"
> +#include "eal_vfio.h"
>
>  /**
>   * @file
> @@ -488,6 +489,71 @@ rte_pci_scan(void)
>  	return -1;
>  }
>
> +/*
> + * Any one of the device bound to uio
> + */
> +static inline int
> +pci_device_bound_uio(void)
> +{
> +	struct rte_pci_device *dev = NULL;
> +
> +	FOREACH_DEVICE_ON_PCIBUS(dev) {
> +		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
> +		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
> +			return 1;
> +		}
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Any one of the device has iova as va
> + */
> +static inline int
> +pci_device_has_iova_va(void)
> +{
> +	struct rte_pci_device *dev = NULL;
> +	struct rte_pci_driver *drv = NULL;
> +
> +	FOREACH_DRIVER_ON_PCIBUS(drv) {
> +		if (drv && drv->drv_flags & RTE_PCI_DRV_NEED_IOVA_VA) {
> +			FOREACH_DEVICE_ON_PCIBUS(dev) {
> +				if (dev->kdrv == RTE_KDRV_VFIO &&
> +				    rte_pci_match(drv, dev))
> +					return 1;
> +			}
> +		}
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Get iommu class of PCI devices on the bus.
> + */
> +enum rte_iova_mode
> +rte_pci_get_iommu_class(void)
> +{
> +	bool is_vfio_noiommu_enabled;
> +	bool has_iova_va;
> +	bool is_bound_uio;
> +
> +	has_iova_va = pci_device_has_iova_va();
> +	is_bound_uio = pci_device_bound_uio();
> +	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
> +
> +	if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
> +		return RTE_IOVA_VA;
> +

PCI is generally present in all platform including dpaa2.
There may not be any device found or available for dpdk usages in such 
cases. The PCI bus will still return RTE_IOVA_PA, which will make the 
system mode as PA.

> +	if (has_iova_va) {
> +		if (is_vfio_noiommu_enabled)
> +			RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
> +		if (is_bound_uio)
> +			RTE_LOG(WARNING, EAL, "Some device attached to UIO\n");
> +	}
> +
> +	return RTE_IOVA_PA;
> +}
> +
>  /* Read PCI config space. */
>  int rte_pci_read_config(const struct rte_pci_device *device,
>  		void *buf, size_t len, off_t offset)
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> index 946df7e31..c8a97b7e7 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
>  	return 0;
>  }
>
> +int
> +vfio_noiommu_is_enabled(void)
> +{
> +	int fd, ret, cnt __rte_unused;
> +	char c;
> +
> +	ret = -1;
> +	fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
> +	if (fd < 0)
> +		return -1;
> +
> +	cnt = read(fd, &c, 1);
> +	if (c == 'Y')
> +		ret = 1;
> +
> +	close(fd);
> +	return ret;
> +}
> +
>  #endif
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
> index 5ff63e5d7..26ea8e119 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
> @@ -150,6 +150,8 @@ struct vfio_config {
>  #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
>  #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
>  #define VFIO_GET_REGION_IDX(x) (x >> 40)
> +#define VFIO_NOIOMMU_MODE      \
> +	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
>
>  /* DMA mapping function prototype.
>   * Takes VFIO container fd as a parameter.
> @@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
>
>  int vfio_mp_sync_setup(void);
>
> +int vfio_noiommu_is_enabled(void);
> +
>  #define SOCKET_REQ_CONTAINER 0x100
>  #define SOCKET_REQ_GROUP 0x200
>  #define SOCKET_CLR_GROUP 0x300
> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> index c91dd44c4..044f89c7c 100644
> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> @@ -206,6 +206,7 @@ DPDK_17.08 {
>  	rte_bus_find_by_device;
>  	rte_bus_find_by_name;
>  	rte_pci_match;
> +	rte_pci_get_iommu_class;
>
>  } DPDK_17.05;
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 04/11] linuxapp/eal_pci: get iommu class
  2017-07-13  8:23         ` santosh
@ 2017-07-14  7:43           ` Sergio Gonzalez Monroy
  2017-07-14  8:11             ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Sergio Gonzalez Monroy @ 2017-07-14  7:43 UTC (permalink / raw)
  To: santosh, thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, anatoly.burakov, stephen, maxime.coquelin,
	olivier.matz

On 13/07/2017 09:23, santosh wrote:
> Hi Sergio,
>
> On Wednesday 12 July 2017 01:50 PM, Sergio Gonzalez Monroy wrote:
>
>> On 11/07/2017 07:16, Santosh Shukla wrote:
>>> Get iommu class of PCI device on the bus and returns preferred iova
>>> mapping mode for that bus.
>>>
>>> Algorithm for iova scheme selection for PCI bus:
>>> 0. Look for device attached to vfio kdrv and has .drv_flag set
>>> to RTE_PCI_DRV_NEED_IOVA_VA.
>>> 1. Look for any device attached to UIO class of driver.
>>> 2. Check for vfio-noiommu mode enabled.
>>>
>>> If 1) & 2) is false and 0) is true then select
>>> mapping scheme as iova=va. Otherwise use default
>>> mapping scheme (iova_pa).
>>>
>>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>> ---
>>> v1 --> v2:
>>> - Removed Linux version check in vfio_noiommu func. Refer [1].
>>> - Extending autodetction logic for _iommu_class.
>>> Refer [2].
>>>
>>> [1] https://www.mail-archive.com/dev@dpdk.org/msg70108.html
>>> [2] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
>> Just wondering how it all works with device hotplug.
>> Correct me if I am wrong but if EAL decides to use IOVA_AS_VA scheme,
>> if we were to attach a device that needs IOVA_AS_PA, it will not work and should fail to attach, right?
>>
> It will work for igb_uio case, and won't work for vfio-noiommu hotplug case(Invalid case).

Why are those two cases (igb_uio, vfio-noiommu) different? do they not 
have the same requirements, ie. need IOVA_PA sheme?

Thanks,
Sergio

> Yes, we can dictate iova awareness to hotplug/unplug area.
>
>> Thanks,
>> Sergio
>>
>>>    lib/librte_eal/linuxapp/eal/eal_pci.c           | 66 +++++++++++++++++++++++++
>>>    lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++++
>>>    lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>>>    lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>>    4 files changed, 90 insertions(+)
>>>
>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
>>> index 7d9e1a99b..573caa000 100644
>>> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
>>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
>>> @@ -45,6 +45,7 @@
>>>    #include "eal_filesystem.h"
>>>    #include "eal_private.h"
>>>    #include "eal_pci_init.h"
>>> +#include "eal_vfio.h"
>>>      /**
>>>     * @file
>>> @@ -488,6 +489,71 @@ rte_pci_scan(void)
>>>        return -1;
>>>    }
>>>    +/*
>>> + * Any one of the device bound to uio
>>> + */
>>> +static inline int
>>> +pci_device_bound_uio(void)
>>> +{
>>> +    struct rte_pci_device *dev = NULL;
>>> +
>>> +    FOREACH_DEVICE_ON_PCIBUS(dev) {
>>> +        if (dev->kdrv == RTE_KDRV_IGB_UIO ||
>>> +           dev->kdrv == RTE_KDRV_UIO_GENERIC) {
>>> +            return 1;
>>> +        }
>>> +    }
>>> +    return 0;
>>> +}
>>> +
>>> +/*
>>> + * Any one of the device has iova as va
>>> + */
>>> +static inline int
>>> +pci_device_has_iova_va(void)
>>> +{
>>> +    struct rte_pci_device *dev = NULL;
>>> +    struct rte_pci_driver *drv = NULL;
>>> +
>>> +    FOREACH_DRIVER_ON_PCIBUS(drv) {
>>> +        if (drv && drv->drv_flags & RTE_PCI_DRV_NEED_IOVA_VA) {
>>> +            FOREACH_DEVICE_ON_PCIBUS(dev) {
>>> +                if (dev->kdrv == RTE_KDRV_VFIO &&
>>> +                    rte_pci_match(drv, dev))
>>> +                    return 1;
>>> +            }
>>> +        }
>>> +    }
>>> +    return 0;
>>> +}
>>> +
>>> +/*
>>> + * Get iommu class of PCI devices on the bus.
>>> + */
>>> +enum rte_iova_mode
>>> +rte_pci_get_iommu_class(void)
>>> +{
>>> +    bool is_vfio_noiommu_enabled;
>>> +    bool has_iova_va;
>>> +    bool is_bound_uio;
>>> +
>>> +    has_iova_va = pci_device_has_iova_va();
>>> +    is_bound_uio = pci_device_bound_uio();
>>> +    is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
>>> +
>>> +    if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
>>> +        return RTE_IOVA_VA;
>>> +
>>> +    if (has_iova_va) {
>>> +        if (is_vfio_noiommu_enabled)
>>> +            RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
>>> +        if (is_bound_uio)
>>> +            RTE_LOG(WARNING, EAL, "Some device attached to UIO\n");
>>> +    }
>>> +
>>> +    return RTE_IOVA_PA;
>>> +}
>>> +
>>>    /* Read PCI config space. */
>>>    int rte_pci_read_config(const struct rte_pci_device *device,
>>>            void *buf, size_t len, off_t offset)
>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>> index 946df7e31..c8a97b7e7 100644
>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>> @@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
>>>        return 0;
>>>    }
>>>    +int
>>> +vfio_noiommu_is_enabled(void)
>>> +{
>>> +    int fd, ret, cnt __rte_unused;
>>> +    char c;
>>> +
>>> +    ret = -1;
>>> +    fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
>>> +    if (fd < 0)
>>> +        return -1;
>>> +
>>> +    cnt = read(fd, &c, 1);
>>> +    if (c == 'Y')
>>> +        ret = 1;
>>> +
>>> +    close(fd);
>>> +    return ret;
>>> +}
>>> +
>>>    #endif
>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
>>> index 5ff63e5d7..26ea8e119 100644
>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
>>> @@ -150,6 +150,8 @@ struct vfio_config {
>>>    #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
>>>    #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
>>>    #define VFIO_GET_REGION_IDX(x) (x >> 40)
>>> +#define VFIO_NOIOMMU_MODE      \
>>> +    "/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
>>>      /* DMA mapping function prototype.
>>>     * Takes VFIO container fd as a parameter.
>>> @@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
>>>      int vfio_mp_sync_setup(void);
>>>    +int vfio_noiommu_is_enabled(void);
>>> +
>>>    #define SOCKET_REQ_CONTAINER 0x100
>>>    #define SOCKET_REQ_GROUP 0x200
>>>    #define SOCKET_CLR_GROUP 0x300
>>> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>> index c91dd44c4..044f89c7c 100644
>>> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>> @@ -206,6 +206,7 @@ DPDK_17.08 {
>>>        rte_bus_find_by_device;
>>>        rte_bus_find_by_name;
>>>        rte_pci_match;
>>> +    rte_pci_get_iommu_class;
>>>      } DPDK_17.05;
>>>    
>>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 04/11] linuxapp/eal_pci: get iommu class
  2017-07-14  7:39       ` Hemant Agrawal
@ 2017-07-14  7:55         ` santosh
  2017-07-14  8:06           ` Hemant Agrawal
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-14  7:55 UTC (permalink / raw)
  To: Hemant Agrawal, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On Friday 14 July 2017 01:09 PM, Hemant Agrawal wrote:

> On 7/11/2017 11:46 AM, Santosh Shukla wrote:
>> Get iommu class of PCI device on the bus and returns preferred iova
>> mapping mode for that bus.
>>
>> Algorithm for iova scheme selection for PCI bus:
>> 0. Look for device attached to vfio kdrv and has .drv_flag set
>> to RTE_PCI_DRV_NEED_IOVA_VA.
>> 1. Look for any device attached to UIO class of driver.
>> 2. Check for vfio-noiommu mode enabled.
>>
>> If 1) & 2) is false and 0) is true then select
>> mapping scheme as iova=va. Otherwise use default
>> mapping scheme (iova_pa).
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> ---
>> v1 --> v2:
>> - Removed Linux version check in vfio_noiommu func. Refer [1].
>> - Extending autodetction logic for _iommu_class.
>> Refer [2].
>>
>> [1] https://www.mail-archive.com/dev@dpdk.org/msg70108.html
>> [2] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
>>
>>  lib/librte_eal/linuxapp/eal/eal_pci.c           | 66 +++++++++++++++++++++++++
>>  lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++++
>>  lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>  4 files changed, 90 insertions(+)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> index 7d9e1a99b..573caa000 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> @@ -45,6 +45,7 @@
>>  #include "eal_filesystem.h"
>>  #include "eal_private.h"
>>  #include "eal_pci_init.h"
>> +#include "eal_vfio.h"
>>
>>  /**
>>   * @file
>> @@ -488,6 +489,71 @@ rte_pci_scan(void)
>>      return -1;
>>  }
>>
>> +/*
>> + * Any one of the device bound to uio
>> + */
>> +static inline int
>> +pci_device_bound_uio(void)
>> +{
>> +    struct rte_pci_device *dev = NULL;
>> +
>> +    FOREACH_DEVICE_ON_PCIBUS(dev) {
>> +        if (dev->kdrv == RTE_KDRV_IGB_UIO ||
>> +           dev->kdrv == RTE_KDRV_UIO_GENERIC) {
>> +            return 1;
>> +        }
>> +    }
>> +    return 0;
>> +}
>> +
>> +/*
>> + * Any one of the device has iova as va
>> + */
>> +static inline int
>> +pci_device_has_iova_va(void)
>> +{
>> +    struct rte_pci_device *dev = NULL;
>> +    struct rte_pci_driver *drv = NULL;
>> +
>> +    FOREACH_DRIVER_ON_PCIBUS(drv) {
>> +        if (drv && drv->drv_flags & RTE_PCI_DRV_NEED_IOVA_VA) {
>> +            FOREACH_DEVICE_ON_PCIBUS(dev) {
>> +                if (dev->kdrv == RTE_KDRV_VFIO &&
>> +                    rte_pci_match(drv, dev))
>> +                    return 1;
>> +            }
>> +        }
>> +    }
>> +    return 0;
>> +}
>> +
>> +/*
>> + * Get iommu class of PCI devices on the bus.
>> + */
>> +enum rte_iova_mode
>> +rte_pci_get_iommu_class(void)
>> +{
>> +    bool is_vfio_noiommu_enabled;
>> +    bool has_iova_va;
>> +    bool is_bound_uio;
>> +
>> +    has_iova_va = pci_device_has_iova_va();
>> +    is_bound_uio = pci_device_bound_uio();
>> +    is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
>> +
>> +    if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
>> +        return RTE_IOVA_VA;
>> +
>
> PCI is generally present in all platform including dpaa2.
> There may not be any device found or available for dpdk usages in such cases. The PCI bus will still return RTE_IOVA_PA, which will make the system mode as PA.
>
That's the expected behavior. And implementation makes sure
that PCI_bus return default mode aka _PA if no-pci device found.

Isn't code taking care of same?

Let me walk through the code:

has_iova_va = 0 (if no pci device then pci_device_has_iov_va() will return 0).

And if (has_iova_va & ,,,) will fail therefore rte_pci_get_iommu_class() retuns RTE_IOVA_PA mode.
which is default mode. Right?

>> +    if (has_iova_va) {
>> +        if (is_vfio_noiommu_enabled)
>> +            RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
>> +        if (is_bound_uio)
>> +            RTE_LOG(WARNING, EAL, "Some device attached to UIO\n");
>> +    }
>> +
>> +    return RTE_IOVA_PA;
>> +}
>> +
>>  /* Read PCI config space. */
>>  int rte_pci_read_config(const struct rte_pci_device *device,
>>          void *buf, size_t len, off_t offset)
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> index 946df7e31..c8a97b7e7 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> @@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
>>      return 0;
>>  }
>>
>> +int
>> +vfio_noiommu_is_enabled(void)
>> +{
>> +    int fd, ret, cnt __rte_unused;
>> +    char c;
>> +
>> +    ret = -1;
>> +    fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
>> +    if (fd < 0)
>> +        return -1;
>> +
>> +    cnt = read(fd, &c, 1);
>> +    if (c == 'Y')
>> +        ret = 1;
>> +
>> +    close(fd);
>> +    return ret;
>> +}
>> +
>>  #endif
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
>> index 5ff63e5d7..26ea8e119 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
>> @@ -150,6 +150,8 @@ struct vfio_config {
>>  #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
>>  #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
>>  #define VFIO_GET_REGION_IDX(x) (x >> 40)
>> +#define VFIO_NOIOMMU_MODE      \
>> +    "/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
>>
>>  /* DMA mapping function prototype.
>>   * Takes VFIO container fd as a parameter.
>> @@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
>>
>>  int vfio_mp_sync_setup(void);
>>
>> +int vfio_noiommu_is_enabled(void);
>> +
>>  #define SOCKET_REQ_CONTAINER 0x100
>>  #define SOCKET_REQ_GROUP 0x200
>>  #define SOCKET_CLR_GROUP 0x300
>> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>> index c91dd44c4..044f89c7c 100644
>> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>> @@ -206,6 +206,7 @@ DPDK_17.08 {
>>      rte_bus_find_by_device;
>>      rte_bus_find_by_name;
>>      rte_pci_match;
>> +    rte_pci_get_iommu_class;
>>
>>  } DPDK_17.05;
>>
>>
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 04/11] linuxapp/eal_pci: get iommu class
  2017-07-14  7:55         ` santosh
@ 2017-07-14  8:06           ` Hemant Agrawal
  2017-07-14  8:46             ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Hemant Agrawal @ 2017-07-14  8:06 UTC (permalink / raw)
  To: santosh, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On 7/14/2017 1:25 PM, santosh wrote:
> On Friday 14 July 2017 01:09 PM, Hemant Agrawal wrote:
>
>> On 7/11/2017 11:46 AM, Santosh Shukla wrote:
>>> Get iommu class of PCI device on the bus and returns preferred iova
>>> mapping mode for that bus.
>>>
>>> Algorithm for iova scheme selection for PCI bus:
>>> 0. Look for device attached to vfio kdrv and has .drv_flag set
>>> to RTE_PCI_DRV_NEED_IOVA_VA.
>>> 1. Look for any device attached to UIO class of driver.
>>> 2. Check for vfio-noiommu mode enabled.
>>>
>>> If 1) & 2) is false and 0) is true then select
>>> mapping scheme as iova=va. Otherwise use default
>>> mapping scheme (iova_pa).
>>>
>>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>> ---
>>> v1 --> v2:
>>> - Removed Linux version check in vfio_noiommu func. Refer [1].
>>> - Extending autodetction logic for _iommu_class.
>>> Refer [2].
>>>
>>> [1] https://www.mail-archive.com/dev@dpdk.org/msg70108.html
>>> [2] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
>>>
>>>  lib/librte_eal/linuxapp/eal/eal_pci.c           | 66 +++++++++++++++++++++++++
>>>  lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++++
>>>  lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>>  4 files changed, 90 insertions(+)
>>>
>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
>>> index 7d9e1a99b..573caa000 100644
>>> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
>>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
>>> @@ -45,6 +45,7 @@
>>>  #include "eal_filesystem.h"
>>>  #include "eal_private.h"
>>>  #include "eal_pci_init.h"
>>> +#include "eal_vfio.h"
>>>
>>>  /**
>>>   * @file
>>> @@ -488,6 +489,71 @@ rte_pci_scan(void)
>>>      return -1;
>>>  }
>>>
>>> +/*
>>> + * Any one of the device bound to uio
>>> + */
>>> +static inline int
>>> +pci_device_bound_uio(void)
>>> +{
>>> +    struct rte_pci_device *dev = NULL;
>>> +
>>> +    FOREACH_DEVICE_ON_PCIBUS(dev) {
>>> +        if (dev->kdrv == RTE_KDRV_IGB_UIO ||
>>> +           dev->kdrv == RTE_KDRV_UIO_GENERIC) {
>>> +            return 1;
>>> +        }
>>> +    }
>>> +    return 0;
>>> +}
>>> +
>>> +/*
>>> + * Any one of the device has iova as va
>>> + */
>>> +static inline int
>>> +pci_device_has_iova_va(void)
>>> +{
>>> +    struct rte_pci_device *dev = NULL;
>>> +    struct rte_pci_driver *drv = NULL;
>>> +
>>> +    FOREACH_DRIVER_ON_PCIBUS(drv) {
>>> +        if (drv && drv->drv_flags & RTE_PCI_DRV_NEED_IOVA_VA) {
>>> +            FOREACH_DEVICE_ON_PCIBUS(dev) {
>>> +                if (dev->kdrv == RTE_KDRV_VFIO &&
>>> +                    rte_pci_match(drv, dev))
>>> +                    return 1;
>>> +            }
>>> +        }
>>> +    }
>>> +    return 0;
>>> +}
>>> +
>>> +/*
>>> + * Get iommu class of PCI devices on the bus.
>>> + */
>>> +enum rte_iova_mode
>>> +rte_pci_get_iommu_class(void)
>>> +{
>>> +    bool is_vfio_noiommu_enabled;
>>> +    bool has_iova_va;
>>> +    bool is_bound_uio;
>>> +
>>> +    has_iova_va = pci_device_has_iova_va();
>>> +    is_bound_uio = pci_device_bound_uio();
>>> +    is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
>>> +
>>> +    if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
>>> +        return RTE_IOVA_VA;
>>> +
>>
>> PCI is generally present in all platform including dpaa2.
>> There may not be any device found or available for dpdk usages in such cases. The PCI bus will still return RTE_IOVA_PA, which will make the system mode as PA.
>>
> That's the expected behavior. And implementation makes sure
> that PCI_bus return default mode aka _PA if no-pci device found.
>
> Isn't code taking care of same?
>

I have attached a PCI device to the board. But it is being managed by 
kernel only.

EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL:   probe driver: 8086:10d3 net_e1000_em
EAL:   Not managed by a supported kernel driver, skipped

So, there are devices in the PCI list. But none of them is probed or 
being used by dpdk.


> Let me walk through the code:
>
> has_iova_va = 0 (if no pci device then pci_device_has_iov_va() will return 0).
>
> And if (has_iova_va & ,,,) will fail therefore rte_pci_get_iommu_class() retuns RTE_IOVA_PA mode.
> which is default mode. Right?
>
This will create issue for the 2nd bus, which is a VA bus. The combined 
mode will becomes '3', so the system mode will be PA.

>>> +    if (has_iova_va) {
>>> +        if (is_vfio_noiommu_enabled)
>>> +            RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
>>> +        if (is_bound_uio)
>>> +            RTE_LOG(WARNING, EAL, "Some device attached to UIO\n");
>>> +    }
>>> +
>>> +    return RTE_IOVA_PA;
>>> +}
>>> +
>>>  /* Read PCI config space. */
>>>  int rte_pci_read_config(const struct rte_pci_device *device,
>>>          void *buf, size_t len, off_t offset)
>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>> index 946df7e31..c8a97b7e7 100644
>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>> @@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
>>>      return 0;
>>>  }
>>>
>>> +int
>>> +vfio_noiommu_is_enabled(void)
>>> +{
>>> +    int fd, ret, cnt __rte_unused;
>>> +    char c;
>>> +
>>> +    ret = -1;
>>> +    fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
>>> +    if (fd < 0)
>>> +        return -1;
>>> +
>>> +    cnt = read(fd, &c, 1);
>>> +    if (c == 'Y')
>>> +        ret = 1;
>>> +
>>> +    close(fd);
>>> +    return ret;
>>> +}
>>> +
>>>  #endif
>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
>>> index 5ff63e5d7..26ea8e119 100644
>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
>>> @@ -150,6 +150,8 @@ struct vfio_config {
>>>  #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
>>>  #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
>>>  #define VFIO_GET_REGION_IDX(x) (x >> 40)
>>> +#define VFIO_NOIOMMU_MODE      \
>>> +    "/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
>>>
>>>  /* DMA mapping function prototype.
>>>   * Takes VFIO container fd as a parameter.
>>> @@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
>>>
>>>  int vfio_mp_sync_setup(void);
>>>
>>> +int vfio_noiommu_is_enabled(void);
>>> +
>>>  #define SOCKET_REQ_CONTAINER 0x100
>>>  #define SOCKET_REQ_GROUP 0x200
>>>  #define SOCKET_CLR_GROUP 0x300
>>> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>> index c91dd44c4..044f89c7c 100644
>>> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>> @@ -206,6 +206,7 @@ DPDK_17.08 {
>>>      rte_bus_find_by_device;
>>>      rte_bus_find_by_name;
>>>      rte_pci_match;
>>> +    rte_pci_get_iommu_class;
>>>
>>>  } DPDK_17.05;
>>>
>>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 05/11] bus: get iommu class
  2017-07-11  6:16     ` [PATCH v3 05/11] bus: " Santosh Shukla
@ 2017-07-14  8:07       ` Hemant Agrawal
  2017-07-14  8:30         ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Hemant Agrawal @ 2017-07-14  8:07 UTC (permalink / raw)
  To: Santosh Shukla, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On 7/11/2017 11:46 AM, Santosh Shukla wrote:
> API(rte_bus_get_iommu_class) helps to automatically detect and select
> appropriate iova mapping scheme for iommu capable device on that bus.
>
> Algorithm for iova scheme selection for bus:
> 0. Iterate through bus_list.
> 1. Collect each bus iova mode value and update into 'mode' var.
> 2. Here value '1' is _pa and value '2' is _va mode.
> So mode selection scheme is like:
> if mode == 2 then iova mode is _va.
> if mode == 1 then iova mode is _pa
> if mode  == 3 then iova mode ia _pa.
>
> So mode !=2  will be default iova mode.
>
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
>  lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
>  lib/librte_eal/common/eal_common_pci.c          |  1 +
>  lib/librte_eal/common/include/rte_bus.h         | 22 ++++++++++++++++++++++
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>  5 files changed, 48 insertions(+)
>
> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> index 33c2c32c0..a2dd65a33 100644
> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> @@ -202,6 +202,7 @@ DPDK_17.08 {
>  	rte_bus_find_by_name;
>  	rte_pci_match;
>  	rte_pci_get_iommu_class;
> +	rte_bus_get_iommu_class;
>
>  } DPDK_17.05;
>
> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
> index 08bec2d93..5d5753ac9 100644
> --- a/lib/librte_eal/common/eal_common_bus.c
> +++ b/lib/librte_eal/common/eal_common_bus.c
> @@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
>  		c[0] = '\0';
>  	return rte_bus_find(NULL, bus_can_parse, name);
>  }
> +
> +
> +/*
> + * Get iommu class of devices on the bus.
> + */
> +enum rte_iova_mode
> +rte_bus_get_iommu_class(void)
> +{
> +	int mode = 0;
> +	struct rte_bus *bus;
> +
> +	TAILQ_FOREACH(bus, &rte_bus_list, next) {
> +
> +		if (bus->get_iommu_class)
> +			mode |= bus->get_iommu_class();
> +	}
> +

If you change the default return as '0' for buses. This code will work.
e.g. PCI will return '0' - when no device is probed. FSL MC will return 
VA. the default mode will be 'VA'

if fslmc is not present. The default mode will be PA.

> +	if (mode != RTE_IOVA_VA) {
> +		/* Use default IOVA mode */
> +		mode = RTE_IOVA_PA;
> +	}
> +	return mode;
> +}
> diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
> index 8b6ecebd6..bdf2e7c3a 100644
> --- a/lib/librte_eal/common/eal_common_pci.c
> +++ b/lib/librte_eal/common/eal_common_pci.c
> @@ -552,6 +552,7 @@ struct rte_pci_bus rte_pci_bus = {
>  		.plug = pci_plug,
>  		.unplug = pci_unplug,
>  		.parse = pci_parse,
> +		.get_iommu_class = rte_pci_get_iommu_class,
>  	},
>  	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
>  	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
> index 7a0cfb165..8b2805b7f 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -181,6 +181,17 @@ struct rte_bus_conf {
>  	enum rte_bus_scan_mode scan_mode; /**< Scan policy. */
>  };
>
> +
> +/**
> + * Get iommu class of devices on the bus.
> + * Check that those devices are attached to iommu driver.
> + *
> + * @return
> + *      enum rte_iova_mode value.
> + */
> +typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
> +
> +
>  /**
>   * A structure describing a generic bus.
>   */
> @@ -194,6 +205,7 @@ struct rte_bus {
>  	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
>  	rte_bus_parse_t parse;       /**< Parse a device name */
>  	struct rte_bus_conf conf;    /**< Bus configuration */
> +	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>  };
>
>  /**
> @@ -293,6 +305,16 @@ struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
>   */
>  struct rte_bus *rte_bus_find_by_name(const char *busname);
>
> +
> +/**
> + * Get iommu class of devices on the bus.
> + * Check that those devices are attached to iommu driver.
> + *
> + * @return
> + *     enum rte_iova_mode value.
> + */
> +enum rte_iova_mode rte_bus_get_iommu_class(void);
> +
>  /**
>   * Helper for Bus registration.
>   * The constructor has higher priority than PMD constructors.
> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> index 044f89c7c..186c7b0fd 100644
> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> @@ -207,6 +207,7 @@ DPDK_17.08 {
>  	rte_bus_find_by_name;
>  	rte_pci_match;
>  	rte_pci_get_iommu_class;
> +	rte_bus_get_iommu_class;
>
>  } DPDK_17.05;
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 04/11] linuxapp/eal_pci: get iommu class
  2017-07-14  7:43           ` Sergio Gonzalez Monroy
@ 2017-07-14  8:11             ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-07-14  8:11 UTC (permalink / raw)
  To: Sergio Gonzalez Monroy, thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, anatoly.burakov, stephen, maxime.coquelin,
	olivier.matz

Hi Sergio,

On Friday 14 July 2017 01:13 PM, Sergio Gonzalez Monroy wrote:

> On 13/07/2017 09:23, santosh wrote:
>> Hi Sergio,
>>
>> On Wednesday 12 July 2017 01:50 PM, Sergio Gonzalez Monroy wrote:
>>
>>> On 11/07/2017 07:16, Santosh Shukla wrote:
>>>> Get iommu class of PCI device on the bus and returns preferred iova
>>>> mapping mode for that bus.
>>>>
>>>> Algorithm for iova scheme selection for PCI bus:
>>>> 0. Look for device attached to vfio kdrv and has .drv_flag set
>>>> to RTE_PCI_DRV_NEED_IOVA_VA.
>>>> 1. Look for any device attached to UIO class of driver.
>>>> 2. Check for vfio-noiommu mode enabled.
>>>>
>>>> If 1) & 2) is false and 0) is true then select
>>>> mapping scheme as iova=va. Otherwise use default
>>>> mapping scheme (iova_pa).
>>>>
>>>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>> ---
>>>> v1 --> v2:
>>>> - Removed Linux version check in vfio_noiommu func. Refer [1].
>>>> - Extending autodetction logic for _iommu_class.
>>>> Refer [2].
>>>>
>>>> [1] https://www.mail-archive.com/dev@dpdk.org/msg70108.html
>>>> [2] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
>>> Just wondering how it all works with device hotplug.
>>> Correct me if I am wrong but if EAL decides to use IOVA_AS_VA scheme,
>>> if we were to attach a device that needs IOVA_AS_PA, it will not work and should fail to attach, right?
>>>
>> It will work for igb_uio case, and won't work for vfio-noiommu hotplug case(Invalid case).
>
> Why are those two cases (igb_uio, vfio-noiommu) different? do they not have the same requirements, ie. need IOVA_PA sheme?
>
Behavior remains same.

For vfio-noiommu case in the context of hot-plugging - Rest of the VFIO(/iommu) devices will be functionally
effected thats why mentioned invalid case.

> Thanks,
> Sergio
>
>> Yes, we can dictate iova awareness to hotplug/unplug area.
>>
>>> Thanks,
>>> Sergio
>>>
>>>>    lib/librte_eal/linuxapp/eal/eal_pci.c           | 66 +++++++++++++++++++++++++
>>>>    lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++++
>>>>    lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>>>>    lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>>>    4 files changed, 90 insertions(+)
>>>>
>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
>>>> index 7d9e1a99b..573caa000 100644
>>>> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
>>>> @@ -45,6 +45,7 @@
>>>>    #include "eal_filesystem.h"
>>>>    #include "eal_private.h"
>>>>    #include "eal_pci_init.h"
>>>> +#include "eal_vfio.h"
>>>>      /**
>>>>     * @file
>>>> @@ -488,6 +489,71 @@ rte_pci_scan(void)
>>>>        return -1;
>>>>    }
>>>>    +/*
>>>> + * Any one of the device bound to uio
>>>> + */
>>>> +static inline int
>>>> +pci_device_bound_uio(void)
>>>> +{
>>>> +    struct rte_pci_device *dev = NULL;
>>>> +
>>>> +    FOREACH_DEVICE_ON_PCIBUS(dev) {
>>>> +        if (dev->kdrv == RTE_KDRV_IGB_UIO ||
>>>> +           dev->kdrv == RTE_KDRV_UIO_GENERIC) {
>>>> +            return 1;
>>>> +        }
>>>> +    }
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Any one of the device has iova as va
>>>> + */
>>>> +static inline int
>>>> +pci_device_has_iova_va(void)
>>>> +{
>>>> +    struct rte_pci_device *dev = NULL;
>>>> +    struct rte_pci_driver *drv = NULL;
>>>> +
>>>> +    FOREACH_DRIVER_ON_PCIBUS(drv) {
>>>> +        if (drv && drv->drv_flags & RTE_PCI_DRV_NEED_IOVA_VA) {
>>>> +            FOREACH_DEVICE_ON_PCIBUS(dev) {
>>>> +                if (dev->kdrv == RTE_KDRV_VFIO &&
>>>> +                    rte_pci_match(drv, dev))
>>>> +                    return 1;
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Get iommu class of PCI devices on the bus.
>>>> + */
>>>> +enum rte_iova_mode
>>>> +rte_pci_get_iommu_class(void)
>>>> +{
>>>> +    bool is_vfio_noiommu_enabled;
>>>> +    bool has_iova_va;
>>>> +    bool is_bound_uio;
>>>> +
>>>> +    has_iova_va = pci_device_has_iova_va();
>>>> +    is_bound_uio = pci_device_bound_uio();
>>>> +    is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
>>>> +
>>>> +    if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
>>>> +        return RTE_IOVA_VA;
>>>> +
>>>> +    if (has_iova_va) {
>>>> +        if (is_vfio_noiommu_enabled)
>>>> +            RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
>>>> +        if (is_bound_uio)
>>>> +            RTE_LOG(WARNING, EAL, "Some device attached to UIO\n");
>>>> +    }
>>>> +
>>>> +    return RTE_IOVA_PA;
>>>> +}
>>>> +
>>>>    /* Read PCI config space. */
>>>>    int rte_pci_read_config(const struct rte_pci_device *device,
>>>>            void *buf, size_t len, off_t offset)
>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>> index 946df7e31..c8a97b7e7 100644
>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>> @@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
>>>>        return 0;
>>>>    }
>>>>    +int
>>>> +vfio_noiommu_is_enabled(void)
>>>> +{
>>>> +    int fd, ret, cnt __rte_unused;
>>>> +    char c;
>>>> +
>>>> +    ret = -1;
>>>> +    fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
>>>> +    if (fd < 0)
>>>> +        return -1;
>>>> +
>>>> +    cnt = read(fd, &c, 1);
>>>> +    if (c == 'Y')
>>>> +        ret = 1;
>>>> +
>>>> +    close(fd);
>>>> +    return ret;
>>>> +}
>>>> +
>>>>    #endif
>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
>>>> index 5ff63e5d7..26ea8e119 100644
>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
>>>> @@ -150,6 +150,8 @@ struct vfio_config {
>>>>    #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
>>>>    #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
>>>>    #define VFIO_GET_REGION_IDX(x) (x >> 40)
>>>> +#define VFIO_NOIOMMU_MODE      \
>>>> +    "/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
>>>>      /* DMA mapping function prototype.
>>>>     * Takes VFIO container fd as a parameter.
>>>> @@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
>>>>      int vfio_mp_sync_setup(void);
>>>>    +int vfio_noiommu_is_enabled(void);
>>>> +
>>>>    #define SOCKET_REQ_CONTAINER 0x100
>>>>    #define SOCKET_REQ_GROUP 0x200
>>>>    #define SOCKET_CLR_GROUP 0x300
>>>> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>>> index c91dd44c4..044f89c7c 100644
>>>> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>>> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>>> @@ -206,6 +206,7 @@ DPDK_17.08 {
>>>>        rte_bus_find_by_device;
>>>>        rte_bus_find_by_name;
>>>>        rte_pci_match;
>>>> +    rte_pci_get_iommu_class;
>>>>      } DPDK_17.05;
>>>>    
>>>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 05/11] bus: get iommu class
  2017-07-14  8:07       ` Hemant Agrawal
@ 2017-07-14  8:30         ` santosh
  2017-07-14  9:39           ` Hemant Agrawal
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-14  8:30 UTC (permalink / raw)
  To: Hemant Agrawal, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On Friday 14 July 2017 01:37 PM, Hemant Agrawal wrote:

> On 7/11/2017 11:46 AM, Santosh Shukla wrote:
>> API(rte_bus_get_iommu_class) helps to automatically detect and select
>> appropriate iova mapping scheme for iommu capable device on that bus.
>>
>> Algorithm for iova scheme selection for bus:
>> 0. Iterate through bus_list.
>> 1. Collect each bus iova mode value and update into 'mode' var.
>> 2. Here value '1' is _pa and value '2' is _va mode.
>> So mode selection scheme is like:
>> if mode == 2 then iova mode is _va.
>> if mode == 1 then iova mode is _pa
>> if mode  == 3 then iova mode ia _pa.
>>
>> So mode !=2  will be default iova mode.
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> ---
>>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
>>  lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
>>  lib/librte_eal/common/eal_common_pci.c          |  1 +
>>  lib/librte_eal/common/include/rte_bus.h         | 22 ++++++++++++++++++++++
>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>  5 files changed, 48 insertions(+)
>>
>> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>> index 33c2c32c0..a2dd65a33 100644
>> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>> @@ -202,6 +202,7 @@ DPDK_17.08 {
>>      rte_bus_find_by_name;
>>      rte_pci_match;
>>      rte_pci_get_iommu_class;
>> +    rte_bus_get_iommu_class;
>>
>>  } DPDK_17.05;
>>
>> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
>> index 08bec2d93..5d5753ac9 100644
>> --- a/lib/librte_eal/common/eal_common_bus.c
>> +++ b/lib/librte_eal/common/eal_common_bus.c
>> @@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
>>          c[0] = '\0';
>>      return rte_bus_find(NULL, bus_can_parse, name);
>>  }
>> +
>> +
>> +/*
>> + * Get iommu class of devices on the bus.
>> + */
>> +enum rte_iova_mode
>> +rte_bus_get_iommu_class(void)
>> +{
>> +    int mode = 0;
>> +    struct rte_bus *bus;
>> +
>> +    TAILQ_FOREACH(bus, &rte_bus_list, next) {
>> +
>> +        if (bus->get_iommu_class)
>> +            mode |= bus->get_iommu_class();
>> +    }
>> +
>
> If you change the default return as '0' for buses. This code will work.
> e.g. PCI will return '0' - when no device is probed. FSL MC will return VA. the default mode will be 'VA'
>
I'm confused why it won't work for fslmc case?

Let me walk through the code:

If no-pci device Or (future) no-platform device probed then bus opt 
to use default mapping scheme .. which is iova_pa(default scheme).

Lets take PCI_bus example:
bus->get_iommu_class()
	---> bus->_pci_get_iommu_class()
		* Now consider that no interface bound to any of PCI device, then
		  it will return RTE_IOVA_PA mode to rte_bus layer (aka bus->get_iommu_class).
		  So the iova mapping result from iommu_class scan is RTE_IOVA_PA (default).
		  It works for PCI_bus case, tested for both iova_va and iova_pa case, no-pci device case.

Now in fslmc bus case:
bus->get_iommu_class()
	---> bus->_fslmc_get_iommu_class()
		
		* IIUC your comment - You want fslmc bus to return RTE_IOVA_VA if no device
		  detected, Right?
		  if so then your fslmc bus handle should do something like below
			-- If no device on fslmc bus : return RTE_IOVA_VA.
			-- If device detected on fslmc bus and bound to iommu driver : return RTE_IOVA_VA
			-- If device detected fslmc but not bound to iommu drv : return RTE_IOVA_PA..

make sense? If not then can you describe fslmc mapping scheme? 

> if fslmc is not present. The default mode will be PA.
>
>> +    if (mode != RTE_IOVA_VA) {
>> +        /* Use default IOVA mode */
>> +        mode = RTE_IOVA_PA;
>> +    }
>> +    return mode;
>> +}
>> diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
>> index 8b6ecebd6..bdf2e7c3a 100644
>> --- a/lib/librte_eal/common/eal_common_pci.c
>> +++ b/lib/librte_eal/common/eal_common_pci.c
>> @@ -552,6 +552,7 @@ struct rte_pci_bus rte_pci_bus = {
>>          .plug = pci_plug,
>>          .unplug = pci_unplug,
>>          .parse = pci_parse,
>> +        .get_iommu_class = rte_pci_get_iommu_class,
>>      },
>>      .device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
>>      .driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
>> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
>> index 7a0cfb165..8b2805b7f 100644
>> --- a/lib/librte_eal/common/include/rte_bus.h
>> +++ b/lib/librte_eal/common/include/rte_bus.h
>> @@ -181,6 +181,17 @@ struct rte_bus_conf {
>>      enum rte_bus_scan_mode scan_mode; /**< Scan policy. */
>>  };
>>
>> +
>> +/**
>> + * Get iommu class of devices on the bus.
>> + * Check that those devices are attached to iommu driver.
>> + *
>> + * @return
>> + *      enum rte_iova_mode value.
>> + */
>> +typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
>> +
>> +
>>  /**
>>   * A structure describing a generic bus.
>>   */
>> @@ -194,6 +205,7 @@ struct rte_bus {
>>      rte_bus_unplug_t unplug;     /**< Remove single device from driver */
>>      rte_bus_parse_t parse;       /**< Parse a device name */
>>      struct rte_bus_conf conf;    /**< Bus configuration */
>> +    rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>>  };
>>
>>  /**
>> @@ -293,6 +305,16 @@ struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
>>   */
>>  struct rte_bus *rte_bus_find_by_name(const char *busname);
>>
>> +
>> +/**
>> + * Get iommu class of devices on the bus.
>> + * Check that those devices are attached to iommu driver.
>> + *
>> + * @return
>> + *     enum rte_iova_mode value.
>> + */
>> +enum rte_iova_mode rte_bus_get_iommu_class(void);
>> +
>>  /**
>>   * Helper for Bus registration.
>>   * The constructor has higher priority than PMD constructors.
>> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>> index 044f89c7c..186c7b0fd 100644
>> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>> @@ -207,6 +207,7 @@ DPDK_17.08 {
>>      rte_bus_find_by_name;
>>      rte_pci_match;
>>      rte_pci_get_iommu_class;
>> +    rte_bus_get_iommu_class;
>>
>>  } DPDK_17.05;
>>
>>
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 04/11] linuxapp/eal_pci: get iommu class
  2017-07-14  8:06           ` Hemant Agrawal
@ 2017-07-14  8:46             ` santosh
  2017-07-14  9:13               ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-14  8:46 UTC (permalink / raw)
  To: Hemant Agrawal, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On Friday 14 July 2017 01:36 PM, Hemant Agrawal wrote:

> On 7/14/2017 1:25 PM, santosh wrote:
>> On Friday 14 July 2017 01:09 PM, Hemant Agrawal wrote:
>>
>>> On 7/11/2017 11:46 AM, Santosh Shukla wrote:
>>>> Get iommu class of PCI device on the bus and returns preferred iova
>>>> mapping mode for that bus.
>>>>
>>>> Algorithm for iova scheme selection for PCI bus:
>>>> 0. Look for device attached to vfio kdrv and has .drv_flag set
>>>> to RTE_PCI_DRV_NEED_IOVA_VA.
>>>> 1. Look for any device attached to UIO class of driver.
>>>> 2. Check for vfio-noiommu mode enabled.
>>>>
>>>> If 1) & 2) is false and 0) is true then select
>>>> mapping scheme as iova=va. Otherwise use default
>>>> mapping scheme (iova_pa).
>>>>
>>>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>> ---
>>>> v1 --> v2:
>>>> - Removed Linux version check in vfio_noiommu func. Refer [1].
>>>> - Extending autodetction logic for _iommu_class.
>>>> Refer [2].
>>>>
>>>> [1] https://www.mail-archive.com/dev@dpdk.org/msg70108.html
>>>> [2] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
>>>>
>>>>  lib/librte_eal/linuxapp/eal/eal_pci.c           | 66 +++++++++++++++++++++++++
>>>>  lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++++
>>>>  lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>>>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>>>  4 files changed, 90 insertions(+)
>>>>
>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
>>>> index 7d9e1a99b..573caa000 100644
>>>> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
>>>> @@ -45,6 +45,7 @@
>>>>  #include "eal_filesystem.h"
>>>>  #include "eal_private.h"
>>>>  #include "eal_pci_init.h"
>>>> +#include "eal_vfio.h"
>>>>
>>>>  /**
>>>>   * @file
>>>> @@ -488,6 +489,71 @@ rte_pci_scan(void)
>>>>      return -1;
>>>>  }
>>>>
>>>> +/*
>>>> + * Any one of the device bound to uio
>>>> + */
>>>> +static inline int
>>>> +pci_device_bound_uio(void)
>>>> +{
>>>> +    struct rte_pci_device *dev = NULL;
>>>> +
>>>> +    FOREACH_DEVICE_ON_PCIBUS(dev) {
>>>> +        if (dev->kdrv == RTE_KDRV_IGB_UIO ||
>>>> +           dev->kdrv == RTE_KDRV_UIO_GENERIC) {
>>>> +            return 1;
>>>> +        }
>>>> +    }
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Any one of the device has iova as va
>>>> + */
>>>> +static inline int
>>>> +pci_device_has_iova_va(void)
>>>> +{
>>>> +    struct rte_pci_device *dev = NULL;
>>>> +    struct rte_pci_driver *drv = NULL;
>>>> +
>>>> +    FOREACH_DRIVER_ON_PCIBUS(drv) {
>>>> +        if (drv && drv->drv_flags & RTE_PCI_DRV_NEED_IOVA_VA) {
>>>> +            FOREACH_DEVICE_ON_PCIBUS(dev) {
>>>> +                if (dev->kdrv == RTE_KDRV_VFIO &&
>>>> +                    rte_pci_match(drv, dev))
>>>> +                    return 1;
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Get iommu class of PCI devices on the bus.
>>>> + */
>>>> +enum rte_iova_mode
>>>> +rte_pci_get_iommu_class(void)
>>>> +{
>>>> +    bool is_vfio_noiommu_enabled;
>>>> +    bool has_iova_va;
>>>> +    bool is_bound_uio;
>>>> +
>>>> +    has_iova_va = pci_device_has_iova_va();
>>>> +    is_bound_uio = pci_device_bound_uio();
>>>> +    is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
>>>> +
>>>> +    if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
>>>> +        return RTE_IOVA_VA;
>>>> +
>>>
>>> PCI is generally present in all platform including dpaa2.
>>> There may not be any device found or available for dpdk usages in such cases. The PCI bus will still return RTE_IOVA_PA, which will make the system mode as PA.
>>>
>> That's the expected behavior. And implementation makes sure
>> that PCI_bus return default mode aka _PA if no-pci device found.
>>
>> Isn't code taking care of same?
>>
>
> I have attached a PCI device to the board. But it is being managed by kernel only.
>
> EAL: PCI device 0000:01:00.0 on NUMA socket 0
> EAL:   probe driver: 8086:10d3 net_e1000_em
> EAL:   Not managed by a supported kernel driver, skipped
>
> So, there are devices in the PCI list. But none of them is probed or being used by dpdk.
>
>
Therefore _pci_get_iommu_class scan result would be _PA, As no device bound to dpdk.

>> Let me walk through the code:
>>
>> has_iova_va = 0 (if no pci device then pci_device_has_iov_va() will return 0).
>>
>> And if (has_iova_va & ,,,) will fail therefore rte_pci_get_iommu_class() retuns RTE_IOVA_PA mode.
>> which is default mode. Right?
>>
> This will create issue for the 2nd bus, which is a VA bus. The combined mode will becomes '3', so the system mode will be PA.
>
Yes, If both modes detected at two different bus 
then policy is to use default iova mapping mode across the buses(which is _pa).

Are you operating on two different mode like _pa for PCI-bus and _va for fslmc bus in dpaa2? 

>>>> +    if (has_iova_va) {
>>>> +        if (is_vfio_noiommu_enabled)
>>>> +            RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
>>>> +        if (is_bound_uio)
>>>> +            RTE_LOG(WARNING, EAL, "Some device attached to UIO\n");
>>>> +    }
>>>> +
>>>> +    return RTE_IOVA_PA;
>>>> +}
>>>> +
>>>>  /* Read PCI config space. */
>>>>  int rte_pci_read_config(const struct rte_pci_device *device,
>>>>          void *buf, size_t len, off_t offset)
>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>> index 946df7e31..c8a97b7e7 100644
>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>> @@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
>>>>      return 0;
>>>>  }
>>>>
>>>> +int
>>>> +vfio_noiommu_is_enabled(void)
>>>> +{
>>>> +    int fd, ret, cnt __rte_unused;
>>>> +    char c;
>>>> +
>>>> +    ret = -1;
>>>> +    fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
>>>> +    if (fd < 0)
>>>> +        return -1;
>>>> +
>>>> +    cnt = read(fd, &c, 1);
>>>> +    if (c == 'Y')
>>>> +        ret = 1;
>>>> +
>>>> +    close(fd);
>>>> +    return ret;
>>>> +}
>>>> +
>>>>  #endif
>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
>>>> index 5ff63e5d7..26ea8e119 100644
>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
>>>> @@ -150,6 +150,8 @@ struct vfio_config {
>>>>  #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
>>>>  #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
>>>>  #define VFIO_GET_REGION_IDX(x) (x >> 40)
>>>> +#define VFIO_NOIOMMU_MODE      \
>>>> +    "/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
>>>>
>>>>  /* DMA mapping function prototype.
>>>>   * Takes VFIO container fd as a parameter.
>>>> @@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
>>>>
>>>>  int vfio_mp_sync_setup(void);
>>>>
>>>> +int vfio_noiommu_is_enabled(void);
>>>> +
>>>>  #define SOCKET_REQ_CONTAINER 0x100
>>>>  #define SOCKET_REQ_GROUP 0x200
>>>>  #define SOCKET_CLR_GROUP 0x300
>>>> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>>> index c91dd44c4..044f89c7c 100644
>>>> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>>> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>>> @@ -206,6 +206,7 @@ DPDK_17.08 {
>>>>      rte_bus_find_by_device;
>>>>      rte_bus_find_by_name;
>>>>      rte_pci_match;
>>>> +    rte_pci_get_iommu_class;
>>>>
>>>>  } DPDK_17.05;
>>>>
>>>>
>>>
>>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 07/11] linuxapp/eal: auto detect iova mode
  2017-07-13 18:25         ` santosh
@ 2017-07-14  8:49           ` Hemant Agrawal
  2017-07-14  9:21             ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Hemant Agrawal @ 2017-07-14  8:49 UTC (permalink / raw)
  To: santosh, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On 7/13/2017 11:55 PM, santosh wrote:
> On Thursday 13 July 2017 04:59 PM, Hemant Agrawal wrote:
>
>> On 7/11/2017 11:46 AM, Santosh Shukla wrote:
>>> - Moving late bus scanning to up..just after eal_parsing.
>>> - Auto detect iova mapping mode, based on the result of
>>>   rte_bus_scan_iommu_class.
>>>
>>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>> ---
>>>  lib/librte_eal/linuxapp/eal/eal.c | 16 ++++++++++------
>>>  1 file changed, 10 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
>>> index 2546b55e4..7b4dd70de 100644
>>> --- a/lib/librte_eal/linuxapp/eal/eal.c
>>> +++ b/lib/librte_eal/linuxapp/eal/eal.c
>>> @@ -799,6 +799,16 @@ rte_eal_init(int argc, char **argv)
>>>          return -1;
>>>      }
>>>
>>> +    if (rte_bus_scan()) {
>>> +        rte_eal_init_alert("Cannot scan the buses for devices\n");
>>> +        rte_errno = ENODEV;
>>> +        return -1;
>>> +    }
>>> +
>>
>> The original place of the bus scan was with the following factors:
>> 1. The bus scan requires the VFIO to be enabled atleast in dpaa2 case.
>> (VFIO code still need cleanup to be support non-pci cleanly). I tried moving it before bus_scan, this helped in bus scanning.
>>
> bus_scan should do scanning, device enumeration, detecting devices and
> interface that device bound to, that interface could be VFIO, UIO, UIO_GENERIC etc..
>
> PCI bus scanning (in eal/) strictly comply to what I mentioned above, thus
> aut-detection works gracefully.
>
> However fslmc_bus 'scan' doesn't do device scanning, instead It call vfio dependent
> code which ideally should fall in 'resource mapping' category,. ideally should
> happen at bus probe time.
>
> Example:
> rte_fslmc_bus_scan()
> 	--> fslmc_vfio_setup_group
> 	--> fslmc_vfio_process_group
>
> So it is doing _setup_ inside scan ops, which in PCI(/vfio-pci) case happens
> at probe time (`vfio_setup_device`).
>
> In order to benefit iova auto-detection infrastructure: fslmc bus should
> do to two things:
>
> 0) fslmc bus scan should look at /sys/bus/platform/drivers/vfio-platform/*
> and find out that devices bind to vfio-platform or not, if yes then update kdrv
> entry mentioning interface type example VFIO. That-way flsmc bus gets capability to
> inform rte_bus about IOMMU capable interface. Right now, existing implementation
> don't have means to inform rte_bus about his devices like pci_bus has!.
>
vfio_fsl_mc is bit different from pci, we first get the resource 
container and then look for resources as children.

In any case, the reworking of the bus is pending since the support for 
many other features are being extended  for non-pci buses as well in 
dpdk e.g. devargs.
It is in my priority list to clean it up for next release.


> 1) defer the vfio_seup from scan to bus->probe().

This is a good suggestion. This can solve the initialization issue.

>
>
>
>> 2. During SCAN, the bus may allocate memory to devices or for it's own usages.  rte_malloc or mempool is required in cases to support multi-process environment. (e.g. dpaa2 create dpbp or dpio device memory using the rte_malloc call).
>>
> If bus scanning adheres to device detection or enumeration then rte_malloc/mempool
>
> not required, Example eal/pci bus scanning.
>
>
> And in fslmc bus case: if vfio_setup deferred at bus->probe time then
> bus->scan won't have memory dependency.
>
>> Since none of the other rte library (mempool, memzone, tailq) is available at this point, it will create significant restriction on the bus scan.
>>
>> We will prefer if you can re-introduce the "iova_mode" and allow the application choose, which mode it want to run.
>>
>> This auto-detect logic may not work for many buses and it is going
>> to create serious restrictions on the bus_scan code.
>>
> fslmc is only bus besides PCI. Auto-detection works gracefully for PCI-bus.
> Can you give a try to said proposal?
>
> Ideally vfio-platform code should sit into eal/vfio like eal/vfio-pci is.
> Otherwise it will keep creating problems for new generic framework like we're
> discussing one.
>
> if said proposal doesn't work for you then I will re-introduce iova-mode as
> eal arg, that will override iova mapping mode. But IMHO, eal arg should be
> intermediate solution. Once vfio-platform code properly re-factored and merged,
> We should remove those eal iova-mode args.

Thanks for digging into the fslmc code.  As I said, this is now my 
priority item to get the fslmc bus code refactored. We will target 
immediately after 17.08 validation.

However till then, we can only support the iova-mode.

>
>>> +    /* autodetect the iova mapping mode (default is iova_pa) */
>>> +    if (rte_bus_get_iommu_class() == RTE_IOVA_VA)
>>> +        rte_eal_get_configuration()->iova_mode = RTE_IOVA_VA;
>>> +
>>>      if (internal_config.no_hugetlbfs == 0 &&
>>>              internal_config.process_type != RTE_PROC_SECONDARY &&
>>>              internal_config.xen_dom0_support == 0 &&
>>> @@ -896,12 +906,6 @@ rte_eal_init(int argc, char **argv)
>>>          return -1;
>>>      }
>>>
>>> -    if (rte_bus_scan()) {
>>> -        rte_eal_init_alert("Cannot scan the buses for devices\n");
>>> -        rte_errno = ENODEV;
>>> -        return -1;
>>> -    }
>>> -
>>>      RTE_LCORE_FOREACH_SLAVE(i) {
>>>
>>>          /*
>>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 04/11] linuxapp/eal_pci: get iommu class
  2017-07-14  8:46             ` santosh
@ 2017-07-14  9:13               ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-07-14  9:13 UTC (permalink / raw)
  To: Hemant Agrawal, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On Friday 14 July 2017 02:16 PM, santosh wrote:

> On Friday 14 July 2017 01:36 PM, Hemant Agrawal wrote:
>
>> On 7/14/2017 1:25 PM, santosh wrote:
>>> On Friday 14 July 2017 01:09 PM, Hemant Agrawal wrote:
>>>
>>>> On 7/11/2017 11:46 AM, Santosh Shukla wrote:
>>>>> Get iommu class of PCI device on the bus and returns preferred iova
>>>>> mapping mode for that bus.
>>>>>
>>>>> Algorithm for iova scheme selection for PCI bus:
>>>>> 0. Look for device attached to vfio kdrv and has .drv_flag set
>>>>> to RTE_PCI_DRV_NEED_IOVA_VA.
>>>>> 1. Look for any device attached to UIO class of driver.
>>>>> 2. Check for vfio-noiommu mode enabled.
>>>>>
>>>>> If 1) & 2) is false and 0) is true then select
>>>>> mapping scheme as iova=va. Otherwise use default
>>>>> mapping scheme (iova_pa).
>>>>>
>>>>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>>> ---
>>>>> v1 --> v2:
>>>>> - Removed Linux version check in vfio_noiommu func. Refer [1].
>>>>> - Extending autodetction logic for _iommu_class.
>>>>> Refer [2].
>>>>>
>>>>> [1] https://www.mail-archive.com/dev@dpdk.org/msg70108.html
>>>>> [2] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
>>>>>
>>>>>  lib/librte_eal/linuxapp/eal/eal_pci.c           | 66 +++++++++++++++++++++++++
>>>>>  lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++++
>>>>>  lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>>>>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>>>>  4 files changed, 90 insertions(+)
>>>>>
>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
>>>>> index 7d9e1a99b..573caa000 100644
>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
>>>>> @@ -45,6 +45,7 @@
>>>>>  #include "eal_filesystem.h"
>>>>>  #include "eal_private.h"
>>>>>  #include "eal_pci_init.h"
>>>>> +#include "eal_vfio.h"
>>>>>
>>>>>  /**
>>>>>   * @file
>>>>> @@ -488,6 +489,71 @@ rte_pci_scan(void)
>>>>>      return -1;
>>>>>  }
>>>>>
>>>>> +/*
>>>>> + * Any one of the device bound to uio
>>>>> + */
>>>>> +static inline int
>>>>> +pci_device_bound_uio(void)
>>>>> +{
>>>>> +    struct rte_pci_device *dev = NULL;
>>>>> +
>>>>> +    FOREACH_DEVICE_ON_PCIBUS(dev) {
>>>>> +        if (dev->kdrv == RTE_KDRV_IGB_UIO ||
>>>>> +           dev->kdrv == RTE_KDRV_UIO_GENERIC) {
>>>>> +            return 1;
>>>>> +        }
>>>>> +    }
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * Any one of the device has iova as va
>>>>> + */
>>>>> +static inline int
>>>>> +pci_device_has_iova_va(void)
>>>>> +{
>>>>> +    struct rte_pci_device *dev = NULL;
>>>>> +    struct rte_pci_driver *drv = NULL;
>>>>> +
>>>>> +    FOREACH_DRIVER_ON_PCIBUS(drv) {
>>>>> +        if (drv && drv->drv_flags & RTE_PCI_DRV_NEED_IOVA_VA) {
>>>>> +            FOREACH_DEVICE_ON_PCIBUS(dev) {
>>>>> +                if (dev->kdrv == RTE_KDRV_VFIO &&
>>>>> +                    rte_pci_match(drv, dev))
>>>>> +                    return 1;
>>>>> +            }
>>>>> +        }
>>>>> +    }
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * Get iommu class of PCI devices on the bus.
>>>>> + */
>>>>> +enum rte_iova_mode
>>>>> +rte_pci_get_iommu_class(void)
>>>>> +{
>>>>> +    bool is_vfio_noiommu_enabled;
>>>>> +    bool has_iova_va;
>>>>> +    bool is_bound_uio;
>>>>> +
>>>>> +    has_iova_va = pci_device_has_iova_va();
>>>>> +    is_bound_uio = pci_device_bound_uio();
>>>>> +    is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
>>>>> +
>>>>> +    if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
>>>>> +        return RTE_IOVA_VA;
>>>>> +
>>>> PCI is generally present in all platform including dpaa2.
>>>> There may not be any device found or available for dpdk usages in such cases. The PCI bus will still return RTE_IOVA_PA, which will make the system mode as PA.
>>>>
>>> That's the expected behavior. And implementation makes sure
>>> that PCI_bus return default mode aka _PA if no-pci device found.
>>>
>>> Isn't code taking care of same?
>>>
>> I have attached a PCI device to the board. But it is being managed by kernel only.
>>
>> EAL: PCI device 0000:01:00.0 on NUMA socket 0
>> EAL:   probe driver: 8086:10d3 net_e1000_em
>> EAL:   Not managed by a supported kernel driver, skipped
>>
>> So, there are devices in the PCI list. But none of them is probed or being used by dpdk.
>>
>>
> Therefore _pci_get_iommu_class scan result would be _PA, As no device bound to dpdk.
>
>>> Let me walk through the code:
>>>
>>> has_iova_va = 0 (if no pci device then pci_device_has_iov_va() will return 0).
>>>
>>> And if (has_iova_va & ,,,) will fail therefore rte_pci_get_iommu_class() retuns RTE_IOVA_PA mode.
>>> which is default mode. Right?
>>>
>> This will create issue for the 2nd bus, which is a VA bus. The combined mode will becomes '3', so the system mode will be PA.
>>
> Yes, If both modes detected at two different bus 
> then policy is to use default iova mapping mode across the buses(which is _pa).
>
> Are you operating on two different mode like _pa for PCI-bus and _va for fslmc bus in dpaa2? 

Is vfio kernel infrastructure for dpaa2 allows case like below:
0) Use PCI- vfio(/iommu) mode and map vfio.dma_map to RTE_IOVA_PA
AND
1) Use platform/fslmc vfio-platform mode and map vfio.dma_map to RTE_IOVA_VA?

Does dpaa2 supports?

(Speculating) Lets say if dpaa2 platform supports above case 
 then will you see any issue if both buses using default iova_mapping (_pa),
like dpdk pci has currently?

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 07/11] linuxapp/eal: auto detect iova mode
  2017-07-14  8:49           ` Hemant Agrawal
@ 2017-07-14  9:21             ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-07-14  9:21 UTC (permalink / raw)
  To: Hemant Agrawal, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On Friday 14 July 2017 02:19 PM, Hemant Agrawal wrote:

> On 7/13/2017 11:55 PM, santosh wrote:
>> On Thursday 13 July 2017 04:59 PM, Hemant Agrawal wrote:
>>
>>> On 7/11/2017 11:46 AM, Santosh Shukla wrote:
>>>> - Moving late bus scanning to up..just after eal_parsing.
>>>> - Auto detect iova mapping mode, based on the result of
>>>>   rte_bus_scan_iommu_class.
>>>>
>>>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>> ---
>>>>  lib/librte_eal/linuxapp/eal/eal.c | 16 ++++++++++------
>>>>  1 file changed, 10 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
>>>> index 2546b55e4..7b4dd70de 100644
>>>> --- a/lib/librte_eal/linuxapp/eal/eal.c
>>>> +++ b/lib/librte_eal/linuxapp/eal/eal.c
>>>> @@ -799,6 +799,16 @@ rte_eal_init(int argc, char **argv)
>>>>          return -1;
>>>>      }
>>>>
>>>> +    if (rte_bus_scan()) {
>>>> +        rte_eal_init_alert("Cannot scan the buses for devices\n");
>>>> +        rte_errno = ENODEV;
>>>> +        return -1;
>>>> +    }
>>>> +
>>>
>>> The original place of the bus scan was with the following factors:
>>> 1. The bus scan requires the VFIO to be enabled atleast in dpaa2 case.
>>> (VFIO code still need cleanup to be support non-pci cleanly). I tried moving it before bus_scan, this helped in bus scanning.
>>>
>> bus_scan should do scanning, device enumeration, detecting devices and
>> interface that device bound to, that interface could be VFIO, UIO, UIO_GENERIC etc..
>>
>> PCI bus scanning (in eal/) strictly comply to what I mentioned above, thus
>> aut-detection works gracefully.
>>
>> However fslmc_bus 'scan' doesn't do device scanning, instead It call vfio dependent
>> code which ideally should fall in 'resource mapping' category,. ideally should
>> happen at bus probe time.
>>
>> Example:
>> rte_fslmc_bus_scan()
>>     --> fslmc_vfio_setup_group
>>     --> fslmc_vfio_process_group
>>
>> So it is doing _setup_ inside scan ops, which in PCI(/vfio-pci) case happens
>> at probe time (`vfio_setup_device`).
>>
>> In order to benefit iova auto-detection infrastructure: fslmc bus should
>> do to two things:
>>
>> 0) fslmc bus scan should look at /sys/bus/platform/drivers/vfio-platform/*
>> and find out that devices bind to vfio-platform or not, if yes then update kdrv
>> entry mentioning interface type example VFIO. That-way flsmc bus gets capability to
>> inform rte_bus about IOMMU capable interface. Right now, existing implementation
>> don't have means to inform rte_bus about his devices like pci_bus has!.
>>
> vfio_fsl_mc is bit different from pci, we first get the resource container and then look for resources as children.
>
> In any case, the reworking of the bus is pending since the support for many other features are being extended  for non-pci buses as well in dpdk e.g. devargs.
> It is in my priority list to clean it up for next release.
>
>
>> 1) defer the vfio_seup from scan to bus->probe().
>
> This is a good suggestion. This can solve the initialization issue.
>
>>
>>
>>
>>> 2. During SCAN, the bus may allocate memory to devices or for it's own usages.  rte_malloc or mempool is required in cases to support multi-process environment. (e.g. dpaa2 create dpbp or dpio device memory using the rte_malloc call).
>>>
>> If bus scanning adheres to device detection or enumeration then rte_malloc/mempool
>>
>> not required, Example eal/pci bus scanning.
>>
>>
>> And in fslmc bus case: if vfio_setup deferred at bus->probe time then
>> bus->scan won't have memory dependency.
>>
>>> Since none of the other rte library (mempool, memzone, tailq) is available at this point, it will create significant restriction on the bus scan.
>>>
>>> We will prefer if you can re-introduce the "iova_mode" and allow the application choose, which mode it want to run.
>>>
>>> This auto-detect logic may not work for many buses and it is going
>>> to create serious restrictions on the bus_scan code.
>>>
>> fslmc is only bus besides PCI. Auto-detection works gracefully for PCI-bus.
>> Can you give a try to said proposal?
>>
>> Ideally vfio-platform code should sit into eal/vfio like eal/vfio-pci is.
>> Otherwise it will keep creating problems for new generic framework like we're
>> discussing one.
>>
>> if said proposal doesn't work for you then I will re-introduce iova-mode as
>> eal arg, that will override iova mapping mode. But IMHO, eal arg should be
>> intermediate solution. Once vfio-platform code properly re-factored and merged,
>> We should remove those eal iova-mode args.
>
> Thanks for digging into the fslmc code.  As I said, this is now my priority item to get the fslmc bus code refactored. We will target immediately after 17.08 validation.
>
> However till then, we can only support the iova-mode.
>
Can you please try out said changes in your fslmc bus? and If it works for you then
we don't need to re-introduce iova-mode eal arg in future revision..

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 05/11] bus: get iommu class
  2017-07-14  8:30         ` santosh
@ 2017-07-14  9:39           ` Hemant Agrawal
  2017-07-14 10:22             ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Hemant Agrawal @ 2017-07-14  9:39 UTC (permalink / raw)
  To: santosh, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On 7/14/2017 2:00 PM, santosh wrote:
> On Friday 14 July 2017 01:37 PM, Hemant Agrawal wrote:
>
>> On 7/11/2017 11:46 AM, Santosh Shukla wrote:
>>> API(rte_bus_get_iommu_class) helps to automatically detect and select
>>> appropriate iova mapping scheme for iommu capable device on that bus.
>>>
>>> Algorithm for iova scheme selection for bus:
>>> 0. Iterate through bus_list.
>>> 1. Collect each bus iova mode value and update into 'mode' var.
>>> 2. Here value '1' is _pa and value '2' is _va mode.
>>> So mode selection scheme is like:
>>> if mode == 2 then iova mode is _va.
>>> if mode == 1 then iova mode is _pa
>>> if mode  == 3 then iova mode ia _pa.
>>>
>>> So mode !=2  will be default iova mode.
>>>
>>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>> ---
>>>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
>>>  lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
>>>  lib/librte_eal/common/eal_common_pci.c          |  1 +
>>>  lib/librte_eal/common/include/rte_bus.h         | 22 ++++++++++++++++++++++
>>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>>  5 files changed, 48 insertions(+)
>>>
>>> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>> index 33c2c32c0..a2dd65a33 100644
>>> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>> @@ -202,6 +202,7 @@ DPDK_17.08 {
>>>      rte_bus_find_by_name;
>>>      rte_pci_match;
>>>      rte_pci_get_iommu_class;
>>> +    rte_bus_get_iommu_class;
>>>
>>>  } DPDK_17.05;
>>>
>>> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
>>> index 08bec2d93..5d5753ac9 100644
>>> --- a/lib/librte_eal/common/eal_common_bus.c
>>> +++ b/lib/librte_eal/common/eal_common_bus.c
>>> @@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
>>>          c[0] = '\0';
>>>      return rte_bus_find(NULL, bus_can_parse, name);
>>>  }
>>> +
>>> +
>>> +/*
>>> + * Get iommu class of devices on the bus.
>>> + */
>>> +enum rte_iova_mode
>>> +rte_bus_get_iommu_class(void)
>>> +{
>>> +    int mode = 0;
>>> +    struct rte_bus *bus;
>>> +
>>> +    TAILQ_FOREACH(bus, &rte_bus_list, next) {
>>> +
>>> +        if (bus->get_iommu_class)
>>> +            mode |= bus->get_iommu_class();
>>> +    }
>>> +
>>
>> If you change the default return as '0' for buses. This code will work.
>> e.g. PCI will return '0' - when no device is probed. FSL MC will return VA. the default mode will be 'VA'
>>
> I'm confused why it won't work for fslmc case?
>
> Let me walk through the code:
>
> If no-pci device Or (future) no-platform device probed then bus opt
> to use default mapping scheme .. which is iova_pa(default scheme).
>
> Lets take PCI_bus example:
> bus->get_iommu_class()
> 	---> bus->_pci_get_iommu_class()
> 		* Now consider that no interface bound to any of PCI device, then
> 		  it will return RTE_IOVA_PA mode to rte_bus layer (aka bus->get_iommu_class).
> 		  So the iova mapping result from iommu_class scan is RTE_IOVA_PA (default).
> 		  It works for PCI_bus case, tested for both iova_va and iova_pa case, no-pci device case.
>
> Now in fslmc bus case:
> bus->get_iommu_class()
> 	---> bus->_fslmc_get_iommu_class()
> 		
> 		* IIUC your comment - You want fslmc bus to return RTE_IOVA_VA if no device
> 		  detected, Right?
why?

If bus is just present but no device is in use for dpdk, then bus should 
return 0 and it *should not* participate in the IOMMU class decision.

Right now there are only two buses. There can be more buses. (e.g. PCI, 
platform, fslmc in case of dpaa2 as well).

If the bus is not being used at all, why it influence the decision of 
other buses.

if no bus has any device, the System default is anyway PA.


> 		  if so then your fslmc bus handle should do something like below
> 			-- If no device on fslmc bus : return RTE_IOVA_VA.
> 			-- If device detected on fslmc bus and bound to iommu driver : return RTE_IOVA_VA
> 			-- If device detected fslmc but not bound to iommu drv : return RTE_IOVA_PA..
>
> make sense? If not then can you describe fslmc mapping scheme?
>
>> if fslmc is not present. The default mode will be PA.
>>
>>> +    if (mode != RTE_IOVA_VA) {
>>> +        /* Use default IOVA mode */
>>> +        mode = RTE_IOVA_PA;
>>> +    }

The system default is anyway PA.

>>> +    return mode;
>>> +}
>>> diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
>>> index 8b6ecebd6..bdf2e7c3a 100644
>>> --- a/lib/librte_eal/common/eal_common_pci.c
>>> +++ b/lib/librte_eal/common/eal_common_pci.c
>>> @@ -552,6 +552,7 @@ struct rte_pci_bus rte_pci_bus = {
>>>          .plug = pci_plug,
>>>          .unplug = pci_unplug,
>>>          .parse = pci_parse,
>>> +        .get_iommu_class = rte_pci_get_iommu_class,
>>>      },
>>>      .device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
>>>      .driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
>>> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
>>> index 7a0cfb165..8b2805b7f 100644
>>> --- a/lib/librte_eal/common/include/rte_bus.h
>>> +++ b/lib/librte_eal/common/include/rte_bus.h
>>> @@ -181,6 +181,17 @@ struct rte_bus_conf {
>>>      enum rte_bus_scan_mode scan_mode; /**< Scan policy. */
>>>  };
>>>
>>> +
>>> +/**
>>> + * Get iommu class of devices on the bus.
>>> + * Check that those devices are attached to iommu driver.
>>> + *
>>> + * @return
>>> + *      enum rte_iova_mode value.
>>> + */
>>> +typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
>>> +
>>> +
>>>  /**
>>>   * A structure describing a generic bus.
>>>   */
>>> @@ -194,6 +205,7 @@ struct rte_bus {
>>>      rte_bus_unplug_t unplug;     /**< Remove single device from driver */
>>>      rte_bus_parse_t parse;       /**< Parse a device name */
>>>      struct rte_bus_conf conf;    /**< Bus configuration */
>>> +    rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>>>  };
>>>
>>>  /**
>>> @@ -293,6 +305,16 @@ struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
>>>   */
>>>  struct rte_bus *rte_bus_find_by_name(const char *busname);
>>>
>>> +
>>> +/**
>>> + * Get iommu class of devices on the bus.
>>> + * Check that those devices are attached to iommu driver.
>>> + *
>>> + * @return
>>> + *     enum rte_iova_mode value.
>>> + */
>>> +enum rte_iova_mode rte_bus_get_iommu_class(void);
>>> +
>>>  /**
>>>   * Helper for Bus registration.
>>>   * The constructor has higher priority than PMD constructors.
>>> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>> index 044f89c7c..186c7b0fd 100644
>>> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>> @@ -207,6 +207,7 @@ DPDK_17.08 {
>>>      rte_bus_find_by_name;
>>>      rte_pci_match;
>>>      rte_pci_get_iommu_class;
>>> +    rte_bus_get_iommu_class;
>>>
>>>  } DPDK_17.05;
>>>
>>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 05/11] bus: get iommu class
  2017-07-14  9:39           ` Hemant Agrawal
@ 2017-07-14 10:22             ` santosh
  2017-07-14 10:29               ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-14 10:22 UTC (permalink / raw)
  To: Hemant Agrawal, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On Friday 14 July 2017 03:09 PM, Hemant Agrawal wrote:

> On 7/14/2017 2:00 PM, santosh wrote:
>> On Friday 14 July 2017 01:37 PM, Hemant Agrawal wrote:
>>
>>> On 7/11/2017 11:46 AM, Santosh Shukla wrote:
>>>> API(rte_bus_get_iommu_class) helps to automatically detect and select
>>>> appropriate iova mapping scheme for iommu capable device on that bus.
>>>>
>>>> Algorithm for iova scheme selection for bus:
>>>> 0. Iterate through bus_list.
>>>> 1. Collect each bus iova mode value and update into 'mode' var.
>>>> 2. Here value '1' is _pa and value '2' is _va mode.
>>>> So mode selection scheme is like:
>>>> if mode == 2 then iova mode is _va.
>>>> if mode == 1 then iova mode is _pa
>>>> if mode  == 3 then iova mode ia _pa.
>>>>
>>>> So mode !=2  will be default iova mode.
>>>>
>>>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>> ---
>>>>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
>>>>  lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
>>>>  lib/librte_eal/common/eal_common_pci.c          |  1 +
>>>>  lib/librte_eal/common/include/rte_bus.h         | 22 ++++++++++++++++++++++
>>>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>>>  5 files changed, 48 insertions(+)
>>>>
>>>> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>> index 33c2c32c0..a2dd65a33 100644
>>>> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>> @@ -202,6 +202,7 @@ DPDK_17.08 {
>>>>      rte_bus_find_by_name;
>>>>      rte_pci_match;
>>>>      rte_pci_get_iommu_class;
>>>> +    rte_bus_get_iommu_class;
>>>>
>>>>  } DPDK_17.05;
>>>>
>>>> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
>>>> index 08bec2d93..5d5753ac9 100644
>>>> --- a/lib/librte_eal/common/eal_common_bus.c
>>>> +++ b/lib/librte_eal/common/eal_common_bus.c
>>>> @@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
>>>>          c[0] = '\0';
>>>>      return rte_bus_find(NULL, bus_can_parse, name);
>>>>  }
>>>> +
>>>> +
>>>> +/*
>>>> + * Get iommu class of devices on the bus.
>>>> + */
>>>> +enum rte_iova_mode
>>>> +rte_bus_get_iommu_class(void)
>>>> +{
>>>> +    int mode = 0;
>>>> +    struct rte_bus *bus;
>>>> +
>>>> +    TAILQ_FOREACH(bus, &rte_bus_list, next) {
>>>> +
>>>> +        if (bus->get_iommu_class)
>>>> +            mode |= bus->get_iommu_class();
>>>> +    }
>>>> +
>>>
>>> If you change the default return as '0' for buses. This code will work.
>>> e.g. PCI will return '0' - when no device is probed. FSL MC will return VA. the default mode will be 'VA'
>>>
>> I'm confused why it won't work for fslmc case?
>>
>> Let me walk through the code:
>>
>> If no-pci device Or (future) no-platform device probed then bus opt
>> to use default mapping scheme .. which is iova_pa(default scheme).
>>
>> Lets take PCI_bus example:
>> bus->get_iommu_class()
>>     ---> bus->_pci_get_iommu_class()
>>         * Now consider that no interface bound to any of PCI device, then
>>           it will return RTE_IOVA_PA mode to rte_bus layer (aka bus->get_iommu_class).
>>           So the iova mapping result from iommu_class scan is RTE_IOVA_PA (default).
>>           It works for PCI_bus case, tested for both iova_va and iova_pa case, no-pci device case.
>>
>> Now in fslmc bus case:
>> bus->get_iommu_class()
>>     ---> bus->_fslmc_get_iommu_class()
>>        
>>         * IIUC your comment - You want fslmc bus to return RTE_IOVA_VA if no device
>>           detected, Right?
> why?
>
As I didn't understood your previous reply:
`e.g. PCI will return '0' - when no device is probed. FSL MC will return VA. the default mode will be 'VA'`

So, I'm asking you that in fslmc bus case - if no device found then are you opting _va scheme or not?
Seems like _not_ per your below comment.
 

> If bus is just present but no device is in use for dpdk, then bus should return 0 and it *should not* participate in the IOMMU class decision.
>
I think, I understand your point..Example if you have no-pci on first PCI bus
but device found on 2nd platform bus then you don't want to fallback to default (/_pa) mode.. 
instead you want to use 2nd bus mode for mapping, which is _va. Right?

If so then In my first version - We did introduced the case called _DC.
_DC:0 --> stands for no-device found case.

> Right now there are only two buses. There can be more buses. (e.g. PCI, platform, fslmc in case of dpaa2 as well).
>
> If the bus is not being used at all, why it influence the decision of other buses.
>
If your referring to above case then I agree, We'll re-introduce _DC state from v1 in next revision.
That will look like
rte_pci_get_iommu_class() {
	int mode = RTE_IOVA_DC; /* '0' */

	return _DC; /* if no device found */
}

Right?

> if no bus has any device, the System default is anyway PA.
>
Right, If no bus present then It's also responsibility of `rte_bus_get_iommu_class`
to use default mapping scheme which is _pa and which It does.

>
>>           if so then your fslmc bus handle should do something like below
>>             -- If no device on fslmc bus : return RTE_IOVA_VA.
>>             -- If device detected on fslmc bus and bound to iommu driver : return RTE_IOVA_VA
>>             -- If device detected fslmc but not bound to iommu drv : return RTE_IOVA_PA..
>>
>> make sense? If not then can you describe fslmc mapping scheme?
>>
>>> if fslmc is not present. The default mode will be PA.
>>>
>>>> +    if (mode != RTE_IOVA_VA) {
>>>> +        /* Use default IOVA mode */
>>>> +        mode = RTE_IOVA_PA;
>>>> +    }
>
> The system default is anyway PA.
>
No, That check is needed for case like 1st bus return with _PA and 2nd bus returns with _VA,
then mode = 3 (Mix mode), which we don't support so (as I mentioned before) its responsibility of
rte_bus_get_iommu_class() to return default mode (_pa). That's why!.

>>>> +    return mode;
>>>> +}
>>>> diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
>>>> index 8b6ecebd6..bdf2e7c3a 100644
>>>> --- a/lib/librte_eal/common/eal_common_pci.c
>>>> +++ b/lib/librte_eal/common/eal_common_pci.c
>>>> @@ -552,6 +552,7 @@ struct rte_pci_bus rte_pci_bus = {
>>>>          .plug = pci_plug,
>>>>          .unplug = pci_unplug,
>>>>          .parse = pci_parse,
>>>> +        .get_iommu_class = rte_pci_get_iommu_class,
>>>>      },
>>>>      .device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
>>>>      .driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
>>>> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
>>>> index 7a0cfb165..8b2805b7f 100644
>>>> --- a/lib/librte_eal/common/include/rte_bus.h
>>>> +++ b/lib/librte_eal/common/include/rte_bus.h
>>>> @@ -181,6 +181,17 @@ struct rte_bus_conf {
>>>>      enum rte_bus_scan_mode scan_mode; /**< Scan policy. */
>>>>  };
>>>>
>>>> +
>>>> +/**
>>>> + * Get iommu class of devices on the bus.
>>>> + * Check that those devices are attached to iommu driver.
>>>> + *
>>>> + * @return
>>>> + *      enum rte_iova_mode value.
>>>> + */
>>>> +typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
>>>> +
>>>> +
>>>>  /**
>>>>   * A structure describing a generic bus.
>>>>   */
>>>> @@ -194,6 +205,7 @@ struct rte_bus {
>>>>      rte_bus_unplug_t unplug;     /**< Remove single device from driver */
>>>>      rte_bus_parse_t parse;       /**< Parse a device name */
>>>>      struct rte_bus_conf conf;    /**< Bus configuration */
>>>> +    rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>>>>  };
>>>>
>>>>  /**
>>>> @@ -293,6 +305,16 @@ struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
>>>>   */
>>>>  struct rte_bus *rte_bus_find_by_name(const char *busname);
>>>>
>>>> +
>>>> +/**
>>>> + * Get iommu class of devices on the bus.
>>>> + * Check that those devices are attached to iommu driver.
>>>> + *
>>>> + * @return
>>>> + *     enum rte_iova_mode value.
>>>> + */
>>>> +enum rte_iova_mode rte_bus_get_iommu_class(void);
>>>> +
>>>>  /**
>>>>   * Helper for Bus registration.
>>>>   * The constructor has higher priority than PMD constructors.
>>>> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>>> index 044f89c7c..186c7b0fd 100644
>>>> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>>> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>>> @@ -207,6 +207,7 @@ DPDK_17.08 {
>>>>      rte_bus_find_by_name;
>>>>      rte_pci_match;
>>>>      rte_pci_get_iommu_class;
>>>> +    rte_bus_get_iommu_class;
>>>>
>>>>  } DPDK_17.05;
>>>>
>>>>
>>>
>>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 05/11] bus: get iommu class
  2017-07-14 10:22             ` santosh
@ 2017-07-14 10:29               ` santosh
  2017-07-14 10:51                 ` Hemant Agrawal
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-14 10:29 UTC (permalink / raw)
  To: Hemant Agrawal, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On Friday 14 July 2017 03:52 PM, santosh wrote:

> On Friday 14 July 2017 03:09 PM, Hemant Agrawal wrote:
>
>> On 7/14/2017 2:00 PM, santosh wrote:
>>> On Friday 14 July 2017 01:37 PM, Hemant Agrawal wrote:
>>>
>>>> On 7/11/2017 11:46 AM, Santosh Shukla wrote:
>>>>> API(rte_bus_get_iommu_class) helps to automatically detect and select
>>>>> appropriate iova mapping scheme for iommu capable device on that bus.
>>>>>
>>>>> Algorithm for iova scheme selection for bus:
>>>>> 0. Iterate through bus_list.
>>>>> 1. Collect each bus iova mode value and update into 'mode' var.
>>>>> 2. Here value '1' is _pa and value '2' is _va mode.
>>>>> So mode selection scheme is like:
>>>>> if mode == 2 then iova mode is _va.
>>>>> if mode == 1 then iova mode is _pa
>>>>> if mode  == 3 then iova mode ia _pa.
>>>>>
>>>>> So mode !=2  will be default iova mode.
>>>>>
>>>>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>>> ---
>>>>>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
>>>>>  lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
>>>>>  lib/librte_eal/common/eal_common_pci.c          |  1 +
>>>>>  lib/librte_eal/common/include/rte_bus.h         | 22 ++++++++++++++++++++++
>>>>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>>>>  5 files changed, 48 insertions(+)
>>>>>
>>>>> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>>> index 33c2c32c0..a2dd65a33 100644
>>>>> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>>> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>>> @@ -202,6 +202,7 @@ DPDK_17.08 {
>>>>>      rte_bus_find_by_name;
>>>>>      rte_pci_match;
>>>>>      rte_pci_get_iommu_class;
>>>>> +    rte_bus_get_iommu_class;
>>>>>
>>>>>  } DPDK_17.05;
>>>>>
>>>>> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
>>>>> index 08bec2d93..5d5753ac9 100644
>>>>> --- a/lib/librte_eal/common/eal_common_bus.c
>>>>> +++ b/lib/librte_eal/common/eal_common_bus.c
>>>>> @@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
>>>>>          c[0] = '\0';
>>>>>      return rte_bus_find(NULL, bus_can_parse, name);
>>>>>  }
>>>>> +
>>>>> +
>>>>> +/*
>>>>> + * Get iommu class of devices on the bus.
>>>>> + */
>>>>> +enum rte_iova_mode
>>>>> +rte_bus_get_iommu_class(void)
>>>>> +{
>>>>> +    int mode = 0;
>>>>> +    struct rte_bus *bus;
>>>>> +
>>>>> +    TAILQ_FOREACH(bus, &rte_bus_list, next) {
>>>>> +
>>>>> +        if (bus->get_iommu_class)
>>>>> +            mode |= bus->get_iommu_class();
>>>>> +    }
>>>>> +
>>>> If you change the default return as '0' for buses. This code will work.
>>>> e.g. PCI will return '0' - when no device is probed. FSL MC will return VA. the default mode will be 'VA'
>>>>
>>> I'm confused why it won't work for fslmc case?
>>>
>>> Let me walk through the code:
>>>
>>> If no-pci device Or (future) no-platform device probed then bus opt
>>> to use default mapping scheme .. which is iova_pa(default scheme).
>>>
>>> Lets take PCI_bus example:
>>> bus->get_iommu_class()
>>>     ---> bus->_pci_get_iommu_class()
>>>         * Now consider that no interface bound to any of PCI device, then
>>>           it will return RTE_IOVA_PA mode to rte_bus layer (aka bus->get_iommu_class).
>>>           So the iova mapping result from iommu_class scan is RTE_IOVA_PA (default).
>>>           It works for PCI_bus case, tested for both iova_va and iova_pa case, no-pci device case.
>>>
>>> Now in fslmc bus case:
>>> bus->get_iommu_class()
>>>     ---> bus->_fslmc_get_iommu_class()
>>>        
>>>         * IIUC your comment - You want fslmc bus to return RTE_IOVA_VA if no device
>>>           detected, Right?
>> why?
>>
> As I didn't understood your previous reply:
> `e.g. PCI will return '0' - when no device is probed. FSL MC will return VA. the default mode will be 'VA'`
>
> So, I'm asking you that in fslmc bus case - if no device found then are you opting _va scheme or not?
> Seems like _not_ per your below comment.
>  
>
>> If bus is just present but no device is in use for dpdk, then bus should return 0 and it *should not* participate in the IOMMU class decision.
>>
> I think, I understand your point..Example if you have no-pci on first PCI bus
> but device found on 2nd platform bus then you don't want to fallback to default (/_pa) mode.. 
> instead you want to use 2nd bus mode for mapping, which is _va. Right?
>
> If so then In my first version - We did introduced the case called _DC.
> _DC:0 --> stands for no-device found case.
>
>> Right now there are only two buses. There can be more buses. (e.g. PCI, platform, fslmc in case of dpaa2 as well).
>>
>> If the bus is not being used at all, why it influence the decision of other buses.
>>
> If your referring to above case then I agree, We'll re-introduce _DC state from v1 in next revision.
> That will look like
> rte_pci_get_iommu_class() {
> 	int mode = RTE_IOVA_DC; /* '0' */
>
> 	return _DC; /* if no device found */
> }
>
> Right?
>
>> if no bus has any device, the System default is anyway PA.
>>
> Right, If no bus present then It's also responsibility of `rte_bus_get_iommu_class`
> to use default mapping scheme which is _pa and which It does.
>
>>>           if so then your fslmc bus handle should do something like below
>>>             -- If no device on fslmc bus : return RTE_IOVA_VA.
>>>             -- If device detected on fslmc bus and bound to iommu driver : return RTE_IOVA_VA
>>>             -- If device detected fslmc but not bound to iommu drv : return RTE_IOVA_PA..
>>>
>>> make sense? If not then can you describe fslmc mapping scheme?
>>>
>>>> if fslmc is not present. The default mode will be PA.
>>>>
>>>>> +    if (mode != RTE_IOVA_VA) {
>>>>> +        /* Use default IOVA mode */
>>>>> +        mode = RTE_IOVA_PA;
>>>>> +    }
>> The system default is anyway PA.
>>
> No, That check is needed for case like 1st bus return with _PA and 2nd bus returns with _VA,
> then mode = 3 (Mix mode), which we don't support so (as I mentioned before) its responsibility of
> rte_bus_get_iommu_class() to return default mode (_pa). That's why!.
>
>
Does your platform supports `mix mode`, I asked same question in thread [04/11] too?
Let's say that dpaa2 supports mix mode then it is Ok if bus chose to opt default mapping
for mix mode case? Do you see any issue if bus opt to use default scheme for mix mode?

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 05/11] bus: get iommu class
  2017-07-14 10:29               ` santosh
@ 2017-07-14 10:51                 ` Hemant Agrawal
  2017-07-14 11:03                   ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Hemant Agrawal @ 2017-07-14 10:51 UTC (permalink / raw)
  To: santosh, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On 7/14/2017 3:59 PM, santosh wrote:
> On Friday 14 July 2017 03:52 PM, santosh wrote:
>
>> On Friday 14 July 2017 03:09 PM, Hemant Agrawal wrote:
>>
>>> On 7/14/2017 2:00 PM, santosh wrote:
>>>> On Friday 14 July 2017 01:37 PM, Hemant Agrawal wrote:
>>>>
>>>>> On 7/11/2017 11:46 AM, Santosh Shukla wrote:
>>>>>> API(rte_bus_get_iommu_class) helps to automatically detect and select
>>>>>> appropriate iova mapping scheme for iommu capable device on that bus.
>>>>>>
>>>>>> Algorithm for iova scheme selection for bus:
>>>>>> 0. Iterate through bus_list.
>>>>>> 1. Collect each bus iova mode value and update into 'mode' var.
>>>>>> 2. Here value '1' is _pa and value '2' is _va mode.
>>>>>> So mode selection scheme is like:
>>>>>> if mode == 2 then iova mode is _va.
>>>>>> if mode == 1 then iova mode is _pa
>>>>>> if mode  == 3 then iova mode ia _pa.
>>>>>>
>>>>>> So mode !=2  will be default iova mode.
>>>>>>
>>>>>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>>>> ---
>>>>>>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
>>>>>>  lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
>>>>>>  lib/librte_eal/common/eal_common_pci.c          |  1 +
>>>>>>  lib/librte_eal/common/include/rte_bus.h         | 22 ++++++++++++++++++++++
>>>>>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>>>>>  5 files changed, 48 insertions(+)
>>>>>>
>>>>>> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>>>> index 33c2c32c0..a2dd65a33 100644
>>>>>> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>>>> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>>>> @@ -202,6 +202,7 @@ DPDK_17.08 {
>>>>>>      rte_bus_find_by_name;
>>>>>>      rte_pci_match;
>>>>>>      rte_pci_get_iommu_class;
>>>>>> +    rte_bus_get_iommu_class;
>>>>>>
>>>>>>  } DPDK_17.05;
>>>>>>
>>>>>> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
>>>>>> index 08bec2d93..5d5753ac9 100644
>>>>>> --- a/lib/librte_eal/common/eal_common_bus.c
>>>>>> +++ b/lib/librte_eal/common/eal_common_bus.c
>>>>>> @@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
>>>>>>          c[0] = '\0';
>>>>>>      return rte_bus_find(NULL, bus_can_parse, name);
>>>>>>  }
>>>>>> +
>>>>>> +
>>>>>> +/*
>>>>>> + * Get iommu class of devices on the bus.
>>>>>> + */
>>>>>> +enum rte_iova_mode
>>>>>> +rte_bus_get_iommu_class(void)
>>>>>> +{
>>>>>> +    int mode = 0;
>>>>>> +    struct rte_bus *bus;
>>>>>> +
>>>>>> +    TAILQ_FOREACH(bus, &rte_bus_list, next) {
>>>>>> +
>>>>>> +        if (bus->get_iommu_class)
>>>>>> +            mode |= bus->get_iommu_class();
>>>>>> +    }
>>>>>> +
>>>>> If you change the default return as '0' for buses. This code will work.
>>>>> e.g. PCI will return '0' - when no device is probed. FSL MC will return VA. the default mode will be 'VA'
>>>>>
>>>> I'm confused why it won't work for fslmc case?
>>>>
>>>> Let me walk through the code:
>>>>
>>>> If no-pci device Or (future) no-platform device probed then bus opt
>>>> to use default mapping scheme .. which is iova_pa(default scheme).
>>>>
>>>> Lets take PCI_bus example:
>>>> bus->get_iommu_class()
>>>>     ---> bus->_pci_get_iommu_class()
>>>>         * Now consider that no interface bound to any of PCI device, then
>>>>           it will return RTE_IOVA_PA mode to rte_bus layer (aka bus->get_iommu_class).
>>>>           So the iova mapping result from iommu_class scan is RTE_IOVA_PA (default).
>>>>           It works for PCI_bus case, tested for both iova_va and iova_pa case, no-pci device case.
>>>>
>>>> Now in fslmc bus case:
>>>> bus->get_iommu_class()
>>>>     ---> bus->_fslmc_get_iommu_class()
>>>>
>>>>         * IIUC your comment - You want fslmc bus to return RTE_IOVA_VA if no device
>>>>           detected, Right?
>>> why?
>>>
>> As I didn't understood your previous reply:
>> `e.g. PCI will return '0' - when no device is probed. FSL MC will return VA. the default mode will be 'VA'`
>>
>> So, I'm asking you that in fslmc bus case - if no device found then are you opting _va scheme or not?
>> Seems like _not_ per your below comment.
>>
>>
>>> If bus is just present but no device is in use for dpdk, then bus should return 0 and it *should not* participate in the IOMMU class decision.
>>>
>> I think, I understand your point..Example if you have no-pci on first PCI bus
>> but device found on 2nd platform bus then you don't want to fallback to default (/_pa) mode..
>> instead you want to use 2nd bus mode for mapping, which is _va. Right?
>>
>> If so then In my first version - We did introduced the case called _DC.
>> _DC:0 --> stands for no-device found case.
>>
>>> Right now there are only two buses. There can be more buses. (e.g. PCI, platform, fslmc in case of dpaa2 as well).
>>>
>>> If the bus is not being used at all, why it influence the decision of other buses.
>>>
>> If your referring to above case then I agree, We'll re-introduce _DC state from v1 in next revision.
>> That will look like
>> rte_pci_get_iommu_class() {
>> 	int mode = RTE_IOVA_DC; /* '0' */
>>
>> 	return _DC; /* if no device found */
>> }
>>
>> Right?

Yes! Thanks!

As I explained in the other thread. The PCI devices can be there, but 
none of them is for DPDK:
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL:   probe driver: 8086:10d3 net_e1000_em
EAL:   Not managed by a supported kernel driver, skipped


>>
>>> if no bus has any device, the System default is anyway PA.
>>>
>> Right, If no bus present then It's also responsibility of `rte_bus_get_iommu_class`
>> to use default mapping scheme which is _pa and which It does.
>>
>>>>           if so then your fslmc bus handle should do something like below
>>>>             -- If no device on fslmc bus : return RTE_IOVA_VA.
>>>>             -- If device detected on fslmc bus and bound to iommu driver : return RTE_IOVA_VA
>>>>             -- If device detected fslmc but not bound to iommu drv : return RTE_IOVA_PA..
>>>>
>>>> make sense? If not then can you describe fslmc mapping scheme?
>>>>
>>>>> if fslmc is not present. The default mode will be PA.
>>>>>
>>>>>> +    if (mode != RTE_IOVA_VA) {
>>>>>> +        /* Use default IOVA mode */
>>>>>> +        mode = RTE_IOVA_PA;
>>>>>> +    }
>>> The system default is anyway PA.
>>>
>> No, That check is needed for case like 1st bus return with _PA and 2nd bus returns with _VA,
>> then mode = 3 (Mix mode), which we don't support so (as I mentioned before) its responsibility of
>> rte_bus_get_iommu_class() to return default mode (_pa). That's why!.
>>
>>
> Does your platform supports `mix mode`, I asked same question in thread [04/11] too?
> Let's say that dpaa2 supports mix mode then it is Ok if bus chose to opt default mapping
> for mix mode case? Do you see any issue if bus opt to use default scheme for mix mode?
>
>

yes! We can support mix mode. However with your suggested changes in 
mempool etc APIs, now the DPDK will not work for us in mix mode (when 
both PCI and DPAA2 devices are available) with VA support only for DPAA2 :)

In case of mix mode, you logic is already there to default to PA. That 
is fine.

But, when PCI devices are not hooked to dpdk. We should be able to use 
VA for dpaa2.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 05/11] bus: get iommu class
  2017-07-14 10:51                 ` Hemant Agrawal
@ 2017-07-14 11:03                   ` santosh
  2017-07-14 11:15                     ` Hemant Agrawal
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-07-14 11:03 UTC (permalink / raw)
  To: Hemant Agrawal, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On Friday 14 July 2017 04:21 PM, Hemant Agrawal wrote:

> On 7/14/2017 3:59 PM, santosh wrote:
>> On Friday 14 July 2017 03:52 PM, santosh wrote:
>>
>>> On Friday 14 July 2017 03:09 PM, Hemant Agrawal wrote:
>>>
>>>> On 7/14/2017 2:00 PM, santosh wrote:
>>>>> On Friday 14 July 2017 01:37 PM, Hemant Agrawal wrote:
>>>>>
>>>>>> On 7/11/2017 11:46 AM, Santosh Shukla wrote:
>>>>>>> API(rte_bus_get_iommu_class) helps to automatically detect and select
>>>>>>> appropriate iova mapping scheme for iommu capable device on that bus.
>>>>>>>
>>>>>>> Algorithm for iova scheme selection for bus:
>>>>>>> 0. Iterate through bus_list.
>>>>>>> 1. Collect each bus iova mode value and update into 'mode' var.
>>>>>>> 2. Here value '1' is _pa and value '2' is _va mode.
>>>>>>> So mode selection scheme is like:
>>>>>>> if mode == 2 then iova mode is _va.
>>>>>>> if mode == 1 then iova mode is _pa
>>>>>>> if mode  == 3 then iova mode ia _pa.
>>>>>>>
>>>>>>> So mode !=2  will be default iova mode.
>>>>>>>
>>>>>>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>>>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>>>>> ---
>>>>>>>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
>>>>>>>  lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
>>>>>>>  lib/librte_eal/common/eal_common_pci.c          |  1 +
>>>>>>>  lib/librte_eal/common/include/rte_bus.h         | 22 ++++++++++++++++++++++
>>>>>>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>>>>>>  5 files changed, 48 insertions(+)
>>>>>>>
>>>>>>> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>>>>> index 33c2c32c0..a2dd65a33 100644
>>>>>>> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>>>>> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>>>>> @@ -202,6 +202,7 @@ DPDK_17.08 {
>>>>>>>      rte_bus_find_by_name;
>>>>>>>      rte_pci_match;
>>>>>>>      rte_pci_get_iommu_class;
>>>>>>> +    rte_bus_get_iommu_class;
>>>>>>>
>>>>>>>  } DPDK_17.05;
>>>>>>>
>>>>>>> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
>>>>>>> index 08bec2d93..5d5753ac9 100644
>>>>>>> --- a/lib/librte_eal/common/eal_common_bus.c
>>>>>>> +++ b/lib/librte_eal/common/eal_common_bus.c
>>>>>>> @@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
>>>>>>>          c[0] = '\0';
>>>>>>>      return rte_bus_find(NULL, bus_can_parse, name);
>>>>>>>  }
>>>>>>> +
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Get iommu class of devices on the bus.
>>>>>>> + */
>>>>>>> +enum rte_iova_mode
>>>>>>> +rte_bus_get_iommu_class(void)
>>>>>>> +{
>>>>>>> +    int mode = 0;
>>>>>>> +    struct rte_bus *bus;
>>>>>>> +
>>>>>>> +    TAILQ_FOREACH(bus, &rte_bus_list, next) {
>>>>>>> +
>>>>>>> +        if (bus->get_iommu_class)
>>>>>>> +            mode |= bus->get_iommu_class();
>>>>>>> +    }
>>>>>>> +
>>>>>> If you change the default return as '0' for buses. This code will work.
>>>>>> e.g. PCI will return '0' - when no device is probed. FSL MC will return VA. the default mode will be 'VA'
>>>>>>
>>>>> I'm confused why it won't work for fslmc case?
>>>>>
>>>>> Let me walk through the code:
>>>>>
>>>>> If no-pci device Or (future) no-platform device probed then bus opt
>>>>> to use default mapping scheme .. which is iova_pa(default scheme).
>>>>>
>>>>> Lets take PCI_bus example:
>>>>> bus->get_iommu_class()
>>>>>     ---> bus->_pci_get_iommu_class()
>>>>>         * Now consider that no interface bound to any of PCI device, then
>>>>>           it will return RTE_IOVA_PA mode to rte_bus layer (aka bus->get_iommu_class).
>>>>>           So the iova mapping result from iommu_class scan is RTE_IOVA_PA (default).
>>>>>           It works for PCI_bus case, tested for both iova_va and iova_pa case, no-pci device case.
>>>>>
>>>>> Now in fslmc bus case:
>>>>> bus->get_iommu_class()
>>>>>     ---> bus->_fslmc_get_iommu_class()
>>>>>
>>>>>         * IIUC your comment - You want fslmc bus to return RTE_IOVA_VA if no device
>>>>>           detected, Right?
>>>> why?
>>>>
>>> As I didn't understood your previous reply:
>>> `e.g. PCI will return '0' - when no device is probed. FSL MC will return VA. the default mode will be 'VA'`
>>>
>>> So, I'm asking you that in fslmc bus case - if no device found then are you opting _va scheme or not?
>>> Seems like _not_ per your below comment.
>>>
>>>
>>>> If bus is just present but no device is in use for dpdk, then bus should return 0 and it *should not* participate in the IOMMU class decision.
>>>>
>>> I think, I understand your point..Example if you have no-pci on first PCI bus
>>> but device found on 2nd platform bus then you don't want to fallback to default (/_pa) mode..
>>> instead you want to use 2nd bus mode for mapping, which is _va. Right?
>>>
>>> If so then In my first version - We did introduced the case called _DC.
>>> _DC:0 --> stands for no-device found case.
>>>
>>>> Right now there are only two buses. There can be more buses. (e.g. PCI, platform, fslmc in case of dpaa2 as well).
>>>>
>>>> If the bus is not being used at all, why it influence the decision of other buses.
>>>>
>>> If your referring to above case then I agree, We'll re-introduce _DC state from v1 in next revision.
>>> That will look like
>>> rte_pci_get_iommu_class() {
>>>     int mode = RTE_IOVA_DC; /* '0' */
>>>
>>>     return _DC; /* if no device found */
>>> }
>>>
>>> Right?
>
> Yes! Thanks!
>
> As I explained in the other thread. The PCI devices can be there, but none of them is for DPDK:
> EAL: PCI device 0000:01:00.0 on NUMA socket 0
> EAL:   probe driver: 8086:10d3 net_e1000_em
> EAL:   Not managed by a supported kernel driver, skipped
>
>
Ok, I will queue _DC changes in next verions. Thanks for confirming.

>>>
>>>> if no bus has any device, the System default is anyway PA.
>>>>
>>> Right, If no bus present then It's also responsibility of `rte_bus_get_iommu_class`
>>> to use default mapping scheme which is _pa and which It does.
>>>
>>>>>           if so then your fslmc bus handle should do something like below
>>>>>             -- If no device on fslmc bus : return RTE_IOVA_VA.
>>>>>             -- If device detected on fslmc bus and bound to iommu driver : return RTE_IOVA_VA
>>>>>             -- If device detected fslmc but not bound to iommu drv : return RTE_IOVA_PA..
>>>>>
>>>>> make sense? If not then can you describe fslmc mapping scheme?
>>>>>
>>>>>> if fslmc is not present. The default mode will be PA.
>>>>>>
>>>>>>> +    if (mode != RTE_IOVA_VA) {
>>>>>>> +        /* Use default IOVA mode */
>>>>>>> +        mode = RTE_IOVA_PA;
>>>>>>> +    }
>>>> The system default is anyway PA.
>>>>
>>> No, That check is needed for case like 1st bus return with _PA and 2nd bus returns with _VA,
>>> then mode = 3 (Mix mode), which we don't support so (as I mentioned before) its responsibility of
>>> rte_bus_get_iommu_class() to return default mode (_pa). That's why!.
>>>
>>>
>> Does your platform supports `mix mode`, I asked same question in thread [04/11] too?
>> Let's say that dpaa2 supports mix mode then it is Ok if bus chose to opt default mapping
>> for mix mode case? Do you see any issue if bus opt to use default scheme for mix mode?
>>
>>
>
> yes! We can support mix mode. However with your suggested changes in mempool etc APIs, now the DPDK will not work for us in mix mode (when both PCI and DPAA2 devices are available) with VA support only for DPAA2 :)
>
> In case of mix mode, you logic is already there to default to PA. That is fine.
>
> But, when PCI devices are not hooked to dpdk. We should be able to use VA for dpaa2.
>
Ok.
I assume that You'll implement bus handle for fslmc something like
`rte_fslmc_get_iommu_class()` and make sure that you return:
- _VA in no-device found case.
- _VA if iommu capable interface detected for device.
- _PA if no-iommu.

And only change which you expect at bus layer /rte_bus_get_iommu_class() is to
honor `no device found` situation for multiple bus case. Right?

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v3 05/11] bus: get iommu class
  2017-07-14 11:03                   ` santosh
@ 2017-07-14 11:15                     ` Hemant Agrawal
  0 siblings, 0 replies; 248+ messages in thread
From: Hemant Agrawal @ 2017-07-14 11:15 UTC (permalink / raw)
  To: santosh, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On 7/14/2017 4:33 PM, santosh wrote:
> On Friday 14 July 2017 04:21 PM, Hemant Agrawal wrote:
>
>> On 7/14/2017 3:59 PM, santosh wrote:
>>> On Friday 14 July 2017 03:52 PM, santosh wrote:
>>>
>>>> On Friday 14 July 2017 03:09 PM, Hemant Agrawal wrote:
>>>>
>>>>> On 7/14/2017 2:00 PM, santosh wrote:
>>>>>> On Friday 14 July 2017 01:37 PM, Hemant Agrawal wrote:
>>>>>>
>>>>>>> On 7/11/2017 11:46 AM, Santosh Shukla wrote:
>>>>>>>> API(rte_bus_get_iommu_class) helps to automatically detect and select
>>>>>>>> appropriate iova mapping scheme for iommu capable device on that bus.
>>>>>>>>
>>>>>>>> Algorithm for iova scheme selection for bus:
>>>>>>>> 0. Iterate through bus_list.
>>>>>>>> 1. Collect each bus iova mode value and update into 'mode' var.
>>>>>>>> 2. Here value '1' is _pa and value '2' is _va mode.
>>>>>>>> So mode selection scheme is like:
>>>>>>>> if mode == 2 then iova mode is _va.
>>>>>>>> if mode == 1 then iova mode is _pa
>>>>>>>> if mode  == 3 then iova mode ia _pa.
>>>>>>>>
>>>>>>>> So mode !=2  will be default iova mode.
>>>>>>>>
>>>>>>>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>>>>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>>>>>> ---
>>>>>>>>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
>>>>>>>>  lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
>>>>>>>>  lib/librte_eal/common/eal_common_pci.c          |  1 +
>>>>>>>>  lib/librte_eal/common/include/rte_bus.h         | 22 ++++++++++++++++++++++
>>>>>>>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>>>>>>>  5 files changed, 48 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>>>>>> index 33c2c32c0..a2dd65a33 100644
>>>>>>>> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>>>>>> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>>>>>> @@ -202,6 +202,7 @@ DPDK_17.08 {
>>>>>>>>      rte_bus_find_by_name;
>>>>>>>>      rte_pci_match;
>>>>>>>>      rte_pci_get_iommu_class;
>>>>>>>> +    rte_bus_get_iommu_class;
>>>>>>>>
>>>>>>>>  } DPDK_17.05;
>>>>>>>>
>>>>>>>> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
>>>>>>>> index 08bec2d93..5d5753ac9 100644
>>>>>>>> --- a/lib/librte_eal/common/eal_common_bus.c
>>>>>>>> +++ b/lib/librte_eal/common/eal_common_bus.c
>>>>>>>> @@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
>>>>>>>>          c[0] = '\0';
>>>>>>>>      return rte_bus_find(NULL, bus_can_parse, name);
>>>>>>>>  }
>>>>>>>> +
>>>>>>>> +
>>>>>>>> +/*
>>>>>>>> + * Get iommu class of devices on the bus.
>>>>>>>> + */
>>>>>>>> +enum rte_iova_mode
>>>>>>>> +rte_bus_get_iommu_class(void)
>>>>>>>> +{
>>>>>>>> +    int mode = 0;
>>>>>>>> +    struct rte_bus *bus;
>>>>>>>> +
>>>>>>>> +    TAILQ_FOREACH(bus, &rte_bus_list, next) {
>>>>>>>> +
>>>>>>>> +        if (bus->get_iommu_class)
>>>>>>>> +            mode |= bus->get_iommu_class();
>>>>>>>> +    }
>>>>>>>> +
>>>>>>> If you change the default return as '0' for buses. This code will work.
>>>>>>> e.g. PCI will return '0' - when no device is probed. FSL MC will return VA. the default mode will be 'VA'
>>>>>>>
>>>>>> I'm confused why it won't work for fslmc case?
>>>>>>
>>>>>> Let me walk through the code:
>>>>>>
>>>>>> If no-pci device Or (future) no-platform device probed then bus opt
>>>>>> to use default mapping scheme .. which is iova_pa(default scheme).
>>>>>>
>>>>>> Lets take PCI_bus example:
>>>>>> bus->get_iommu_class()
>>>>>>     ---> bus->_pci_get_iommu_class()
>>>>>>         * Now consider that no interface bound to any of PCI device, then
>>>>>>           it will return RTE_IOVA_PA mode to rte_bus layer (aka bus->get_iommu_class).
>>>>>>           So the iova mapping result from iommu_class scan is RTE_IOVA_PA (default).
>>>>>>           It works for PCI_bus case, tested for both iova_va and iova_pa case, no-pci device case.
>>>>>>
>>>>>> Now in fslmc bus case:
>>>>>> bus->get_iommu_class()
>>>>>>     ---> bus->_fslmc_get_iommu_class()
>>>>>>
>>>>>>         * IIUC your comment - You want fslmc bus to return RTE_IOVA_VA if no device
>>>>>>           detected, Right?
>>>>> why?
>>>>>
>>>> As I didn't understood your previous reply:
>>>> `e.g. PCI will return '0' - when no device is probed. FSL MC will return VA. the default mode will be 'VA'`
>>>>
>>>> So, I'm asking you that in fslmc bus case - if no device found then are you opting _va scheme or not?
>>>> Seems like _not_ per your below comment.
>>>>
>>>>
>>>>> If bus is just present but no device is in use for dpdk, then bus should return 0 and it *should not* participate in the IOMMU class decision.
>>>>>
>>>> I think, I understand your point..Example if you have no-pci on first PCI bus
>>>> but device found on 2nd platform bus then you don't want to fallback to default (/_pa) mode..
>>>> instead you want to use 2nd bus mode for mapping, which is _va. Right?
>>>>
>>>> If so then In my first version - We did introduced the case called _DC.
>>>> _DC:0 --> stands for no-device found case.
>>>>
>>>>> Right now there are only two buses. There can be more buses. (e.g. PCI, platform, fslmc in case of dpaa2 as well).
>>>>>
>>>>> If the bus is not being used at all, why it influence the decision of other buses.
>>>>>
>>>> If your referring to above case then I agree, We'll re-introduce _DC state from v1 in next revision.
>>>> That will look like
>>>> rte_pci_get_iommu_class() {
>>>>     int mode = RTE_IOVA_DC; /* '0' */
>>>>
>>>>     return _DC; /* if no device found */
>>>> }
>>>>
>>>> Right?
>>
>> Yes! Thanks!
>>
>> As I explained in the other thread. The PCI devices can be there, but none of them is for DPDK:
>> EAL: PCI device 0000:01:00.0 on NUMA socket 0
>> EAL:   probe driver: 8086:10d3 net_e1000_em
>> EAL:   Not managed by a supported kernel driver, skipped
>>
>>
> Ok, I will queue _DC changes in next verions. Thanks for confirming.
>
>>>>
>>>>> if no bus has any device, the System default is anyway PA.
>>>>>
>>>> Right, If no bus present then It's also responsibility of `rte_bus_get_iommu_class`
>>>> to use default mapping scheme which is _pa and which It does.
>>>>
>>>>>>           if so then your fslmc bus handle should do something like below
>>>>>>             -- If no device on fslmc bus : return RTE_IOVA_VA.
>>>>>>             -- If device detected on fslmc bus and bound to iommu driver : return RTE_IOVA_VA
>>>>>>             -- If device detected fslmc but not bound to iommu drv : return RTE_IOVA_PA..
>>>>>>
>>>>>> make sense? If not then can you describe fslmc mapping scheme?
>>>>>>
>>>>>>> if fslmc is not present. The default mode will be PA.
>>>>>>>
>>>>>>>> +    if (mode != RTE_IOVA_VA) {
>>>>>>>> +        /* Use default IOVA mode */
>>>>>>>> +        mode = RTE_IOVA_PA;
>>>>>>>> +    }
>>>>> The system default is anyway PA.
>>>>>
>>>> No, That check is needed for case like 1st bus return with _PA and 2nd bus returns with _VA,
>>>> then mode = 3 (Mix mode), which we don't support so (as I mentioned before) its responsibility of
>>>> rte_bus_get_iommu_class() to return default mode (_pa). That's why!.
>>>>
>>>>
>>> Does your platform supports `mix mode`, I asked same question in thread [04/11] too?
>>> Let's say that dpaa2 supports mix mode then it is Ok if bus chose to opt default mapping
>>> for mix mode case? Do you see any issue if bus opt to use default scheme for mix mode?
>>>
>>>
>>
>> yes! We can support mix mode. However with your suggested changes in mempool etc APIs, now the DPDK will not work for us in mix mode (when both PCI and DPAA2 devices are available) with VA support only for DPAA2 :)
>>
>> In case of mix mode, you logic is already there to default to PA. That is fine.
>>
>> But, when PCI devices are not hooked to dpdk. We should be able to use VA for dpaa2.
>>
> Ok.
> I assume that You'll implement bus handle for fslmc something like
> `rte_fslmc_get_iommu_class()` and make sure that you return:
> - _VA in no-device found case.
I will return NO DC or '0', which ever you define for no-dev case.
> - _VA if iommu capable interface detected for device.
> - _PA if no-iommu.
>
> And only change which you expect at bus layer /rte_bus_get_iommu_class() is to
> honor `no device found` situation for multiple bus case. Right?
>

yes!
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus
  2017-07-11  6:16   ` [PATCH v3 00/11] Infrastructure to detect iova mapping on the bus Santosh Shukla
                       ` (10 preceding siblings ...)
  2017-07-11  6:16     ` [PATCH v3 11/11] eal/rte_malloc: " Santosh Shukla
@ 2017-07-18  5:59     ` Santosh Shukla
  2017-07-18  5:59       ` [PATCH v4 01/12] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
                         ` (13 more replies)
  11 siblings, 14 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-18  5:59 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

v4:
Introducing RTE_PCI_DRV_IOVA_AS_VA flag for autodetection of iova va mapping. 
If a PCI driver demand for IOVA as VA scheme then the driver can add it in the
PCI driver registration function.

Algorithm to select IOVA as VA for PCI bus case:
    0. If no device bound then return with RTE_IOVA_DC mapping mode,
    else goto 1).
    1. Look for device attached to vfio kdrv and has .drv_flag set
    to RTE_PCI_DRV_IOVA_AS_VA.
    2. Look for any device attached to UIO class of driver.
    3. Check for vfio-noiommu mode enabled.
    
    If 2) & 3) is false and 1) is true then select
    mapping scheme as RTE_IOVA_VA. Otherwise use default
    mapping scheme (RTE_IOVA_PA).

That way, Bus can truly autodetect the iova mapping mode for
a device Or a set of the device.


Patch series rebased on 'a599eb31f2e477674fc6176cdf989ee17432b552'.

* Re-introduced RTE_IOVA_DC (Don't care mode) for no-device found case.
  (Identified by Hemant [5]).
* Renamed flag from RTE_PCI_DRV_NEED_IOVA_VA to RTE_PCI_DRV_IOVA_AS_VA
  (Suggested by Maxime[6]).
* Based on the discussion on the thread [3], [6] and [5].

v3 --> v4:
- Re-introduced RTE_IOVA_DEC mode (Suggested by Hemant [5]).
- Renamed flag to RTE_PCI_DRV_IOVA_AS_VA (Suggested by Maxime).
- Reworded WARNING message(suggested by Maxime[7]).
- Created a separate patch for rte_pci_get_iommu_class (suggested by Maxime[]).
- Added VFIO_PRESENT ifdef build fix.

v2 --> v3:
- Removed rte_mempool_virt2phy (suggested by Olivier [4])

v1 --> v2:
- Removed override eal option i.e. (--iova-mode=<>) Because we have means to
  truly autodetect the iova mode.
- Introduced RTE_PCI_DRV_NEED_IOVA_VA drv_flag (Suggested by Maxime [3]).
- Using NEED_IOVA_VA drv_flag in autodetection logic.
- Removed Linux version check macro in vfio code, As per Maxime feedback.
- Moved rte_pci_match API from local to global.

Patch Summary:
0) 1st: Introducing a new flag in rte_pci_drv
1) 2nd: declare rte_pci_match api in pci header. Required for autodetection in
follow up patches.
2) 3rd: declare rte_pci_get_iommu_class.
3) 4nd - 5th: autodetection mapping infrastructure for Linux/bsdapp.
4) 6th: Introduces global bus API named rte_bus_get_iommu_class.
5) 7th: iova mode helper API.
6) 8th - 9th: Calls rte_bus_get_iommu_class API for Linux/bsdapp and returns
their iova mode.
7) 10th: Check iova mode and accordingly map vfio.dma_map to _pa or _va.
8) 11th - 12th: Check for IOVA_VA mode in below APIs
        - rte_mem_virt2phy
        - rte_malloc_virt2phy

Test History:
- Tested for x86/XL710 40G NIC card for both modes (iova_va/pa).
- Tested for arm64/thunderx vNIC Integrated NIC for both modes
- Tested for arm64/Octeontx integrated NICs for only
  Iova_va mode(It supports only one mode.)
- Ran standalone tests like mempool_autotest, mbuf_autotest.
- Verified for Doxygen.

Work History:
For v1, Refer [1].
For v2, Refer [2].
For v3, Refer [9].


Checkpatch result:
* Debug message - WARNING: line over 80 characters

Thanks.,

[1] https://www.mail-archive.com/dev@dpdk.org/msg67438.html
[2] https://www.mail-archive.com/dev@dpdk.org/msg70674.html
[3] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
[4] https://www.mail-archive.com/dev@dpdk.org/msg70692.html
[5] http://dpdk.org/ml/archives/dev/2017-July/071282.html
[6] http://dpdk.org/ml/archives/dev/2017-July/070951.html
[7] http://dpdk.org/ml/archives/dev/2017-July/070941.html
[8] http://dpdk.org/ml/archives/dev/2017-July/070952.html 
[9] http://dpdk.org/ml/archives/dev/2017-July/070918.html


Santosh Shukla (12):
  eal/pci: introduce PCI driver iova as va flag
  eal/pci: export match function
  eal/pci: get iommu class
  bsdapp/eal_pci: get iommu class
  linuxapp/eal_pci: get iommu class
  bus: get iommu class
  eal: introduce iova mode helper api
  linuxapp/eal: auto detect iova mode
  bsdapp/eal: auto detect iova mapping mode
  linuxapp/eal_vfio: honor iova mode before mapping
  linuxapp/eal_memory: honor iova mode in virt2phy
  eal/rte_malloc: honor iova mode in virt2phy

 lib/librte_eal/bsdapp/eal/eal.c                 | 21 ++++--
 lib/librte_eal/bsdapp/eal/eal_pci.c             | 10 +++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  4 ++
 lib/librte_eal/common/eal_common_bus.c          | 23 ++++++
 lib/librte_eal/common/eal_common_pci.c          | 11 +--
 lib/librte_eal/common/include/rte_bus.h         | 32 +++++++++
 lib/librte_eal/common/include/rte_eal.h         | 12 ++++
 lib/librte_eal/common/include/rte_pci.h         | 28 ++++++++
 lib/librte_eal/common/rte_malloc.c              |  9 ++-
 lib/librte_eal/linuxapp/eal/eal.c               | 21 ++++--
 lib/librte_eal/linuxapp/eal/eal_memory.c        |  3 +
 lib/librte_eal/linuxapp/eal/eal_pci.c           | 95 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 29 +++++++-
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  4 ++
 15 files changed, 282 insertions(+), 24 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v4 01/12] eal/pci: introduce PCI driver iova as va flag
  2017-07-18  5:59     ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
@ 2017-07-18  5:59       ` Santosh Shukla
  2017-07-18  5:59       ` [PATCH v4 02/12] eal/pci: export match function Santosh Shukla
                         ` (12 subsequent siblings)
  13 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-18  5:59 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Introducing RTE_PCI_DRV_IOVA_AS_VA flag. Flag used when driver needs
to operate in iova=va mode.

Why driver need iova=va mapping?

On NPU style co-processors like Octeontx, the buffer recycling has been
done in HW, unlike SW model. Here is the data flow:
1) On control path, Fill the HW mempool with buffers(iova as pa address)
2) on rx_burst, HW gives you IOVA address(iova as pa address)
3) As application expects VA to operate on it, rx_burst() needs to
convert to _va from _pa. Which is very expensive.
Instead of that if iova as va mapping, we can avoid the cost of
converting with help of IOMMU/SMMU.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
v3 --> v4:
- Renamed RTE_PCI_DRV_NEED_IOVA_VA to RTE_PCI_DRV_IOVA_AS_VA.
(Suggested by Maxime)

lib/librte_eal/common/include/rte_pci.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 8b123391c..743392f91 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -202,6 +202,8 @@ struct rte_pci_bus {
 #define RTE_PCI_DRV_INTR_RMV 0x0010
 /** Device driver needs to keep mapped resources if unsupported dev detected */
 #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
+/** Device driver supports iova as va */
+#define RTE_PCI_DRV_IOVA_AS_VA 0X0040
 
 /**
  * A structure describing a PCI mapping.
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v4 02/12] eal/pci: export match function
  2017-07-18  5:59     ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
  2017-07-18  5:59       ` [PATCH v4 01/12] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
@ 2017-07-18  5:59       ` Santosh Shukla
  2017-07-18  5:59       ` [PATCH v4 03/12] eal/pci: get iommu class Santosh Shukla
                         ` (11 subsequent siblings)
  13 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-18  5:59 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Export rte_pci_match() function as it needed in the followup patch.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/eal_common_pci.c          | 10 +---------
 lib/librte_eal/common/include/rte_pci.h         | 15 +++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 4 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 480ad234c..e81cbb286 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -200,6 +200,7 @@ DPDK_17.08 {
 	rte_bus_find;
 	rte_bus_find_by_device;
 	rte_bus_find_by_name;
+	rte_pci_match;
 
 } DPDK_17.05;
 
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 76bbcc853..8b6ecebd6 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -128,16 +128,8 @@ pci_unmap_resource(void *requested_addr, size_t size)
 
 /*
  * Match the PCI Driver and Device using the ID Table
- *
- * @param pci_drv
- *	PCI driver from which ID table would be extracted
- * @param pci_dev
- *	PCI device to match against the driver
- * @return
- *	1 for successful match
- *	0 for unsuccessful match
  */
-static int
+int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev)
 {
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 743392f91..47f0532e4 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -368,6 +368,21 @@ int rte_pci_scan(void);
 int
 rte_pci_probe(void);
 
+/*
+ * Match the PCI Driver and Device using the ID Table
+ *
+ * @param pci_drv
+ *      PCI driver from which ID table would be extracted
+ * @param pci_dev
+ *      PCI device to match against the driver
+ * @return
+ *      1 for successful match
+ *      0 for unsuccessful match
+ */
+int
+rte_pci_match(const struct rte_pci_driver *pci_drv,
+	      const struct rte_pci_device *pci_dev);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index fbaec39f7..a69bbb599 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -205,6 +205,7 @@ DPDK_17.08 {
 	rte_bus_find;
 	rte_bus_find_by_device;
 	rte_bus_find_by_name;
+	rte_pci_match;
 
 } DPDK_17.05;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v4 03/12] eal/pci: get iommu class
  2017-07-18  5:59     ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
  2017-07-18  5:59       ` [PATCH v4 01/12] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
  2017-07-18  5:59       ` [PATCH v4 02/12] eal/pci: export match function Santosh Shukla
@ 2017-07-18  5:59       ` Santosh Shukla
  2017-07-18  5:59       ` [PATCH v4 04/12] bsdapp/eal_pci: " Santosh Shukla
                         ` (10 subsequent siblings)
  13 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-18  5:59 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Introducing rte_pci_get_iommu_class API which helps to get iommu class
of PCI device on the bus and returns preferred iova mapping mode for
PCI bus.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
v3 --> v4:
- Created a separate patch per suggestion from Maxime.
Initially thought to squash patch into [01/12] but
then [01/12] will have more context so decided to
keep it as separate patch.

 lib/librte_eal/common/include/rte_bus.h | 10 ++++++++++
 lib/librte_eal/common/include/rte_pci.h | 11 +++++++++++
 2 files changed, 21 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index af9f0e13f..e06084253 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -55,6 +55,16 @@ extern "C" {
 /** Double linked list of buses */
 TAILQ_HEAD(rte_bus_list, rte_bus);
 
+
+/**
+ * IOVA mapping mode.
+ */
+enum rte_iova_mode {
+	RTE_IOVA_DC = 0,	/* Don't care mode */
+	RTE_IOVA_PA = (1 << 0),
+	RTE_IOVA_VA = (1 << 1)
+};
+
 /**
  * Bus specific scan for devices attached on the bus.
  * For each bus object, the scan would be responsible for finding devices and
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 47f0532e4..a67d77f22 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -383,6 +383,17 @@ int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev);
 
+
+/**
+ * Get iommu class of PCI devices on the bus.
+ * And return their preferred iova mapping mode.
+ *
+ * @return
+ *   - enum rte_iova_mode.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v4 04/12] bsdapp/eal_pci: get iommu class
  2017-07-18  5:59     ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                         ` (2 preceding siblings ...)
  2017-07-18  5:59       ` [PATCH v4 03/12] eal/pci: get iommu class Santosh Shukla
@ 2017-07-18  5:59       ` Santosh Shukla
  2017-07-18  5:59       ` [PATCH v4 05/12] linuxapp/eal_pci: " Santosh Shukla
                         ` (9 subsequent siblings)
  13 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-18  5:59 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Bsdapp case returns default iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
v3 --> v4:
- Removed rte_pci_get_iommu_class api declaration. Now that
  sits into separate patch [03/12].

 lib/librte_eal/bsdapp/eal/eal_pci.c           | 10 ++++++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map |  1 +
 2 files changed, 11 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c
index dcb3b51ad..965255f79 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -403,6 +403,16 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Get iommu class of pci devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	/* Supports only RTE_KDRV_NIC_UIO */
+	return RTE_IOVA_PA;
+}
+
 int
 pci_update_device(const struct rte_pci_addr *addr)
 {
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index e81cbb286..4b25318be 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -201,6 +201,7 @@ DPDK_17.08 {
 	rte_bus_find_by_device;
 	rte_bus_find_by_name;
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.05;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v4 05/12] linuxapp/eal_pci: get iommu class
  2017-07-18  5:59     ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                         ` (3 preceding siblings ...)
  2017-07-18  5:59       ` [PATCH v4 04/12] bsdapp/eal_pci: " Santosh Shukla
@ 2017-07-18  5:59       ` Santosh Shukla
  2017-07-18 10:55         ` Hemant Agrawal
  2017-07-18  5:59       ` [PATCH v4 06/12] bus: " Santosh Shukla
                         ` (8 subsequent siblings)
  13 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-07-18  5:59 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Get iommu class of PCI device on the bus and returns preferred iova
mapping mode for that bus.

Algorithm for iova scheme selection for PCI bus:
0. If no device bound then return with RTE_IOVA_DC mapping mode,
else goto 1).
1. Look for device attached to vfio kdrv and has .drv_flag set
to RTE_PCI_DRV_IOVA_AS_VA.
2. Look for any device attached to UIO class of driver.
3. Check for vfio-noiommu mode enabled.

If 2) & 3) is false and 1) is true then select
mapping scheme as RTE_IOVA_VA. Otherwise use default
mapping scheme (RTE_IOVA_PA).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
v3 --> v4 :
- Reworded WARNING message (suggested by Maxime)
- Added pci_device_is_bound func to check for no device case
  (suggested by Hemant).
- Added ifdef vfio_present.

v1 --> v2:
- Removed Linux version check in vfio_noiommu func. Refer [1].
  - Extending autodetction logic for _iommu_class.
    Refer [2].

[1] https://www.mail-archive.com/dev@dpdk.org/msg70108.html
[2] https://www.mail-archive.com/dev@dpdk.org/msg70279.html

 lib/librte_eal/linuxapp/eal/eal_pci.c           | 95 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 4 files changed, 119 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 7d9e1a99b..ecd946250 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -45,6 +45,7 @@
 #include "eal_filesystem.h"
 #include "eal_private.h"
 #include "eal_pci_init.h"
+#include "eal_vfio.h"
 
 /**
  * @file
@@ -488,6 +489,100 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Is pci device bound to any kdrv
+ */
+static inline int
+pci_device_is_bound(void)
+{
+	struct rte_pci_device *dev = NULL;
+	int ret = 0;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_UNKNOWN ||
+		    dev->kdrv == RTE_KDRV_NONE) {
+			continue;
+		} else {
+			ret = 1;
+			break;
+		}
+	}
+	return ret;
+}
+
+/*
+ * Any one of the device bound to uio
+ */
+static inline int
+pci_device_bound_uio(void)
+{
+	struct rte_pci_device *dev = NULL;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
+		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
+			return 1;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Any one of the device has iova as va
+ */
+static inline int
+pci_device_has_iova_va(void)
+{
+	struct rte_pci_device *dev = NULL;
+	struct rte_pci_driver *drv = NULL;
+
+	FOREACH_DRIVER_ON_PCIBUS(drv) {
+		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
+			FOREACH_DEVICE_ON_PCIBUS(dev) {
+				if (dev->kdrv == RTE_KDRV_VFIO &&
+				    rte_pci_match(drv, dev))
+					return 1;
+			}
+		}
+	}
+	return 0;
+}
+
+/*
+ * Get iommu class of PCI devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	bool is_bound;
+	bool is_vfio_noiommu_enabled = true;
+	bool has_iova_va;
+	bool is_bound_uio;
+
+	is_bound = pci_device_is_bound();
+	if (!is_bound)
+		return RTE_IOVA_DC;
+
+	has_iova_va = pci_device_has_iova_va();
+	is_bound_uio = pci_device_bound_uio();
+#ifdef VFIO_PRESENT
+	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
+#endif
+
+	if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
+		return RTE_IOVA_VA;
+
+	if (has_iova_va) {
+		RTE_LOG(WARNING, EAL, "Some devices want iova as va but pa will be used because.. ");
+		if (is_vfio_noiommu_enabled)
+			RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
+		if (is_bound_uio)
+			RTE_LOG(WARNING, EAL, "few device bound to UIO\n");
+	}
+
+	return RTE_IOVA_PA;
+}
+
 /* Read PCI config space. */
 int rte_pci_read_config(const struct rte_pci_device *device,
 		void *buf, size_t len, off_t offset)
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 946df7e31..c8a97b7e7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
 	return 0;
 }
 
+int
+vfio_noiommu_is_enabled(void)
+{
+	int fd, ret, cnt __rte_unused;
+	char c;
+
+	ret = -1;
+	fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
+	if (fd < 0)
+		return -1;
+
+	cnt = read(fd, &c, 1);
+	if (c == 'Y')
+		ret = 1;
+
+	close(fd);
+	return ret;
+}
+
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index 5ff63e5d7..26ea8e119 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -150,6 +150,8 @@ struct vfio_config {
 #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
 #define VFIO_GET_REGION_IDX(x) (x >> 40)
+#define VFIO_NOIOMMU_MODE      \
+	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
 
 /* DMA mapping function prototype.
  * Takes VFIO container fd as a parameter.
@@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
 
 int vfio_mp_sync_setup(void);
 
+int vfio_noiommu_is_enabled(void);
+
 #define SOCKET_REQ_CONTAINER 0x100
 #define SOCKET_REQ_GROUP 0x200
 #define SOCKET_CLR_GROUP 0x300
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index a69bbb599..5dd40f948 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -206,6 +206,7 @@ DPDK_17.08 {
 	rte_bus_find_by_device;
 	rte_bus_find_by_name;
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.05;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v4 06/12] bus: get iommu class
  2017-07-18  5:59     ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                         ` (4 preceding siblings ...)
  2017-07-18  5:59       ` [PATCH v4 05/12] linuxapp/eal_pci: " Santosh Shukla
@ 2017-07-18  5:59       ` Santosh Shukla
  2017-07-18 11:05         ` Hemant Agrawal
  2017-07-18  5:59       ` [PATCH v4 07/12] eal: introduce iova mode helper api Santosh Shukla
                         ` (7 subsequent siblings)
  13 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-07-18  5:59 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

API(rte_bus_get_iommu_class) helps to automatically detect and select
appropriate iova mapping scheme for iommu capable device on that bus.

Algorithm for iova scheme selection for bus:
0. Iterate through bus_list.
1. Collect each bus iova mode value and update into 'mode' var.
2. Mode selection scheme is:
if mode == 0 then iova mode is _pa,
if mode == 1 then iova mode is _pa,
if mode == 2 then iova mode is _va,
if mode == 3 then iova mode ia _pa.

So mode !=2  will be default iova mode (_pa).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
v3 --> v4:
 - Initialized mode to RTE_IOVA_DC in rte_bus_get_iommu_class.

 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
 lib/librte_eal/common/eal_common_pci.c          |  1 +
 lib/librte_eal/common/include/rte_bus.h         | 22 ++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 48 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 4b25318be..b9ee82b6b 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -202,6 +202,7 @@ DPDK_17.08 {
 	rte_bus_find_by_name;
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.05;
 
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 08bec2d93..a30a8982e 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
 		c[0] = '\0';
 	return rte_bus_find(NULL, bus_can_parse, name);
 }
+
+
+/*
+ * Get iommu class of devices on the bus.
+ */
+enum rte_iova_mode
+rte_bus_get_iommu_class(void)
+{
+	int mode = RTE_IOVA_DC;
+	struct rte_bus *bus;
+
+	TAILQ_FOREACH(bus, &rte_bus_list, next) {
+
+		if (bus->get_iommu_class)
+			mode |= bus->get_iommu_class();
+	}
+
+	if (mode != RTE_IOVA_VA) {
+		/* Use default IOVA mode */
+		mode = RTE_IOVA_PA;
+	}
+	return mode;
+}
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 8b6ecebd6..bdf2e7c3a 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -552,6 +552,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.plug = pci_plug,
 		.unplug = pci_unplug,
 		.parse = pci_parse,
+		.get_iommu_class = rte_pci_get_iommu_class,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index e06084253..94f1fdfca 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,17 @@ struct rte_bus_conf {
 	enum rte_bus_scan_mode scan_mode; /**< Scan policy. */
 };
 
+
+/**
+ * Get iommu class of devices on the bus.
+ * Check that those devices are attached to iommu driver.
+ *
+ * @return
+ *      enum rte_iova_mode value.
+ */
+typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
+
+
 /**
  * A structure describing a generic bus.
  */
@@ -195,6 +206,7 @@ struct rte_bus {
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	struct rte_bus_conf conf;    /**< Bus configuration */
+	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
 
 /**
@@ -294,6 +306,16 @@ struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
  */
 struct rte_bus *rte_bus_find_by_name(const char *busname);
 
+
+/**
+ * Get iommu class of devices on the bus.
+ * Check that those devices are attached to iommu driver.
+ *
+ * @return
+ *     enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_bus_get_iommu_class(void);
+
 /**
  * Helper for Bus registration.
  * The constructor has higher priority than PMD constructors.
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 5dd40f948..705af3adc 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -207,6 +207,7 @@ DPDK_17.08 {
 	rte_bus_find_by_name;
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.05;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v4 07/12] eal: introduce iova mode helper api
  2017-07-18  5:59     ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                         ` (5 preceding siblings ...)
  2017-07-18  5:59       ` [PATCH v4 06/12] bus: " Santosh Shukla
@ 2017-07-18  5:59       ` Santosh Shukla
  2017-07-18  5:59       ` [PATCH v4 08/12] linuxapp/eal: auto detect iova mode Santosh Shukla
                         ` (6 subsequent siblings)
  13 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-18  5:59 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Introducing rte_eal_iova_mode() helper API. This API
used by non-eal library for detecting iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
---
 lib/librte_eal/bsdapp/eal/eal.c                 |  6 ++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/include/rte_eal.h         | 12 ++++++++++++
 lib/librte_eal/linuxapp/eal/eal.c               |  6 ++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 26 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 80fe21de3..2a49e9fde 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -119,6 +119,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index b9ee82b6b..75a86a9d7 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -203,6 +203,7 @@ DPDK_17.08 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.05;
 
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 0e7363d77..932dc1a96 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -45,6 +45,7 @@
 
 #include <rte_per_lcore.h>
 #include <rte_config.h>
+#include <rte_bus.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -87,6 +88,9 @@ struct rte_config {
 	/** Primary or secondary configuration */
 	enum rte_proc_type_t process_type;
 
+	/** PA or VA mapping mode */
+	enum rte_iova_mode iova_mode;
+
 	/**
 	 * Pointer to memory configuration, which may be shared across multiple
 	 * DPDK instances
@@ -287,6 +291,14 @@ static inline int rte_gettid(void)
 	return RTE_PER_LCORE(_thread_id);
 }
 
+/**
+ * Get the iova mode
+ *
+ * @return
+ *   enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_eal_iova_mode(void);
+
 #define RTE_INIT(func) \
 static void __attribute__((constructor, used)) func(void)
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index b28bbab54..fffdf0d15 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -128,6 +128,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 705af3adc..7161d1d83 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -208,6 +208,7 @@ DPDK_17.08 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.05;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v4 08/12] linuxapp/eal: auto detect iova mode
  2017-07-18  5:59     ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                         ` (6 preceding siblings ...)
  2017-07-18  5:59       ` [PATCH v4 07/12] eal: introduce iova mode helper api Santosh Shukla
@ 2017-07-18  5:59       ` Santosh Shukla
  2017-07-18 11:34         ` Hemant Agrawal
  2017-07-18  5:59       ` [PATCH v4 09/12] bsdapp/eal: auto detect iova mapping mode Santosh Shukla
                         ` (5 subsequent siblings)
  13 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-07-18  5:59 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

- Moving late bus scanning to up..just after eal_parsing.
- Auto detect iova mapping mode, based on the result of
  rte_bus_scan_iommu_class.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index fffdf0d15..49b52ce4f 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -798,6 +798,15 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			internal_config.xen_dom0_support == 0 &&
@@ -895,12 +904,6 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v4 09/12] bsdapp/eal: auto detect iova mapping mode
  2017-07-18  5:59     ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                         ` (7 preceding siblings ...)
  2017-07-18  5:59       ` [PATCH v4 08/12] linuxapp/eal: auto detect iova mode Santosh Shukla
@ 2017-07-18  5:59       ` Santosh Shukla
  2017-07-18  5:59       ` [PATCH v4 10/12] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
                         ` (4 subsequent siblings)
  13 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-18  5:59 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

- Moving late bus scanning to up..just after eal_parsing.
- Mapping mode would be default for bsdapp. It supports
  only one pass through mode (RTE_KDRV_NIC_UIO)

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/bsdapp/eal/eal.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 2a49e9fde..3cb1bd22f 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -541,6 +541,15 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			eal_hugepage_info_init() < 0) {
@@ -620,12 +629,6 @@ rte_eal_init(int argc, char **argv)
 		rte_config.master_lcore, thread_id, cpuset,
 		ret == 0 ? "" : "...");
 
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v4 10/12] linuxapp/eal_vfio: honor iova mode before mapping
  2017-07-18  5:59     ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                         ` (8 preceding siblings ...)
  2017-07-18  5:59       ` [PATCH v4 09/12] bsdapp/eal: auto detect iova mapping mode Santosh Shukla
@ 2017-07-18  5:59       ` Santosh Shukla
  2017-07-18  5:59       ` [PATCH v4 11/12] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
                         ` (3 subsequent siblings)
  13 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-18  5:59 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Check iova mode and accordingly map iova to pa or va.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c8a97b7e7..b32cd09a2 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
 
 		ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
@@ -792,7 +795,10 @@ vfio_spapr_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
 				 VFIO_DMA_MAP_FLAG_WRITE;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v4 11/12] linuxapp/eal_memory: honor iova mode in virt2phy
  2017-07-18  5:59     ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                         ` (9 preceding siblings ...)
  2017-07-18  5:59       ` [PATCH v4 10/12] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
@ 2017-07-18  5:59       ` Santosh Shukla
  2017-07-18  5:59       ` [PATCH v4 12/12] eal/rte_malloc: " Santosh Shukla
                         ` (2 subsequent siblings)
  13 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-18  5:59 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index daead31c2..249740645 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -139,6 +139,9 @@ rte_mem_virt2phy(const void *virtaddr)
 	int page_size;
 	off_t offset;
 
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		return (uintptr_t)virtaddr;
+
 	/* when using dom0, /proc/self/pagemap always returns 0, check in
 	 * dpdk memory by browsing the memsegs */
 	if (rte_xen_dom0_supported()) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v4 12/12] eal/rte_malloc: honor iova mode in virt2phy
  2017-07-18  5:59     ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                         ` (10 preceding siblings ...)
  2017-07-18  5:59       ` [PATCH v4 11/12] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
@ 2017-07-18  5:59       ` Santosh Shukla
  2017-07-21  8:07       ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Maxime Coquelin
  2017-07-24  8:39       ` [PATCH v5 " Santosh Shukla
  13 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-18  5:59 UTC (permalink / raw)
  To: thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/common/rte_malloc.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 5c0627bf4..d65c05a4d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -251,10 +251,17 @@ rte_malloc_set_limit(__rte_unused const char *type,
 phys_addr_t
 rte_malloc_virt2phy(const void *addr)
 {
+	phys_addr_t paddr;
 	const struct malloc_elem *elem = malloc_elem_from_data(addr);
 	if (elem == NULL)
 		return RTE_BAD_PHYS_ADDR;
 	if (elem->ms->phys_addr == RTE_BAD_PHYS_ADDR)
 		return RTE_BAD_PHYS_ADDR;
-	return elem->ms->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		paddr = (uintptr_t)addr;
+	else
+		paddr = elem->ms->phys_addr +
+			((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+	return paddr;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* Re: [PATCH v4 05/12] linuxapp/eal_pci: get iommu class
  2017-07-18  5:59       ` [PATCH v4 05/12] linuxapp/eal_pci: " Santosh Shukla
@ 2017-07-18 10:55         ` Hemant Agrawal
  0 siblings, 0 replies; 248+ messages in thread
From: Hemant Agrawal @ 2017-07-18 10:55 UTC (permalink / raw)
  To: Santosh Shukla, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On 7/18/2017 11:29 AM, Santosh Shukla wrote:
> Get iommu class of PCI device on the bus and returns preferred iova
> mapping mode for that bus.
>
> Algorithm for iova scheme selection for PCI bus:
> 0. If no device bound then return with RTE_IOVA_DC mapping mode,
> else goto 1).
> 1. Look for device attached to vfio kdrv and has .drv_flag set
> to RTE_PCI_DRV_IOVA_AS_VA.
> 2. Look for any device attached to UIO class of driver.
> 3. Check for vfio-noiommu mode enabled.
>
> If 2) & 3) is false and 1) is true then select
> mapping scheme as RTE_IOVA_VA. Otherwise use default
> mapping scheme (RTE_IOVA_PA).
>
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
> v3 --> v4 :
> - Reworded WARNING message (suggested by Maxime)
> - Added pci_device_is_bound func to check for no device case
>   (suggested by Hemant).
> - Added ifdef vfio_present.
>
> v1 --> v2:
> - Removed Linux version check in vfio_noiommu func. Refer [1].
>   - Extending autodetction logic for _iommu_class.
>     Refer [2].
>
> [1] https://www.mail-archive.com/dev@dpdk.org/msg70108.html
> [2] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
>
>  lib/librte_eal/linuxapp/eal/eal_pci.c           | 95 +++++++++++++++++++++++++
>  lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++
>  lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>  4 files changed, 119 insertions(+)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
> index 7d9e1a99b..ecd946250 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> @@ -45,6 +45,7 @@
>  #include "eal_filesystem.h"
>  #include "eal_private.h"
>  #include "eal_pci_init.h"
> +#include "eal_vfio.h"
>
>  /**
>   * @file
> @@ -488,6 +489,100 @@ rte_pci_scan(void)
>  	return -1;
>  }
>
> +/*
> + * Is pci device bound to any kdrv
> + */
> +static inline int
> +pci_device_is_bound(void)
> +{
> +	struct rte_pci_device *dev = NULL;
> +	int ret = 0;
> +
> +	FOREACH_DEVICE_ON_PCIBUS(dev) {
> +		if (dev->kdrv == RTE_KDRV_UNKNOWN ||
> +		    dev->kdrv == RTE_KDRV_NONE) {
> +			continue;
> +		} else {
> +			ret = 1;
> +			break;
> +		}
> +	}
> +	return ret;
> +}
> +
> +/*
> + * Any one of the device bound to uio
> + */
> +static inline int
> +pci_device_bound_uio(void)
> +{
> +	struct rte_pci_device *dev = NULL;
> +
> +	FOREACH_DEVICE_ON_PCIBUS(dev) {
> +		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
> +		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
> +			return 1;
> +		}
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Any one of the device has iova as va
> + */
> +static inline int
> +pci_device_has_iova_va(void)
> +{
> +	struct rte_pci_device *dev = NULL;
> +	struct rte_pci_driver *drv = NULL;
> +
> +	FOREACH_DRIVER_ON_PCIBUS(drv) {
> +		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
> +			FOREACH_DEVICE_ON_PCIBUS(dev) {
> +				if (dev->kdrv == RTE_KDRV_VFIO &&
> +				    rte_pci_match(drv, dev))
> +					return 1;
> +			}
> +		}
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Get iommu class of PCI devices on the bus.
> + */
> +enum rte_iova_mode
> +rte_pci_get_iommu_class(void)
> +{
> +	bool is_bound;
> +	bool is_vfio_noiommu_enabled = true;
> +	bool has_iova_va;
> +	bool is_bound_uio;
> +
> +	is_bound = pci_device_is_bound();
> +	if (!is_bound)
> +		return RTE_IOVA_DC;
> +
> +	has_iova_va = pci_device_has_iova_va();
> +	is_bound_uio = pci_device_bound_uio();
> +#ifdef VFIO_PRESENT
> +	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
> +#endif
> +
> +	if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
> +		return RTE_IOVA_VA;
> +
> +	if (has_iova_va) {
> +		RTE_LOG(WARNING, EAL, "Some devices want iova as va but pa will be used because.. ");
> +		if (is_vfio_noiommu_enabled)
> +			RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
> +		if (is_bound_uio)
> +			RTE_LOG(WARNING, EAL, "few device bound to UIO\n");
> +	}
> +
> +	return RTE_IOVA_PA;
> +}
> +
>  /* Read PCI config space. */
>  int rte_pci_read_config(const struct rte_pci_device *device,
>  		void *buf, size_t len, off_t offset)
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> index 946df7e31..c8a97b7e7 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
>  	return 0;
>  }
>
> +int
> +vfio_noiommu_is_enabled(void)
> +{
> +	int fd, ret, cnt __rte_unused;
> +	char c;
> +
> +	ret = -1;
> +	fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
> +	if (fd < 0)
> +		return -1;
> +
> +	cnt = read(fd, &c, 1);
> +	if (c == 'Y')
> +		ret = 1;
> +
> +	close(fd);
> +	return ret;
> +}
> +
>  #endif
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
> index 5ff63e5d7..26ea8e119 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
> @@ -150,6 +150,8 @@ struct vfio_config {
>  #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
>  #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
>  #define VFIO_GET_REGION_IDX(x) (x >> 40)
> +#define VFIO_NOIOMMU_MODE      \
> +	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
>
>  /* DMA mapping function prototype.
>   * Takes VFIO container fd as a parameter.
> @@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
>
>  int vfio_mp_sync_setup(void);
>
> +int vfio_noiommu_is_enabled(void);
> +
>  #define SOCKET_REQ_CONTAINER 0x100
>  #define SOCKET_REQ_GROUP 0x200
>  #define SOCKET_CLR_GROUP 0x300
> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> index a69bbb599..5dd40f948 100644
> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> @@ -206,6 +206,7 @@ DPDK_17.08 {
>  	rte_bus_find_by_device;
>  	rte_bus_find_by_name;
>  	rte_pci_match;
> +	rte_pci_get_iommu_class;
>
>  } DPDK_17.05;
>
>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v4 06/12] bus: get iommu class
  2017-07-18  5:59       ` [PATCH v4 06/12] bus: " Santosh Shukla
@ 2017-07-18 11:05         ` Hemant Agrawal
  2017-07-18 11:16           ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Hemant Agrawal @ 2017-07-18 11:05 UTC (permalink / raw)
  To: Santosh Shukla, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On 7/18/2017 11:29 AM, Santosh Shukla wrote:
> API(rte_bus_get_iommu_class) helps to automatically detect and select
> appropriate iova mapping scheme for iommu capable device on that bus.
>
> Algorithm for iova scheme selection for bus:
> 0. Iterate through bus_list.
> 1. Collect each bus iova mode value and update into 'mode' var.
> 2. Mode selection scheme is:
> if mode == 0 then iova mode is _pa,
> if mode == 1 then iova mode is _pa,
> if mode == 2 then iova mode is _va,
> if mode == 3 then iova mode ia _pa.
>
> So mode !=2  will be default iova mode (_pa).
>
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
> v3 --> v4:
>  - Initialized mode to RTE_IOVA_DC in rte_bus_get_iommu_class.
>
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
>  lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
>  lib/librte_eal/common/eal_common_pci.c          |  1 +
>  lib/librte_eal/common/include/rte_bus.h         | 22 ++++++++++++++++++++++
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>  5 files changed, 48 insertions(+)
>
> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> index 4b25318be..b9ee82b6b 100644
> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> @@ -202,6 +202,7 @@ DPDK_17.08 {
>  	rte_bus_find_by_name;
>  	rte_pci_match;
>  	rte_pci_get_iommu_class;
> +	rte_bus_get_iommu_class;
>
>  } DPDK_17.05;
>
> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
> index 08bec2d93..a30a8982e 100644
> --- a/lib/librte_eal/common/eal_common_bus.c
> +++ b/lib/librte_eal/common/eal_common_bus.c
> @@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
>  		c[0] = '\0';
>  	return rte_bus_find(NULL, bus_can_parse, name);
>  }
> +
> +
> +/*
> + * Get iommu class of devices on the bus.
> + */
> +enum rte_iova_mode
> +rte_bus_get_iommu_class(void)
> +{
> +	int mode = RTE_IOVA_DC;
> +	struct rte_bus *bus;
> +
> +	TAILQ_FOREACH(bus, &rte_bus_list, next) {
> +
> +		if (bus->get_iommu_class)
> +			mode |= bus->get_iommu_class();
> +	}
> +
> +	if (mode != RTE_IOVA_VA) {
> +		/* Use default IOVA mode */
> +		mode = RTE_IOVA_PA;
> +	}
> +	return mode;
> +}
> diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
> index 8b6ecebd6..bdf2e7c3a 100644
> --- a/lib/librte_eal/common/eal_common_pci.c
> +++ b/lib/librte_eal/common/eal_common_pci.c
> @@ -552,6 +552,7 @@ struct rte_pci_bus rte_pci_bus = {
>  		.plug = pci_plug,
>  		.unplug = pci_unplug,
>  		.parse = pci_parse,
> +		.get_iommu_class = rte_pci_get_iommu_class,
>  	},
>  	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
>  	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
> index e06084253..94f1fdfca 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -182,6 +182,17 @@ struct rte_bus_conf {
>  	enum rte_bus_scan_mode scan_mode; /**< Scan policy. */
>  };
>
> +
> +/**
> + * Get iommu class of devices on the bus.
> + * Check that those devices are attached to iommu driver.

Can we try to improve this description.
" Get common iommu class of the all the devices on the bus. Bus may 
check that those devices are attached to iommu driver.
If not devices are attached to the bus. Bus may return with don't core 
value."

otherwise
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

> + *
> + * @return
> + *      enum rte_iova_mode value.
> + */
> +typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
> +
> +
>  /**
>   * A structure describing a generic bus.
>   */
> @@ -195,6 +206,7 @@ struct rte_bus {
>  	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
>  	rte_bus_parse_t parse;       /**< Parse a device name */
>  	struct rte_bus_conf conf;    /**< Bus configuration */
> +	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>  };
>
>  /**
> @@ -294,6 +306,16 @@ struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
>   */
>  struct rte_bus *rte_bus_find_by_name(const char *busname);
>
> +
> +/**
> + * Get iommu class of devices on the bus.
> + * Check that those devices are attached to iommu driver.

Get the common iommu class of devices bound on to buses available in the 
system. The default mode is PA.

> + *
> + * @return
> + *     enum rte_iova_mode value.
> + */
> +enum rte_iova_mode rte_bus_get_iommu_class(void);
> +
>  /**
>   * Helper for Bus registration.
>   * The constructor has higher priority than PMD constructors.
> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> index 5dd40f948..705af3adc 100644
> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> @@ -207,6 +207,7 @@ DPDK_17.08 {
>  	rte_bus_find_by_name;
>  	rte_pci_match;
>  	rte_pci_get_iommu_class;
> +	rte_bus_get_iommu_class;
>
>  } DPDK_17.05;
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v4 06/12] bus: get iommu class
  2017-07-18 11:05         ` Hemant Agrawal
@ 2017-07-18 11:16           ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-07-18 11:16 UTC (permalink / raw)
  To: Hemant Agrawal, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

Hi Hemant,

On Tuesday 18 July 2017 04:35 PM, Hemant Agrawal wrote:

> On 7/18/2017 11:29 AM, Santosh Shukla wrote:
>> API(rte_bus_get_iommu_class) helps to automatically detect and select
>> appropriate iova mapping scheme for iommu capable device on that bus.
>>
>> Algorithm for iova scheme selection for bus:
>> 0. Iterate through bus_list.
>> 1. Collect each bus iova mode value and update into 'mode' var.
>> 2. Mode selection scheme is:
>> if mode == 0 then iova mode is _pa,
>> if mode == 1 then iova mode is _pa,
>> if mode == 2 then iova mode is _va,
>> if mode == 3 then iova mode ia _pa.
>>
>> So mode !=2  will be default iova mode (_pa).
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> ---
>> v3 --> v4:
>>  - Initialized mode to RTE_IOVA_DC in rte_bus_get_iommu_class.
>>
>>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
>>  lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
>>  lib/librte_eal/common/eal_common_pci.c          |  1 +
>>  lib/librte_eal/common/include/rte_bus.h         | 22 ++++++++++++++++++++++
>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>  5 files changed, 48 insertions(+)
>>
>> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>> index 4b25318be..b9ee82b6b 100644
>> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>> @@ -202,6 +202,7 @@ DPDK_17.08 {
>>      rte_bus_find_by_name;
>>      rte_pci_match;
>>      rte_pci_get_iommu_class;
>> +    rte_bus_get_iommu_class;
>>
>>  } DPDK_17.05;
>>
>> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
>> index 08bec2d93..a30a8982e 100644
>> --- a/lib/librte_eal/common/eal_common_bus.c
>> +++ b/lib/librte_eal/common/eal_common_bus.c
>> @@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
>>          c[0] = '\0';
>>      return rte_bus_find(NULL, bus_can_parse, name);
>>  }
>> +
>> +
>> +/*
>> + * Get iommu class of devices on the bus.
>> + */
>> +enum rte_iova_mode
>> +rte_bus_get_iommu_class(void)
>> +{
>> +    int mode = RTE_IOVA_DC;
>> +    struct rte_bus *bus;
>> +
>> +    TAILQ_FOREACH(bus, &rte_bus_list, next) {
>> +
>> +        if (bus->get_iommu_class)
>> +            mode |= bus->get_iommu_class();
>> +    }
>> +
>> +    if (mode != RTE_IOVA_VA) {
>> +        /* Use default IOVA mode */
>> +        mode = RTE_IOVA_PA;
>> +    }
>> +    return mode;
>> +}
>> diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
>> index 8b6ecebd6..bdf2e7c3a 100644
>> --- a/lib/librte_eal/common/eal_common_pci.c
>> +++ b/lib/librte_eal/common/eal_common_pci.c
>> @@ -552,6 +552,7 @@ struct rte_pci_bus rte_pci_bus = {
>>          .plug = pci_plug,
>>          .unplug = pci_unplug,
>>          .parse = pci_parse,
>> +        .get_iommu_class = rte_pci_get_iommu_class,
>>      },
>>      .device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
>>      .driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
>> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
>> index e06084253..94f1fdfca 100644
>> --- a/lib/librte_eal/common/include/rte_bus.h
>> +++ b/lib/librte_eal/common/include/rte_bus.h
>> @@ -182,6 +182,17 @@ struct rte_bus_conf {
>>      enum rte_bus_scan_mode scan_mode; /**< Scan policy. */
>>  };
>>
>> +
>> +/**
>> + * Get iommu class of devices on the bus.
>> + * Check that those devices are attached to iommu driver.
>
> Can we try to improve this description.
> " Get common iommu class of the all the devices on the bus. Bus may check that those devices are attached to iommu driver.
> If not devices are attached to the bus. Bus may return with don't core value."
>
> otherwise
> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>
We'll reword description in v5. Thanks for suggestion.

>> + *
>> + * @return
>> + *      enum rte_iova_mode value.
>> + */
>> +typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
>> +
>> +
>>  /**
>>   * A structure describing a generic bus.
>>   */
>> @@ -195,6 +206,7 @@ struct rte_bus {
>>      rte_bus_unplug_t unplug;     /**< Remove single device from driver */
>>      rte_bus_parse_t parse;       /**< Parse a device name */
>>      struct rte_bus_conf conf;    /**< Bus configuration */
>> +    rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>>  };
>>
>>  /**
>> @@ -294,6 +306,16 @@ struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
>>   */
>>  struct rte_bus *rte_bus_find_by_name(const char *busname);
>>
>> +
>> +/**
>> + * Get iommu class of devices on the bus.
>> + * Check that those devices are attached to iommu driver.
>
> Get the common iommu class of devices bound on to buses available in the system. The default mode is PA.
>
ditto... in v5.

>> + *
>> + * @return
>> + *     enum rte_iova_mode value.
>> + */
>> +enum rte_iova_mode rte_bus_get_iommu_class(void);
>> +
>>  /**
>>   * Helper for Bus registration.
>>   * The constructor has higher priority than PMD constructors.
>> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>> index 5dd40f948..705af3adc 100644
>> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>> @@ -207,6 +207,7 @@ DPDK_17.08 {
>>      rte_bus_find_by_name;
>>      rte_pci_match;
>>      rte_pci_get_iommu_class;
>> +    rte_bus_get_iommu_class;
>>
>>  } DPDK_17.05;
>>
>>
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v4 08/12] linuxapp/eal: auto detect iova mode
  2017-07-18  5:59       ` [PATCH v4 08/12] linuxapp/eal: auto detect iova mode Santosh Shukla
@ 2017-07-18 11:34         ` Hemant Agrawal
  2017-07-18 11:56           ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Hemant Agrawal @ 2017-07-18 11:34 UTC (permalink / raw)
  To: Santosh Shukla, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On 7/18/2017 11:29 AM, Santosh Shukla wrote:
> - Moving late bus scanning to up..just after eal_parsing.
> - Auto detect iova mapping mode, based on the result of
>   rte_bus_scan_iommu_class.
>
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>  lib/librte_eal/linuxapp/eal/eal.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
> index fffdf0d15..49b52ce4f 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -798,6 +798,15 @@ rte_eal_init(int argc, char **argv)
>  		return -1;
>  	}
>
> +	if (rte_bus_scan()) {
> +		rte_eal_init_alert("Cannot scan the buses for devices\n");
> +		rte_errno = ENODEV;
> +		return -1;
> +	}
> +
> +	/* autodetect the iova mapping mode (default is iova_pa) */
> +	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
> +
Santosh,
      With some workaround in fslmc bus scanning/probe code. I am able 
to test it. It works ok.

Post 17.08, we will be submitting the rework of fslmc bus so that this 
patch will not break the dpaa2 platform support.

Regards,
Hemant

>  	if (internal_config.no_hugetlbfs == 0 &&
>  			internal_config.process_type != RTE_PROC_SECONDARY &&
>  			internal_config.xen_dom0_support == 0 &&
> @@ -895,12 +904,6 @@ rte_eal_init(int argc, char **argv)
>  		return -1;
>  	}
>
> -	if (rte_bus_scan()) {
> -		rte_eal_init_alert("Cannot scan the buses for devices\n");
> -		rte_errno = ENODEV;
> -		return -1;
> -	}
> -
>  	RTE_LCORE_FOREACH_SLAVE(i) {
>
>  		/*
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v4 08/12] linuxapp/eal: auto detect iova mode
  2017-07-18 11:34         ` Hemant Agrawal
@ 2017-07-18 11:56           ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-07-18 11:56 UTC (permalink / raw)
  To: Hemant Agrawal, thomas, dev
  Cc: bruce.richardson, jerin.jacob, shreyansh.jain, gaetan.rivet,
	sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz

On Tuesday 18 July 2017 05:04 PM, Hemant Agrawal wrote:

> On 7/18/2017 11:29 AM, Santosh Shukla wrote:
>> - Moving late bus scanning to up..just after eal_parsing.
>> - Auto detect iova mapping mode, based on the result of
>>   rte_bus_scan_iommu_class.
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> ---
>>  lib/librte_eal/linuxapp/eal/eal.c | 15 +++++++++------
>>  1 file changed, 9 insertions(+), 6 deletions(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
>> index fffdf0d15..49b52ce4f 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal.c
>> @@ -798,6 +798,15 @@ rte_eal_init(int argc, char **argv)
>>          return -1;
>>      }
>>
>> +    if (rte_bus_scan()) {
>> +        rte_eal_init_alert("Cannot scan the buses for devices\n");
>> +        rte_errno = ENODEV;
>> +        return -1;
>> +    }
>> +
>> +    /* autodetect the iova mapping mode (default is iova_pa) */
>> +    rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
>> +
> Santosh,
>      With some workaround in fslmc bus scanning/probe code. I am able to test it. It works ok.
>
> Post 17.08, we will be submitting the rework of fslmc bus so that this patch will not break the dpaa2 platform support.
>
Cool ;).

> Regards,
> Hemant
>
>>      if (internal_config.no_hugetlbfs == 0 &&
>>              internal_config.process_type != RTE_PROC_SECONDARY &&
>>              internal_config.xen_dom0_support == 0 &&
>> @@ -895,12 +904,6 @@ rte_eal_init(int argc, char **argv)
>>          return -1;
>>      }
>>
>> -    if (rte_bus_scan()) {
>> -        rte_eal_init_alert("Cannot scan the buses for devices\n");
>> -        rte_errno = ENODEV;
>> -        return -1;
>> -    }
>> -
>>      RTE_LCORE_FOREACH_SLAVE(i) {
>>
>>          /*
>>
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus
  2017-07-18  5:59     ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                         ` (11 preceding siblings ...)
  2017-07-18  5:59       ` [PATCH v4 12/12] eal/rte_malloc: " Santosh Shukla
@ 2017-07-21  8:07       ` Maxime Coquelin
  2017-07-24  8:39       ` [PATCH v5 " Santosh Shukla
  13 siblings, 0 replies; 248+ messages in thread
From: Maxime Coquelin @ 2017-07-21  8:07 UTC (permalink / raw)
  To: Santosh Shukla, thomas, dev
  Cc: bruce.richardson, jerin.jacob, hemant.agrawal, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	olivier.matz

Hi Santosh,

On 07/18/2017 07:59 AM, Santosh Shukla wrote:
> v4:
> Introducing RTE_PCI_DRV_IOVA_AS_VA flag for autodetection of iova va mapping.
> If a PCI driver demand for IOVA as VA scheme then the driver can add it in the
> PCI driver registration function.
> 
> Algorithm to select IOVA as VA for PCI bus case:
>      0. If no device bound then return with RTE_IOVA_DC mapping mode,
>      else goto 1).
>      1. Look for device attached to vfio kdrv and has .drv_flag set
>      to RTE_PCI_DRV_IOVA_AS_VA.
>      2. Look for any device attached to UIO class of driver.
>      3. Check for vfio-noiommu mode enabled.
>      
>      If 2) & 3) is false and 1) is true then select
>      mapping scheme as RTE_IOVA_VA. Otherwise use default
>      mapping scheme (RTE_IOVA_PA).
> 
> That way, Bus can truly autodetect the iova mapping mode for
> a device Or a set of the device.
> 
> 
> Patch series rebased on 'a599eb31f2e477674fc6176cdf989ee17432b552'.
> 
> * Re-introduced RTE_IOVA_DC (Don't care mode) for no-device found case.
>    (Identified by Hemant [5]).
> * Renamed flag from RTE_PCI_DRV_NEED_IOVA_VA to RTE_PCI_DRV_IOVA_AS_VA
>    (Suggested by Maxime[6]).
> * Based on the discussion on the thread [3], [6] and [5].
> 
> v3 --> v4:
> - Re-introduced RTE_IOVA_DEC mode (Suggested by Hemant [5]).
> - Renamed flag to RTE_PCI_DRV_IOVA_AS_VA (Suggested by Maxime).
> - Reworded WARNING message(suggested by Maxime[7]).
> - Created a separate patch for rte_pci_get_iommu_class (suggested by Maxime[]).
> - Added VFIO_PRESENT ifdef build fix.
> 
> v2 --> v3:
> - Removed rte_mempool_virt2phy (suggested by Olivier [4])
> 
> v1 --> v2:
> - Removed override eal option i.e. (--iova-mode=<>) Because we have means to
>    truly autodetect the iova mode.
> - Introduced RTE_PCI_DRV_NEED_IOVA_VA drv_flag (Suggested by Maxime [3]).
> - Using NEED_IOVA_VA drv_flag in autodetection logic.
> - Removed Linux version check macro in vfio code, As per Maxime feedback.
> - Moved rte_pci_match API from local to global.
> 
> Patch Summary:
> 0) 1st: Introducing a new flag in rte_pci_drv
> 1) 2nd: declare rte_pci_match api in pci header. Required for autodetection in
> follow up patches.
> 2) 3rd: declare rte_pci_get_iommu_class.
> 3) 4nd - 5th: autodetection mapping infrastructure for Linux/bsdapp.
> 4) 6th: Introduces global bus API named rte_bus_get_iommu_class.
> 5) 7th: iova mode helper API.
> 6) 8th - 9th: Calls rte_bus_get_iommu_class API for Linux/bsdapp and returns
> their iova mode.
> 7) 10th: Check iova mode and accordingly map vfio.dma_map to _pa or _va.
> 8) 11th - 12th: Check for IOVA_VA mode in below APIs
>          - rte_mem_virt2phy
>          - rte_malloc_virt2phy
> 
> Test History:
> - Tested for x86/XL710 40G NIC card for both modes (iova_va/pa).
> - Tested for arm64/thunderx vNIC Integrated NIC for both modes
> - Tested for arm64/Octeontx integrated NICs for only
>    Iova_va mode(It supports only one mode.)
> - Ran standalone tests like mempool_autotest, mbuf_autotest.
> - Verified for Doxygen.
> 
> Work History:
> For v1, Refer [1].
> For v2, Refer [2].
> For v3, Refer [9].
> 
> 
> Checkpatch result:
> * Debug message - WARNING: line over 80 characters
> 
> Thanks.,
> 
> [1] https://www.mail-archive.com/dev@dpdk.org/msg67438.html
> [2] https://www.mail-archive.com/dev@dpdk.org/msg70674.html
> [3] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
> [4] https://www.mail-archive.com/dev@dpdk.org/msg70692.html
> [5] http://dpdk.org/ml/archives/dev/2017-July/071282.html
> [6] http://dpdk.org/ml/archives/dev/2017-July/070951.html
> [7] http://dpdk.org/ml/archives/dev/2017-July/070941.html
> [8] http://dpdk.org/ml/archives/dev/2017-July/070952.html
> [9] http://dpdk.org/ml/archives/dev/2017-July/070918.html
> 
> 
> Santosh Shukla (12):
>    eal/pci: introduce PCI driver iova as va flag
>    eal/pci: export match function
>    eal/pci: get iommu class
>    bsdapp/eal_pci: get iommu class
>    linuxapp/eal_pci: get iommu class
>    bus: get iommu class
>    eal: introduce iova mode helper api
>    linuxapp/eal: auto detect iova mode
>    bsdapp/eal: auto detect iova mapping mode
>    linuxapp/eal_vfio: honor iova mode before mapping
>    linuxapp/eal_memory: honor iova mode in virt2phy
>    eal/rte_malloc: honor iova mode in virt2phy
> 
>   lib/librte_eal/bsdapp/eal/eal.c                 | 21 ++++--
>   lib/librte_eal/bsdapp/eal/eal_pci.c             | 10 +++
>   lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  4 ++
>   lib/librte_eal/common/eal_common_bus.c          | 23 ++++++
>   lib/librte_eal/common/eal_common_pci.c          | 11 +--
>   lib/librte_eal/common/include/rte_bus.h         | 32 +++++++++
>   lib/librte_eal/common/include/rte_eal.h         | 12 ++++
>   lib/librte_eal/common/include/rte_pci.h         | 28 ++++++++
>   lib/librte_eal/common/rte_malloc.c              |  9 ++-
>   lib/librte_eal/linuxapp/eal/eal.c               | 21 ++++--
>   lib/librte_eal/linuxapp/eal/eal_memory.c        |  3 +
>   lib/librte_eal/linuxapp/eal/eal_pci.c           | 95 +++++++++++++++++++++++++
>   lib/librte_eal/linuxapp/eal/eal_vfio.c          | 29 +++++++-
>   lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>   lib/librte_eal/linuxapp/eal/rte_eal_version.map |  4 ++
>   15 files changed, 282 insertions(+), 24 deletions(-)
> 

With Hermant's comments on patch 6 taken into account, feel free
to add my:
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v5 00/12] Infrastructure to detect iova mapping on the bus
  2017-07-18  5:59     ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                         ` (12 preceding siblings ...)
  2017-07-21  8:07       ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Maxime Coquelin
@ 2017-07-24  8:39       ` Santosh Shukla
  2017-07-24  8:39         ` [PATCH v5 01/12] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
                           ` (12 more replies)
  13 siblings, 13 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-24  8:39 UTC (permalink / raw)
  To: thomas, dev
  Cc: hemant.agrawal, bruce.richardson, jerin.jacob, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

v5:
Introducing RTE_PCI_DRV_IOVA_AS_VA flag for autodetection of iova va mapping.
If a PCI driver demand for IOVA as VA scheme then the driver can add it in the
PCI driver registration function.

Algorithm to select IOVA as VA for PCI bus case:
     0. If no device bound then return with RTE_IOVA_DC mapping mode,
     else goto 1).
     1. Look for device attached to vfio kdrv and has .drv_flag set
     to RTE_PCI_DRV_IOVA_AS_VA.
     2. Look for any device attached to UIO class of driver.
     3. Check for vfio-noiommu mode enabled.
     
     If 2) & 3) is false and 1) is true then select
     mapping scheme as RTE_IOVA_VA. Otherwise use default
     mapping scheme (RTE_IOVA_PA).

That way, Bus can truly autodetect the iova mapping mode for
a device Or a set of the device.

Patch series rebased on version-17.08-rc2: 
'67c4b6db68e199247b5dbd63f560582640b180bf'.

v4 --> v5:
- Change DPDK_17.08 to DPDK_17.11 in _version.map.
- Reworded bus api description (suggested by Hemant).
- Added reviewed-by from Maxime in v5.
- Added acked-by from Hemant for pci and bus patches.

v3 --> v4:
- Re-introduced RTE_IOVA_DEC mode (Suggested by Hemant [5]).
- Renamed flag to RTE_PCI_DRV_IOVA_AS_VA (Suggested by Maxime).
- Reworded WARNING message(suggested by Maxime[7]).
- Created a separate patch for rte_pci_get_iommu_class (suggested by Maxime[]).
- Added VFIO_PRESENT ifdef build fix.

v2 --> v3:
- Removed rte_mempool_virt2phy (suggested by Olivier [4])

v1 --> v2:
- Removed override eal option i.e. (--iova-mode=<>) Because we have means to
   truly autodetect the iova mode.
- Introduced RTE_PCI_DRV_NEED_IOVA_VA drv_flag (Suggested by Maxime [3]).
- Using NEED_IOVA_VA drv_flag in autodetection logic.
- Removed Linux version check macro in vfio code, As per Maxime feedback.
- Moved rte_pci_match API from local to global.

Patch Summary:
0) 1st: Introducing a new flag in rte_pci_drv
1) 2nd: declare rte_pci_match api in pci header. Required for autodetection in
follow up patches.
2) 3rd: declare rte_pci_get_iommu_class.
3) 4nd - 5th: autodetection mapping infrastructure for Linux/bsdapp.
4) 6th: Introduces global bus API named rte_bus_get_iommu_class.
5) 7th: iova mode helper API.
6) 8th - 9th: Calls rte_bus_get_iommu_class API for Linux/bsdapp and returns
their iova mode.
7) 10th: Check iova mode and accordingly map vfio.dma_map to _pa or _va.
8) 11th - 12th: Check for IOVA_VA mode in below APIs
         - rte_mem_virt2phy
         - rte_malloc_virt2phy

Test History:
- Tested for x86/XL710 40G NIC card for both modes (iova_va/pa).
- Tested for arm64/thunderx vNIC Integrated NIC for both modes
- Tested for arm64/Octeontx integrated NICs for only
   Iova_va mode(It supports only one mode.)
- Ran standalone tests like mempool_autotest, mbuf_autotest.
- Verified for Doxygen.

Work History:
For v1, Refer [1].
For v2, Refer [2].
For v3, Refer [9].
For v4, refer [10].

Checkpatch result:
* Debug message - WARNING: line over 80 characters

Thanks.,

[1] https://www.mail-archive.com/dev@dpdk.org/msg67438.html
[2] https://www.mail-archive.com/dev@dpdk.org/msg70674.html
[3] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
[4] https://www.mail-archive.com/dev@dpdk.org/msg70692.html
[5] http://dpdk.org/ml/archives/dev/2017-July/071282.html
[6] http://dpdk.org/ml/archives/dev/2017-July/070951.html
[7] http://dpdk.org/ml/archives/dev/2017-July/070941.html
[8] http://dpdk.org/ml/archives/dev/2017-July/070952.html
[9] http://dpdk.org/ml/archives/dev/2017-July/070918.html
[10] http://dpdk.org/ml/archives/dev/2017-July/071754.html

Santosh Shukla (12):
  eal/pci: introduce PCI driver iova as va flag
  eal/pci: export match function
  eal/pci: get iommu class
  bsdapp/eal_pci: get iommu class
  linuxapp/eal_pci: get iommu class
  bus: get iommu class
  eal: introduce iova mode helper api
  linuxapp/eal: auto detect iova mode
  bsdapp/eal: auto detect iova mapping mode
  linuxapp/eal_vfio: honor iova mode before mapping
  linuxapp/eal_memory: honor iova mode in virt2phy
  eal/rte_malloc: honor iova mode in virt2phy

 lib/librte_eal/bsdapp/eal/eal.c                 | 21 ++++--
 lib/librte_eal/bsdapp/eal/eal_pci.c             | 10 +++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   | 10 +++
 lib/librte_eal/common/eal_common_bus.c          | 23 ++++++
 lib/librte_eal/common/eal_common_pci.c          | 11 +--
 lib/librte_eal/common/include/rte_bus.h         | 35 +++++++++
 lib/librte_eal/common/include/rte_eal.h         | 12 ++++
 lib/librte_eal/common/include/rte_pci.h         | 28 ++++++++
 lib/librte_eal/common/rte_malloc.c              |  9 ++-
 lib/librte_eal/linuxapp/eal/eal.c               | 21 ++++--
 lib/librte_eal/linuxapp/eal/eal_memory.c        |  3 +
 lib/librte_eal/linuxapp/eal/eal_pci.c           | 95 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 29 +++++++-
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map | 10 +++
 15 files changed, 297 insertions(+), 24 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v5 01/12] eal/pci: introduce PCI driver iova as va flag
  2017-07-24  8:39       ` [PATCH v5 " Santosh Shukla
@ 2017-07-24  8:39         ` Santosh Shukla
  2017-07-24  8:39         ` [PATCH v5 02/12] eal/pci: export match function Santosh Shukla
                           ` (11 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-24  8:39 UTC (permalink / raw)
  To: thomas, dev
  Cc: hemant.agrawal, bruce.richardson, jerin.jacob, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Introducing RTE_PCI_DRV_IOVA_AS_VA flag. Flag used when driver needs
to operate in iova=va mode.

Why driver need iova=va mapping?

On NPU style co-processors like Octeontx, the buffer recycling has been
done in HW, unlike SW model. Here is the data flow:
1) On control path, Fill the HW mempool with buffers(iova as pa address)
2) on rx_burst, HW gives you IOVA address(iova as pa address)
3) As application expects VA to operate on it, rx_burst() needs to
convert to _va from _pa. Which is very expensive.
Instead of that if iova as va mapping, we can avoid the cost of
converting with help of IOMMU/SMMU.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
v3 --> v4:
- Renamed RTE_PCI_DRV_NEED_IOVA_VA to RTE_PCI_DRV_IOVA_AS_VA.
(Suggested by Maxime)

 lib/librte_eal/common/include/rte_pci.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 8b123391c..743392f91 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -202,6 +202,8 @@ struct rte_pci_bus {
 #define RTE_PCI_DRV_INTR_RMV 0x0010
 /** Device driver needs to keep mapped resources if unsupported dev detected */
 #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
+/** Device driver supports iova as va */
+#define RTE_PCI_DRV_IOVA_AS_VA 0X0040
 
 /**
  * A structure describing a PCI mapping.
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v5 02/12] eal/pci: export match function
  2017-07-24  8:39       ` [PATCH v5 " Santosh Shukla
  2017-07-24  8:39         ` [PATCH v5 01/12] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
@ 2017-07-24  8:39         ` Santosh Shukla
  2017-07-24  8:39         ` [PATCH v5 03/12] eal/pci: get iommu class Santosh Shukla
                           ` (10 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-24  8:39 UTC (permalink / raw)
  To: thomas, dev
  Cc: hemant.agrawal, bruce.richardson, jerin.jacob, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Export rte_pci_match() function as it needed in the followup patch.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
v4 --> v5:
- Changed DPDK_17.08 to DPDK_17.11 in _version.map

 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  7 +++++++
 lib/librte_eal/common/eal_common_pci.c          | 10 +---------
 lib/librte_eal/common/include/rte_pci.h         | 15 +++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +++++++
 4 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index f689f0c8f..3d3c70a88 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -236,3 +236,10 @@ EXPERIMENTAL {
 	rte_service_unregister;
 
 } DPDK_17.08;
+
+DPDK_17.11 {
+	global:
+
+	rte_pci_match;
+
+} DPDK_17.08;
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 52fd38cdd..3b7d0a0ee 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -150,16 +150,8 @@ pci_unmap_resource(void *requested_addr, size_t size)
 
 /*
  * Match the PCI Driver and Device using the ID Table
- *
- * @param pci_drv
- *	PCI driver from which ID table would be extracted
- * @param pci_dev
- *	PCI device to match against the driver
- * @return
- *	1 for successful match
- *	0 for unsuccessful match
  */
-static int
+int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev)
 {
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 743392f91..47f0532e4 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -368,6 +368,21 @@ int rte_pci_scan(void);
 int
 rte_pci_probe(void);
 
+/*
+ * Match the PCI Driver and Device using the ID Table
+ *
+ * @param pci_drv
+ *      PCI driver from which ID table would be extracted
+ * @param pci_dev
+ *      PCI device to match against the driver
+ * @return
+ *      1 for successful match
+ *      0 for unsuccessful match
+ */
+int
+rte_pci_match(const struct rte_pci_driver *pci_drv,
+	      const struct rte_pci_device *pci_dev);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 202072189..7d7fff496 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -241,3 +241,10 @@ EXPERIMENTAL {
 	rte_service_unregister;
 
 } DPDK_17.08;
+
+DPDK_17.11 {
+	global:
+
+	rte_pci_match;
+
+} DPDK_17.08;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v5 03/12] eal/pci: get iommu class
  2017-07-24  8:39       ` [PATCH v5 " Santosh Shukla
  2017-07-24  8:39         ` [PATCH v5 01/12] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
  2017-07-24  8:39         ` [PATCH v5 02/12] eal/pci: export match function Santosh Shukla
@ 2017-07-24  8:39         ` Santosh Shukla
  2017-07-24  8:39         ` [PATCH v5 04/12] bsdapp/eal_pci: " Santosh Shukla
                           ` (9 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-24  8:39 UTC (permalink / raw)
  To: thomas, dev
  Cc: hemant.agrawal, bruce.richardson, jerin.jacob, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Introducing rte_pci_get_iommu_class API which helps to get iommu class
of PCI device on the bus and returns preferred iova mapping mode for
PCI bus.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
v3 --> v4:
- Created a separate patch per suggestion from Maxime.
Initially thought to squash patch into [01/12] but
then [01/12] will have more context so decided to
keep it as separate patch.

 lib/librte_eal/common/include/rte_bus.h | 10 ++++++++++
 lib/librte_eal/common/include/rte_pci.h | 11 +++++++++++
 2 files changed, 21 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index c79368d3c..9e40687e5 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -55,6 +55,16 @@ extern "C" {
 /** Double linked list of buses */
 TAILQ_HEAD(rte_bus_list, rte_bus);
 
+
+/**
+ * IOVA mapping mode.
+ */
+enum rte_iova_mode {
+	RTE_IOVA_DC = 0,	/* Don't care mode */
+	RTE_IOVA_PA = (1 << 0),
+	RTE_IOVA_VA = (1 << 1)
+};
+
 /**
  * Bus specific scan for devices attached on the bus.
  * For each bus object, the scan would be responsible for finding devices and
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 47f0532e4..a67d77f22 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -383,6 +383,17 @@ int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev);
 
+
+/**
+ * Get iommu class of PCI devices on the bus.
+ * And return their preferred iova mapping mode.
+ *
+ * @return
+ *   - enum rte_iova_mode.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v5 04/12] bsdapp/eal_pci: get iommu class
  2017-07-24  8:39       ` [PATCH v5 " Santosh Shukla
                           ` (2 preceding siblings ...)
  2017-07-24  8:39         ` [PATCH v5 03/12] eal/pci: get iommu class Santosh Shukla
@ 2017-07-24  8:39         ` Santosh Shukla
  2017-07-24  8:39         ` [PATCH v5 05/12] linuxapp/eal_pci: " Santosh Shukla
                           ` (8 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-24  8:39 UTC (permalink / raw)
  To: thomas, dev
  Cc: hemant.agrawal, bruce.richardson, jerin.jacob, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Bsdapp case returns default iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
v3 --> v4:
- Removed rte_pci_get_iommu_class api declaration. Now that
  sits into separate patch [03/12].

 lib/librte_eal/bsdapp/eal/eal_pci.c           | 10 ++++++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map |  1 +
 2 files changed, 11 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c
index d3fb3c2d0..b45649428 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -403,6 +403,16 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Get iommu class of pci devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	/* Supports only RTE_KDRV_NIC_UIO */
+	return RTE_IOVA_PA;
+}
+
 int
 pci_update_device(const struct rte_pci_addr *addr)
 {
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 3d3c70a88..8d5bc5000 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -241,5 +241,6 @@ DPDK_17.11 {
 	global:
 
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.08;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v5 05/12] linuxapp/eal_pci: get iommu class
  2017-07-24  8:39       ` [PATCH v5 " Santosh Shukla
                           ` (3 preceding siblings ...)
  2017-07-24  8:39         ` [PATCH v5 04/12] bsdapp/eal_pci: " Santosh Shukla
@ 2017-07-24  8:39         ` Santosh Shukla
  2017-07-24  8:39         ` [PATCH v5 06/12] bus: " Santosh Shukla
                           ` (7 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-24  8:39 UTC (permalink / raw)
  To: thomas, dev
  Cc: hemant.agrawal, bruce.richardson, jerin.jacob, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Get iommu class of PCI device on the bus and returns preferred iova
mapping mode for that bus.

Algorithm for iova scheme selection for PCI bus:
0. If no device bound then return with RTE_IOVA_DC mapping mode,
else goto 1).
1. Look for device attached to vfio kdrv and has .drv_flag set
to RTE_PCI_DRV_IOVA_AS_VA.
2. Look for any device attached to UIO class of driver.
3. Check for vfio-noiommu mode enabled.

If 2) & 3) is false and 1) is true then select
mapping scheme as RTE_IOVA_VA. Otherwise use default
mapping scheme (RTE_IOVA_PA).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
v3 --> v4 :
- Reworded WARNING message (suggested by Maxime)
- Added pci_device_is_bound func to check for no device case
  (suggested by Hemant).
- Added ifdef vfio_present.

v1 --> v2:
- Removed Linux version check in vfio_noiommu func. Refer [1].
  - Extending autodetction logic for _iommu_class.
    Refer [2].

[1] https://www.mail-archive.com/dev@dpdk.org/msg70108.html
[2] https://www.mail-archive.com/dev@dpdk.org/msg70279.html

 lib/librte_eal/linuxapp/eal/eal_pci.c           | 95 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 4 files changed, 119 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 2041d5f34..81d980817 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -45,6 +45,7 @@
 #include "eal_filesystem.h"
 #include "eal_private.h"
 #include "eal_pci_init.h"
+#include "eal_vfio.h"
 
 /**
  * @file
@@ -483,6 +484,100 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Is pci device bound to any kdrv
+ */
+static inline int
+pci_device_is_bound(void)
+{
+	struct rte_pci_device *dev = NULL;
+	int ret = 0;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_UNKNOWN ||
+		    dev->kdrv == RTE_KDRV_NONE) {
+			continue;
+		} else {
+			ret = 1;
+			break;
+		}
+	}
+	return ret;
+}
+
+/*
+ * Any one of the device bound to uio
+ */
+static inline int
+pci_device_bound_uio(void)
+{
+	struct rte_pci_device *dev = NULL;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
+		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
+			return 1;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Any one of the device has iova as va
+ */
+static inline int
+pci_device_has_iova_va(void)
+{
+	struct rte_pci_device *dev = NULL;
+	struct rte_pci_driver *drv = NULL;
+
+	FOREACH_DRIVER_ON_PCIBUS(drv) {
+		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
+			FOREACH_DEVICE_ON_PCIBUS(dev) {
+				if (dev->kdrv == RTE_KDRV_VFIO &&
+				    rte_pci_match(drv, dev))
+					return 1;
+			}
+		}
+	}
+	return 0;
+}
+
+/*
+ * Get iommu class of PCI devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	bool is_bound;
+	bool is_vfio_noiommu_enabled = true;
+	bool has_iova_va;
+	bool is_bound_uio;
+
+	is_bound = pci_device_is_bound();
+	if (!is_bound)
+		return RTE_IOVA_DC;
+
+	has_iova_va = pci_device_has_iova_va();
+	is_bound_uio = pci_device_bound_uio();
+#ifdef VFIO_PRESENT
+	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
+#endif
+
+	if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
+		return RTE_IOVA_VA;
+
+	if (has_iova_va) {
+		RTE_LOG(WARNING, EAL, "Some devices want iova as va but pa will be used because.. ");
+		if (is_vfio_noiommu_enabled)
+			RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
+		if (is_bound_uio)
+			RTE_LOG(WARNING, EAL, "few device bound to UIO\n");
+	}
+
+	return RTE_IOVA_PA;
+}
+
 /* Read PCI config space. */
 int rte_pci_read_config(const struct rte_pci_device *device,
 		void *buf, size_t len, off_t offset)
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 946df7e31..c8a97b7e7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
 	return 0;
 }
 
+int
+vfio_noiommu_is_enabled(void)
+{
+	int fd, ret, cnt __rte_unused;
+	char c;
+
+	ret = -1;
+	fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
+	if (fd < 0)
+		return -1;
+
+	cnt = read(fd, &c, 1);
+	if (c == 'Y')
+		ret = 1;
+
+	close(fd);
+	return ret;
+}
+
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index 5ff63e5d7..26ea8e119 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -150,6 +150,8 @@ struct vfio_config {
 #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
 #define VFIO_GET_REGION_IDX(x) (x >> 40)
+#define VFIO_NOIOMMU_MODE      \
+	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
 
 /* DMA mapping function prototype.
  * Takes VFIO container fd as a parameter.
@@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
 
 int vfio_mp_sync_setup(void);
 
+int vfio_noiommu_is_enabled(void);
+
 #define SOCKET_REQ_CONTAINER 0x100
 #define SOCKET_REQ_GROUP 0x200
 #define SOCKET_CLR_GROUP 0x300
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 7d7fff496..bf68f02bc 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -246,5 +246,6 @@ DPDK_17.11 {
 	global:
 
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.08;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v5 06/12] bus: get iommu class
  2017-07-24  8:39       ` [PATCH v5 " Santosh Shukla
                           ` (4 preceding siblings ...)
  2017-07-24  8:39         ` [PATCH v5 05/12] linuxapp/eal_pci: " Santosh Shukla
@ 2017-07-24  8:39         ` Santosh Shukla
  2017-07-24  8:39         ` [PATCH v5 07/12] eal: introduce iova mode helper api Santosh Shukla
                           ` (6 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-24  8:39 UTC (permalink / raw)
  To: thomas, dev
  Cc: hemant.agrawal, bruce.richardson, jerin.jacob, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

API(rte_bus_get_iommu_class) helps to automatically detect and select
appropriate iova mapping scheme for iommu capable device on that bus.

Algorithm for iova scheme selection for bus:
0. Iterate through bus_list.
1. Collect each bus iova mode value and update into 'mode' var.
2. Mode selection scheme is:
if mode == 0 then iova mode is _pa,
if mode == 1 then iova mode is _pa,
if mode == 2 then iova mode is _va,
if mode == 3 then iova mode ia _pa.

So mode !=2  will be default iova mode (_pa).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
v4 --> v5:
 - Reworded bus API description (Suggested by Hemant).

v3 --> v4:
 - Initialized mode to RTE_IOVA_DC in rte_bus_get_iommu_class.

 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
 lib/librte_eal/common/eal_common_pci.c          |  1 +
 lib/librte_eal/common/include/rte_bus.h         | 25 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 51 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 8d5bc5000..a30085a32 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -242,5 +242,6 @@ DPDK_17.11 {
 
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 08bec2d93..a30a8982e 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
 		c[0] = '\0';
 	return rte_bus_find(NULL, bus_can_parse, name);
 }
+
+
+/*
+ * Get iommu class of devices on the bus.
+ */
+enum rte_iova_mode
+rte_bus_get_iommu_class(void)
+{
+	int mode = RTE_IOVA_DC;
+	struct rte_bus *bus;
+
+	TAILQ_FOREACH(bus, &rte_bus_list, next) {
+
+		if (bus->get_iommu_class)
+			mode |= bus->get_iommu_class();
+	}
+
+	if (mode != RTE_IOVA_VA) {
+		/* Use default IOVA mode */
+		mode = RTE_IOVA_PA;
+	}
+	return mode;
+}
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 3b7d0a0ee..0f0e4b93b 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -564,6 +564,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.plug = pci_plug,
 		.unplug = pci_unplug,
 		.parse = pci_parse,
+		.get_iommu_class = rte_pci_get_iommu_class,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 9e40687e5..70a291a4d 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -178,6 +178,20 @@ struct rte_bus_conf {
 	enum rte_bus_scan_mode scan_mode; /**< Scan policy. */
 };
 
+
+/**
+ * Get common iommu class of the all the devices on the bus. The bus may
+ * check that those devices are attached to iommu driver.
+ * If no devices are attached to the bus. The bus may return with don't care
+ * (_DC) value.
+ * Otherwise, The bus will return appropriate _pa or _va iova mode.
+ *
+ * @return
+ *      enum rte_iova_mode value.
+ */
+typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
+
+
 /**
  * A structure describing a generic bus.
  */
@@ -191,6 +205,7 @@ struct rte_bus {
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	struct rte_bus_conf conf;    /**< Bus configuration */
+	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
 
 /**
@@ -290,6 +305,16 @@ struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
  */
 struct rte_bus *rte_bus_find_by_name(const char *busname);
 
+
+/**
+ * Get the common iommu class of devices bound on to buses available in the
+ * system. The default mode is PA.
+ *
+ * @return
+ *     enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_bus_get_iommu_class(void);
+
 /**
  * Helper for Bus registration.
  * The constructor has higher priority than PMD constructors.
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index bf68f02bc..780539dc7 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -247,5 +247,6 @@ DPDK_17.11 {
 
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.08;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v5 07/12] eal: introduce iova mode helper api
  2017-07-24  8:39       ` [PATCH v5 " Santosh Shukla
                           ` (5 preceding siblings ...)
  2017-07-24  8:39         ` [PATCH v5 06/12] bus: " Santosh Shukla
@ 2017-07-24  8:39         ` Santosh Shukla
  2017-07-24  8:40         ` [PATCH v5 08/12] linuxapp/eal: auto detect iova mode Santosh Shukla
                           ` (5 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-24  8:39 UTC (permalink / raw)
  To: thomas, dev
  Cc: hemant.agrawal, bruce.richardson, jerin.jacob, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Introducing rte_eal_iova_mode() helper API. This API
used by non-eal library for detecting iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c                 |  6 ++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/include/rte_eal.h         | 12 ++++++++++++
 lib/librte_eal/linuxapp/eal/eal.c               |  6 ++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 26 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 80fe21de3..2a49e9fde 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -119,6 +119,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index a30085a32..2a3a592b2 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -243,5 +243,6 @@ DPDK_17.11 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 0e7363d77..932dc1a96 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -45,6 +45,7 @@
 
 #include <rte_per_lcore.h>
 #include <rte_config.h>
+#include <rte_bus.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -87,6 +88,9 @@ struct rte_config {
 	/** Primary or secondary configuration */
 	enum rte_proc_type_t process_type;
 
+	/** PA or VA mapping mode */
+	enum rte_iova_mode iova_mode;
+
 	/**
 	 * Pointer to memory configuration, which may be shared across multiple
 	 * DPDK instances
@@ -287,6 +291,14 @@ static inline int rte_gettid(void)
 	return RTE_PER_LCORE(_thread_id);
 }
 
+/**
+ * Get the iova mode
+ *
+ * @return
+ *   enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_eal_iova_mode(void);
+
 #define RTE_INIT(func) \
 static void __attribute__((constructor, used)) func(void)
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index b28bbab54..fffdf0d15 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -128,6 +128,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 780539dc7..8b9a13fd8 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -248,5 +248,6 @@ DPDK_17.11 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.08;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v5 08/12] linuxapp/eal: auto detect iova mode
  2017-07-24  8:39       ` [PATCH v5 " Santosh Shukla
                           ` (6 preceding siblings ...)
  2017-07-24  8:39         ` [PATCH v5 07/12] eal: introduce iova mode helper api Santosh Shukla
@ 2017-07-24  8:40         ` Santosh Shukla
  2017-07-24  8:40         ` [PATCH v5 09/12] bsdapp/eal: auto detect iova mapping mode Santosh Shukla
                           ` (4 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-24  8:40 UTC (permalink / raw)
  To: thomas, dev
  Cc: hemant.agrawal, bruce.richardson, jerin.jacob, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

- Moving late bus scanning to up..just after eal_parsing.
- Auto detect iova mapping mode, based on the result of
  rte_bus_scan_iommu_class.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index fffdf0d15..49b52ce4f 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -798,6 +798,15 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			internal_config.xen_dom0_support == 0 &&
@@ -895,12 +904,6 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v5 09/12] bsdapp/eal: auto detect iova mapping mode
  2017-07-24  8:39       ` [PATCH v5 " Santosh Shukla
                           ` (7 preceding siblings ...)
  2017-07-24  8:40         ` [PATCH v5 08/12] linuxapp/eal: auto detect iova mode Santosh Shukla
@ 2017-07-24  8:40         ` Santosh Shukla
  2017-07-24  8:40         ` [PATCH v5 10/12] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
                           ` (3 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-24  8:40 UTC (permalink / raw)
  To: thomas, dev
  Cc: hemant.agrawal, bruce.richardson, jerin.jacob, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

- Moving late bus scanning to up..just after eal_parsing.
- Mapping mode would be default for bsdapp. It supports
  only one pass through mode (RTE_KDRV_NIC_UIO)

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 2a49e9fde..3cb1bd22f 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -541,6 +541,15 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			eal_hugepage_info_init() < 0) {
@@ -620,12 +629,6 @@ rte_eal_init(int argc, char **argv)
 		rte_config.master_lcore, thread_id, cpuset,
 		ret == 0 ? "" : "...");
 
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v5 10/12] linuxapp/eal_vfio: honor iova mode before mapping
  2017-07-24  8:39       ` [PATCH v5 " Santosh Shukla
                           ` (8 preceding siblings ...)
  2017-07-24  8:40         ` [PATCH v5 09/12] bsdapp/eal: auto detect iova mapping mode Santosh Shukla
@ 2017-07-24  8:40         ` Santosh Shukla
  2017-07-24  8:40         ` [PATCH v5 11/12] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
                           ` (2 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-24  8:40 UTC (permalink / raw)
  To: thomas, dev
  Cc: hemant.agrawal, bruce.richardson, jerin.jacob, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Check iova mode and accordingly map iova to pa or va.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c8a97b7e7..b32cd09a2 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
 
 		ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
@@ -792,7 +795,10 @@ vfio_spapr_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
 				 VFIO_DMA_MAP_FLAG_WRITE;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v5 11/12] linuxapp/eal_memory: honor iova mode in virt2phy
  2017-07-24  8:39       ` [PATCH v5 " Santosh Shukla
                           ` (9 preceding siblings ...)
  2017-07-24  8:40         ` [PATCH v5 10/12] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
@ 2017-07-24  8:40         ` Santosh Shukla
  2017-07-24  8:40         ` [PATCH v5 12/12] eal/rte_malloc: " Santosh Shukla
  2017-08-14 16:10         ` [PATCH v6 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-24  8:40 UTC (permalink / raw)
  To: thomas, dev
  Cc: hemant.agrawal, bruce.richardson, jerin.jacob, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index daead31c2..249740645 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -139,6 +139,9 @@ rte_mem_virt2phy(const void *virtaddr)
 	int page_size;
 	off_t offset;
 
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		return (uintptr_t)virtaddr;
+
 	/* when using dom0, /proc/self/pagemap always returns 0, check in
 	 * dpdk memory by browsing the memsegs */
 	if (rte_xen_dom0_supported()) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v5 12/12] eal/rte_malloc: honor iova mode in virt2phy
  2017-07-24  8:39       ` [PATCH v5 " Santosh Shukla
                           ` (10 preceding siblings ...)
  2017-07-24  8:40         ` [PATCH v5 11/12] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
@ 2017-07-24  8:40         ` Santosh Shukla
  2017-08-14 16:10         ` [PATCH v6 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-07-24  8:40 UTC (permalink / raw)
  To: thomas, dev
  Cc: hemant.agrawal, bruce.richardson, jerin.jacob, shreyansh.jain,
	gaetan.rivet, sergio.gonzalez.monroy, anatoly.burakov, stephen,
	maxime.coquelin, olivier.matz, Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/common/rte_malloc.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 5c0627bf4..d65c05a4d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -251,10 +251,17 @@ rte_malloc_set_limit(__rte_unused const char *type,
 phys_addr_t
 rte_malloc_virt2phy(const void *addr)
 {
+	phys_addr_t paddr;
 	const struct malloc_elem *elem = malloc_elem_from_data(addr);
 	if (elem == NULL)
 		return RTE_BAD_PHYS_ADDR;
 	if (elem->ms->phys_addr == RTE_BAD_PHYS_ADDR)
 		return RTE_BAD_PHYS_ADDR;
-	return elem->ms->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		paddr = (uintptr_t)addr;
+	else
+		paddr = elem->ms->phys_addr +
+			((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+	return paddr;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* Re: [PATCH v2 11/12] mempool: honor iova mode in virt2phy
  2017-07-10 14:37                 ` Thomas Monjalon
@ 2017-08-04  4:00                   ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-08-04  4:00 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Olivier Matz, dev, bruce.richardson, jerin.jacob, hemant.agrawal,
	shreyansh.jain, gaetan.rivet, sergio.gonzalez.monroy,
	anatoly.burakov, stephen, maxime.coquelin

Hi Thomas,

On Monday 10 July 2017 08:07 PM, Thomas Monjalon wrote:

> 10/07/2017 16:22, santosh:
>> On Monday 10 July 2017 07:39 PM, Thomas Monjalon wrote:
>>
>>> 10/07/2017 15:56, santosh:
>>>> On Monday 10 July 2017 07:21 PM, Thomas Monjalon wrote:
>>>>
>>>>> 10/07/2017 15:30, santosh:
>>>>>> Hi Olivier,
>>>>>>
>>>>>> On Monday 10 July 2017 05:57 PM, Olivier Matz wrote:
>>>>>>> I didn't check the patchset in detail, but in my understanding,
>>>>>>> what we call physaddr in dpdk is actually a bus address. Shouldn't
>>>>>>> we start to rename some of these fields and functions to avoid
>>>>>>> confusion?
>>>>>> Agree.
>>>>>> While working on iova mode thing and reading these vir2phy api -
>>>>>> confused me more. Actually it should be iova2va, va2iova or pa2iova,iova2pa..
>>>>>> where iova address is nothing but bus address Or we should refer to linux
>>>>>> semantics.
>>>>>>
>>>>>> We thought of addressing semantics after this series, Not a priority in IMO.
>>>>> I think it is a priority to start with semantics.
>>>>> The work is too hard with wrong semantic otherwise.
>>>> Sorry, I don;t agree with you. Semantic shouldn't lower the iova priority.
>>>> iova framework is blocking SoC's. w/o iova framework : One has to live with
>>>> hackish solution for their SoC.
>>>>
>>>> Semantic change in any-case could be pipelined. It shouldn't be like
>>>> Semantics change gets priority and therefore it blocks other SoCs.
>>> I am not saying it is blocking.
>>> I just say that you have not started your work by the beginning,
>>> and now it make reviews difficult (from what I understand).
>>> You must make all the efforts to make your patches easier to
>>> understand and accept.
>> It's just about changing name for virt2phy api's.. But changing those function
>> names require deprecation notice, Once iova patchset is merged then I'll
>> take up responsibility for sending deprecation notice and change those api
>> name in the next release.
> This series is not going to be integrated in 17.08.
> Anyway, you should probably send the deprecation notice now,
> in order to change the semantic in 17.11.
> Olivier was also talking about physaddr wording in EAL code.

Per above discussion, we had sent out deprecation notice [1],
and agreed to keep iova patch series on hold for 17.08 release.

Now that v5[2] iova series is reviewed and ready for 17.11. 
So iova,v5 series shouldn't be blocked/delayed in case iova deprecation
notice not merged to 17.08 release.

[1] http://dpdk.org/dev/patchwork/patch/26771/
[2] http://dpdk.org/ml/archives/dev/2017-July/071809.html

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v6 00/12] Infrastructure to detect iova mapping on the bus
  2017-07-24  8:39       ` [PATCH v5 " Santosh Shukla
                           ` (11 preceding siblings ...)
  2017-07-24  8:40         ` [PATCH v5 12/12] eal/rte_malloc: " Santosh Shukla
@ 2017-08-14 16:10         ` Santosh Shukla
  2017-08-14 16:10           ` [PATCH v6 01/12] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
                             ` (12 more replies)
  12 siblings, 13 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-08-14 16:10 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen,
	Santosh Shukla

v6:
Sending v5 series rebased on top of version: 17.11-rc0.

v5:
Introducing RTE_PCI_DRV_IOVA_AS_VA flag for autodetection of iova va mapping.
If a PCI driver demand for IOVA as VA scheme then the driver can add it in the
PCI driver registration function.

Algorithm to select IOVA as VA for PCI bus case:
     0. If no device bound then return with RTE_IOVA_DC mapping mode,
     else goto 1).
     1. Look for device attached to vfio kdrv and has .drv_flag set
     to RTE_PCI_DRV_IOVA_AS_VA.
     2. Look for any device attached to UIO class of driver.
     3. Check for vfio-noiommu mode enabled.
    
     If 2) & 3) is false and 1) is true then select
     mapping scheme as RTE_IOVA_VA. Otherwise use default
     mapping scheme (RTE_IOVA_PA).

That way, Bus can truly autodetect the iova mapping mode for
a device Or a set of the device.

Patch series rebased on version-17.08-rc2:
'67c4b6db68e199247b5dbd63f560582640b180bf'.

v5 --> v6:
- Added api info in eal's versiom.map (release DPDK_v17.11).

v4 --> v5:
- Change DPDK_17.08 to DPDK_17.11 in _version.map.
- Reworded bus api description (suggested by Hemant).
- Added reviewed-by from Maxime in v5.
- Added acked-by from Hemant for pci and bus patches.

v3 --> v4:
- Re-introduced RTE_IOVA_DEC mode (Suggested by Hemant [5]).
- Renamed flag to RTE_PCI_DRV_IOVA_AS_VA (Suggested by Maxime).
- Reworded WARNING message(suggested by Maxime[7]).
- Created a separate patch for rte_pci_get_iommu_class (suggested by Maxime[]).
- Added VFIO_PRESENT ifdef build fix.

v2 --> v3:
- Removed rte_mempool_virt2phy (suggested by Olivier [4])

v1 --> v2:
- Removed override eal option i.e. (--iova-mode=<>) Because we have means to
   truly autodetect the iova mode.
- Introduced RTE_PCI_DRV_NEED_IOVA_VA drv_flag (Suggested by Maxime [3]).
- Using NEED_IOVA_VA drv_flag in autodetection logic.
- Removed Linux version check macro in vfio code, As per Maxime feedback.
- Moved rte_pci_match API from local to global.

Patch Summary:
0) 1st: Introducing a new flag in rte_pci_drv
1) 2nd: declare rte_pci_match api in pci header. Required for autodetection in
follow up patches.
2) 3rd: declare rte_pci_get_iommu_class.
3) 4nd - 5th: autodetection mapping infrastructure for Linux/bsdapp.
4) 6th: Introduces global bus API named rte_bus_get_iommu_class.
5) 7th: iova mode helper API.
6) 8th - 9th: Calls rte_bus_get_iommu_class API for Linux/bsdapp and returns
their iova mode.
7) 10th: Check iova mode and accordingly map vfio.dma_map to _pa or _va.
8) 11th - 12th: Check for IOVA_VA mode in below APIs
         - rte_mem_virt2phy
         - rte_malloc_virt2phy

Test History:
- Tested for x86/XL710 40G NIC card for both modes (iova_va/pa).
- Tested for arm64/thunderx vNIC Integrated NIC for both modes
- Tested for arm64/Octeontx integrated NICs for only
   Iova_va mode(It supports only one mode.)
- Ran standalone tests like mempool_autotest, mbuf_autotest.
- Verified for Doxygen.

Work History:
For v1, Refer [1].
For v2, Refer [2].
For v3, Refer [9].
For v4, refer [10].

Checkpatch result:
* Debug message - WARNING: line over 80 characters

Thanks.,

[1] https://www.mail-archive.com/dev@dpdk.org/msg67438.html
[2] https://www.mail-archive.com/dev@dpdk.org/msg70674.html
[3] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
[4] https://www.mail-archive.com/dev@dpdk.org/msg70692.html
[5] http://dpdk.org/ml/archives/dev/2017-July/071282.html
[6] http://dpdk.org/ml/archives/dev/2017-July/070951.html
[7] http://dpdk.org/ml/archives/dev/2017-July/070941.html
[8] http://dpdk.org/ml/archives/dev/2017-July/070952.html
[9] http://dpdk.org/ml/archives/dev/2017-July/070918.html
[10] http://dpdk.org/ml/archives/dev/2017-July/071754.html

Santosh Shukla (12):
  eal/pci: introduce PCI driver iova as va flag
  eal/pci: export match function
  eal/pci: get iommu class
  bsdapp/eal_pci: get iommu class
  linuxapp/eal_pci: get iommu class
  bus: get iommu class
  eal: introduce iova mode helper api
  linuxapp/eal: auto detect iova mode
  bsdapp/eal: auto detect iova mapping mode
  linuxapp/eal_vfio: honor iova mode before mapping
  linuxapp/eal_memory: honor iova mode in virt2phy
  eal/rte_malloc: honor iova mode in virt2phy

 lib/librte_eal/bsdapp/eal/eal.c                 | 20 ++++--
 lib/librte_eal/bsdapp/eal/eal_pci.c             | 10 +++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   | 10 +++
 lib/librte_eal/common/eal_common_bus.c          | 23 ++++++
 lib/librte_eal/common/eal_common_pci.c          | 11 +--
 lib/librte_eal/common/include/rte_bus.h         | 35 +++++++++
 lib/librte_eal/common/include/rte_eal.h         | 12 ++++
 lib/librte_eal/common/include/rte_pci.h         | 28 ++++++++
 lib/librte_eal/common/rte_malloc.c              |  9 ++-
 lib/librte_eal/linuxapp/eal/eal.c               | 21 ++++--
 lib/librte_eal/linuxapp/eal/eal_memory.c        |  3 +
 lib/librte_eal/linuxapp/eal/eal_pci.c           | 95 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 29 +++++++-
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map | 10 +++
 15 files changed, 296 insertions(+), 24 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v6 01/12] eal/pci: introduce PCI driver iova as va flag
  2017-08-14 16:10         ` [PATCH v6 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
@ 2017-08-14 16:10           ` Santosh Shukla
  2017-08-17 12:35             ` Aaron Conole
  2017-08-14 16:10           ` [PATCH v6 02/12] eal/pci: export match function Santosh Shukla
                             ` (11 subsequent siblings)
  12 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-08-14 16:10 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen,
	Santosh Shukla

Introducing RTE_PCI_DRV_IOVA_AS_VA flag. Flag used when driver needs
to operate in iova=va mode.

Why driver need iova=va mapping?

On NPU style co-processors like Octeontx, the buffer recycling has been
done in HW, unlike SW model. Here is the data flow:
1) On control path, Fill the HW mempool with buffers(iova as pa address)
2) on rx_burst, HW gives you IOVA address(iova as pa address)
3) As application expects VA to operate on it, rx_burst() needs to
convert to _va from _pa. Which is very expensive.
Instead of that if iova as va mapping, we can avoid the cost of
converting with help of IOMMU/SMMU.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/common/include/rte_pci.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 8b123391c..743392f91 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -202,6 +202,8 @@ struct rte_pci_bus {
 #define RTE_PCI_DRV_INTR_RMV 0x0010
 /** Device driver needs to keep mapped resources if unsupported dev detected */
 #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
+/** Device driver supports iova as va */
+#define RTE_PCI_DRV_IOVA_AS_VA 0X0040
 
 /**
  * A structure describing a PCI mapping.
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v6 02/12] eal/pci: export match function
  2017-08-14 16:10         ` [PATCH v6 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
  2017-08-14 16:10           ` [PATCH v6 01/12] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
@ 2017-08-14 16:10           ` Santosh Shukla
  2017-08-14 16:10           ` [PATCH v6 03/12] eal/pci: get iommu class Santosh Shukla
                             ` (10 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-08-14 16:10 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen,
	Santosh Shukla

Export rte_pci_match() function as it needed in the followup patch.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  7 +++++++
 lib/librte_eal/common/eal_common_pci.c          | 10 +---------
 lib/librte_eal/common/include/rte_pci.h         | 15 +++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +++++++
 4 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index aac6fd776..c819e3084 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -237,3 +237,10 @@ EXPERIMENTAL {
 	rte_service_unregister;
 
 } DPDK_17.08;
+
+DPDK_17.11 {
+	global:
+
+	rte_pci_match;
+
+} DPDK_17.08;
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 52fd38cdd..3b7d0a0ee 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -150,16 +150,8 @@ pci_unmap_resource(void *requested_addr, size_t size)
 
 /*
  * Match the PCI Driver and Device using the ID Table
- *
- * @param pci_drv
- *	PCI driver from which ID table would be extracted
- * @param pci_dev
- *	PCI device to match against the driver
- * @return
- *	1 for successful match
- *	0 for unsuccessful match
  */
-static int
+int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev)
 {
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 743392f91..47f0532e4 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -368,6 +368,21 @@ int rte_pci_scan(void);
 int
 rte_pci_probe(void);
 
+/*
+ * Match the PCI Driver and Device using the ID Table
+ *
+ * @param pci_drv
+ *      PCI driver from which ID table would be extracted
+ * @param pci_dev
+ *      PCI device to match against the driver
+ * @return
+ *      1 for successful match
+ *      0 for unsuccessful match
+ */
+int
+rte_pci_match(const struct rte_pci_driver *pci_drv,
+	      const struct rte_pci_device *pci_dev);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 3a8f15406..a15b382ff 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -242,3 +242,10 @@ EXPERIMENTAL {
 	rte_service_unregister;
 
 } DPDK_17.08;
+
+DPDK_17.11 {
+	global:
+
+	rte_pci_match;
+
+} DPDK_17.08;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v6 03/12] eal/pci: get iommu class
  2017-08-14 16:10         ` [PATCH v6 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
  2017-08-14 16:10           ` [PATCH v6 01/12] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
  2017-08-14 16:10           ` [PATCH v6 02/12] eal/pci: export match function Santosh Shukla
@ 2017-08-14 16:10           ` Santosh Shukla
  2017-08-17 12:38             ` Aaron Conole
  2017-08-14 16:10           ` [PATCH v6 04/12] bsdapp/eal_pci: " Santosh Shukla
                             ` (9 subsequent siblings)
  12 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-08-14 16:10 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen,
	Santosh Shukla

Introducing rte_pci_get_iommu_class API which helps to get iommu class
of PCI device on the bus and returns preferred iova mapping mode for
PCI bus.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/common/include/rte_bus.h | 10 ++++++++++
 lib/librte_eal/common/include/rte_pci.h | 11 +++++++++++
 2 files changed, 21 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index c79368d3c..9e40687e5 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -55,6 +55,16 @@ extern "C" {
 /** Double linked list of buses */
 TAILQ_HEAD(rte_bus_list, rte_bus);
 
+
+/**
+ * IOVA mapping mode.
+ */
+enum rte_iova_mode {
+	RTE_IOVA_DC = 0,	/* Don't care mode */
+	RTE_IOVA_PA = (1 << 0),
+	RTE_IOVA_VA = (1 << 1)
+};
+
 /**
  * Bus specific scan for devices attached on the bus.
  * For each bus object, the scan would be responsible for finding devices and
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 47f0532e4..a67d77f22 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -383,6 +383,17 @@ int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev);
 
+
+/**
+ * Get iommu class of PCI devices on the bus.
+ * And return their preferred iova mapping mode.
+ *
+ * @return
+ *   - enum rte_iova_mode.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v6 04/12] bsdapp/eal_pci: get iommu class
  2017-08-14 16:10         ` [PATCH v6 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                             ` (2 preceding siblings ...)
  2017-08-14 16:10           ` [PATCH v6 03/12] eal/pci: get iommu class Santosh Shukla
@ 2017-08-14 16:10           ` Santosh Shukla
  2017-08-14 16:10           ` [PATCH v6 05/12] linuxapp/eal_pci: " Santosh Shukla
                             ` (8 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-08-14 16:10 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen,
	Santosh Shukla

Bsdapp case returns default iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal_pci.c           | 10 ++++++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map |  1 +
 2 files changed, 11 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 04eacdcc7..e2c252320 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -403,6 +403,16 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Get iommu class of pci devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	/* Supports only RTE_KDRV_NIC_UIO */
+	return RTE_IOVA_PA;
+}
+
 int
 pci_update_device(const struct rte_pci_addr *addr)
 {
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index c819e3084..1fdcfb460 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -242,5 +242,6 @@ DPDK_17.11 {
 	global:
 
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.08;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v6 05/12] linuxapp/eal_pci: get iommu class
  2017-08-14 16:10         ` [PATCH v6 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                             ` (3 preceding siblings ...)
  2017-08-14 16:10           ` [PATCH v6 04/12] bsdapp/eal_pci: " Santosh Shukla
@ 2017-08-14 16:10           ` Santosh Shukla
  2017-08-14 16:10           ` [PATCH v6 06/12] bus: " Santosh Shukla
                             ` (7 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-08-14 16:10 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen,
	Santosh Shukla

Get iommu class of PCI device on the bus and returns preferred iova
mapping mode for that bus.

Algorithm for iova scheme selection for PCI bus:
0. If no device bound then return with RTE_IOVA_DC mapping mode,
else goto 1).
1. Look for device attached to vfio kdrv and has .drv_flag set
to RTE_PCI_DRV_IOVA_AS_VA.
2. Look for any device attached to UIO class of driver.
3. Check for vfio-noiommu mode enabled.

If 2) & 3) is false and 1) is true then select
mapping scheme as RTE_IOVA_VA. Otherwise use default
mapping scheme (RTE_IOVA_PA).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_eal/linuxapp/eal/eal_pci.c           | 95 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 4 files changed, 119 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 8951ce742..9725fd493 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -45,6 +45,7 @@
 #include "eal_filesystem.h"
 #include "eal_private.h"
 #include "eal_pci_init.h"
+#include "eal_vfio.h"
 
 /**
  * @file
@@ -487,6 +488,100 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Is pci device bound to any kdrv
+ */
+static inline int
+pci_device_is_bound(void)
+{
+	struct rte_pci_device *dev = NULL;
+	int ret = 0;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_UNKNOWN ||
+		    dev->kdrv == RTE_KDRV_NONE) {
+			continue;
+		} else {
+			ret = 1;
+			break;
+		}
+	}
+	return ret;
+}
+
+/*
+ * Any one of the device bound to uio
+ */
+static inline int
+pci_device_bound_uio(void)
+{
+	struct rte_pci_device *dev = NULL;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
+		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
+			return 1;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Any one of the device has iova as va
+ */
+static inline int
+pci_device_has_iova_va(void)
+{
+	struct rte_pci_device *dev = NULL;
+	struct rte_pci_driver *drv = NULL;
+
+	FOREACH_DRIVER_ON_PCIBUS(drv) {
+		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
+			FOREACH_DEVICE_ON_PCIBUS(dev) {
+				if (dev->kdrv == RTE_KDRV_VFIO &&
+				    rte_pci_match(drv, dev))
+					return 1;
+			}
+		}
+	}
+	return 0;
+}
+
+/*
+ * Get iommu class of PCI devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	bool is_bound;
+	bool is_vfio_noiommu_enabled = true;
+	bool has_iova_va;
+	bool is_bound_uio;
+
+	is_bound = pci_device_is_bound();
+	if (!is_bound)
+		return RTE_IOVA_DC;
+
+	has_iova_va = pci_device_has_iova_va();
+	is_bound_uio = pci_device_bound_uio();
+#ifdef VFIO_PRESENT
+	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
+#endif
+
+	if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
+		return RTE_IOVA_VA;
+
+	if (has_iova_va) {
+		RTE_LOG(WARNING, EAL, "Some devices want iova as va but pa will be used because.. ");
+		if (is_vfio_noiommu_enabled)
+			RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
+		if (is_bound_uio)
+			RTE_LOG(WARNING, EAL, "few device bound to UIO\n");
+	}
+
+	return RTE_IOVA_PA;
+}
+
 /* Read PCI config space. */
 int rte_pci_read_config(const struct rte_pci_device *device,
 		void *buf, size_t len, off_t offset)
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 946df7e31..c8a97b7e7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
 	return 0;
 }
 
+int
+vfio_noiommu_is_enabled(void)
+{
+	int fd, ret, cnt __rte_unused;
+	char c;
+
+	ret = -1;
+	fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
+	if (fd < 0)
+		return -1;
+
+	cnt = read(fd, &c, 1);
+	if (c == 'Y')
+		ret = 1;
+
+	close(fd);
+	return ret;
+}
+
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index 5ff63e5d7..26ea8e119 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -150,6 +150,8 @@ struct vfio_config {
 #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
 #define VFIO_GET_REGION_IDX(x) (x >> 40)
+#define VFIO_NOIOMMU_MODE      \
+	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
 
 /* DMA mapping function prototype.
  * Takes VFIO container fd as a parameter.
@@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
 
 int vfio_mp_sync_setup(void);
 
+int vfio_noiommu_is_enabled(void);
+
 #define SOCKET_REQ_CONTAINER 0x100
 #define SOCKET_REQ_GROUP 0x200
 #define SOCKET_CLR_GROUP 0x300
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index a15b382ff..40420ded3 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -247,5 +247,6 @@ DPDK_17.11 {
 	global:
 
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.08;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v6 06/12] bus: get iommu class
  2017-08-14 16:10         ` [PATCH v6 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                             ` (4 preceding siblings ...)
  2017-08-14 16:10           ` [PATCH v6 05/12] linuxapp/eal_pci: " Santosh Shukla
@ 2017-08-14 16:10           ` Santosh Shukla
  2017-08-14 16:10           ` [PATCH v6 07/12] eal: introduce iova mode helper api Santosh Shukla
                             ` (6 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-08-14 16:10 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen,
	Santosh Shukla

API(rte_bus_get_iommu_class) helps to automatically detect and select
appropriate iova mapping scheme for iommu capable device on that bus.

Algorithm for iova scheme selection for bus:
0. Iterate through bus_list.
1. Collect each bus iova mode value and update into 'mode' var.
2. Mode selection scheme is:
if mode == 0 then iova mode is _pa,
if mode == 1 then iova mode is _pa,
if mode == 2 then iova mode is _va,
if mode == 3 then iova mode ia _pa.

So mode !=2  will be default iova mode (_pa).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
 lib/librte_eal/common/eal_common_pci.c          |  1 +
 lib/librte_eal/common/include/rte_bus.h         | 25 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 51 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 1fdcfb460..9942f47aa 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -243,5 +243,6 @@ DPDK_17.11 {
 
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 08bec2d93..a30a8982e 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
 		c[0] = '\0';
 	return rte_bus_find(NULL, bus_can_parse, name);
 }
+
+
+/*
+ * Get iommu class of devices on the bus.
+ */
+enum rte_iova_mode
+rte_bus_get_iommu_class(void)
+{
+	int mode = RTE_IOVA_DC;
+	struct rte_bus *bus;
+
+	TAILQ_FOREACH(bus, &rte_bus_list, next) {
+
+		if (bus->get_iommu_class)
+			mode |= bus->get_iommu_class();
+	}
+
+	if (mode != RTE_IOVA_VA) {
+		/* Use default IOVA mode */
+		mode = RTE_IOVA_PA;
+	}
+	return mode;
+}
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 3b7d0a0ee..0f0e4b93b 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -564,6 +564,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.plug = pci_plug,
 		.unplug = pci_unplug,
 		.parse = pci_parse,
+		.get_iommu_class = rte_pci_get_iommu_class,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 9e40687e5..70a291a4d 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -178,6 +178,20 @@ struct rte_bus_conf {
 	enum rte_bus_scan_mode scan_mode; /**< Scan policy. */
 };
 
+
+/**
+ * Get common iommu class of the all the devices on the bus. The bus may
+ * check that those devices are attached to iommu driver.
+ * If no devices are attached to the bus. The bus may return with don't care
+ * (_DC) value.
+ * Otherwise, The bus will return appropriate _pa or _va iova mode.
+ *
+ * @return
+ *      enum rte_iova_mode value.
+ */
+typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
+
+
 /**
  * A structure describing a generic bus.
  */
@@ -191,6 +205,7 @@ struct rte_bus {
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	struct rte_bus_conf conf;    /**< Bus configuration */
+	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
 
 /**
@@ -290,6 +305,16 @@ struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
  */
 struct rte_bus *rte_bus_find_by_name(const char *busname);
 
+
+/**
+ * Get the common iommu class of devices bound on to buses available in the
+ * system. The default mode is PA.
+ *
+ * @return
+ *     enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_bus_get_iommu_class(void);
+
 /**
  * Helper for Bus registration.
  * The constructor has higher priority than PMD constructors.
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 40420ded3..f35031746 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -248,5 +248,6 @@ DPDK_17.11 {
 
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.08;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v6 07/12] eal: introduce iova mode helper api
  2017-08-14 16:10         ` [PATCH v6 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                             ` (5 preceding siblings ...)
  2017-08-14 16:10           ` [PATCH v6 06/12] bus: " Santosh Shukla
@ 2017-08-14 16:10           ` Santosh Shukla
  2017-08-14 16:10           ` [PATCH v6 08/12] linuxapp/eal: auto detect iova mode Santosh Shukla
                             ` (5 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-08-14 16:10 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen,
	Santosh Shukla

Introducing rte_eal_iova_mode() helper API. This API
used by non-eal library for detecting iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c                 |  6 ++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/include/rte_eal.h         | 12 ++++++++++++
 lib/librte_eal/linuxapp/eal/eal.c               |  6 ++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 26 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 5fa598842..07e72203f 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -119,6 +119,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 9942f47aa..1a63f3f05 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -244,5 +244,6 @@ DPDK_17.11 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 0e7363d77..932dc1a96 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -45,6 +45,7 @@
 
 #include <rte_per_lcore.h>
 #include <rte_config.h>
+#include <rte_bus.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -87,6 +88,9 @@ struct rte_config {
 	/** Primary or secondary configuration */
 	enum rte_proc_type_t process_type;
 
+	/** PA or VA mapping mode */
+	enum rte_iova_mode iova_mode;
+
 	/**
 	 * Pointer to memory configuration, which may be shared across multiple
 	 * DPDK instances
@@ -287,6 +291,14 @@ static inline int rte_gettid(void)
 	return RTE_PER_LCORE(_thread_id);
 }
 
+/**
+ * Get the iova mode
+ *
+ * @return
+ *   enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_eal_iova_mode(void);
+
 #define RTE_INIT(func) \
 static void __attribute__((constructor, used)) func(void)
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 48f12f44c..febbafdb3 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -128,6 +128,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index f35031746..c99f1ed44 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -249,5 +249,6 @@ DPDK_17.11 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.08;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v6 08/12] linuxapp/eal: auto detect iova mode
  2017-08-14 16:10         ` [PATCH v6 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                             ` (6 preceding siblings ...)
  2017-08-14 16:10           ` [PATCH v6 07/12] eal: introduce iova mode helper api Santosh Shukla
@ 2017-08-14 16:10           ` Santosh Shukla
  2017-08-16 17:38             ` Aaron Conole
  2017-08-14 16:10           ` [PATCH v6 09/12] bsdapp/eal: auto detect iova mapping mode Santosh Shukla
                             ` (4 subsequent siblings)
  12 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-08-14 16:10 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen,
	Santosh Shukla

- Moving late bus scanning to up..just after eal_parsing.
- Auto detect iova mapping mode, based on the result of
  rte_bus_scan_iommu_class.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index febbafdb3..5382f6c00 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -798,6 +798,15 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			internal_config.xen_dom0_support == 0 &&
@@ -900,12 +909,6 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v6 09/12] bsdapp/eal: auto detect iova mapping mode
  2017-08-14 16:10         ` [PATCH v6 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                             ` (7 preceding siblings ...)
  2017-08-14 16:10           ` [PATCH v6 08/12] linuxapp/eal: auto detect iova mode Santosh Shukla
@ 2017-08-14 16:10           ` Santosh Shukla
  2017-08-17 12:41             ` Aaron Conole
  2017-08-14 16:10           ` [PATCH v6 10/12] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
                             ` (3 subsequent siblings)
  12 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-08-14 16:10 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen,
	Santosh Shukla

- Moving late bus scanning to up..just after eal_parsing.
- Mapping mode would be default for bsdapp. It supports
  only one pass through mode (RTE_KDRV_NIC_UIO)

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 07e72203f..53ad87b95 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -540,6 +540,14 @@ rte_eal_init(int argc, char **argv)
 		rte_atomic32_clear(&run_once);
 		return -1;
 	}
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
 
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
@@ -625,12 +633,6 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v6 10/12] linuxapp/eal_vfio: honor iova mode before mapping
  2017-08-14 16:10         ` [PATCH v6 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                             ` (8 preceding siblings ...)
  2017-08-14 16:10           ` [PATCH v6 09/12] bsdapp/eal: auto detect iova mapping mode Santosh Shukla
@ 2017-08-14 16:10           ` Santosh Shukla
  2017-08-14 16:10           ` [PATCH v6 11/12] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
                             ` (2 subsequent siblings)
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-08-14 16:10 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen,
	Santosh Shukla

Check iova mode and accordingly map iova to pa or va.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c8a97b7e7..b32cd09a2 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
 
 		ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
@@ -792,7 +795,10 @@ vfio_spapr_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
 				 VFIO_DMA_MAP_FLAG_WRITE;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v6 11/12] linuxapp/eal_memory: honor iova mode in virt2phy
  2017-08-14 16:10         ` [PATCH v6 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                             ` (9 preceding siblings ...)
  2017-08-14 16:10           ` [PATCH v6 10/12] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
@ 2017-08-14 16:10           ` Santosh Shukla
  2017-08-14 16:10           ` [PATCH v6 12/12] eal/rte_malloc: " Santosh Shukla
  2017-08-31  3:26           ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-08-14 16:10 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen,
	Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 52791282f..2d9d7c2dc 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -139,6 +139,9 @@ rte_mem_virt2phy(const void *virtaddr)
 	int page_size;
 	off_t offset;
 
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		return (uintptr_t)virtaddr;
+
 	/* when using dom0, /proc/self/pagemap always returns 0, check in
 	 * dpdk memory by browsing the memsegs */
 	if (rte_xen_dom0_supported()) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v6 12/12] eal/rte_malloc: honor iova mode in virt2phy
  2017-08-14 16:10         ` [PATCH v6 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                             ` (10 preceding siblings ...)
  2017-08-14 16:10           ` [PATCH v6 11/12] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
@ 2017-08-14 16:10           ` Santosh Shukla
  2017-08-31  3:26           ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
  12 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-08-14 16:10 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen,
	Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/common/rte_malloc.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 5c0627bf4..d65c05a4d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -251,10 +251,17 @@ rte_malloc_set_limit(__rte_unused const char *type,
 phys_addr_t
 rte_malloc_virt2phy(const void *addr)
 {
+	phys_addr_t paddr;
 	const struct malloc_elem *elem = malloc_elem_from_data(addr);
 	if (elem == NULL)
 		return RTE_BAD_PHYS_ADDR;
 	if (elem->ms->phys_addr == RTE_BAD_PHYS_ADDR)
 		return RTE_BAD_PHYS_ADDR;
-	return elem->ms->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		paddr = (uintptr_t)addr;
+	else
+		paddr = elem->ms->phys_addr +
+			((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+	return paddr;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* Re: [PATCH v6 08/12] linuxapp/eal: auto detect iova mode
  2017-08-14 16:10           ` [PATCH v6 08/12] linuxapp/eal: auto detect iova mode Santosh Shukla
@ 2017-08-16 17:38             ` Aaron Conole
  2017-08-17 14:43               ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Aaron Conole @ 2017-08-16 17:38 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: dev, olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen

Santosh Shukla <santosh.shukla@caviumnetworks.com> writes:

> - Moving late bus scanning to up..just after eal_parsing.
> - Auto detect iova mapping mode, based on the result of
>   rte_bus_scan_iommu_class.
>
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/librte_eal/linuxapp/eal/eal.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
> index febbafdb3..5382f6c00 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -798,6 +798,15 @@ rte_eal_init(int argc, char **argv)
>  		return -1;
>  	}
>  
> +	if (rte_bus_scan()) {
> +		rte_eal_init_alert("Cannot scan the buses for devices\n");
> +		rte_errno = ENODEV;

Since this now happens before hugetlbs are allocated, is it possible to
retry?  If so, then I would say to clear the run_once variable.

> +		return -1;
> +	}
> +
> +	/* autodetect the iova mapping mode (default is iova_pa) */
> +	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
> +
>  	if (internal_config.no_hugetlbfs == 0 &&
>  			internal_config.process_type != RTE_PROC_SECONDARY &&
>  			internal_config.xen_dom0_support == 0 &&
> @@ -900,12 +909,6 @@ rte_eal_init(int argc, char **argv)
>  		return -1;
>  	}
>  
> -	if (rte_bus_scan()) {
> -		rte_eal_init_alert("Cannot scan the buses for devices\n");
> -		rte_errno = ENODEV;
> -		return -1;
> -	}
> -
>  	RTE_LCORE_FOREACH_SLAVE(i) {
>  
>  		/*

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v6 01/12] eal/pci: introduce PCI driver iova as va flag
  2017-08-14 16:10           ` [PATCH v6 01/12] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
@ 2017-08-17 12:35             ` Aaron Conole
  0 siblings, 0 replies; 248+ messages in thread
From: Aaron Conole @ 2017-08-17 12:35 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: dev, olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen

Santosh Shukla <santosh.shukla@caviumnetworks.com> writes:

> Introducing RTE_PCI_DRV_IOVA_AS_VA flag. Flag used when driver needs
> to operate in iova=va mode.
>
> Why driver need iova=va mapping?
>
> On NPU style co-processors like Octeontx, the buffer recycling has been
> done in HW, unlike SW model. Here is the data flow:
> 1) On control path, Fill the HW mempool with buffers(iova as pa address)
> 2) on rx_burst, HW gives you IOVA address(iova as pa address)
> 3) As application expects VA to operate on it, rx_burst() needs to
> convert to _va from _pa. Which is very expensive.
> Instead of that if iova as va mapping, we can avoid the cost of
> converting with help of IOMMU/SMMU.
>
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---

This should be folded into patch 5;  there's no clear need for it until
then.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v6 03/12] eal/pci: get iommu class
  2017-08-14 16:10           ` [PATCH v6 03/12] eal/pci: get iommu class Santosh Shukla
@ 2017-08-17 12:38             ` Aaron Conole
  0 siblings, 0 replies; 248+ messages in thread
From: Aaron Conole @ 2017-08-17 12:38 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: dev, olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen

Santosh Shukla <santosh.shukla@caviumnetworks.com> writes:

> Introducing rte_pci_get_iommu_class API which helps to get iommu class
> of PCI device on the bus and returns preferred iova mapping mode for
> PCI bus.
>
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---

I think 3/12 and 4/12 should be combined with 5/12.  At the very least,
3/12 and 4/12 should be combined.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v6 09/12] bsdapp/eal: auto detect iova mapping mode
  2017-08-14 16:10           ` [PATCH v6 09/12] bsdapp/eal: auto detect iova mapping mode Santosh Shukla
@ 2017-08-17 12:41             ` Aaron Conole
  0 siblings, 0 replies; 248+ messages in thread
From: Aaron Conole @ 2017-08-17 12:41 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: dev, olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen

Santosh Shukla <santosh.shukla@caviumnetworks.com> writes:

> - Moving late bus scanning to up..just after eal_parsing.
> - Mapping mode would be default for bsdapp. It supports
>   only one pass through mode (RTE_KDRV_NIC_UIO)
>
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---

Same comments as 8/12;  also I think 8/12 and 9/12 can be folded
together.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v6 08/12] linuxapp/eal: auto detect iova mode
  2017-08-16 17:38             ` Aaron Conole
@ 2017-08-17 14:43               ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-08-17 14:43 UTC (permalink / raw)
  To: Aaron Conole
  Cc: dev, olivier.matz, thomas, jerin.jacob, hemant.agrawal,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen

On Wednesday 16 August 2017 11:08 PM, Aaron Conole wrote:

> Santosh Shukla <santosh.shukla@caviumnetworks.com> writes:
>
>> - Moving late bus scanning to up..just after eal_parsing.
>> - Auto detect iova mapping mode, based on the result of
>>   rte_bus_scan_iommu_class.
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>  lib/librte_eal/linuxapp/eal/eal.c | 15 +++++++++------
>>  1 file changed, 9 insertions(+), 6 deletions(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
>> index febbafdb3..5382f6c00 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal.c
>> @@ -798,6 +798,15 @@ rte_eal_init(int argc, char **argv)
>>  		return -1;
>>  	}
>>  
>> +	if (rte_bus_scan()) {
>> +		rte_eal_init_alert("Cannot scan the buses for devices\n");
>> +		rte_errno = ENODEV;
> Since this now happens before hugetlbs are allocated, is it possible to
> retry?  If so, then I would say to clear the run_once variable.

Yes, Change queued for v7. Thanks.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus
  2017-08-14 16:10         ` [PATCH v6 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
                             ` (11 preceding siblings ...)
  2017-08-14 16:10           ` [PATCH v6 12/12] eal/rte_malloc: " Santosh Shukla
@ 2017-08-31  3:26           ` Santosh Shukla
  2017-08-31  3:26             ` [PATCH v7 1/9] eal/pci: export match function Santosh Shukla
                               ` (10 more replies)
  12 siblings, 11 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-08-31  3:26 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen, aconole,
	Santosh Shukla

v7:
Includes no major change, minor change detailing:
- patch sqashing (Aaron suggestion)
- added run_once for device_parse() and bus_scan() in eal init
	(Aaron suggestion)
- Moved rte_eal_device_parse() up in eal initialization order.
- Patches rebased on top of version: 17.11-rc0
For v6 info refer [11].

v6:
Sending v5 series rebased on top of version: 17.11-rc0.

v5:
Introducing RTE_PCI_DRV_IOVA_AS_VA flag for autodetection of iova va mapping.
If a PCI driver demand for IOVA as VA scheme then the driver can add it in the
PCI driver registration function.

Algorithm to select IOVA as VA for PCI bus case:
     0. If no device bound then return with RTE_IOVA_DC mapping mode,
     else goto 1).
     1. Look for device attached to vfio kdrv and has .drv_flag set
     to RTE_PCI_DRV_IOVA_AS_VA.
     2. Look for any device attached to UIO class of driver.
     3. Check for vfio-noiommu mode enabled.

     If 2) & 3) is false and 1) is true then select
     mapping scheme as RTE_IOVA_VA. Otherwise use default
     mapping scheme (RTE_IOVA_PA).

That way, Bus can truly autodetect the iova mapping mode for
a device Or a set of the device.

v6 --> v7:
- Patches squashed per v6.
- Added run_once in eal per v6.
- Moved rte_eal_device_parse() up in eal init oder.

v5 --> v6:
- Added api info in eal's versiom.map (release DPDK_v17.11).

v4 --> v5:
- Change DPDK_17.08 to DPDK_17.11 in _version.map.
- Reworded bus api description (suggested by Hemant).
- Added reviewed-by from Maxime in v5.
- Added acked-by from Hemant for pci and bus patches.

v3 --> v4:
- Re-introduced RTE_IOVA_DEC mode (Suggested by Hemant [5]).
- Renamed flag to RTE_PCI_DRV_IOVA_AS_VA (Suggested by Maxime).
- Reworded WARNING message(suggested by Maxime[7]).
- Created a separate patch for rte_pci_get_iommu_class (suggested by Maxime[]).
- Added VFIO_PRESENT ifdef build fix.

v2 --> v3:
- Removed rte_mempool_virt2phy (suggested by Olivier [4])

v1 --> v2:
- Removed override eal option i.e. (--iova-mode=<>) Because we have means to
   truly autodetect the iova mode.
- Introduced RTE_PCI_DRV_NEED_IOVA_VA drv_flag (Suggested by Maxime [3]).
- Using NEED_IOVA_VA drv_flag in autodetection logic.
- Removed Linux version check macro in vfio code, As per Maxime feedback.
- Moved rte_pci_match API from local to global.

Patch Summary:
1) 1nd: declare rte_pci_match api in pci header. Required for autodetection in
follow up patches.
2) 2nd - 3rd - 4th : autodetection mapping infrastructure for Linux/bsdapp.
3) 5th: iova mode helper API.
4) 6th: Infra to detect iova mode.
5) 7th: make vfio mapping iova aware.
6) 8th - 9th : Check for IOVA_VA mode in below APIs
         - rte_mem_virt2phy
         - rte_malloc_virt2phy

Test History:
- Tested for x86/XL710 40G NIC card for both modes (iova_va/pa).
- Tested for arm64/thunderx vNIC Integrated NIC for both modes
- Tested for arm64/Octeontx integrated NICs for only
   Iova_va mode(It supports only one mode.)
- Ran standalone tests like mempool_autotest, mbuf_autotest.
- Verified for Doxygen.

Work History:
For v1, Refer [1].
For v2, Refer [2].
For v3, Refer [9].
For v4, refer [10].
for v6, refer [11].

Checkpatch result:
* Debug message - WARNING: line over 80 characters

Thanks.,
[1] https://www.mail-archive.com/dev@dpdk.org/msg67438.html
[2] https://www.mail-archive.com/dev@dpdk.org/msg70674.html
[3] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
[4] https://www.mail-archive.com/dev@dpdk.org/msg70692.html
[5] http://dpdk.org/ml/archives/dev/2017-July/071282.html
[6] http://dpdk.org/ml/archives/dev/2017-July/070951.html
[7] http://dpdk.org/ml/archives/dev/2017-July/070941.html
[8] http://dpdk.org/ml/archives/dev/2017-July/070952.html
[9] http://dpdk.org/ml/archives/dev/2017-July/070918.html
[10] http://dpdk.org/ml/archives/dev/2017-July/071754.html
[11] http://dpdk.org/ml/archives/dev/2017-August/072871.html


Santosh Shukla (9):
  eal/pci: export match function
  eal/pci: get iommu class
  linuxapp/eal_pci: get iommu class
  bus: get iommu class
  eal: introduce iova mode helper api
  eal: auto detect iova mode
  linuxapp/eal_vfio: honor iova mode before mapping
  linuxapp/eal_memory: honor iova mode in virt2phy
  eal/rte_malloc: honor iova mode in virt2phy

 lib/librte_eal/bsdapp/eal/eal.c                 | 33 ++++++---
 lib/librte_eal/bsdapp/eal/eal_pci.c             | 10 +++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   | 10 +++
 lib/librte_eal/common/eal_common_bus.c          | 23 ++++++
 lib/librte_eal/common/eal_common_pci.c          | 11 +--
 lib/librte_eal/common/include/rte_bus.h         | 35 +++++++++
 lib/librte_eal/common/include/rte_eal.h         | 12 ++++
 lib/librte_eal/common/include/rte_pci.h         | 28 ++++++++
 lib/librte_eal/common/rte_malloc.c              |  9 ++-
 lib/librte_eal/linuxapp/eal/eal.c               | 33 ++++++---
 lib/librte_eal/linuxapp/eal/eal_memory.c        |  3 +
 lib/librte_eal/linuxapp/eal/eal_pci.c           | 95 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 29 +++++++-
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map | 10 +++
 15 files changed, 311 insertions(+), 34 deletions(-)

-- 
2.13.0

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v7 1/9] eal/pci: export match function
  2017-08-31  3:26           ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
@ 2017-08-31  3:26             ` Santosh Shukla
  2017-09-04 14:49               ` Burakov, Anatoly
  2017-09-06 15:39               ` Ferruh Yigit
  2017-08-31  3:26             ` [PATCH v7 2/9] eal/pci: get iommu class Santosh Shukla
                               ` (9 subsequent siblings)
  10 siblings, 2 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-08-31  3:26 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen, aconole,
	Santosh Shukla

Export rte_pci_match() function as it needed in the followup patch.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  7 +++++++
 lib/librte_eal/common/eal_common_pci.c          | 10 +---------
 lib/librte_eal/common/include/rte_pci.h         | 15 +++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +++++++
 4 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index aac6fd776..c819e3084 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -237,3 +237,10 @@ EXPERIMENTAL {
 	rte_service_unregister;
 
 } DPDK_17.08;
+
+DPDK_17.11 {
+	global:
+
+	rte_pci_match;
+
+} DPDK_17.08;
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 52fd38cdd..3b7d0a0ee 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -150,16 +150,8 @@ pci_unmap_resource(void *requested_addr, size_t size)
 
 /*
  * Match the PCI Driver and Device using the ID Table
- *
- * @param pci_drv
- *	PCI driver from which ID table would be extracted
- * @param pci_dev
- *	PCI device to match against the driver
- * @return
- *	1 for successful match
- *	0 for unsuccessful match
  */
-static int
+int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev)
 {
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 8b123391c..eab84c7a4 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -366,6 +366,21 @@ int rte_pci_scan(void);
 int
 rte_pci_probe(void);
 
+/*
+ * Match the PCI Driver and Device using the ID Table
+ *
+ * @param pci_drv
+ *      PCI driver from which ID table would be extracted
+ * @param pci_dev
+ *      PCI device to match against the driver
+ * @return
+ *      1 for successful match
+ *      0 for unsuccessful match
+ */
+int
+rte_pci_match(const struct rte_pci_driver *pci_drv,
+	      const struct rte_pci_device *pci_dev);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 3a8f15406..a15b382ff 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -242,3 +242,10 @@ EXPERIMENTAL {
 	rte_service_unregister;
 
 } DPDK_17.08;
+
+DPDK_17.11 {
+	global:
+
+	rte_pci_match;
+
+} DPDK_17.08;
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v7 2/9] eal/pci: get iommu class
  2017-08-31  3:26           ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
  2017-08-31  3:26             ` [PATCH v7 1/9] eal/pci: export match function Santosh Shukla
@ 2017-08-31  3:26             ` Santosh Shukla
  2017-09-04 14:53               ` Burakov, Anatoly
  2017-09-04 15:30               ` Burakov, Anatoly
  2017-08-31  3:26             ` [PATCH v7 3/9] linuxapp/eal_pci: " Santosh Shukla
                               ` (8 subsequent siblings)
  10 siblings, 2 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-08-31  3:26 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen, aconole,
	Santosh Shukla

Introducing rte_pci_get_iommu_class API which helps to get iommu class
of PCI device on the bus and returns preferred iova mapping mode for
PCI bus.

Patch also add rte_pci_get_iommu_class definition for bsdapp,
in bsdapp case - api returns default iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
v6 --> v7:
- squashed v6 series patch [02/12] & [03/12] (Aaron comment).

 lib/librte_eal/bsdapp/eal/eal_pci.c           | 10 ++++++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map |  1 +
 lib/librte_eal/common/include/rte_bus.h       | 10 ++++++++++
 lib/librte_eal/common/include/rte_pci.h       | 11 +++++++++++
 4 files changed, 32 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 04eacdcc7..e2c252320 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -403,6 +403,16 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Get iommu class of pci devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	/* Supports only RTE_KDRV_NIC_UIO */
+	return RTE_IOVA_PA;
+}
+
 int
 pci_update_device(const struct rte_pci_addr *addr)
 {
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index c819e3084..1fdcfb460 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -242,5 +242,6 @@ DPDK_17.11 {
 	global:
 
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index c79368d3c..9e40687e5 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -55,6 +55,16 @@ extern "C" {
 /** Double linked list of buses */
 TAILQ_HEAD(rte_bus_list, rte_bus);
 
+
+/**
+ * IOVA mapping mode.
+ */
+enum rte_iova_mode {
+	RTE_IOVA_DC = 0,	/* Don't care mode */
+	RTE_IOVA_PA = (1 << 0),
+	RTE_IOVA_VA = (1 << 1)
+};
+
 /**
  * Bus specific scan for devices attached on the bus.
  * For each bus object, the scan would be responsible for finding devices and
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index eab84c7a4..0e36de093 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -381,6 +381,17 @@ int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev);
 
+
+/**
+ * Get iommu class of PCI devices on the bus.
+ * And return their preferred iova mapping mode.
+ *
+ * @return
+ *   - enum rte_iova_mode.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
  2017-08-31  3:26           ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
  2017-08-31  3:26             ` [PATCH v7 1/9] eal/pci: export match function Santosh Shukla
  2017-08-31  3:26             ` [PATCH v7 2/9] eal/pci: get iommu class Santosh Shukla
@ 2017-08-31  3:26             ` Santosh Shukla
  2017-09-04 15:08               ` Burakov, Anatoly
  2017-09-05  9:01               ` Burakov, Anatoly
  2017-08-31  3:26             ` [PATCH v7 4/9] bus: " Santosh Shukla
                               ` (7 subsequent siblings)
  10 siblings, 2 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-08-31  3:26 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen, aconole,
	Santosh Shukla

Get iommu class of PCI device on the bus and returns preferred iova
mapping mode for that bus.

Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
Flag used when driver needs to operate in iova=va mode.

Algorithm for iova scheme selection for PCI bus:
0. If no device bound then return with RTE_IOVA_DC mapping mode,
else goto 1).
1. Look for device attached to vfio kdrv and has .drv_flag set
to RTE_PCI_DRV_IOVA_AS_VA.
2. Look for any device attached to UIO class of driver.
3. Check for vfio-noiommu mode enabled.

If 2) & 3) is false and 1) is true then select
mapping scheme as RTE_IOVA_VA. Otherwise use default
mapping scheme (RTE_IOVA_PA).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
v6 --> v7:
- squashed v6 series patch no [01/12] & [05/12]..
  i.e.. moved RTE_PCI_DRV_IOVA_AS_VA flag into this patch. (Aaron comment).

 lib/librte_eal/common/include/rte_pci.h         |  2 +
 lib/librte_eal/linuxapp/eal/eal_pci.c           | 95 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 121 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 0e36de093..a67d77f22 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -202,6 +202,8 @@ struct rte_pci_bus {
 #define RTE_PCI_DRV_INTR_RMV 0x0010
 /** Device driver needs to keep mapped resources if unsupported dev detected */
 #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
+/** Device driver supports iova as va */
+#define RTE_PCI_DRV_IOVA_AS_VA 0X0040
 
 /**
  * A structure describing a PCI mapping.
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 8951ce742..9725fd493 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -45,6 +45,7 @@
 #include "eal_filesystem.h"
 #include "eal_private.h"
 #include "eal_pci_init.h"
+#include "eal_vfio.h"
 
 /**
  * @file
@@ -487,6 +488,100 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Is pci device bound to any kdrv
+ */
+static inline int
+pci_device_is_bound(void)
+{
+	struct rte_pci_device *dev = NULL;
+	int ret = 0;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_UNKNOWN ||
+		    dev->kdrv == RTE_KDRV_NONE) {
+			continue;
+		} else {
+			ret = 1;
+			break;
+		}
+	}
+	return ret;
+}
+
+/*
+ * Any one of the device bound to uio
+ */
+static inline int
+pci_device_bound_uio(void)
+{
+	struct rte_pci_device *dev = NULL;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
+		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
+			return 1;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Any one of the device has iova as va
+ */
+static inline int
+pci_device_has_iova_va(void)
+{
+	struct rte_pci_device *dev = NULL;
+	struct rte_pci_driver *drv = NULL;
+
+	FOREACH_DRIVER_ON_PCIBUS(drv) {
+		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
+			FOREACH_DEVICE_ON_PCIBUS(dev) {
+				if (dev->kdrv == RTE_KDRV_VFIO &&
+				    rte_pci_match(drv, dev))
+					return 1;
+			}
+		}
+	}
+	return 0;
+}
+
+/*
+ * Get iommu class of PCI devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	bool is_bound;
+	bool is_vfio_noiommu_enabled = true;
+	bool has_iova_va;
+	bool is_bound_uio;
+
+	is_bound = pci_device_is_bound();
+	if (!is_bound)
+		return RTE_IOVA_DC;
+
+	has_iova_va = pci_device_has_iova_va();
+	is_bound_uio = pci_device_bound_uio();
+#ifdef VFIO_PRESENT
+	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 : 0;
+#endif
+
+	if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
+		return RTE_IOVA_VA;
+
+	if (has_iova_va) {
+		RTE_LOG(WARNING, EAL, "Some devices want iova as va but pa will be used because.. ");
+		if (is_vfio_noiommu_enabled)
+			RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
+		if (is_bound_uio)
+			RTE_LOG(WARNING, EAL, "few device bound to UIO\n");
+	}
+
+	return RTE_IOVA_PA;
+}
+
 /* Read PCI config space. */
 int rte_pci_read_config(const struct rte_pci_device *device,
 		void *buf, size_t len, off_t offset)
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 946df7e31..c8a97b7e7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
 	return 0;
 }
 
+int
+vfio_noiommu_is_enabled(void)
+{
+	int fd, ret, cnt __rte_unused;
+	char c;
+
+	ret = -1;
+	fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
+	if (fd < 0)
+		return -1;
+
+	cnt = read(fd, &c, 1);
+	if (c == 'Y')
+		ret = 1;
+
+	close(fd);
+	return ret;
+}
+
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index 5ff63e5d7..26ea8e119 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -150,6 +150,8 @@ struct vfio_config {
 #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
 #define VFIO_GET_REGION_IDX(x) (x >> 40)
+#define VFIO_NOIOMMU_MODE      \
+	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
 
 /* DMA mapping function prototype.
  * Takes VFIO container fd as a parameter.
@@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
 
 int vfio_mp_sync_setup(void);
 
+int vfio_noiommu_is_enabled(void);
+
 #define SOCKET_REQ_CONTAINER 0x100
 #define SOCKET_REQ_GROUP 0x200
 #define SOCKET_CLR_GROUP 0x300
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index a15b382ff..40420ded3 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -247,5 +247,6 @@ DPDK_17.11 {
 	global:
 
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.08;
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v7 4/9] bus: get iommu class
  2017-08-31  3:26           ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
                               ` (2 preceding siblings ...)
  2017-08-31  3:26             ` [PATCH v7 3/9] linuxapp/eal_pci: " Santosh Shukla
@ 2017-08-31  3:26             ` Santosh Shukla
  2017-09-04 15:25               ` Burakov, Anatoly
  2017-08-31  3:26             ` [PATCH v7 5/9] eal: introduce iova mode helper api Santosh Shukla
                               ` (6 subsequent siblings)
  10 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-08-31  3:26 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen, aconole,
	Santosh Shukla

API(rte_bus_get_iommu_class) helps to automatically detect and select
appropriate iova mapping scheme for iommu capable device on that bus.

Algorithm for iova scheme selection for bus:
0. Iterate through bus_list.
1. Collect each bus iova mode value and update into 'mode' var.
2. Mode selection scheme is:
if mode == 0 then iova mode is _pa,
if mode == 1 then iova mode is _pa,
if mode == 2 then iova mode is _va,
if mode == 3 then iova mode ia _pa.

So mode !=2  will be default iova mode (_pa).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
 lib/librte_eal/common/eal_common_pci.c          |  1 +
 lib/librte_eal/common/include/rte_bus.h         | 25 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 51 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 1fdcfb460..9942f47aa 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -243,5 +243,6 @@ DPDK_17.11 {
 
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 08bec2d93..a30a8982e 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
 		c[0] = '\0';
 	return rte_bus_find(NULL, bus_can_parse, name);
 }
+
+
+/*
+ * Get iommu class of devices on the bus.
+ */
+enum rte_iova_mode
+rte_bus_get_iommu_class(void)
+{
+	int mode = RTE_IOVA_DC;
+	struct rte_bus *bus;
+
+	TAILQ_FOREACH(bus, &rte_bus_list, next) {
+
+		if (bus->get_iommu_class)
+			mode |= bus->get_iommu_class();
+	}
+
+	if (mode != RTE_IOVA_VA) {
+		/* Use default IOVA mode */
+		mode = RTE_IOVA_PA;
+	}
+	return mode;
+}
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 3b7d0a0ee..0f0e4b93b 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -564,6 +564,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.plug = pci_plug,
 		.unplug = pci_unplug,
 		.parse = pci_parse,
+		.get_iommu_class = rte_pci_get_iommu_class,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 9e40687e5..70a291a4d 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -178,6 +178,20 @@ struct rte_bus_conf {
 	enum rte_bus_scan_mode scan_mode; /**< Scan policy. */
 };
 
+
+/**
+ * Get common iommu class of the all the devices on the bus. The bus may
+ * check that those devices are attached to iommu driver.
+ * If no devices are attached to the bus. The bus may return with don't care
+ * (_DC) value.
+ * Otherwise, The bus will return appropriate _pa or _va iova mode.
+ *
+ * @return
+ *      enum rte_iova_mode value.
+ */
+typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
+
+
 /**
  * A structure describing a generic bus.
  */
@@ -191,6 +205,7 @@ struct rte_bus {
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	struct rte_bus_conf conf;    /**< Bus configuration */
+	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
 
 /**
@@ -290,6 +305,16 @@ struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
  */
 struct rte_bus *rte_bus_find_by_name(const char *busname);
 
+
+/**
+ * Get the common iommu class of devices bound on to buses available in the
+ * system. The default mode is PA.
+ *
+ * @return
+ *     enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_bus_get_iommu_class(void);
+
 /**
  * Helper for Bus registration.
  * The constructor has higher priority than PMD constructors.
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 40420ded3..f35031746 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -248,5 +248,6 @@ DPDK_17.11 {
 
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.08;
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v7 5/9] eal: introduce iova mode helper api
  2017-08-31  3:26           ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
                               ` (3 preceding siblings ...)
  2017-08-31  3:26             ` [PATCH v7 4/9] bus: " Santosh Shukla
@ 2017-08-31  3:26             ` Santosh Shukla
  2017-08-31  3:26             ` [PATCH v7 6/9] eal: auto detect iova mode Santosh Shukla
                               ` (5 subsequent siblings)
  10 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-08-31  3:26 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen, aconole,
	Santosh Shukla

Introducing rte_eal_iova_mode() helper API. This API
used by non-eal library for detecting iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c                 |  6 ++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/include/rte_eal.h         | 12 ++++++++++++
 lib/librte_eal/linuxapp/eal/eal.c               |  6 ++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 26 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 5fa598842..07e72203f 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -119,6 +119,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 9942f47aa..1a63f3f05 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -244,5 +244,6 @@ DPDK_17.11 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 0e7363d77..932dc1a96 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -45,6 +45,7 @@
 
 #include <rte_per_lcore.h>
 #include <rte_config.h>
+#include <rte_bus.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -87,6 +88,9 @@ struct rte_config {
 	/** Primary or secondary configuration */
 	enum rte_proc_type_t process_type;
 
+	/** PA or VA mapping mode */
+	enum rte_iova_mode iova_mode;
+
 	/**
 	 * Pointer to memory configuration, which may be shared across multiple
 	 * DPDK instances
@@ -287,6 +291,14 @@ static inline int rte_gettid(void)
 	return RTE_PER_LCORE(_thread_id);
 }
 
+/**
+ * Get the iova mode
+ *
+ * @return
+ *   enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_eal_iova_mode(void);
+
 #define RTE_INIT(func) \
 static void __attribute__((constructor, used)) func(void)
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 48f12f44c..febbafdb3 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -128,6 +128,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index f35031746..c99f1ed44 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -249,5 +249,6 @@ DPDK_17.11 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.08;
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v7 6/9] eal: auto detect iova mode
  2017-08-31  3:26           ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
                               ` (4 preceding siblings ...)
  2017-08-31  3:26             ` [PATCH v7 5/9] eal: introduce iova mode helper api Santosh Shukla
@ 2017-08-31  3:26             ` Santosh Shukla
  2017-09-04 15:32               ` Burakov, Anatoly
  2017-08-31  3:26             ` [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
                               ` (4 subsequent siblings)
  10 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-08-31  3:26 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen, aconole,
	Santosh Shukla

For auto detection purpose:
* Below calls moved up in the eal initialization order:
	- eal_option_device_parse
	- rte_bus_scan

Based on the result of rte_bus_scan_iommu_class - select iova
mapping mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
v6 --> v7:
- Moved eal_option_device_parse() up in then order of eal init.
- Added run_once. (aaron suggestion).
- squashed v6 series patch no. [08/12] & [09/12] into one patch (Aaron comment)

 lib/librte_eal/bsdapp/eal/eal.c   | 27 ++++++++++++++++-----------
 lib/librte_eal/linuxapp/eal/eal.c | 27 ++++++++++++++++-----------
 2 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 07e72203f..f003f4c04 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -541,6 +541,22 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (eal_option_device_parse()) {
+		rte_errno = ENODEV;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
+
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			eal_hugepage_info_init() < 0) {
@@ -620,17 +636,6 @@ rte_eal_init(int argc, char **argv)
 		rte_config.master_lcore, thread_id, cpuset,
 		ret == 0 ? "" : "...");
 
-	if (eal_option_device_parse()) {
-		rte_errno = ENODEV;
-		return -1;
-	}
-
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index febbafdb3..f4901ffb6 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -798,6 +798,22 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (eal_option_device_parse()) {
+		rte_errno = ENODEV;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
+
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			internal_config.xen_dom0_support == 0 &&
@@ -895,17 +911,6 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (eal_option_device_parse()) {
-		rte_errno = ENODEV;
-		return -1;
-	}
-
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping
  2017-08-31  3:26           ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
                               ` (5 preceding siblings ...)
  2017-08-31  3:26             ` [PATCH v7 6/9] eal: auto detect iova mode Santosh Shukla
@ 2017-08-31  3:26             ` Santosh Shukla
  2017-09-04 15:40               ` Burakov, Anatoly
  2017-10-26 12:57               ` Jonas Pfefferle1
  2017-08-31  3:26             ` [PATCH v7 8/9] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
                               ` (3 subsequent siblings)
  10 siblings, 2 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-08-31  3:26 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen, aconole,
	Santosh Shukla

Check iova mode and accordingly map iova to pa or va.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c8a97b7e7..b32cd09a2 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
 
 		ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
@@ -792,7 +795,10 @@ vfio_spapr_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
 				 VFIO_DMA_MAP_FLAG_WRITE;
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v7 8/9] linuxapp/eal_memory: honor iova mode in virt2phy
  2017-08-31  3:26           ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
                               ` (6 preceding siblings ...)
  2017-08-31  3:26             ` [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
@ 2017-08-31  3:26             ` Santosh Shukla
  2017-09-04 15:42               ` Burakov, Anatoly
  2017-08-31  3:26             ` [PATCH v7 9/9] eal/rte_malloc: " Santosh Shukla
                               ` (2 subsequent siblings)
  10 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-08-31  3:26 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen, aconole,
	Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 52791282f..2d9d7c2dc 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -139,6 +139,9 @@ rte_mem_virt2phy(const void *virtaddr)
 	int page_size;
 	off_t offset;
 
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		return (uintptr_t)virtaddr;
+
 	/* when using dom0, /proc/self/pagemap always returns 0, check in
 	 * dpdk memory by browsing the memsegs */
 	if (rte_xen_dom0_supported()) {
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v7 9/9] eal/rte_malloc: honor iova mode in virt2phy
  2017-08-31  3:26           ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
                               ` (7 preceding siblings ...)
  2017-08-31  3:26             ` [PATCH v7 8/9] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
@ 2017-08-31  3:26             ` Santosh Shukla
  2017-09-04 15:44               ` Burakov, Anatoly
  2017-09-05 12:28             ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Hemant Agrawal
  2017-09-18 10:42             ` [PATCH v8 " Santosh Shukla
  10 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-08-31  3:26 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen, aconole,
	Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/common/rte_malloc.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 5c0627bf4..d65c05a4d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -251,10 +251,17 @@ rte_malloc_set_limit(__rte_unused const char *type,
 phys_addr_t
 rte_malloc_virt2phy(const void *addr)
 {
+	phys_addr_t paddr;
 	const struct malloc_elem *elem = malloc_elem_from_data(addr);
 	if (elem == NULL)
 		return RTE_BAD_PHYS_ADDR;
 	if (elem->ms->phys_addr == RTE_BAD_PHYS_ADDR)
 		return RTE_BAD_PHYS_ADDR;
-	return elem->ms->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		paddr = (uintptr_t)addr;
+	else
+		paddr = elem->ms->phys_addr +
+			((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+	return paddr;
 }
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 1/9] eal/pci: export match function
  2017-08-31  3:26             ` [PATCH v7 1/9] eal/pci: export match function Santosh Shukla
@ 2017-09-04 14:49               ` Burakov, Anatoly
  2017-09-06 15:39               ` Ferruh Yigit
  1 sibling, 0 replies; 248+ messages in thread
From: Burakov, Anatoly @ 2017-09-04 14:49 UTC (permalink / raw)
  To: Santosh Shukla, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, Gonzalez Monroy, Sergio, Richardson, Bruce,
	shreyansh.jain, gaetan.rivet, stephen, aconole

> From: Santosh Shukla [mailto:santosh.shukla@caviumnetworks.com]
> Sent: Thursday, August 31, 2017 4:26 AM
> To: dev@dpdk.org
> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
> gaetan.rivet@6wind.com; Burakov, Anatoly <anatoly.burakov@intel.com>;
> stephen@networkplumber.org; aconole@redhat.com; Santosh Shukla
> <santosh.shukla@caviumnetworks.com>
> Subject: [PATCH v7 1/9] eal/pci: export match function
> 
> Export rte_pci_match() function as it needed in the followup patch.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---

Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 2/9] eal/pci: get iommu class
  2017-08-31  3:26             ` [PATCH v7 2/9] eal/pci: get iommu class Santosh Shukla
@ 2017-09-04 14:53               ` Burakov, Anatoly
  2017-09-04 15:13                 ` santosh
  2017-09-04 15:30               ` Burakov, Anatoly
  1 sibling, 1 reply; 248+ messages in thread
From: Burakov, Anatoly @ 2017-09-04 14:53 UTC (permalink / raw)
  To: Santosh Shukla, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, Gonzalez Monroy, Sergio, Richardson, Bruce,
	shreyansh.jain, gaetan.rivet, stephen, aconole

> From: Santosh Shukla [mailto:santosh.shukla@caviumnetworks.com]
> Sent: Thursday, August 31, 2017 4:26 AM
> To: dev@dpdk.org
> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
> gaetan.rivet@6wind.com; Burakov, Anatoly <anatoly.burakov@intel.com>;
> stephen@networkplumber.org; aconole@redhat.com; Santosh Shukla
> <santosh.shukla@caviumnetworks.com>
> Subject: [PATCH v7 2/9] eal/pci: get iommu class
> 
> Introducing rte_pci_get_iommu_class API which helps to get iommu class of
> PCI device on the bus and returns preferred iova mapping mode for PCI bus.
> 
> Patch also add rte_pci_get_iommu_class definition for bsdapp, in bsdapp
> case - api returns default iova mode.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> v6 --> v7:
> - squashed v6 series patch [02/12] & [03/12] (Aaron comment).
> 
>  lib/librte_eal/bsdapp/eal/eal_pci.c           | 10 ++++++++++
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map |  1 +
>  lib/librte_eal/common/include/rte_bus.h       | 10 ++++++++++
>  lib/librte_eal/common/include/rte_pci.h       | 11 +++++++++++
>  4 files changed, 32 insertions(+)
> 
> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c
> b/lib/librte_eal/bsdapp/eal/eal_pci.c
> index 04eacdcc7..e2c252320 100644
> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
> @@ -403,6 +403,16 @@ rte_pci_scan(void)
>  	return -1;
>  }
> 
> +/*
> + * Get iommu class of pci devices on the bus.
> + */
> +enum rte_iova_mode
> +rte_pci_get_iommu_class(void)
> +{
> +	/* Supports only RTE_KDRV_NIC_UIO */
> +	return RTE_IOVA_PA;
> +}
> +
>  int
>  pci_update_device(const struct rte_pci_addr *addr)  { diff --git
> a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> index c819e3084..1fdcfb460 100644
> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> @@ -242,5 +242,6 @@ DPDK_17.11 {
>  	global:
> 
>  	rte_pci_match;
> +	rte_pci_get_iommu_class;
> 
>  } DPDK_17.08;
> diff --git a/lib/librte_eal/common/include/rte_bus.h
> b/lib/librte_eal/common/include/rte_bus.h
> index c79368d3c..9e40687e5 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -55,6 +55,16 @@ extern "C" {
>  /** Double linked list of buses */
>  TAILQ_HEAD(rte_bus_list, rte_bus);
> 
> +
> +/**
> + * IOVA mapping mode.
> + */
> +enum rte_iova_mode {
> +	RTE_IOVA_DC = 0,	/* Don't care mode */
> +	RTE_IOVA_PA = (1 << 0),
> +	RTE_IOVA_VA = (1 << 1)

Hi Santosh,

No need to set values explicitly, standard C will take care of it.

I wonder the purpose of "don't care" mode. It's not used for anything but cases when no driver is bound. All the libraries (e.g. rte_malloc) will only check for IOVA_VA mode. Can't we just used PA in all cases where IOVA_DC would be applicable?

Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
  2017-08-31  3:26             ` [PATCH v7 3/9] linuxapp/eal_pci: " Santosh Shukla
@ 2017-09-04 15:08               ` Burakov, Anatoly
  2017-09-05  8:47                 ` santosh
  2017-09-05  9:01               ` Burakov, Anatoly
  1 sibling, 1 reply; 248+ messages in thread
From: Burakov, Anatoly @ 2017-09-04 15:08 UTC (permalink / raw)
  To: Santosh Shukla, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, Gonzalez Monroy, Sergio, Richardson, Bruce,
	shreyansh.jain, gaetan.rivet, stephen, aconole

> From: Santosh Shukla [mailto:santosh.shukla@caviumnetworks.com]
> Sent: Thursday, August 31, 2017 4:26 AM
> To: dev@dpdk.org
> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
> gaetan.rivet@6wind.com; Burakov, Anatoly <anatoly.burakov@intel.com>;
> stephen@networkplumber.org; aconole@redhat.com; Santosh Shukla
> <santosh.shukla@caviumnetworks.com>
> Subject: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
> 
> Get iommu class of PCI device on the bus and returns preferred iova
> mapping mode for that bus.
> 
> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
> Flag used when driver needs to operate in iova=va mode.
> 
> Algorithm for iova scheme selection for PCI bus:
> 0. If no device bound then return with RTE_IOVA_DC mapping mode, else
> goto 1).
> 1. Look for device attached to vfio kdrv and has .drv_flag set to
> RTE_PCI_DRV_IOVA_AS_VA.
> 2. Look for any device attached to UIO class of driver.
> 3. Check for vfio-noiommu mode enabled.
> 
> If 2) & 3) is false and 1) is true then select mapping scheme as RTE_IOVA_VA.
> Otherwise use default mapping scheme (RTE_IOVA_PA).
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> ---
> v6 --> v7:
> - squashed v6 series patch no [01/12] & [05/12]..
>   i.e.. moved RTE_PCI_DRV_IOVA_AS_VA flag into this patch. (Aaron
> comment).
> 
>  lib/librte_eal/common/include/rte_pci.h         |  2 +
>  lib/librte_eal/linuxapp/eal/eal_pci.c           | 95
> +++++++++++++++++++++++++
>  lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++
>  lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>  5 files changed, 121 insertions(+)
> 
> diff --git a/lib/librte_eal/common/include/rte_pci.h
> b/lib/librte_eal/common/include/rte_pci.h
> index 0e36de093..a67d77f22 100644
> --- a/lib/librte_eal/common/include/rte_pci.h
> +++ b/lib/librte_eal/common/include/rte_pci.h
> @@ -202,6 +202,8 @@ struct rte_pci_bus {  #define
> RTE_PCI_DRV_INTR_RMV 0x0010
>  /** Device driver needs to keep mapped resources if unsupported dev
> detected */  #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
> +/** Device driver supports iova as va */ #define
> RTE_PCI_DRV_IOVA_AS_VA
> +0X0040
> 
>  /**
>   * A structure describing a PCI mapping.
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c
> b/lib/librte_eal/linuxapp/eal/eal_pci.c
> index 8951ce742..9725fd493 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> @@ -45,6 +45,7 @@
>  #include "eal_filesystem.h"
>  #include "eal_private.h"
>  #include "eal_pci_init.h"
> +#include "eal_vfio.h"
> 
>  /**
>   * @file
> @@ -487,6 +488,100 @@ rte_pci_scan(void)
>  	return -1;
>  }
> 
> +/*
> + * Is pci device bound to any kdrv
> + */
> +static inline int
> +pci_device_is_bound(void)
> +{
> +	struct rte_pci_device *dev = NULL;
> +	int ret = 0;
> +
> +	FOREACH_DEVICE_ON_PCIBUS(dev) {
> +		if (dev->kdrv == RTE_KDRV_UNKNOWN ||
> +		    dev->kdrv == RTE_KDRV_NONE) {
> +			continue;
> +		} else {
> +			ret = 1;
> +			break;
> +		}
> +	}
> +	return ret;
> +}
> +
> +/*
> + * Any one of the device bound to uio
> + */
> +static inline int
> +pci_device_bound_uio(void)
> +{
> +	struct rte_pci_device *dev = NULL;
> +
> +	FOREACH_DEVICE_ON_PCIBUS(dev) {
> +		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
> +		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
> +			return 1;
> +		}
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Any one of the device has iova as va  */ static inline int
> +pci_device_has_iova_va(void)
> +{
> +	struct rte_pci_device *dev = NULL;
> +	struct rte_pci_driver *drv = NULL;
> +
> +	FOREACH_DRIVER_ON_PCIBUS(drv) {
> +		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
> +			FOREACH_DEVICE_ON_PCIBUS(dev) {
> +				if (dev->kdrv == RTE_KDRV_VFIO &&
> +				    rte_pci_match(drv, dev))
> +					return 1;
> +			}
> +		}
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Get iommu class of PCI devices on the bus.
> + */
> +enum rte_iova_mode
> +rte_pci_get_iommu_class(void)
> +{
> +	bool is_bound;
> +	bool is_vfio_noiommu_enabled = true;
> +	bool has_iova_va;
> +	bool is_bound_uio;
> +
> +	is_bound = pci_device_is_bound();
> +	if (!is_bound)
> +		return RTE_IOVA_DC;
> +
> +	has_iova_va = pci_device_has_iova_va();
> +	is_bound_uio = pci_device_bound_uio(); #ifdef VFIO_PRESENT
> +	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 :
> 0;

If you specify is_vfio_noiommu_enabled as bool, you should probably treat it as such, and assign true/false.

Other than that, I'm curious why is it always set to "true" by default? If we don't have VFIO compiled, it seems like the error message would always complain about vfio-noiommu mode being enabled, which is confusing.

Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 2/9] eal/pci: get iommu class
  2017-09-04 14:53               ` Burakov, Anatoly
@ 2017-09-04 15:13                 ` santosh
  2017-09-04 15:16                   ` Burakov, Anatoly
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-09-04 15:13 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, Gonzalez Monroy, Sergio, Richardson, Bruce,
	shreyansh.jain, gaetan.rivet, stephen, aconole

Hi Anatoly,


On Monday 04 September 2017 08:23 PM, Burakov, Anatoly wrote:
>> From: Santosh Shukla [mailto:santosh.shukla@caviumnetworks.com]
>> Sent: Thursday, August 31, 2017 4:26 AM
>> To: dev@dpdk.org
>> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
>> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
>> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
>> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
>> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
>> gaetan.rivet@6wind.com; Burakov, Anatoly <anatoly.burakov@intel.com>;
>> stephen@networkplumber.org; aconole@redhat.com; Santosh Shukla
>> <santosh.shukla@caviumnetworks.com>
>> Subject: [PATCH v7 2/9] eal/pci: get iommu class
>>
>> Introducing rte_pci_get_iommu_class API which helps to get iommu class of
>> PCI device on the bus and returns preferred iova mapping mode for PCI bus.
>>
>> Patch also add rte_pci_get_iommu_class definition for bsdapp, in bsdapp
>> case - api returns default iova mode.
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>> v6 --> v7:
>> - squashed v6 series patch [02/12] & [03/12] (Aaron comment).
>>
>>  lib/librte_eal/bsdapp/eal/eal_pci.c           | 10 ++++++++++
>>  lib/librte_eal/bsdapp/eal/rte_eal_version.map |  1 +
>>  lib/librte_eal/common/include/rte_bus.h       | 10 ++++++++++
>>  lib/librte_eal/common/include/rte_pci.h       | 11 +++++++++++
>>  4 files changed, 32 insertions(+)
>>
>> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c
>> b/lib/librte_eal/bsdapp/eal/eal_pci.c
>> index 04eacdcc7..e2c252320 100644
>> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
>> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
>> @@ -403,6 +403,16 @@ rte_pci_scan(void)
>>  	return -1;
>>  }
>>
>> +/*
>> + * Get iommu class of pci devices on the bus.
>> + */
>> +enum rte_iova_mode
>> +rte_pci_get_iommu_class(void)
>> +{
>> +	/* Supports only RTE_KDRV_NIC_UIO */
>> +	return RTE_IOVA_PA;
>> +}
>> +
>>  int
>>  pci_update_device(const struct rte_pci_addr *addr)  { diff --git
>> a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>> b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>> index c819e3084..1fdcfb460 100644
>> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>> @@ -242,5 +242,6 @@ DPDK_17.11 {
>>  	global:
>>
>>  	rte_pci_match;
>> +	rte_pci_get_iommu_class;
>>
>>  } DPDK_17.08;
>> diff --git a/lib/librte_eal/common/include/rte_bus.h
>> b/lib/librte_eal/common/include/rte_bus.h
>> index c79368d3c..9e40687e5 100644
>> --- a/lib/librte_eal/common/include/rte_bus.h
>> +++ b/lib/librte_eal/common/include/rte_bus.h
>> @@ -55,6 +55,16 @@ extern "C" {
>>  /** Double linked list of buses */
>>  TAILQ_HEAD(rte_bus_list, rte_bus);
>>
>> +
>> +/**
>> + * IOVA mapping mode.
>> + */
>> +enum rte_iova_mode {
>> +	RTE_IOVA_DC = 0,	/* Don't care mode */
>> +	RTE_IOVA_PA = (1 << 0),
>> +	RTE_IOVA_VA = (1 << 1)
> Hi Santosh,
>
> No need to set values explicitly, standard C will take care of it.

no strong opinion, change queued for v8.

> I wonder the purpose of "don't care" mode. It's not used for anything but cases when no driver is bound. All the libraries (e.g. rte_malloc) will only check for IOVA_VA mode. Can't we just used PA in all cases where IOVA_DC would be applicable?

Indeed policy is to use iova_pa for _dc case, 
but we need a way to distinguish between no device found vs device attached
(if attached then decide upon its iova scheme).

For more detailed info pl. refer [1].

[1] http://dpdk.org/dev/patchwork/patch/26762/

> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 2/9] eal/pci: get iommu class
  2017-09-04 15:13                 ` santosh
@ 2017-09-04 15:16                   ` Burakov, Anatoly
  2017-09-04 15:31                     ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Burakov, Anatoly @ 2017-09-04 15:16 UTC (permalink / raw)
  To: santosh, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, Gonzalez Monroy, Sergio, Richardson, Bruce,
	shreyansh.jain, gaetan.rivet, stephen, aconole

Hi Santosh,

> From: santosh [mailto:santosh.shukla@caviumnetworks.com]
> Sent: Monday, September 4, 2017 4:14 PM
> To: Burakov, Anatoly <anatoly.burakov@intel.com>; dev@dpdk.org
> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
> gaetan.rivet@6wind.com; stephen@networkplumber.org;
> aconole@redhat.com
> Subject: Re: [PATCH v7 2/9] eal/pci: get iommu class
> 
> Hi Anatoly,
> 
> 
> On Monday 04 September 2017 08:23 PM, Burakov, Anatoly wrote:
> >> From: Santosh Shukla [mailto:santosh.shukla@caviumnetworks.com]
> >> Sent: Thursday, August 31, 2017 4:26 AM
> >> To: dev@dpdk.org
> >> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> >> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
> >> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
> >> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
> >> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
> >> gaetan.rivet@6wind.com; Burakov, Anatoly
> <anatoly.burakov@intel.com>;
> >> stephen@networkplumber.org; aconole@redhat.com; Santosh Shukla
> >> <santosh.shukla@caviumnetworks.com>
> >> Subject: [PATCH v7 2/9] eal/pci: get iommu class
> >>
> >> Introducing rte_pci_get_iommu_class API which helps to get iommu
> >> class of PCI device on the bus and returns preferred iova mapping mode
> for PCI bus.
> >>
> >> Patch also add rte_pci_get_iommu_class definition for bsdapp, in
> >> bsdapp case - api returns default iova mode.
> >>
> >> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> >> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> >> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> ---
> >> v6 --> v7:
> >> - squashed v6 series patch [02/12] & [03/12] (Aaron comment).
> >>
> >>  lib/librte_eal/bsdapp/eal/eal_pci.c           | 10 ++++++++++
> >>  lib/librte_eal/bsdapp/eal/rte_eal_version.map |  1 +
> >>  lib/librte_eal/common/include/rte_bus.h       | 10 ++++++++++
> >>  lib/librte_eal/common/include/rte_pci.h       | 11 +++++++++++
> >>  4 files changed, 32 insertions(+)
> >>
> >> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c
> >> b/lib/librte_eal/bsdapp/eal/eal_pci.c
> >> index 04eacdcc7..e2c252320 100644
> >> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
> >> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
> >> @@ -403,6 +403,16 @@ rte_pci_scan(void)
> >>  	return -1;
> >>  }
> >>
> >> +/*
> >> + * Get iommu class of pci devices on the bus.
> >> + */
> >> +enum rte_iova_mode
> >> +rte_pci_get_iommu_class(void)
> >> +{
> >> +	/* Supports only RTE_KDRV_NIC_UIO */
> >> +	return RTE_IOVA_PA;
> >> +}
> >> +
> >>  int
> >>  pci_update_device(const struct rte_pci_addr *addr)  { diff --git
> >> a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> >> b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> >> index c819e3084..1fdcfb460 100644
> >> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> >> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> >> @@ -242,5 +242,6 @@ DPDK_17.11 {
> >>  	global:
> >>
> >>  	rte_pci_match;
> >> +	rte_pci_get_iommu_class;
> >>
> >>  } DPDK_17.08;
> >> diff --git a/lib/librte_eal/common/include/rte_bus.h
> >> b/lib/librte_eal/common/include/rte_bus.h
> >> index c79368d3c..9e40687e5 100644
> >> --- a/lib/librte_eal/common/include/rte_bus.h
> >> +++ b/lib/librte_eal/common/include/rte_bus.h
> >> @@ -55,6 +55,16 @@ extern "C" {
> >>  /** Double linked list of buses */
> >>  TAILQ_HEAD(rte_bus_list, rte_bus);
> >>
> >> +
> >> +/**
> >> + * IOVA mapping mode.
> >> + */
> >> +enum rte_iova_mode {
> >> +	RTE_IOVA_DC = 0,	/* Don't care mode */
> >> +	RTE_IOVA_PA = (1 << 0),
> >> +	RTE_IOVA_VA = (1 << 1)
> > Hi Santosh,
> >
> > No need to set values explicitly, standard C will take care of it.
> 
> no strong opinion, change queued for v8.
> 
> > I wonder the purpose of "don't care" mode. It's not used for anything but
> cases when no driver is bound. All the libraries (e.g. rte_malloc) will only
> check for IOVA_VA mode. Can't we just used PA in all cases where IOVA_DC
> would be applicable?
> 
> Indeed policy is to use iova_pa for _dc case, but we need a way to distinguish
> between no device found vs device attached (if attached then decide upon
> its iova scheme).
> 
> For more detailed info pl. refer [1].
> 
> [1] http://dpdk.org/dev/patchwork/patch/26762/
> 

Maybe make your intentions more explicit then? I.e. instead of "don't care" use "no device" or some such. No strong opinion either way though, I'm fine with leaving it as is.

Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 4/9] bus: get iommu class
  2017-08-31  3:26             ` [PATCH v7 4/9] bus: " Santosh Shukla
@ 2017-09-04 15:25               ` Burakov, Anatoly
  0 siblings, 0 replies; 248+ messages in thread
From: Burakov, Anatoly @ 2017-09-04 15:25 UTC (permalink / raw)
  To: Santosh Shukla, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, Gonzalez Monroy, Sergio, Richardson, Bruce,
	shreyansh.jain, gaetan.rivet, stephen, aconole

> From: Santosh Shukla [mailto:santosh.shukla@caviumnetworks.com]
> Sent: Thursday, August 31, 2017 4:26 AM
> To: dev@dpdk.org
> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
> gaetan.rivet@6wind.com; Burakov, Anatoly <anatoly.burakov@intel.com>;
> stephen@networkplumber.org; aconole@redhat.com; Santosh Shukla
> <santosh.shukla@caviumnetworks.com>
> Subject: [PATCH v7 4/9] bus: get iommu class
> 
> API(rte_bus_get_iommu_class) helps to automatically detect and select
> appropriate iova mapping scheme for iommu capable device on that bus.
> 
> Algorithm for iova scheme selection for bus:
> 0. Iterate through bus_list.
> 1. Collect each bus iova mode value and update into 'mode' var.
> 2. Mode selection scheme is:
> if mode == 0 then iova mode is _pa,
> if mode == 1 then iova mode is _pa,
> if mode == 2 then iova mode is _va,
> if mode == 3 then iova mode ia _pa.
> 
> So mode !=2  will be default iova mode (_pa).
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> ---

Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 2/9] eal/pci: get iommu class
  2017-08-31  3:26             ` [PATCH v7 2/9] eal/pci: get iommu class Santosh Shukla
  2017-09-04 14:53               ` Burakov, Anatoly
@ 2017-09-04 15:30               ` Burakov, Anatoly
  1 sibling, 0 replies; 248+ messages in thread
From: Burakov, Anatoly @ 2017-09-04 15:30 UTC (permalink / raw)
  To: Santosh Shukla, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, Gonzalez Monroy, Sergio, Richardson, Bruce,
	shreyansh.jain, gaetan.rivet, stephen, aconole

> From: Santosh Shukla [mailto:santosh.shukla@caviumnetworks.com]
> Sent: Thursday, August 31, 2017 4:26 AM
> To: dev@dpdk.org
> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
> gaetan.rivet@6wind.com; Burakov, Anatoly <anatoly.burakov@intel.com>;
> stephen@networkplumber.org; aconole@redhat.com; Santosh Shukla
> <santosh.shukla@caviumnetworks.com>
> Subject: [PATCH v7 2/9] eal/pci: get iommu class
> 
> Introducing rte_pci_get_iommu_class API which helps to get iommu class of
> PCI device on the bus and returns preferred iova mapping mode for PCI bus.
> 
> Patch also add rte_pci_get_iommu_class definition for bsdapp, in bsdapp
> case - api returns default iova mode.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> 
> +/*
> + * Get iommu class of pci devices on the bus.
> + */
> +enum rte_iova_mode
> +rte_pci_get_iommu_class(void)
> +{
> +	/* Supports only RTE_KDRV_NIC_UIO */
> +	return RTE_IOVA_PA;
> +}
> +

Hi Santosh,

I think you should add a linuxapp stub in this commit as well.

Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 2/9] eal/pci: get iommu class
  2017-09-04 15:16                   ` Burakov, Anatoly
@ 2017-09-04 15:31                     ` santosh
  2017-09-04 15:35                       ` Burakov, Anatoly
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-09-04 15:31 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, Gonzalez Monroy, Sergio, Richardson, Bruce,
	shreyansh.jain, gaetan.rivet, stephen, aconole

Hi Anatoly,


On Monday 04 September 2017 08:46 PM, Burakov, Anatoly wrote:
> Hi Santosh,
>
>> From: santosh [mailto:santosh.shukla@caviumnetworks.com]
>> Sent: Monday, September 4, 2017 4:14 PM
>> To: Burakov, Anatoly <anatoly.burakov@intel.com>; dev@dpdk.org
>> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
>> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
>> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
>> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
>> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
>> gaetan.rivet@6wind.com; stephen@networkplumber.org;
>> aconole@redhat.com
>> Subject: Re: [PATCH v7 2/9] eal/pci: get iommu class
>>
>> Hi Anatoly,
>>
>>
>> On Monday 04 September 2017 08:23 PM, Burakov, Anatoly wrote:
>>>> From: Santosh Shukla [mailto:santosh.shukla@caviumnetworks.com]
>>>> Sent: Thursday, August 31, 2017 4:26 AM
>>>> To: dev@dpdk.org
>>>> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
>>>> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
>>>> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
>>>> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
>>>> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
>>>> gaetan.rivet@6wind.com; Burakov, Anatoly
>> <anatoly.burakov@intel.com>;
>>>> stephen@networkplumber.org; aconole@redhat.com; Santosh Shukla
>>>> <santosh.shukla@caviumnetworks.com>
>>>> Subject: [PATCH v7 2/9] eal/pci: get iommu class
>>>>
>>>> Introducing rte_pci_get_iommu_class API which helps to get iommu
>>>> class of PCI device on the bus and returns preferred iova mapping mode
>> for PCI bus.
>>>> Patch also add rte_pci_get_iommu_class definition for bsdapp, in
>>>> bsdapp case - api returns default iova mode.
>>>>
>>>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> ---
>>>> v6 --> v7:
>>>> - squashed v6 series patch [02/12] & [03/12] (Aaron comment).
>>>>
>>>>  lib/librte_eal/bsdapp/eal/eal_pci.c           | 10 ++++++++++
>>>>  lib/librte_eal/bsdapp/eal/rte_eal_version.map |  1 +
>>>>  lib/librte_eal/common/include/rte_bus.h       | 10 ++++++++++
>>>>  lib/librte_eal/common/include/rte_pci.h       | 11 +++++++++++
>>>>  4 files changed, 32 insertions(+)
>>>>
>>>> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c
>>>> b/lib/librte_eal/bsdapp/eal/eal_pci.c
>>>> index 04eacdcc7..e2c252320 100644
>>>> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
>>>> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
>>>> @@ -403,6 +403,16 @@ rte_pci_scan(void)
>>>>  	return -1;
>>>>  }
>>>>
>>>> +/*
>>>> + * Get iommu class of pci devices on the bus.
>>>> + */
>>>> +enum rte_iova_mode
>>>> +rte_pci_get_iommu_class(void)
>>>> +{
>>>> +	/* Supports only RTE_KDRV_NIC_UIO */
>>>> +	return RTE_IOVA_PA;
>>>> +}
>>>> +
>>>>  int
>>>>  pci_update_device(const struct rte_pci_addr *addr)  { diff --git
>>>> a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>> b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>> index c819e3084..1fdcfb460 100644
>>>> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>>>> @@ -242,5 +242,6 @@ DPDK_17.11 {
>>>>  	global:
>>>>
>>>>  	rte_pci_match;
>>>> +	rte_pci_get_iommu_class;
>>>>
>>>>  } DPDK_17.08;
>>>> diff --git a/lib/librte_eal/common/include/rte_bus.h
>>>> b/lib/librte_eal/common/include/rte_bus.h
>>>> index c79368d3c..9e40687e5 100644
>>>> --- a/lib/librte_eal/common/include/rte_bus.h
>>>> +++ b/lib/librte_eal/common/include/rte_bus.h
>>>> @@ -55,6 +55,16 @@ extern "C" {
>>>>  /** Double linked list of buses */
>>>>  TAILQ_HEAD(rte_bus_list, rte_bus);
>>>>
>>>> +
>>>> +/**
>>>> + * IOVA mapping mode.
>>>> + */
>>>> +enum rte_iova_mode {
>>>> +	RTE_IOVA_DC = 0,	/* Don't care mode */
>>>> +	RTE_IOVA_PA = (1 << 0),
>>>> +	RTE_IOVA_VA = (1 << 1)
>>> Hi Santosh,
>>>
>>> No need to set values explicitly, standard C will take care of it.
>> no strong opinion, change queued for v8.

recalling myself on why expressed RTE_IOVA_PA/_VA as 1 << 0/1.
Since user in future (by mistake) may add new entry example: RTE_IOVA_XX = 3 then it will
enable both _pa and _va both, So to avoid such programming error, deliberately
kept _pa = 1 << 0 and _va = 1 << 1, if new entry comes (highly unlikely) then
should be programmed as _xx = 1 << 2;

If you convinced then I think - i don;t need to spin this change for v8. 

>>> I wonder the purpose of "don't care" mode. It's not used for anything but
>> cases when no driver is bound. All the libraries (e.g. rte_malloc) will only
>> check for IOVA_VA mode. Can't we just used PA in all cases where IOVA_DC
>> would be applicable?
>>
>> Indeed policy is to use iova_pa for _dc case, but we need a way to distinguish
>> between no device found vs device attached (if attached then decide upon
>> its iova scheme).
>>
>> For more detailed info pl. refer [1].
>>
>> [1] http://dpdk.org/dev/patchwork/patch/26762/
>>
> Maybe make your intentions more explicit then? I.e. instead of "don't care" use "no device" or some such. No strong opinion either way though, I'm fine with leaving it as is.

prefer keeping _DC, if you don;t mind, sounds more appropriate to me. 

> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 6/9] eal: auto detect iova mode
  2017-08-31  3:26             ` [PATCH v7 6/9] eal: auto detect iova mode Santosh Shukla
@ 2017-09-04 15:32               ` Burakov, Anatoly
  0 siblings, 0 replies; 248+ messages in thread
From: Burakov, Anatoly @ 2017-09-04 15:32 UTC (permalink / raw)
  To: Santosh Shukla, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, Gonzalez Monroy, Sergio, Richardson, Bruce,
	shreyansh.jain, gaetan.rivet, stephen, aconole

> From: Santosh Shukla [mailto:santosh.shukla@caviumnetworks.com]
> Sent: Thursday, August 31, 2017 4:26 AM
> To: dev@dpdk.org
> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
> gaetan.rivet@6wind.com; Burakov, Anatoly <anatoly.burakov@intel.com>;
> stephen@networkplumber.org; aconole@redhat.com; Santosh Shukla
> <santosh.shukla@caviumnetworks.com>
> Subject: [PATCH v7 6/9] eal: auto detect iova mode
> 
> For auto detection purpose:
> * Below calls moved up in the eal initialization order:
> 	- eal_option_device_parse
> 	- rte_bus_scan
> 
> Based on the result of rte_bus_scan_iommu_class - select iova mapping
> mode.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---

Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 2/9] eal/pci: get iommu class
  2017-09-04 15:31                     ` santosh
@ 2017-09-04 15:35                       ` Burakov, Anatoly
  0 siblings, 0 replies; 248+ messages in thread
From: Burakov, Anatoly @ 2017-09-04 15:35 UTC (permalink / raw)
  To: santosh, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, Gonzalez Monroy, Sergio, Richardson, Bruce,
	shreyansh.jain, gaetan.rivet, stephen, aconole

> From: santosh [mailto:santosh.shukla@caviumnetworks.com]
> Sent: Monday, September 4, 2017 4:32 PM
> To: Burakov, Anatoly <anatoly.burakov@intel.com>; dev@dpdk.org
> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
> gaetan.rivet@6wind.com; stephen@networkplumber.org;
> aconole@redhat.com
> Subject: Re: [PATCH v7 2/9] eal/pci: get iommu class
> 
> Hi Anatoly,
> 
> 
> On Monday 04 September 2017 08:46 PM, Burakov, Anatoly wrote:
> > Hi Santosh,
> >
> >> From: santosh [mailto:santosh.shukla@caviumnetworks.com]
> >> Sent: Monday, September 4, 2017 4:14 PM
> >> To: Burakov, Anatoly <anatoly.burakov@intel.com>; dev@dpdk.org
> >> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> >> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
> >> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
> >> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
> >> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
> >> gaetan.rivet@6wind.com; stephen@networkplumber.org;
> >> aconole@redhat.com
> >> Subject: Re: [PATCH v7 2/9] eal/pci: get iommu class
> >>
> >> Hi Anatoly,
> >>
> >>
> >> On Monday 04 September 2017 08:23 PM, Burakov, Anatoly wrote:
> >>>> From: Santosh Shukla [mailto:santosh.shukla@caviumnetworks.com]
> >>>> Sent: Thursday, August 31, 2017 4:26 AM
> >>>> To: dev@dpdk.org
> >>>> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> >>>> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
> >>>> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
> >>>> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
> >>>> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
> >>>> gaetan.rivet@6wind.com; Burakov, Anatoly
> >> <anatoly.burakov@intel.com>;
> >>>> stephen@networkplumber.org; aconole@redhat.com; Santosh Shukla
> >>>> <santosh.shukla@caviumnetworks.com>
> >>>> Subject: [PATCH v7 2/9] eal/pci: get iommu class
> >>>>
> >>>> Introducing rte_pci_get_iommu_class API which helps to get iommu
> >>>> class of PCI device on the bus and returns preferred iova mapping
> >>>> mode
> >> for PCI bus.
> >>>> Patch also add rte_pci_get_iommu_class definition for bsdapp, in
> >>>> bsdapp case - api returns default iova mode.
> >>>>
> >>>> Signed-off-by: Santosh Shukla
> <santosh.shukla@caviumnetworks.com>
> >>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> >>>> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>> ---
> >>>> v6 --> v7:
> >>>> - squashed v6 series patch [02/12] & [03/12] (Aaron comment).
> >>>>
> >>>>  lib/librte_eal/bsdapp/eal/eal_pci.c           | 10 ++++++++++
> >>>>  lib/librte_eal/bsdapp/eal/rte_eal_version.map |  1 +
> >>>>  lib/librte_eal/common/include/rte_bus.h       | 10 ++++++++++
> >>>>  lib/librte_eal/common/include/rte_pci.h       | 11 +++++++++++
> >>>>  4 files changed, 32 insertions(+)
> >>>>
> >>>> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c
> >>>> b/lib/librte_eal/bsdapp/eal/eal_pci.c
> >>>> index 04eacdcc7..e2c252320 100644
> >>>> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
> >>>> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
> >>>> @@ -403,6 +403,16 @@ rte_pci_scan(void)
> >>>>  	return -1;
> >>>>  }
> >>>>
> >>>> +/*
> >>>> + * Get iommu class of pci devices on the bus.
> >>>> + */
> >>>> +enum rte_iova_mode
> >>>> +rte_pci_get_iommu_class(void)
> >>>> +{
> >>>> +	/* Supports only RTE_KDRV_NIC_UIO */
> >>>> +	return RTE_IOVA_PA;
> >>>> +}
> >>>> +
> >>>>  int
> >>>>  pci_update_device(const struct rte_pci_addr *addr)  { diff --git
> >>>> a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> >>>> b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> >>>> index c819e3084..1fdcfb460 100644
> >>>> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> >>>> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> >>>> @@ -242,5 +242,6 @@ DPDK_17.11 {
> >>>>  	global:
> >>>>
> >>>>  	rte_pci_match;
> >>>> +	rte_pci_get_iommu_class;
> >>>>
> >>>>  } DPDK_17.08;
> >>>> diff --git a/lib/librte_eal/common/include/rte_bus.h
> >>>> b/lib/librte_eal/common/include/rte_bus.h
> >>>> index c79368d3c..9e40687e5 100644
> >>>> --- a/lib/librte_eal/common/include/rte_bus.h
> >>>> +++ b/lib/librte_eal/common/include/rte_bus.h
> >>>> @@ -55,6 +55,16 @@ extern "C" {
> >>>>  /** Double linked list of buses */  TAILQ_HEAD(rte_bus_list,
> >>>> rte_bus);
> >>>>
> >>>> +
> >>>> +/**
> >>>> + * IOVA mapping mode.
> >>>> + */
> >>>> +enum rte_iova_mode {
> >>>> +	RTE_IOVA_DC = 0,	/* Don't care mode */
> >>>> +	RTE_IOVA_PA = (1 << 0),
> >>>> +	RTE_IOVA_VA = (1 << 1)
> >>> Hi Santosh,
> >>>
> >>> No need to set values explicitly, standard C will take care of it.
> >> no strong opinion, change queued for v8.
> 
> recalling myself on why expressed RTE_IOVA_PA/_VA as 1 << 0/1.
> Since user in future (by mistake) may add new entry example: RTE_IOVA_XX
> = 3 then it will enable both _pa and _va both, So to avoid such programming
> error, deliberately kept _pa = 1 << 0 and _va = 1 << 1, if new entry comes
> (highly unlikely) then should be programmed as _xx = 1 << 2;
> 
> If you convinced then I think - i don;t need to spin this change for v8.

Hi Santosh,

Fair enough (on both issues).

> 
> >>> I wonder the purpose of "don't care" mode. It's not used for
> >>> anything but
> >> cases when no driver is bound. All the libraries (e.g. rte_malloc)
> >> will only check for IOVA_VA mode. Can't we just used PA in all cases
> >> where IOVA_DC would be applicable?
> >>
> >> Indeed policy is to use iova_pa for _dc case, but we need a way to
> >> distinguish between no device found vs device attached (if attached
> >> then decide upon its iova scheme).
> >>
> >> For more detailed info pl. refer [1].
> >>
> >> [1] http://dpdk.org/dev/patchwork/patch/26762/
> >>
> > Maybe make your intentions more explicit then? I.e. instead of "don't
> care" use "no device" or some such. No strong opinion either way though,
> I'm fine with leaving it as is.
> 
> prefer keeping _DC, if you don;t mind, sounds more appropriate to me.
> 
> > Thanks,
> > Anatoly

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping
  2017-08-31  3:26             ` [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
@ 2017-09-04 15:40               ` Burakov, Anatoly
  2017-10-26 12:57               ` Jonas Pfefferle1
  1 sibling, 0 replies; 248+ messages in thread
From: Burakov, Anatoly @ 2017-09-04 15:40 UTC (permalink / raw)
  To: Santosh Shukla, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, Gonzalez Monroy, Sergio, Richardson, Bruce,
	shreyansh.jain, gaetan.rivet, stephen, aconole

> From: Santosh Shukla [mailto:santosh.shukla@caviumnetworks.com]
> Sent: Thursday, August 31, 2017 4:26 AM
> To: dev@dpdk.org
> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
> gaetan.rivet@6wind.com; Burakov, Anatoly <anatoly.burakov@intel.com>;
> stephen@networkplumber.org; aconole@redhat.com; Santosh Shukla
> <santosh.shukla@caviumnetworks.com>
> Subject: [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping
> 
> Check iova mode and accordingly map iova to pa or va.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 8/9] linuxapp/eal_memory: honor iova mode in virt2phy
  2017-08-31  3:26             ` [PATCH v7 8/9] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
@ 2017-09-04 15:42               ` Burakov, Anatoly
  0 siblings, 0 replies; 248+ messages in thread
From: Burakov, Anatoly @ 2017-09-04 15:42 UTC (permalink / raw)
  To: Santosh Shukla, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, Gonzalez Monroy, Sergio, Richardson, Bruce,
	shreyansh.jain, gaetan.rivet, stephen, aconole

> From: Santosh Shukla [mailto:santosh.shukla@caviumnetworks.com]
> Sent: Thursday, August 31, 2017 4:26 AM
> To: dev@dpdk.org
> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
> gaetan.rivet@6wind.com; Burakov, Anatoly <anatoly.burakov@intel.com>;
> stephen@networkplumber.org; aconole@redhat.com; Santosh Shukla
> <santosh.shukla@caviumnetworks.com>
> Subject: [PATCH v7 8/9] linuxapp/eal_memory: honor iova mode in virt2phy
> 
> Check iova mode and accordingly return phy addr.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---

Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 9/9] eal/rte_malloc: honor iova mode in virt2phy
  2017-08-31  3:26             ` [PATCH v7 9/9] eal/rte_malloc: " Santosh Shukla
@ 2017-09-04 15:44               ` Burakov, Anatoly
  0 siblings, 0 replies; 248+ messages in thread
From: Burakov, Anatoly @ 2017-09-04 15:44 UTC (permalink / raw)
  To: Santosh Shukla, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, Gonzalez Monroy, Sergio, Richardson, Bruce,
	shreyansh.jain, gaetan.rivet, stephen, aconole

> From: Santosh Shukla [mailto:santosh.shukla@caviumnetworks.com]
> Sent: Thursday, August 31, 2017 4:26 AM
> To: dev@dpdk.org
> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
> gaetan.rivet@6wind.com; Burakov, Anatoly <anatoly.burakov@intel.com>;
> stephen@networkplumber.org; aconole@redhat.com; Santosh Shukla
> <santosh.shukla@caviumnetworks.com>
> Subject: [PATCH v7 9/9] eal/rte_malloc: honor iova mode in virt2phy
> 
> Check iova mode and accordingly return phy addr.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/librte_eal/common/rte_malloc.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/common/rte_malloc.c
> b/lib/librte_eal/common/rte_malloc.c
> index 5c0627bf4..d65c05a4d 100644
> --- a/lib/librte_eal/common/rte_malloc.c
> +++ b/lib/librte_eal/common/rte_malloc.c
> @@ -251,10 +251,17 @@ rte_malloc_set_limit(__rte_unused const char
> *type,  phys_addr_t  rte_malloc_virt2phy(const void *addr)  {
> +	phys_addr_t paddr;
>  	const struct malloc_elem *elem = malloc_elem_from_data(addr);
>  	if (elem == NULL)
>  		return RTE_BAD_PHYS_ADDR;
>  	if (elem->ms->phys_addr == RTE_BAD_PHYS_ADDR)
>  		return RTE_BAD_PHYS_ADDR;
> -	return elem->ms->phys_addr + ((uintptr_t)addr - (uintptr_t)elem-
> >ms->addr);
> +
> +	if (rte_eal_iova_mode() == RTE_IOVA_VA)
> +		paddr = (uintptr_t)addr;
> +	else
> +		paddr = elem->ms->phys_addr +
> +			((uintptr_t)addr - (uintptr_t)elem->ms->addr);
> +	return paddr;
>  }

Hi Santosh,

I think there's a RTE_PTR_DIFF macro for stuff like this, but otherwise

Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
  2017-09-04 15:08               ` Burakov, Anatoly
@ 2017-09-05  8:47                 ` santosh
  2017-09-05  8:55                   ` Burakov, Anatoly
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-09-05  8:47 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, Gonzalez Monroy, Sergio, Richardson, Bruce,
	shreyansh.jain, gaetan.rivet, stephen, aconole

Hi Anatoly,


On Monday 04 September 2017 08:38 PM, Burakov, Anatoly wrote:
>> From: Santosh Shukla [mailto:santosh.shukla@caviumnetworks.com]
>> Sent: Thursday, August 31, 2017 4:26 AM
>> To: dev@dpdk.org
>> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
>> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
>> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
>> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
>> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
>> gaetan.rivet@6wind.com; Burakov, Anatoly <anatoly.burakov@intel.com>;
>> stephen@networkplumber.org; aconole@redhat.com; Santosh Shukla
>> <santosh.shukla@caviumnetworks.com>
>> Subject: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
>>
>> Get iommu class of PCI device on the bus and returns preferred iova
>> mapping mode for that bus.
>>
>> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
>> Flag used when driver needs to operate in iova=va mode.
>>
>> Algorithm for iova scheme selection for PCI bus:
>> 0. If no device bound then return with RTE_IOVA_DC mapping mode, else
>> goto 1).
>> 1. Look for device attached to vfio kdrv and has .drv_flag set to
>> RTE_PCI_DRV_IOVA_AS_VA.
>> 2. Look for any device attached to UIO class of driver.
>> 3. Check for vfio-noiommu mode enabled.
>>
>> If 2) & 3) is false and 1) is true then select mapping scheme as RTE_IOVA_VA.
>> Otherwise use default mapping scheme (RTE_IOVA_PA).
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>> ---
>> v6 --> v7:
>> - squashed v6 series patch no [01/12] & [05/12]..
>>   i.e.. moved RTE_PCI_DRV_IOVA_AS_VA flag into this patch. (Aaron
>> comment).
>>
>>  lib/librte_eal/common/include/rte_pci.h         |  2 +
>>  lib/librte_eal/linuxapp/eal/eal_pci.c           | 95
>> +++++++++++++++++++++++++
>>  lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++
>>  lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>  5 files changed, 121 insertions(+)
>>
>> diff --git a/lib/librte_eal/common/include/rte_pci.h
>> b/lib/librte_eal/common/include/rte_pci.h
>> index 0e36de093..a67d77f22 100644
>> --- a/lib/librte_eal/common/include/rte_pci.h
>> +++ b/lib/librte_eal/common/include/rte_pci.h
>> @@ -202,6 +202,8 @@ struct rte_pci_bus {  #define
>> RTE_PCI_DRV_INTR_RMV 0x0010
>>  /** Device driver needs to keep mapped resources if unsupported dev
>> detected */  #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
>> +/** Device driver supports iova as va */ #define
>> RTE_PCI_DRV_IOVA_AS_VA
>> +0X0040
>>
>>  /**
>>   * A structure describing a PCI mapping.
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c
>> b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> index 8951ce742..9725fd493 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> @@ -45,6 +45,7 @@
>>  #include "eal_filesystem.h"
>>  #include "eal_private.h"
>>  #include "eal_pci_init.h"
>> +#include "eal_vfio.h"
>>
>>  /**
>>   * @file
>> @@ -487,6 +488,100 @@ rte_pci_scan(void)
>>  	return -1;
>>  }
>>
>> +/*
>> + * Is pci device bound to any kdrv
>> + */
>> +static inline int
>> +pci_device_is_bound(void)
>> +{
>> +	struct rte_pci_device *dev = NULL;
>> +	int ret = 0;
>> +
>> +	FOREACH_DEVICE_ON_PCIBUS(dev) {
>> +		if (dev->kdrv == RTE_KDRV_UNKNOWN ||
>> +		    dev->kdrv == RTE_KDRV_NONE) {
>> +			continue;
>> +		} else {
>> +			ret = 1;
>> +			break;
>> +		}
>> +	}
>> +	return ret;
>> +}
>> +
>> +/*
>> + * Any one of the device bound to uio
>> + */
>> +static inline int
>> +pci_device_bound_uio(void)
>> +{
>> +	struct rte_pci_device *dev = NULL;
>> +
>> +	FOREACH_DEVICE_ON_PCIBUS(dev) {
>> +		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
>> +		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
>> +			return 1;
>> +		}
>> +	}
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Any one of the device has iova as va  */ static inline int
>> +pci_device_has_iova_va(void)
>> +{
>> +	struct rte_pci_device *dev = NULL;
>> +	struct rte_pci_driver *drv = NULL;
>> +
>> +	FOREACH_DRIVER_ON_PCIBUS(drv) {
>> +		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
>> +			FOREACH_DEVICE_ON_PCIBUS(dev) {
>> +				if (dev->kdrv == RTE_KDRV_VFIO &&
>> +				    rte_pci_match(drv, dev))
>> +					return 1;
>> +			}
>> +		}
>> +	}
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Get iommu class of PCI devices on the bus.
>> + */
>> +enum rte_iova_mode
>> +rte_pci_get_iommu_class(void)
>> +{
>> +	bool is_bound;
>> +	bool is_vfio_noiommu_enabled = true;
>> +	bool has_iova_va;
>> +	bool is_bound_uio;
>> +
>> +	is_bound = pci_device_is_bound();
>> +	if (!is_bound)
>> +		return RTE_IOVA_DC;
>> +
>> +	has_iova_va = pci_device_has_iova_va();
>> +	is_bound_uio = pci_device_bound_uio(); #ifdef VFIO_PRESENT
>> +	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 :
>> 0;
> If you specify is_vfio_noiommu_enabled as bool, you should probably treat it as such, and assign true/false.

queued for v8.

> Other than that, I'm curious why is it always set to "true" by default? If we don't have VFIO compiled, it seems like the error message would always complain about vfio-noiommu mode being enabled, which is confusing.

Set to 'true' for case when VFIO_PRESENT unset.. meaning platform
doesn't support VFIO (linux versioned < 3.6) 
i.e.. using UIO - In that case, flag makes sure _pa policy selected.

On error message: It won't come in non-vfio case, as 'has_iova_va' will set to 0.
Error message will show for those case where few device out of many bind to uio, so
message will pop-up and iova policy would be _pa in that case.

Thanks.

> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
  2017-09-05  8:47                 ` santosh
@ 2017-09-05  8:55                   ` Burakov, Anatoly
  2017-09-05  8:59                     ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Burakov, Anatoly @ 2017-09-05  8:55 UTC (permalink / raw)
  To: santosh, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, Gonzalez Monroy, Sergio, Richardson, Bruce,
	shreyansh.jain, gaetan.rivet, stephen, aconole

> From: santosh [mailto:santosh.shukla@caviumnetworks.com]
> Sent: Tuesday, September 5, 2017 9:48 AM
> To: Burakov, Anatoly <anatoly.burakov@intel.com>; dev@dpdk.org
> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
> gaetan.rivet@6wind.com; stephen@networkplumber.org;
> aconole@redhat.com
> Subject: Re: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
> 
> Hi Anatoly,
> 
> 
> On Monday 04 September 2017 08:38 PM, Burakov, Anatoly wrote:
> >> From: Santosh Shukla [mailto:santosh.shukla@caviumnetworks.com]
> >> Sent: Thursday, August 31, 2017 4:26 AM
> >> To: dev@dpdk.org
> >> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> >> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
> >> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
> >> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
> >> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
> >> gaetan.rivet@6wind.com; Burakov, Anatoly
> <anatoly.burakov@intel.com>;
> >> stephen@networkplumber.org; aconole@redhat.com; Santosh Shukla
> >> <santosh.shukla@caviumnetworks.com>
> >> Subject: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
> >>
> >> Get iommu class of PCI device on the bus and returns preferred iova
> >> mapping mode for that bus.
> >>
> >> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
> >> Flag used when driver needs to operate in iova=va mode.
> >>
> >> Algorithm for iova scheme selection for PCI bus:
> >> 0. If no device bound then return with RTE_IOVA_DC mapping mode, else
> >> goto 1).
> >> 1. Look for device attached to vfio kdrv and has .drv_flag set to
> >> RTE_PCI_DRV_IOVA_AS_VA.
> >> 2. Look for any device attached to UIO class of driver.
> >> 3. Check for vfio-noiommu mode enabled.
> >>
> >> If 2) & 3) is false and 1) is true then select mapping scheme as
> RTE_IOVA_VA.
> >> Otherwise use default mapping scheme (RTE_IOVA_PA).
> >>
> >> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> >> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> >> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> >> ---
> >> v6 --> v7:
> >> - squashed v6 series patch no [01/12] & [05/12]..
> >>   i.e.. moved RTE_PCI_DRV_IOVA_AS_VA flag into this patch. (Aaron
> >> comment).
> >>
> >>  lib/librte_eal/common/include/rte_pci.h         |  2 +
> >>  lib/librte_eal/linuxapp/eal/eal_pci.c           | 95
> >> +++++++++++++++++++++++++
> >>  lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++
> >>  lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
> >>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
> >>  5 files changed, 121 insertions(+)
> >>
> >> diff --git a/lib/librte_eal/common/include/rte_pci.h
> >> b/lib/librte_eal/common/include/rte_pci.h
> >> index 0e36de093..a67d77f22 100644
> >> --- a/lib/librte_eal/common/include/rte_pci.h
> >> +++ b/lib/librte_eal/common/include/rte_pci.h
> >> @@ -202,6 +202,8 @@ struct rte_pci_bus {  #define
> >> RTE_PCI_DRV_INTR_RMV 0x0010
> >>  /** Device driver needs to keep mapped resources if unsupported dev
> >> detected */  #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
> >> +/** Device driver supports iova as va */ #define
> >> RTE_PCI_DRV_IOVA_AS_VA
> >> +0X0040
> >>
> >>  /**
> >>   * A structure describing a PCI mapping.
> >> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c
> >> b/lib/librte_eal/linuxapp/eal/eal_pci.c
> >> index 8951ce742..9725fd493 100644
> >> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> >> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> >> @@ -45,6 +45,7 @@
> >>  #include "eal_filesystem.h"
> >>  #include "eal_private.h"
> >>  #include "eal_pci_init.h"
> >> +#include "eal_vfio.h"
> >>
> >>  /**
> >>   * @file
> >> @@ -487,6 +488,100 @@ rte_pci_scan(void)
> >>  	return -1;
> >>  }
> >>
> >> +/*
> >> + * Is pci device bound to any kdrv
> >> + */
> >> +static inline int
> >> +pci_device_is_bound(void)
> >> +{
> >> +	struct rte_pci_device *dev = NULL;
> >> +	int ret = 0;
> >> +
> >> +	FOREACH_DEVICE_ON_PCIBUS(dev) {
> >> +		if (dev->kdrv == RTE_KDRV_UNKNOWN ||
> >> +		    dev->kdrv == RTE_KDRV_NONE) {
> >> +			continue;
> >> +		} else {
> >> +			ret = 1;
> >> +			break;
> >> +		}
> >> +	}
> >> +	return ret;
> >> +}
> >> +
> >> +/*
> >> + * Any one of the device bound to uio  */ static inline int
> >> +pci_device_bound_uio(void)
> >> +{
> >> +	struct rte_pci_device *dev = NULL;
> >> +
> >> +	FOREACH_DEVICE_ON_PCIBUS(dev) {
> >> +		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
> >> +		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
> >> +			return 1;
> >> +		}
> >> +	}
> >> +	return 0;
> >> +}
> >> +
> >> +/*
> >> + * Any one of the device has iova as va  */ static inline int
> >> +pci_device_has_iova_va(void)
> >> +{
> >> +	struct rte_pci_device *dev = NULL;
> >> +	struct rte_pci_driver *drv = NULL;
> >> +
> >> +	FOREACH_DRIVER_ON_PCIBUS(drv) {
> >> +		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
> >> +			FOREACH_DEVICE_ON_PCIBUS(dev) {
> >> +				if (dev->kdrv == RTE_KDRV_VFIO &&
> >> +				    rte_pci_match(drv, dev))
> >> +					return 1;
> >> +			}
> >> +		}
> >> +	}
> >> +	return 0;
> >> +}
> >> +
> >> +/*
> >> + * Get iommu class of PCI devices on the bus.
> >> + */
> >> +enum rte_iova_mode
> >> +rte_pci_get_iommu_class(void)
> >> +{
> >> +	bool is_bound;
> >> +	bool is_vfio_noiommu_enabled = true;
> >> +	bool has_iova_va;
> >> +	bool is_bound_uio;
> >> +
> >> +	is_bound = pci_device_is_bound();
> >> +	if (!is_bound)
> >> +		return RTE_IOVA_DC;
> >> +
> >> +	has_iova_va = pci_device_has_iova_va();
> >> +	is_bound_uio = pci_device_bound_uio(); #ifdef VFIO_PRESENT
> >> +	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 :
> >> 0;
> > If you specify is_vfio_noiommu_enabled as bool, you should probably treat
> it as such, and assign true/false.
> 
> queued for v8.
> 
> > Other than that, I'm curious why is it always set to "true" by default? If we
> don't have VFIO compiled, it seems like the error message would always
> complain about vfio-noiommu mode being enabled, which is confusing.
> 
> Set to 'true' for case when VFIO_PRESENT unset.. meaning platform doesn't
> support VFIO (linux versioned < 3.6) i.e.. using UIO - In that case, flag makes
> sure _pa policy selected.
> 
> On error message: It won't come in non-vfio case, as 'has_iova_va' will set to
> 0.
> Error message will show for those case where few device out of many bind
> to uio, so message will pop-up and iova policy would be _pa in that case.
> 
> Thanks.

Right. My apologies, I misunderstood the meaning of "has_iova_va" flag.

Thanks,
Anatoly

> 
> > Thanks,
> > Anatoly

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
  2017-09-05  8:55                   ` Burakov, Anatoly
@ 2017-09-05  8:59                     ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-09-05  8:59 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, Gonzalez Monroy, Sergio, Richardson, Bruce,
	shreyansh.jain, gaetan.rivet, stephen, aconole

Hi Anatoly,


On Tuesday 05 September 2017 02:25 PM, Burakov, Anatoly wrote:
>> From: santosh [mailto:santosh.shukla@caviumnetworks.com]
>> Sent: Tuesday, September 5, 2017 9:48 AM
>> To: Burakov, Anatoly <anatoly.burakov@intel.com>; dev@dpdk.org
>> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
>> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
>> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
>> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
>> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
>> gaetan.rivet@6wind.com; stephen@networkplumber.org;
>> aconole@redhat.com
>> Subject: Re: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
>>
>> Hi Anatoly,
>>
>>
>> On Monday 04 September 2017 08:38 PM, Burakov, Anatoly wrote:
>>>> From: Santosh Shukla [mailto:santosh.shukla@caviumnetworks.com]
>>>> Sent: Thursday, August 31, 2017 4:26 AM
>>>> To: dev@dpdk.org
>>>> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
>>>> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
>>>> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
>>>> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
>>>> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
>>>> gaetan.rivet@6wind.com; Burakov, Anatoly
>> <anatoly.burakov@intel.com>;
>>>> stephen@networkplumber.org; aconole@redhat.com; Santosh Shukla
>>>> <santosh.shukla@caviumnetworks.com>
>>>> Subject: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
>>>>
>>>> Get iommu class of PCI device on the bus and returns preferred iova
>>>> mapping mode for that bus.
>>>>
>>>> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
>>>> Flag used when driver needs to operate in iova=va mode.
>>>>
>>>> Algorithm for iova scheme selection for PCI bus:
>>>> 0. If no device bound then return with RTE_IOVA_DC mapping mode, else
>>>> goto 1).
>>>> 1. Look for device attached to vfio kdrv and has .drv_flag set to
>>>> RTE_PCI_DRV_IOVA_AS_VA.
>>>> 2. Look for any device attached to UIO class of driver.
>>>> 3. Check for vfio-noiommu mode enabled.
>>>>
>>>> If 2) & 3) is false and 1) is true then select mapping scheme as
>> RTE_IOVA_VA.
>>>> Otherwise use default mapping scheme (RTE_IOVA_PA).
>>>>
>>>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>>>> ---
>>>> v6 --> v7:
>>>> - squashed v6 series patch no [01/12] & [05/12]..
>>>>   i.e.. moved RTE_PCI_DRV_IOVA_AS_VA flag into this patch. (Aaron
>>>> comment).
>>>>
>>>>  lib/librte_eal/common/include/rte_pci.h         |  2 +
>>>>  lib/librte_eal/linuxapp/eal/eal_pci.c           | 95
>>>> +++++++++++++++++++++++++
>>>>  lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++
>>>>  lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>>>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>>>  5 files changed, 121 insertions(+)
>>>>
>>>> diff --git a/lib/librte_eal/common/include/rte_pci.h
>>>> b/lib/librte_eal/common/include/rte_pci.h
>>>> index 0e36de093..a67d77f22 100644
>>>> --- a/lib/librte_eal/common/include/rte_pci.h
>>>> +++ b/lib/librte_eal/common/include/rte_pci.h
>>>> @@ -202,6 +202,8 @@ struct rte_pci_bus {  #define
>>>> RTE_PCI_DRV_INTR_RMV 0x0010
>>>>  /** Device driver needs to keep mapped resources if unsupported dev
>>>> detected */  #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
>>>> +/** Device driver supports iova as va */ #define
>>>> RTE_PCI_DRV_IOVA_AS_VA
>>>> +0X0040
>>>>
>>>>  /**
>>>>   * A structure describing a PCI mapping.
>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c
>>>> b/lib/librte_eal/linuxapp/eal/eal_pci.c
>>>> index 8951ce742..9725fd493 100644
>>>> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
>>>> @@ -45,6 +45,7 @@
>>>>  #include "eal_filesystem.h"
>>>>  #include "eal_private.h"
>>>>  #include "eal_pci_init.h"
>>>> +#include "eal_vfio.h"
>>>>
>>>>  /**
>>>>   * @file
>>>> @@ -487,6 +488,100 @@ rte_pci_scan(void)
>>>>  	return -1;
>>>>  }
>>>>
>>>> +/*
>>>> + * Is pci device bound to any kdrv
>>>> + */
>>>> +static inline int
>>>> +pci_device_is_bound(void)
>>>> +{
>>>> +	struct rte_pci_device *dev = NULL;
>>>> +	int ret = 0;
>>>> +
>>>> +	FOREACH_DEVICE_ON_PCIBUS(dev) {
>>>> +		if (dev->kdrv == RTE_KDRV_UNKNOWN ||
>>>> +		    dev->kdrv == RTE_KDRV_NONE) {
>>>> +			continue;
>>>> +		} else {
>>>> +			ret = 1;
>>>> +			break;
>>>> +		}
>>>> +	}
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Any one of the device bound to uio  */ static inline int
>>>> +pci_device_bound_uio(void)
>>>> +{
>>>> +	struct rte_pci_device *dev = NULL;
>>>> +
>>>> +	FOREACH_DEVICE_ON_PCIBUS(dev) {
>>>> +		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
>>>> +		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
>>>> +			return 1;
>>>> +		}
>>>> +	}
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Any one of the device has iova as va  */ static inline int
>>>> +pci_device_has_iova_va(void)
>>>> +{
>>>> +	struct rte_pci_device *dev = NULL;
>>>> +	struct rte_pci_driver *drv = NULL;
>>>> +
>>>> +	FOREACH_DRIVER_ON_PCIBUS(drv) {
>>>> +		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
>>>> +			FOREACH_DEVICE_ON_PCIBUS(dev) {
>>>> +				if (dev->kdrv == RTE_KDRV_VFIO &&
>>>> +				    rte_pci_match(drv, dev))
>>>> +					return 1;
>>>> +			}
>>>> +		}
>>>> +	}
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Get iommu class of PCI devices on the bus.
>>>> + */
>>>> +enum rte_iova_mode
>>>> +rte_pci_get_iommu_class(void)
>>>> +{
>>>> +	bool is_bound;
>>>> +	bool is_vfio_noiommu_enabled = true;
>>>> +	bool has_iova_va;
>>>> +	bool is_bound_uio;
>>>> +
>>>> +	is_bound = pci_device_is_bound();
>>>> +	if (!is_bound)
>>>> +		return RTE_IOVA_DC;
>>>> +
>>>> +	has_iova_va = pci_device_has_iova_va();
>>>> +	is_bound_uio = pci_device_bound_uio(); #ifdef VFIO_PRESENT
>>>> +	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == 1 ? 1 :
>>>> 0;
>>> If you specify is_vfio_noiommu_enabled as bool, you should probably treat
>> it as such, and assign true/false.
>>
>> queued for v8.
>>
>>> Other than that, I'm curious why is it always set to "true" by default? If we
>> don't have VFIO compiled, it seems like the error message would always
>> complain about vfio-noiommu mode being enabled, which is confusing.
>>
>> Set to 'true' for case when VFIO_PRESENT unset.. meaning platform doesn't
>> support VFIO (linux versioned < 3.6) i.e.. using UIO - In that case, flag makes
>> sure _pa policy selected.
>>
>> On error message: It won't come in non-vfio case, as 'has_iova_va' will set to
>> 0.
>> Error message will show for those case where few device out of many bind
>> to uio, so message will pop-up and iova policy would be _pa in that case.
>>
>> Thanks.
> Right. My apologies, I misunderstood the meaning of "has_iova_va" flag.

No worry ;). Thanks for review feedback and looking into v7 series.

Can I collect your reviewed-by: for [3/9]?

Thanks. 

> Thanks,
> Anatoly
>
>>> Thanks,
>>> Anatoly

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
  2017-08-31  3:26             ` [PATCH v7 3/9] linuxapp/eal_pci: " Santosh Shukla
  2017-09-04 15:08               ` Burakov, Anatoly
@ 2017-09-05  9:01               ` Burakov, Anatoly
  1 sibling, 0 replies; 248+ messages in thread
From: Burakov, Anatoly @ 2017-09-05  9:01 UTC (permalink / raw)
  To: Santosh Shukla, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, Gonzalez Monroy, Sergio, Richardson, Bruce,
	shreyansh.jain, gaetan.rivet, stephen, aconole

> From: Santosh Shukla [mailto:santosh.shukla@caviumnetworks.com]
> Sent: Thursday, August 31, 2017 4:26 AM
> To: dev@dpdk.org
> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> hemant.agrawal@nxp.com; olivier.matz@6wind.com;
> maxime.coquelin@redhat.com; Gonzalez Monroy, Sergio
> <sergio.gonzalez.monroy@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; shreyansh.jain@nxp.com;
> gaetan.rivet@6wind.com; Burakov, Anatoly <anatoly.burakov@intel.com>;
> stephen@networkplumber.org; aconole@redhat.com; Santosh Shukla
> <santosh.shukla@caviumnetworks.com>
> Subject: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
> 
> Get iommu class of PCI device on the bus and returns preferred iova
> mapping mode for that bus.
> 
> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
> Flag used when driver needs to operate in iova=va mode.
> 
> Algorithm for iova scheme selection for PCI bus:
> 0. If no device bound then return with RTE_IOVA_DC mapping mode, else
> goto 1).
> 1. Look for device attached to vfio kdrv and has .drv_flag set to
> RTE_PCI_DRV_IOVA_AS_VA.
> 2. Look for any device attached to UIO class of driver.
> 3. Check for vfio-noiommu mode enabled.
> 
> If 2) & 3) is false and 1) is true then select mapping scheme as RTE_IOVA_VA.
> Otherwise use default mapping scheme (RTE_IOVA_PA).
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> ---

Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus
  2017-08-31  3:26           ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
                               ` (8 preceding siblings ...)
  2017-08-31  3:26             ` [PATCH v7 9/9] eal/rte_malloc: " Santosh Shukla
@ 2017-09-05 12:28             ` Hemant Agrawal
  2017-09-05 12:30               ` Hemant Agrawal
  2017-09-18 10:42             ` [PATCH v8 " Santosh Shukla
  10 siblings, 1 reply; 248+ messages in thread
From: Hemant Agrawal @ 2017-09-05 12:28 UTC (permalink / raw)
  To: Santosh Shukla, dev
  Cc: thomas, jerin.jacob, olivier.matz, maxime.coquelin,
	sergio.gonzalez.monroy, bruce.richardson, shreyansh.jain,
	gaetan.rivet, anatoly.burakov, stephen, aconole

Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>

On 8/31/2017 8:56 AM, Santosh Shukla wrote:
> v7:
> Includes no major change, minor change detailing:
> - patch sqashing (Aaron suggestion)
> - added run_once for device_parse() and bus_scan() in eal init
> 	(Aaron suggestion)
> - Moved rte_eal_device_parse() up in eal initialization order.
> - Patches rebased on top of version: 17.11-rc0
> For v6 info refer [11].
>
> v6:
> Sending v5 series rebased on top of version: 17.11-rc0.
>
> v5:
> Introducing RTE_PCI_DRV_IOVA_AS_VA flag for autodetection of iova va mapping.
> If a PCI driver demand for IOVA as VA scheme then the driver can add it in the
> PCI driver registration function.
>
> Algorithm to select IOVA as VA for PCI bus case:
>      0. If no device bound then return with RTE_IOVA_DC mapping mode,
>      else goto 1).
>      1. Look for device attached to vfio kdrv and has .drv_flag set
>      to RTE_PCI_DRV_IOVA_AS_VA.
>      2. Look for any device attached to UIO class of driver.
>      3. Check for vfio-noiommu mode enabled.
>
>      If 2) & 3) is false and 1) is true then select
>      mapping scheme as RTE_IOVA_VA. Otherwise use default
>      mapping scheme (RTE_IOVA_PA).
>
> That way, Bus can truly autodetect the iova mapping mode for
> a device Or a set of the device.
>
> v6 --> v7:
> - Patches squashed per v6.
> - Added run_once in eal per v6.
> - Moved rte_eal_device_parse() up in eal init oder.
>
> v5 --> v6:
> - Added api info in eal's versiom.map (release DPDK_v17.11).
>
> v4 --> v5:
> - Change DPDK_17.08 to DPDK_17.11 in _version.map.
> - Reworded bus api description (suggested by Hemant).
> - Added reviewed-by from Maxime in v5.
> - Added acked-by from Hemant for pci and bus patches.
>
> v3 --> v4:
> - Re-introduced RTE_IOVA_DEC mode (Suggested by Hemant [5]).
> - Renamed flag to RTE_PCI_DRV_IOVA_AS_VA (Suggested by Maxime).
> - Reworded WARNING message(suggested by Maxime[7]).
> - Created a separate patch for rte_pci_get_iommu_class (suggested by Maxime[]).
> - Added VFIO_PRESENT ifdef build fix.
>
> v2 --> v3:
> - Removed rte_mempool_virt2phy (suggested by Olivier [4])
>
> v1 --> v2:
> - Removed override eal option i.e. (--iova-mode=<>) Because we have means to
>    truly autodetect the iova mode.
> - Introduced RTE_PCI_DRV_NEED_IOVA_VA drv_flag (Suggested by Maxime [3]).
> - Using NEED_IOVA_VA drv_flag in autodetection logic.
> - Removed Linux version check macro in vfio code, As per Maxime feedback.
> - Moved rte_pci_match API from local to global.
>
> Patch Summary:
> 1) 1nd: declare rte_pci_match api in pci header. Required for autodetection in
> follow up patches.
> 2) 2nd - 3rd - 4th : autodetection mapping infrastructure for Linux/bsdapp.
> 3) 5th: iova mode helper API.
> 4) 6th: Infra to detect iova mode.
> 5) 7th: make vfio mapping iova aware.
> 6) 8th - 9th : Check for IOVA_VA mode in below APIs
>          - rte_mem_virt2phy
>          - rte_malloc_virt2phy
>
> Test History:
> - Tested for x86/XL710 40G NIC card for both modes (iova_va/pa).
> - Tested for arm64/thunderx vNIC Integrated NIC for both modes
> - Tested for arm64/Octeontx integrated NICs for only
>    Iova_va mode(It supports only one mode.)
> - Ran standalone tests like mempool_autotest, mbuf_autotest.
> - Verified for Doxygen.
>
> Work History:
> For v1, Refer [1].
> For v2, Refer [2].
> For v3, Refer [9].
> For v4, refer [10].
> for v6, refer [11].
>
> Checkpatch result:
> * Debug message - WARNING: line over 80 characters
>
> Thanks.,
> [1] https://www.mail-archive.com/dev@dpdk.org/msg67438.html
> [2] https://www.mail-archive.com/dev@dpdk.org/msg70674.html
> [3] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
> [4] https://www.mail-archive.com/dev@dpdk.org/msg70692.html
> [5] http://dpdk.org/ml/archives/dev/2017-July/071282.html
> [6] http://dpdk.org/ml/archives/dev/2017-July/070951.html
> [7] http://dpdk.org/ml/archives/dev/2017-July/070941.html
> [8] http://dpdk.org/ml/archives/dev/2017-July/070952.html
> [9] http://dpdk.org/ml/archives/dev/2017-July/070918.html
> [10] http://dpdk.org/ml/archives/dev/2017-July/071754.html
> [11] http://dpdk.org/ml/archives/dev/2017-August/072871.html
>
>
> Santosh Shukla (9):
>   eal/pci: export match function
>   eal/pci: get iommu class
>   linuxapp/eal_pci: get iommu class
>   bus: get iommu class
>   eal: introduce iova mode helper api
>   eal: auto detect iova mode
>   linuxapp/eal_vfio: honor iova mode before mapping
>   linuxapp/eal_memory: honor iova mode in virt2phy
>   eal/rte_malloc: honor iova mode in virt2phy
>
>  lib/librte_eal/bsdapp/eal/eal.c                 | 33 ++++++---
>  lib/librte_eal/bsdapp/eal/eal_pci.c             | 10 +++
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   | 10 +++
>  lib/librte_eal/common/eal_common_bus.c          | 23 ++++++
>  lib/librte_eal/common/eal_common_pci.c          | 11 +--
>  lib/librte_eal/common/include/rte_bus.h         | 35 +++++++++
>  lib/librte_eal/common/include/rte_eal.h         | 12 ++++
>  lib/librte_eal/common/include/rte_pci.h         | 28 ++++++++
>  lib/librte_eal/common/rte_malloc.c              |  9 ++-
>  lib/librte_eal/linuxapp/eal/eal.c               | 33 ++++++---
>  lib/librte_eal/linuxapp/eal/eal_memory.c        |  3 +
>  lib/librte_eal/linuxapp/eal/eal_pci.c           | 95 +++++++++++++++++++++++++
>  lib/librte_eal/linuxapp/eal/eal_vfio.c          | 29 +++++++-
>  lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map | 10 +++
>  15 files changed, 311 insertions(+), 34 deletions(-)
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus
  2017-09-05 12:28             ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Hemant Agrawal
@ 2017-09-05 12:30               ` Hemant Agrawal
  0 siblings, 0 replies; 248+ messages in thread
From: Hemant Agrawal @ 2017-09-05 12:30 UTC (permalink / raw)
  To: Santosh Shukla, dev
  Cc: thomas, jerin.jacob, olivier.matz, maxime.coquelin,
	sergio.gonzalez.monroy, bruce.richardson, shreyansh.jain,
	gaetan.rivet, anatoly.burakov, stephen, aconole

Please note that this series break the DPAA2 BUS.
Following patch series (Shreyansh) is required to fix DPAA2 bus working 
with this patch series:

http://dpdk.org/dev/patchwork/patch/27950/


On 9/5/2017 5:58 PM, Hemant Agrawal wrote:
> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>
> On 8/31/2017 8:56 AM, Santosh Shukla wrote:
>> v7:
>> Includes no major change, minor change detailing:
>> - patch sqashing (Aaron suggestion)
>> - added run_once for device_parse() and bus_scan() in eal init
>>     (Aaron suggestion)
>> - Moved rte_eal_device_parse() up in eal initialization order.
>> - Patches rebased on top of version: 17.11-rc0
>> For v6 info refer [11].
>>
>> v6:
>> Sending v5 series rebased on top of version: 17.11-rc0.
>>
>> v5:
>> Introducing RTE_PCI_DRV_IOVA_AS_VA flag for autodetection of iova va
>> mapping.
>> If a PCI driver demand for IOVA as VA scheme then the driver can add
>> it in the
>> PCI driver registration function.
>>
>> Algorithm to select IOVA as VA for PCI bus case:
>>      0. If no device bound then return with RTE_IOVA_DC mapping mode,
>>      else goto 1).
>>      1. Look for device attached to vfio kdrv and has .drv_flag set
>>      to RTE_PCI_DRV_IOVA_AS_VA.
>>      2. Look for any device attached to UIO class of driver.
>>      3. Check for vfio-noiommu mode enabled.
>>
>>      If 2) & 3) is false and 1) is true then select
>>      mapping scheme as RTE_IOVA_VA. Otherwise use default
>>      mapping scheme (RTE_IOVA_PA).
>>
>> That way, Bus can truly autodetect the iova mapping mode for
>> a device Or a set of the device.
>>
>> v6 --> v7:
>> - Patches squashed per v6.
>> - Added run_once in eal per v6.
>> - Moved rte_eal_device_parse() up in eal init oder.
>>
>> v5 --> v6:
>> - Added api info in eal's versiom.map (release DPDK_v17.11).
>>
>> v4 --> v5:
>> - Change DPDK_17.08 to DPDK_17.11 in _version.map.
>> - Reworded bus api description (suggested by Hemant).
>> - Added reviewed-by from Maxime in v5.
>> - Added acked-by from Hemant for pci and bus patches.
>>
>> v3 --> v4:
>> - Re-introduced RTE_IOVA_DEC mode (Suggested by Hemant [5]).
>> - Renamed flag to RTE_PCI_DRV_IOVA_AS_VA (Suggested by Maxime).
>> - Reworded WARNING message(suggested by Maxime[7]).
>> - Created a separate patch for rte_pci_get_iommu_class (suggested by
>> Maxime[]).
>> - Added VFIO_PRESENT ifdef build fix.
>>
>> v2 --> v3:
>> - Removed rte_mempool_virt2phy (suggested by Olivier [4])
>>
>> v1 --> v2:
>> - Removed override eal option i.e. (--iova-mode=<>) Because we have
>> means to
>>    truly autodetect the iova mode.
>> - Introduced RTE_PCI_DRV_NEED_IOVA_VA drv_flag (Suggested by Maxime [3]).
>> - Using NEED_IOVA_VA drv_flag in autodetection logic.
>> - Removed Linux version check macro in vfio code, As per Maxime feedback.
>> - Moved rte_pci_match API from local to global.
>>
>> Patch Summary:
>> 1) 1nd: declare rte_pci_match api in pci header. Required for
>> autodetection in
>> follow up patches.
>> 2) 2nd - 3rd - 4th : autodetection mapping infrastructure for
>> Linux/bsdapp.
>> 3) 5th: iova mode helper API.
>> 4) 6th: Infra to detect iova mode.
>> 5) 7th: make vfio mapping iova aware.
>> 6) 8th - 9th : Check for IOVA_VA mode in below APIs
>>          - rte_mem_virt2phy
>>          - rte_malloc_virt2phy
>>
>> Test History:
>> - Tested for x86/XL710 40G NIC card for both modes (iova_va/pa).
>> - Tested for arm64/thunderx vNIC Integrated NIC for both modes
>> - Tested for arm64/Octeontx integrated NICs for only
>>    Iova_va mode(It supports only one mode.)
>> - Ran standalone tests like mempool_autotest, mbuf_autotest.
>> - Verified for Doxygen.
>>
>> Work History:
>> For v1, Refer [1].
>> For v2, Refer [2].
>> For v3, Refer [9].
>> For v4, refer [10].
>> for v6, refer [11].
>>
>> Checkpatch result:
>> * Debug message - WARNING: line over 80 characters
>>
>> Thanks.,
>> [1] https://www.mail-archive.com/dev@dpdk.org/msg67438.html
>> [2] https://www.mail-archive.com/dev@dpdk.org/msg70674.html
>> [3] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
>> [4] https://www.mail-archive.com/dev@dpdk.org/msg70692.html
>> [5] http://dpdk.org/ml/archives/dev/2017-July/071282.html
>> [6] http://dpdk.org/ml/archives/dev/2017-July/070951.html
>> [7] http://dpdk.org/ml/archives/dev/2017-July/070941.html
>> [8] http://dpdk.org/ml/archives/dev/2017-July/070952.html
>> [9] http://dpdk.org/ml/archives/dev/2017-July/070918.html
>> [10] http://dpdk.org/ml/archives/dev/2017-July/071754.html
>> [11] http://dpdk.org/ml/archives/dev/2017-August/072871.html
>>
>>
>> Santosh Shukla (9):
>>   eal/pci: export match function
>>   eal/pci: get iommu class
>>   linuxapp/eal_pci: get iommu class
>>   bus: get iommu class
>>   eal: introduce iova mode helper api
>>   eal: auto detect iova mode
>>   linuxapp/eal_vfio: honor iova mode before mapping
>>   linuxapp/eal_memory: honor iova mode in virt2phy
>>   eal/rte_malloc: honor iova mode in virt2phy
>>
>>  lib/librte_eal/bsdapp/eal/eal.c                 | 33 ++++++---
>>  lib/librte_eal/bsdapp/eal/eal_pci.c             | 10 +++
>>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   | 10 +++
>>  lib/librte_eal/common/eal_common_bus.c          | 23 ++++++
>>  lib/librte_eal/common/eal_common_pci.c          | 11 +--
>>  lib/librte_eal/common/include/rte_bus.h         | 35 +++++++++
>>  lib/librte_eal/common/include/rte_eal.h         | 12 ++++
>>  lib/librte_eal/common/include/rte_pci.h         | 28 ++++++++
>>  lib/librte_eal/common/rte_malloc.c              |  9 ++-
>>  lib/librte_eal/linuxapp/eal/eal.c               | 33 ++++++---
>>  lib/librte_eal/linuxapp/eal/eal_memory.c        |  3 +
>>  lib/librte_eal/linuxapp/eal/eal_pci.c           | 95
>> +++++++++++++++++++++++++
>>  lib/librte_eal/linuxapp/eal/eal_vfio.c          | 29 +++++++-
>>  lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map | 10 +++
>>  15 files changed, 311 insertions(+), 34 deletions(-)
>>
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 1/9] eal/pci: export match function
  2017-08-31  3:26             ` [PATCH v7 1/9] eal/pci: export match function Santosh Shukla
  2017-09-04 14:49               ` Burakov, Anatoly
@ 2017-09-06 15:39               ` Ferruh Yigit
  2017-09-18 10:07                 ` santosh
  1 sibling, 1 reply; 248+ messages in thread
From: Ferruh Yigit @ 2017-09-06 15:39 UTC (permalink / raw)
  To: Santosh Shukla, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen, aconole

On 8/31/2017 4:26 AM, Santosh Shukla wrote:
> Export rte_pci_match() function as it needed in the followup patch.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  7 +++++++
>  lib/librte_eal/common/eal_common_pci.c          | 10 +---------
>  lib/librte_eal/common/include/rte_pci.h         | 15 +++++++++++++++
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +++++++
>  4 files changed, 30 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> index aac6fd776..c819e3084 100644
> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> @@ -237,3 +237,10 @@ EXPERIMENTAL {
>  	rte_service_unregister;
>  
>  } DPDK_17.08;
> +
> +DPDK_17.11 {
> +	global:
> +
> +	rte_pci_match;
> +
> +} DPDK_17.08;

Is updating .map file required? As far as I can see rte_pci_match()
calls are within the same library, and no need to expose the API out of
library.

<...>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 1/9] eal/pci: export match function
  2017-09-06 15:39               ` Ferruh Yigit
@ 2017-09-18 10:07                 ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-09-18 10:07 UTC (permalink / raw)
  To: Ferruh Yigit, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen, aconole

Hi Ferruh,


On Wednesday 06 September 2017 09:09 PM, Ferruh Yigit wrote:
> On 8/31/2017 4:26 AM, Santosh Shukla wrote:
>> Export rte_pci_match() function as it needed in the followup patch.
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  7 +++++++
>>  lib/librte_eal/common/eal_common_pci.c          | 10 +---------
>>  lib/librte_eal/common/include/rte_pci.h         | 15 +++++++++++++++
>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +++++++
>>  4 files changed, 30 insertions(+), 9 deletions(-)
>>
>> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>> index aac6fd776..c819e3084 100644
>> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
>> @@ -237,3 +237,10 @@ EXPERIMENTAL {
>>  	rte_service_unregister;
>>  
>>  } DPDK_17.08;
>> +
>> +DPDK_17.11 {
>> +	global:
>> +
>> +	rte_pci_match;
>> +
>> +} DPDK_17.08;
> Is updating .map file required? As far as I can see rte_pci_match()
> calls are within the same library, and no need to expose the API out of
> library.
>
> <...>
>
Its used in file eal/eal_pci.c in following patch.
Thanks. 

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v8 0/9] Infrastructure to detect iova mapping on the bus
  2017-08-31  3:26           ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
                               ` (9 preceding siblings ...)
  2017-09-05 12:28             ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Hemant Agrawal
@ 2017-09-18 10:42             ` Santosh Shukla
  2017-09-18 10:42               ` [PATCH v8 1/9] eal/pci: export match function Santosh Shukla
                                 ` (9 more replies)
  10 siblings, 10 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-09-18 10:42 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

v8:
Includes minor review changes per v7 review comment from Anatoly.
Patches rebased on Tip commit:3d2e0448eb.

v7:
Includes no major change, minor change detailing:
- patch sqashing (Aaron suggestion)
- added run_once for device_parse() and bus_scan() in eal init
    (Aaron suggestion)
- Moved rte_eal_device_parse() up in eal initialization order.
- Patches rebased on top of version: 17.11-rc0
For v6 info refer [11].

v6:
Sending v5 series rebased on top of version: 17.11-rc0.

v5:
Introducing RTE_PCI_DRV_IOVA_AS_VA flag for autodetection of iova va
mapping.
If a PCI driver demand for IOVA as VA scheme then the driver can add
it in the
PCI driver registration function.

Algorithm to select IOVA as VA for PCI bus case:
     0. If no device bound then return with RTE_IOVA_DC mapping mode,
     else goto 1).
     1. Look for device attached to vfio kdrv and has .drv_flag set
     to RTE_PCI_DRV_IOVA_AS_VA.
     2. Look for any device attached to UIO class of driver.
     3. Check for vfio-noiommu mode enabled.

     If 2) & 3) is false and 1) is true then select
     mapping scheme as RTE_IOVA_VA. Otherwise use default
     mapping scheme (RTE_IOVA_PA).

That way, Bus can truly autodetect the iova mapping mode for
a device Or a set of the device.

Change History:
v7 --> v8:
- Replace 0 / 1 with true/false boolean values (Suggested by Anatoly).

v6 --> v7:
- Patches squashed per v6.
- Added run_once in eal per v6.
- Moved rte_eal_device_parse() up in eal init oder.

v5 --> v6:
- Added api info in eal's versiom.map (release DPDK_v17.11).

v4 --> v5:
- Change DPDK_17.08 to DPDK_17.11 in _version.map.
- Reworded bus api description (suggested by Hemant).
- Added reviewed-by from Maxime in v5.
- Added acked-by from Hemant for pci and bus patches.

v3 --> v4:
- Re-introduced RTE_IOVA_DEC mode (Suggested by Hemant [5]).
- Renamed flag to RTE_PCI_DRV_IOVA_AS_VA (Suggested by Maxime).
- Reworded WARNING message(suggested by Maxime[7]).
- Created a separate patch for rte_pci_get_iommu_class (suggested by
Maxime[]).
- Added VFIO_PRESENT ifdef build fix.

v2 --> v3:
- Removed rte_mempool_virt2phy (suggested by Olivier [4])

v1 --> v2:
- Removed override eal option i.e. (--iova-mode=<>) Because we have
means to
   truly autodetect the iova mode.
- Introduced RTE_PCI_DRV_NEED_IOVA_VA drv_flag (Suggested by Maxime [3]).
- Using NEED_IOVA_VA drv_flag in autodetection logic.
- Removed Linux version check macro in vfio code, As per Maxime feedback.
- Moved rte_pci_match API from local to global.

Patch Summary:
1) 1nd: declare rte_pci_match api in pci header. Required for
autodetection in
follow up patches.
2) 2nd - 3rd - 4th : autodetection mapping infrastructure for
Linux/bsdapp.
3) 5th: iova mode helper API.
4) 6th: Infra to detect iova mode.
5) 7th: make vfio mapping iova aware.
6) 8th - 9th : Check for IOVA_VA mode in below APIs
         - rte_mem_virt2phy
         - rte_malloc_virt2phy

Test History:
- Tested for x86/XL710 40G NIC card for both modes (iova_va/pa).
- Tested for arm64/thunderx vNIC Integrated NIC for both modes
- Tested for arm64/Octeontx integrated NICs for only
   Iova_va mode(It supports only one mode.)
- Ran standalone tests like mempool_autotest, mbuf_autotest.
- Verified for Doxygen.

Work History:
For v1, Refer [1].
For v2, Refer [2].
For v3, Refer [9].
For v4, refer [10].
for v6, refer [11].

Checkpatch result:
* None 

Thanks.,
[1] https://www.mail-archive.com/dev@dpdk.org/msg67438.html
[2] https://www.mail-archive.com/dev@dpdk.org/msg70674.html
[3] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
[4] https://www.mail-archive.com/dev@dpdk.org/msg70692.html
[5] http://dpdk.org/ml/archives/dev/2017-July/071282.html
[6] http://dpdk.org/ml/archives/dev/2017-July/070951.html
[7] http://dpdk.org/ml/archives/dev/2017-July/070941.html
[8] http://dpdk.org/ml/archives/dev/2017-July/070952.html
[9] http://dpdk.org/ml/archives/dev/2017-July/070918.html
[10] http://dpdk.org/ml/archives/dev/2017-July/071754.html
[11] http://dpdk.org/ml/archives/dev/2017-August/072871.html


Santosh Shukla (9):
  eal/pci: export match function
  eal/pci: get iommu class
  linuxapp/eal_pci: get iommu class
  bus: get iommu class
  eal: introduce iova mode helper api
  eal: auto detect iova mode
  linuxapp/eal_vfio: honor iova mode before mapping
  linuxapp/eal_memory: honor iova mode in virt2phy
  eal/rte_malloc: honor iova mode in virt2phy

 lib/librte_eal/bsdapp/eal/eal.c                 | 33 ++++++---
 lib/librte_eal/bsdapp/eal/eal_pci.c             | 10 +++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   | 10 +++
 lib/librte_eal/common/eal_common_bus.c          | 23 ++++++
 lib/librte_eal/common/eal_common_pci.c          | 11 +--
 lib/librte_eal/common/include/rte_bus.h         | 35 +++++++++
 lib/librte_eal/common/include/rte_eal.h         | 12 ++++
 lib/librte_eal/common/include/rte_pci.h         | 28 ++++++++
 lib/librte_eal/common/rte_malloc.c              |  9 ++-
 lib/librte_eal/linuxapp/eal/eal.c               | 33 ++++++---
 lib/librte_eal/linuxapp/eal/eal_memory.c        |  3 +
 lib/librte_eal/linuxapp/eal/eal_pci.c           | 96 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 29 +++++++-
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map | 10 +++
 15 files changed, 312 insertions(+), 34 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v8 1/9] eal/pci: export match function
  2017-09-18 10:42             ` [PATCH v8 " Santosh Shukla
@ 2017-09-18 10:42               ` Santosh Shukla
  2017-09-18 10:42               ` [PATCH v8 2/9] eal/pci: get iommu class Santosh Shukla
                                 ` (8 subsequent siblings)
  9 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-09-18 10:42 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

Export rte_pci_match() function as it needed in the followup patch.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  7 +++++++
 lib/librte_eal/common/eal_common_pci.c          | 10 +---------
 lib/librte_eal/common/include/rte_pci.h         | 15 +++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +++++++
 4 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 47a09ea7f..cfbf8fbd0 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -238,3 +238,10 @@ EXPERIMENTAL {
 	rte_service_start_with_defaults;
 
 } DPDK_17.08;
+
+DPDK_17.11 {
+	global:
+
+	rte_pci_match;
+
+} DPDK_17.08;
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 52fd38cdd..3b7d0a0ee 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -150,16 +150,8 @@ pci_unmap_resource(void *requested_addr, size_t size)
 
 /*
  * Match the PCI Driver and Device using the ID Table
- *
- * @param pci_drv
- *	PCI driver from which ID table would be extracted
- * @param pci_dev
- *	PCI device to match against the driver
- * @return
- *	1 for successful match
- *	0 for unsuccessful match
  */
-static int
+int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev)
 {
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 8b123391c..eab84c7a4 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -366,6 +366,21 @@ int rte_pci_scan(void);
 int
 rte_pci_probe(void);
 
+/*
+ * Match the PCI Driver and Device using the ID Table
+ *
+ * @param pci_drv
+ *      PCI driver from which ID table would be extracted
+ * @param pci_dev
+ *      PCI device to match against the driver
+ * @return
+ *      1 for successful match
+ *      0 for unsuccessful match
+ */
+int
+rte_pci_match(const struct rte_pci_driver *pci_drv,
+	      const struct rte_pci_device *pci_dev);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 8c08b8d1e..287cc75cd 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -243,3 +243,10 @@ EXPERIMENTAL {
 	rte_service_start_with_defaults;
 
 } DPDK_17.08;
+
+DPDK_17.11 {
+	global:
+
+	rte_pci_match;
+
+} DPDK_17.08;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v8 2/9] eal/pci: get iommu class
  2017-09-18 10:42             ` [PATCH v8 " Santosh Shukla
  2017-09-18 10:42               ` [PATCH v8 1/9] eal/pci: export match function Santosh Shukla
@ 2017-09-18 10:42               ` Santosh Shukla
  2017-09-19 16:37                 ` Burakov, Anatoly
  2017-09-18 10:42               ` [PATCH v8 3/9] linuxapp/eal_pci: " Santosh Shukla
                                 ` (7 subsequent siblings)
  9 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-09-18 10:42 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

Introducing rte_pci_get_iommu_class API which helps to get iommu class
of PCI device on the bus and returns preferred iova mapping mode for
PCI bus.

Patch also add rte_pci_get_iommu_class definition for bsdapp,
in bsdapp case - api returns default iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
v6 --> v7:
- squashed v6 series patch [02/12] & [03/12] (Aaron comment).

 lib/librte_eal/bsdapp/eal/eal_pci.c           | 10 ++++++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map |  1 +
 lib/librte_eal/common/include/rte_bus.h       | 10 ++++++++++
 lib/librte_eal/common/include/rte_pci.h       | 11 +++++++++++
 4 files changed, 32 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 04eacdcc7..e2c252320 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -403,6 +403,16 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Get iommu class of pci devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	/* Supports only RTE_KDRV_NIC_UIO */
+	return RTE_IOVA_PA;
+}
+
 int
 pci_update_device(const struct rte_pci_addr *addr)
 {
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index cfbf8fbd0..c6ffd9399 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -243,5 +243,6 @@ DPDK_17.11 {
 	global:
 
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index c79368d3c..9e40687e5 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -55,6 +55,16 @@ extern "C" {
 /** Double linked list of buses */
 TAILQ_HEAD(rte_bus_list, rte_bus);
 
+
+/**
+ * IOVA mapping mode.
+ */
+enum rte_iova_mode {
+	RTE_IOVA_DC = 0,	/* Don't care mode */
+	RTE_IOVA_PA = (1 << 0),
+	RTE_IOVA_VA = (1 << 1)
+};
+
 /**
  * Bus specific scan for devices attached on the bus.
  * For each bus object, the scan would be responsible for finding devices and
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index eab84c7a4..0e36de093 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -381,6 +381,17 @@ int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev);
 
+
+/**
+ * Get iommu class of PCI devices on the bus.
+ * And return their preferred iova mapping mode.
+ *
+ * @return
+ *   - enum rte_iova_mode.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v8 3/9] linuxapp/eal_pci: get iommu class
  2017-09-18 10:42             ` [PATCH v8 " Santosh Shukla
  2017-09-18 10:42               ` [PATCH v8 1/9] eal/pci: export match function Santosh Shukla
  2017-09-18 10:42               ` [PATCH v8 2/9] eal/pci: get iommu class Santosh Shukla
@ 2017-09-18 10:42               ` Santosh Shukla
  2017-09-18 10:42               ` [PATCH v8 4/9] bus: " Santosh Shukla
                                 ` (6 subsequent siblings)
  9 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-09-18 10:42 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

Get iommu class of PCI device on the bus and returns preferred iova
mapping mode for that bus.

Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
Flag used when driver needs to operate in iova=va mode.

Algorithm for iova scheme selection for PCI bus:
0. If no device bound then return with RTE_IOVA_DC mapping mode,
else goto 1).
1. Look for device attached to vfio kdrv and has .drv_flag set
to RTE_PCI_DRV_IOVA_AS_VA.
2. Look for any device attached to UIO class of driver.
3. Check for vfio-noiommu mode enabled.

If 2) & 3) is false and 1) is true then select
mapping scheme as RTE_IOVA_VA. Otherwise use default
mapping scheme (RTE_IOVA_PA).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
v7 --> v8:
- Replaced 0/1 with false/true boolean value (Suggested by Anatoly)

v6 --> v7:
- squashed v6 series patch no [01/12] & [05/12]..
    i.e.. moved RTE_PCI_DRV_IOVA_AS_VA flag into this patch. (Aaron comment).

 lib/librte_eal/common/include/rte_pci.h         |  2 +
 lib/librte_eal/linuxapp/eal/eal_pci.c           | 96 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 19 +++++
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 122 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 0e36de093..a67d77f22 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -202,6 +202,8 @@ struct rte_pci_bus {
 #define RTE_PCI_DRV_INTR_RMV 0x0010
 /** Device driver needs to keep mapped resources if unsupported dev detected */
 #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
+/** Device driver supports iova as va */
+#define RTE_PCI_DRV_IOVA_AS_VA 0X0040
 
 /**
  * A structure describing a PCI mapping.
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 8951ce742..2971f1d4f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -45,6 +45,7 @@
 #include "eal_filesystem.h"
 #include "eal_private.h"
 #include "eal_pci_init.h"
+#include "eal_vfio.h"
 
 /**
  * @file
@@ -487,6 +488,101 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Is pci device bound to any kdrv
+ */
+static inline int
+pci_device_is_bound(void)
+{
+	struct rte_pci_device *dev = NULL;
+	int ret = 0;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_UNKNOWN ||
+		    dev->kdrv == RTE_KDRV_NONE) {
+			continue;
+		} else {
+			ret = 1;
+			break;
+		}
+	}
+	return ret;
+}
+
+/*
+ * Any one of the device bound to uio
+ */
+static inline int
+pci_device_bound_uio(void)
+{
+	struct rte_pci_device *dev = NULL;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
+		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
+			return 1;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Any one of the device has iova as va
+ */
+static inline int
+pci_device_has_iova_va(void)
+{
+	struct rte_pci_device *dev = NULL;
+	struct rte_pci_driver *drv = NULL;
+
+	FOREACH_DRIVER_ON_PCIBUS(drv) {
+		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
+			FOREACH_DEVICE_ON_PCIBUS(dev) {
+				if (dev->kdrv == RTE_KDRV_VFIO &&
+				    rte_pci_match(drv, dev))
+					return 1;
+			}
+		}
+	}
+	return 0;
+}
+
+/*
+ * Get iommu class of PCI devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	bool is_bound;
+	bool is_vfio_noiommu_enabled = true;
+	bool has_iova_va;
+	bool is_bound_uio;
+
+	is_bound = pci_device_is_bound();
+	if (!is_bound)
+		return RTE_IOVA_DC;
+
+	has_iova_va = pci_device_has_iova_va();
+	is_bound_uio = pci_device_bound_uio();
+#ifdef VFIO_PRESENT
+	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == true ?
+					true : false;
+#endif
+
+	if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
+		return RTE_IOVA_VA;
+
+	if (has_iova_va) {
+		RTE_LOG(WARNING, EAL, "Some devices want iova as va but pa will be used because.. ");
+		if (is_vfio_noiommu_enabled)
+			RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
+		if (is_bound_uio)
+			RTE_LOG(WARNING, EAL, "few device bound to UIO\n");
+	}
+
+	return RTE_IOVA_PA;
+}
+
 /* Read PCI config space. */
 int rte_pci_read_config(const struct rte_pci_device *device,
 		void *buf, size_t len, off_t offset)
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 946df7e31..c8a97b7e7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
 	return 0;
 }
 
+int
+vfio_noiommu_is_enabled(void)
+{
+	int fd, ret, cnt __rte_unused;
+	char c;
+
+	ret = -1;
+	fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
+	if (fd < 0)
+		return -1;
+
+	cnt = read(fd, &c, 1);
+	if (c == 'Y')
+		ret = 1;
+
+	close(fd);
+	return ret;
+}
+
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index 5ff63e5d7..26ea8e119 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -150,6 +150,8 @@ struct vfio_config {
 #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
 #define VFIO_GET_REGION_IDX(x) (x >> 40)
+#define VFIO_NOIOMMU_MODE      \
+	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
 
 /* DMA mapping function prototype.
  * Takes VFIO container fd as a parameter.
@@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
 
 int vfio_mp_sync_setup(void);
 
+int vfio_noiommu_is_enabled(void);
+
 #define SOCKET_REQ_CONTAINER 0x100
 #define SOCKET_REQ_GROUP 0x200
 #define SOCKET_CLR_GROUP 0x300
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 287cc75cd..a8c8ea4f4 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -248,5 +248,6 @@ DPDK_17.11 {
 	global:
 
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.08;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v8 4/9] bus: get iommu class
  2017-09-18 10:42             ` [PATCH v8 " Santosh Shukla
                                 ` (2 preceding siblings ...)
  2017-09-18 10:42               ` [PATCH v8 3/9] linuxapp/eal_pci: " Santosh Shukla
@ 2017-09-18 10:42               ` Santosh Shukla
  2017-09-18 10:42               ` [PATCH v8 5/9] eal: introduce iova mode helper api Santosh Shukla
                                 ` (5 subsequent siblings)
  9 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-09-18 10:42 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

API(rte_bus_get_iommu_class) helps to automatically detect and select
appropriate iova mapping scheme for iommu capable device on that bus.

Algorithm for iova scheme selection for bus:
0. Iterate through bus_list.
1. Collect each bus iova mode value and update into 'mode' var.
2. Mode selection scheme is:
if mode == 0 then iova mode is _pa,
if mode == 1 then iova mode is _pa,
if mode == 2 then iova mode is _va,
if mode == 3 then iova mode ia _pa.

So mode !=2  will be default iova mode (_pa).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
 lib/librte_eal/common/eal_common_pci.c          |  1 +
 lib/librte_eal/common/include/rte_bus.h         | 25 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 51 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index c6ffd9399..3466eaf20 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -244,5 +244,6 @@ DPDK_17.11 {
 
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 08bec2d93..a30a8982e 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
 		c[0] = '\0';
 	return rte_bus_find(NULL, bus_can_parse, name);
 }
+
+
+/*
+ * Get iommu class of devices on the bus.
+ */
+enum rte_iova_mode
+rte_bus_get_iommu_class(void)
+{
+	int mode = RTE_IOVA_DC;
+	struct rte_bus *bus;
+
+	TAILQ_FOREACH(bus, &rte_bus_list, next) {
+
+		if (bus->get_iommu_class)
+			mode |= bus->get_iommu_class();
+	}
+
+	if (mode != RTE_IOVA_VA) {
+		/* Use default IOVA mode */
+		mode = RTE_IOVA_PA;
+	}
+	return mode;
+}
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 3b7d0a0ee..0f0e4b93b 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -564,6 +564,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.plug = pci_plug,
 		.unplug = pci_unplug,
 		.parse = pci_parse,
+		.get_iommu_class = rte_pci_get_iommu_class,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 9e40687e5..70a291a4d 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -178,6 +178,20 @@ struct rte_bus_conf {
 	enum rte_bus_scan_mode scan_mode; /**< Scan policy. */
 };
 
+
+/**
+ * Get common iommu class of the all the devices on the bus. The bus may
+ * check that those devices are attached to iommu driver.
+ * If no devices are attached to the bus. The bus may return with don't care
+ * (_DC) value.
+ * Otherwise, The bus will return appropriate _pa or _va iova mode.
+ *
+ * @return
+ *      enum rte_iova_mode value.
+ */
+typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
+
+
 /**
  * A structure describing a generic bus.
  */
@@ -191,6 +205,7 @@ struct rte_bus {
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	struct rte_bus_conf conf;    /**< Bus configuration */
+	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
 
 /**
@@ -290,6 +305,16 @@ struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
  */
 struct rte_bus *rte_bus_find_by_name(const char *busname);
 
+
+/**
+ * Get the common iommu class of devices bound on to buses available in the
+ * system. The default mode is PA.
+ *
+ * @return
+ *     enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_bus_get_iommu_class(void);
+
 /**
  * Helper for Bus registration.
  * The constructor has higher priority than PMD constructors.
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index a8c8ea4f4..9115aa3e9 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -249,5 +249,6 @@ DPDK_17.11 {
 
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.08;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v8 5/9] eal: introduce iova mode helper api
  2017-09-18 10:42             ` [PATCH v8 " Santosh Shukla
                                 ` (3 preceding siblings ...)
  2017-09-18 10:42               ` [PATCH v8 4/9] bus: " Santosh Shukla
@ 2017-09-18 10:42               ` Santosh Shukla
  2017-09-18 10:42               ` [PATCH v8 6/9] eal: auto detect iova mode Santosh Shukla
                                 ` (4 subsequent siblings)
  9 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-09-18 10:42 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

Introducing rte_eal_iova_mode() helper API. This API
used by non-eal library for detecting iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c                 |  6 ++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/include/rte_eal.h         | 12 ++++++++++++
 lib/librte_eal/linuxapp/eal/eal.c               |  6 ++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 26 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 5fa598842..07e72203f 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -119,6 +119,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 3466eaf20..6bed74dff 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -245,5 +245,6 @@ DPDK_17.11 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 0e7363d77..932dc1a96 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -45,6 +45,7 @@
 
 #include <rte_per_lcore.h>
 #include <rte_config.h>
+#include <rte_bus.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -87,6 +88,9 @@ struct rte_config {
 	/** Primary or secondary configuration */
 	enum rte_proc_type_t process_type;
 
+	/** PA or VA mapping mode */
+	enum rte_iova_mode iova_mode;
+
 	/**
 	 * Pointer to memory configuration, which may be shared across multiple
 	 * DPDK instances
@@ -287,6 +291,14 @@ static inline int rte_gettid(void)
 	return RTE_PER_LCORE(_thread_id);
 }
 
+/**
+ * Get the iova mode
+ *
+ * @return
+ *   enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_eal_iova_mode(void);
+
 #define RTE_INIT(func) \
 static void __attribute__((constructor, used)) func(void)
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 48f12f44c..febbafdb3 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -128,6 +128,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 9115aa3e9..8e49bf5fa 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -250,5 +250,6 @@ DPDK_17.11 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.08;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v8 6/9] eal: auto detect iova mode
  2017-09-18 10:42             ` [PATCH v8 " Santosh Shukla
                                 ` (4 preceding siblings ...)
  2017-09-18 10:42               ` [PATCH v8 5/9] eal: introduce iova mode helper api Santosh Shukla
@ 2017-09-18 10:42               ` Santosh Shukla
  2017-09-18 10:42               ` [PATCH v8 7/9] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
                                 ` (3 subsequent siblings)
  9 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-09-18 10:42 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

For auto detection purpose:
* Below calls moved up in the eal initialization order:
	- eal_option_device_parse
	- rte_bus_scan

Based on the result of rte_bus_scan_iommu_class - select iova
mapping mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
v6 --> v7:
- Moved eal_option_device_parse() up in then order of eal init.
- Added run_once. (aaron suggestion).
- squashed v6 series patch no. [08/12] & [09/12] into one patch (Aaron
      comment)

 lib/librte_eal/bsdapp/eal/eal.c   | 27 ++++++++++++++++-----------
 lib/librte_eal/linuxapp/eal/eal.c | 27 ++++++++++++++++-----------
 2 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 07e72203f..f003f4c04 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -541,6 +541,22 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (eal_option_device_parse()) {
+		rte_errno = ENODEV;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
+
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			eal_hugepage_info_init() < 0) {
@@ -620,17 +636,6 @@ rte_eal_init(int argc, char **argv)
 		rte_config.master_lcore, thread_id, cpuset,
 		ret == 0 ? "" : "...");
 
-	if (eal_option_device_parse()) {
-		rte_errno = ENODEV;
-		return -1;
-	}
-
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index febbafdb3..f4901ffb6 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -798,6 +798,22 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (eal_option_device_parse()) {
+		rte_errno = ENODEV;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
+
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			internal_config.xen_dom0_support == 0 &&
@@ -895,17 +911,6 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (eal_option_device_parse()) {
-		rte_errno = ENODEV;
-		return -1;
-	}
-
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v8 7/9] linuxapp/eal_vfio: honor iova mode before mapping
  2017-09-18 10:42             ` [PATCH v8 " Santosh Shukla
                                 ` (5 preceding siblings ...)
  2017-09-18 10:42               ` [PATCH v8 6/9] eal: auto detect iova mode Santosh Shukla
@ 2017-09-18 10:42               ` Santosh Shukla
  2017-09-18 10:42               ` [PATCH v8 8/9] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
                                 ` (2 subsequent siblings)
  9 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-09-18 10:42 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

Check iova mode and accordingly map iova to pa or va.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c8a97b7e7..b32cd09a2 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
 
 		ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
@@ -792,7 +795,10 @@ vfio_spapr_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
 				 VFIO_DMA_MAP_FLAG_WRITE;
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v8 8/9] linuxapp/eal_memory: honor iova mode in virt2phy
  2017-09-18 10:42             ` [PATCH v8 " Santosh Shukla
                                 ` (6 preceding siblings ...)
  2017-09-18 10:42               ` [PATCH v8 7/9] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
@ 2017-09-18 10:42               ` Santosh Shukla
  2017-09-18 10:42               ` [PATCH v8 9/9] eal/rte_malloc: " Santosh Shukla
  2017-09-20 11:23               ` [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
  9 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-09-18 10:42 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 52791282f..2d9d7c2dc 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -139,6 +139,9 @@ rte_mem_virt2phy(const void *virtaddr)
 	int page_size;
 	off_t offset;
 
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		return (uintptr_t)virtaddr;
+
 	/* when using dom0, /proc/self/pagemap always returns 0, check in
 	 * dpdk memory by browsing the memsegs */
 	if (rte_xen_dom0_supported()) {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v8 9/9] eal/rte_malloc: honor iova mode in virt2phy
  2017-09-18 10:42             ` [PATCH v8 " Santosh Shukla
                                 ` (7 preceding siblings ...)
  2017-09-18 10:42               ` [PATCH v8 8/9] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
@ 2017-09-18 10:42               ` Santosh Shukla
  2017-09-20 11:23               ` [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
  9 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-09-18 10:42 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/rte_malloc.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 5c0627bf4..d65c05a4d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -251,10 +251,17 @@ rte_malloc_set_limit(__rte_unused const char *type,
 phys_addr_t
 rte_malloc_virt2phy(const void *addr)
 {
+	phys_addr_t paddr;
 	const struct malloc_elem *elem = malloc_elem_from_data(addr);
 	if (elem == NULL)
 		return RTE_BAD_PHYS_ADDR;
 	if (elem->ms->phys_addr == RTE_BAD_PHYS_ADDR)
 		return RTE_BAD_PHYS_ADDR;
-	return elem->ms->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		paddr = (uintptr_t)addr;
+	else
+		paddr = elem->ms->phys_addr +
+			((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+	return paddr;
 }
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* Re: [PATCH v8 2/9] eal/pci: get iommu class
  2017-09-18 10:42               ` [PATCH v8 2/9] eal/pci: get iommu class Santosh Shukla
@ 2017-09-19 16:37                 ` Burakov, Anatoly
  2017-09-19 17:29                   ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Burakov, Anatoly @ 2017-09-19 16:37 UTC (permalink / raw)
  To: Santosh Shukla, dev

On 18-Sep-17 11:42 AM, Santosh Shukla wrote:
> Introducing rte_pci_get_iommu_class API which helps to get iommu class
> of PCI device on the bus and returns preferred iova mapping mode for
> PCI bus.
> 
> Patch also add rte_pci_get_iommu_class definition for bsdapp,
> in bsdapp case - api returns default iova mode.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla at caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin at redhat.com>
> ---

Hi Santosh,

You have probably missed my comment on previous version of this patch, 
but for commit history reasons i really think you should add a linuxapp 
stub in this commit as well as a FreeBSD stub, even though you are 
adding a linuxapp function in the next commit. Any linuxapp application 
using that function will fail to compile with this commit, despite this 
API being already present and declared as public.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v8 2/9] eal/pci: get iommu class
  2017-09-19 16:37                 ` Burakov, Anatoly
@ 2017-09-19 17:29                   ` santosh
  2017-09-20  9:09                     ` Burakov, Anatoly
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-09-19 17:29 UTC (permalink / raw)
  To: Burakov, Anatoly, dev

Hi Anatoly,


On Tuesday 19 September 2017 10:07 PM, Burakov, Anatoly wrote:
> On 18-Sep-17 11:42 AM, Santosh Shukla wrote:
>> Introducing rte_pci_get_iommu_class API which helps to get iommu class
>> of PCI device on the bus and returns preferred iova mapping mode for
>> PCI bus.
>>
>> Patch also add rte_pci_get_iommu_class definition for bsdapp,
>> in bsdapp case - api returns default iova mode.
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla at caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob at caviumnetworks.com>
>> Reviewed-by: Maxime Coquelin <maxime.coquelin at redhat.com>
>> ---
>
> Hi Santosh,
>
> You have probably missed my comment on previous version of this patch, but for commit history reasons i really think you should add a linuxapp stub in this commit as well as a FreeBSD stub, even though you are adding a linuxapp function in the next commit. Any linuxapp application using that function will fail to compile with this commit, despite this API being already present and declared as public.
>
First, apologies for not following up on your note:

I prefer to keep less context in each patch and 
for [03/9], its already has _IOVA_AS_VA flag + whole autodetection 
algo inside (squashed per Aron suggestion).
 
Now if I squash [2/9] into [3/9], then would be too much info 
for future reader to digest for (imo). Its a kind of trade-off.

On any linuxapp appl breaking with this commit: 
This series exposes eal api for application to use and identify iova mode.

If you still feel not convinced with my explanation then I'll spin v9 and squash
[02/09], [03/09] in v9.

Thanks. 

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v8 2/9] eal/pci: get iommu class
  2017-09-19 17:29                   ` santosh
@ 2017-09-20  9:09                     ` Burakov, Anatoly
  2017-09-20 10:24                       ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Burakov, Anatoly @ 2017-09-20  9:09 UTC (permalink / raw)
  To: santosh, dev

Hi Santosh,

On 19-Sep-17 6:29 PM, santosh wrote:
> Hi Anatoly,
> 
> 
> On Tuesday 19 September 2017 10:07 PM, Burakov, Anatoly wrote:
>> On 18-Sep-17 11:42 AM, Santosh Shukla wrote:
>>> Introducing rte_pci_get_iommu_class API which helps to get iommu class
>>> of PCI device on the bus and returns preferred iova mapping mode for
>>> PCI bus.
>>>
>>> Patch also add rte_pci_get_iommu_class definition for bsdapp,
>>> in bsdapp case - api returns default iova mode.
>>>
>>> Signed-off-by: Santosh Shukla <santosh.shukla at caviumnetworks.com>
>>> Signed-off-by: Jerin Jacob <jerin.jacob at caviumnetworks.com>
>>> Reviewed-by: Maxime Coquelin <maxime.coquelin at redhat.com>
>>> ---
>>
>> Hi Santosh,
>>
>> You have probably missed my comment on previous version of this patch, but for commit history reasons i really think you should add a linuxapp stub in this commit as well as a FreeBSD stub, even though you are adding a linuxapp function in the next commit. Any linuxapp application using that function will fail to compile with this commit, despite this API being already present and declared as public.
>>
> First, apologies for not following up on your note:
> 
> I prefer to keep less context in each patch and
> for [03/9], its already has _IOVA_AS_VA flag + whole autodetection
> algo inside (squashed per Aron suggestion).
>   
> Now if I squash [2/9] into [3/9], then would be too much info
> for future reader to digest for (imo). Its a kind of trade-off.
> 
> On any linuxapp appl breaking with this commit:
> This series exposes eal api for application to use and identify iova mode.
> 
> If you still feel not convinced with my explanation then I'll spin v9 and squash
> [02/09], [03/09] in v9.

No, i don't mean squashing these two patches into one. I mean, provide a 
stub like for FreeBSD, and then edit it to be a proper implementation in 
the next commit.

I.e. in this commit, add a stub that just returns 0, like for FreeBSD. 
Next commit, instead of starting from scratch, start from this stub.

Thanks,
Anatoly

> 
> Thanks.
> 
> 
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v8 2/9] eal/pci: get iommu class
  2017-09-20  9:09                     ` Burakov, Anatoly
@ 2017-09-20 10:24                       ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-09-20 10:24 UTC (permalink / raw)
  To: Burakov, Anatoly, dev

Hi Anatoly,


On Wednesday 20 September 2017 02:39 PM, Burakov, Anatoly wrote:
> Hi Santosh,
>
> On 19-Sep-17 6:29 PM, santosh wrote:
>> Hi Anatoly,
>>
>>
>> On Tuesday 19 September 2017 10:07 PM, Burakov, Anatoly wrote:
>>> On 18-Sep-17 11:42 AM, Santosh Shukla wrote:
>>>> Introducing rte_pci_get_iommu_class API which helps to get iommu class
>>>> of PCI device on the bus and returns preferred iova mapping mode for
>>>> PCI bus.
>>>>
>>>> Patch also add rte_pci_get_iommu_class definition for bsdapp,
>>>> in bsdapp case - api returns default iova mode.
>>>>
>>>> Signed-off-by: Santosh Shukla <santosh.shukla at caviumnetworks.com>
>>>> Signed-off-by: Jerin Jacob <jerin.jacob at caviumnetworks.com>
>>>> Reviewed-by: Maxime Coquelin <maxime.coquelin at redhat.com>
>>>> ---
>>>
>>> Hi Santosh,
>>>
>>> You have probably missed my comment on previous version of this patch, but for commit history reasons i really think you should add a linuxapp stub in this commit as well as a FreeBSD stub, even though you are adding a linuxapp function in the next commit. Any linuxapp application using that function will fail to compile with this commit, despite this API being already present and declared as public.
>>>
>> First, apologies for not following up on your note:
>>
>> I prefer to keep less context in each patch and
>> for [03/9], its already has _IOVA_AS_VA flag + whole autodetection
>> algo inside (squashed per Aron suggestion).
>>   Now if I squash [2/9] into [3/9], then would be too much info
>> for future reader to digest for (imo). Its a kind of trade-off.
>>
>> On any linuxapp appl breaking with this commit:
>> This series exposes eal api for application to use and identify iova mode.
>>
>> If you still feel not convinced with my explanation then I'll spin v9 and squash
>> [02/09], [03/09] in v9.
>
> No, i don't mean squashing these two patches into one. I mean, provide a stub like for FreeBSD, and then edit it to be a proper implementation in the next commit.
>
> I.e. in this commit, add a stub that just returns 0, like for FreeBSD. Next commit, instead of starting from scratch, start from this stub.
>
+1, Sending v9.

Thanks.

> Thanks,
> Anatoly
>
>>
>> Thanks.
>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus
  2017-09-18 10:42             ` [PATCH v8 " Santosh Shukla
                                 ` (8 preceding siblings ...)
  2017-09-18 10:42               ` [PATCH v8 9/9] eal/rte_malloc: " Santosh Shukla
@ 2017-09-20 11:23               ` Santosh Shukla
  2017-09-20 11:23                 ` [PATCH v9 1/9] eal/pci: export match function Santosh Shukla
                                   ` (10 more replies)
  9 siblings, 11 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-09-20 11:23 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

v9:
- Added Tested-By: to series.
- Includes minor changes related to linuxapp api stub in [02/09]
  (Suggested by Anatoly)
- Series rebased on tip commit : aee62e90

v8:
Includes minor review changes per v7 review comment from Anatoly.
Patches rebased on Tip commit:3d2e0448eb.

v7:
Includes no major change, minor change detailing:
- patch sqashing (Aaron suggestion)
- added run_once for device_parse() and bus_scan() in eal init
    (Aaron suggestion)
- Moved rte_eal_device_parse() up in eal initialization order.
- Patches rebased on top of version: 17.11-rc0
For v6 info refer [11].

v6:
Sending v5 series rebased on top of version: 17.11-rc0.

v5:
Introducing RTE_PCI_DRV_IOVA_AS_VA flag for autodetection of iova va
mapping.
If a PCI driver demand for IOVA as VA scheme then the driver can add
it in the
PCI driver registration function.

Algorithm to select IOVA as VA for PCI bus case:
     0. If no device bound then return with RTE_IOVA_DC mapping mode,
     else goto 1).
     1. Look for device attached to vfio kdrv and has .drv_flag set
     to RTE_PCI_DRV_IOVA_AS_VA.
     2. Look for any device attached to UIO class of driver.
     3. Check for vfio-noiommu mode enabled.

     If 2) & 3) is false and 1) is true then select
     mapping scheme as RTE_IOVA_VA. Otherwise use default
     mapping scheme (RTE_IOVA_PA).

That way, Bus can truly autodetect the iova mapping mode for
a device Or a set of the device.

Change History:
v8 --> v9:
- Added Tested-by: signature of Hemant.
- Added linuxapp stub api definition in [02/09] (Suggested by Amatoly)

v7 --> v8:
- Replace 0 / 1 with true/false boolean values (Suggested by Anatoly).

v6 --> v7:
- Patches squashed per v6.
- Added run_once in eal per v6.
- Moved rte_eal_device_parse() up in eal init oder.

v5 --> v6:
- Added api info in eal's versiom.map (release DPDK_v17.11).

v4 --> v5:
- Change DPDK_17.08 to DPDK_17.11 in _version.map.
- Reworded bus api description (suggested by Hemant).
- Added reviewed-by from Maxime in v5.
- Added acked-by from Hemant for pci and bus patches.

v3 --> v4:
- Re-introduced RTE_IOVA_DEC mode (Suggested by Hemant [5]).
- Renamed flag to RTE_PCI_DRV_IOVA_AS_VA (Suggested by Maxime).
- Reworded WARNING message(suggested by Maxime[7]).
- Created a separate patch for rte_pci_get_iommu_class (suggested by
Maxime[]).
- Added VFIO_PRESENT ifdef build fix.

v2 --> v3:
- Removed rte_mempool_virt2phy (suggested by Olivier [4])

v1 --> v2:
- Removed override eal option i.e. (--iova-mode=<>) Because we have
means to
   truly autodetect the iova mode.
- Introduced RTE_PCI_DRV_NEED_IOVA_VA drv_flag (Suggested by Maxime [3]).
- Using NEED_IOVA_VA drv_flag in autodetection logic.
- Removed Linux version check macro in vfio code, As per Maxime feedback.
- Moved rte_pci_match API from local to global.

Patch Summary:
1) 1nd: declare rte_pci_match api in pci header. Required for
autodetection in
follow up patches.
2) 2nd - 3rd - 4th : autodetection mapping infrastructure for
Linux/bsdapp.
3) 5th: iova mode helper API.
4) 6th: Infra to detect iova mode.
5) 7th: make vfio mapping iova aware.
6) 8th - 9th : Check for IOVA_VA mode in below APIs
         - rte_mem_virt2phy
         - rte_malloc_virt2phy

Test History:
- Tested for x86/XL710 40G NIC card for both modes (iova_va/pa).
- Tested for arm64/thunderx vNIC Integrated NIC for both modes
- Tested for arm64/Octeontx integrated NICs for only
   Iova_va mode(It supports only one mode.)
- Ran standalone tests like mempool_autotest, mbuf_autotest.
- Verified for Doxygen.

Work History:
For v1, Refer [1].
For v2, Refer [2].
For v3, Refer [9].
For v4, refer [10].
for v6, refer [11].

Checkpatch result:
* None 

Thanks.,
[1] https://www.mail-archive.com/dev@dpdk.org/msg67438.html
[2] https://www.mail-archive.com/dev@dpdk.org/msg70674.html
[3] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
[4] https://www.mail-archive.com/dev@dpdk.org/msg70692.html
[5] http://dpdk.org/ml/archives/dev/2017-July/071282.html
[6] http://dpdk.org/ml/archives/dev/2017-July/070951.html
[7] http://dpdk.org/ml/archives/dev/2017-July/070941.html
[8] http://dpdk.org/ml/archives/dev/2017-July/070952.html
[9] http://dpdk.org/ml/archives/dev/2017-July/070918.html
[10] http://dpdk.org/ml/archives/dev/2017-July/071754.html
[11] http://dpdk.org/ml/archives/dev/2017-August/072871.html



Santosh Shukla (9):
  eal/pci: export match function
  eal/pci: get iommu class
  linuxapp/eal_pci: get iommu class
  bus: get iommu class
  eal: introduce helper API for iova mode
  eal: auto detect iova mode
  linuxapp/eal_vfio: honor iova mode before mapping
  linuxapp/eal_memory: honor iova mode in virt2phy
  eal/rte_malloc: honor iova mode in virt2phy

 lib/librte_eal/bsdapp/eal/eal.c                 | 33 ++++++---
 lib/librte_eal/bsdapp/eal/eal_pci.c             | 10 +++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   | 10 +++
 lib/librte_eal/common/eal_common_bus.c          | 23 ++++++
 lib/librte_eal/common/eal_common_pci.c          | 11 +--
 lib/librte_eal/common/include/rte_bus.h         | 35 +++++++++
 lib/librte_eal/common/include/rte_eal.h         | 12 ++++
 lib/librte_eal/common/include/rte_pci.h         | 28 ++++++++
 lib/librte_eal/common/rte_malloc.c              |  9 ++-
 lib/librte_eal/linuxapp/eal/eal.c               | 33 ++++++---
 lib/librte_eal/linuxapp/eal/eal_memory.c        |  3 +
 lib/librte_eal/linuxapp/eal/eal_pci.c           | 96 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 29 +++++++-
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map | 10 +++
 15 files changed, 312 insertions(+), 34 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v9 1/9] eal/pci: export match function
  2017-09-20 11:23               ` [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
@ 2017-09-20 11:23                 ` Santosh Shukla
  2017-09-20 11:23                 ` [PATCH v9 2/9] eal/pci: get iommu class Santosh Shukla
                                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-09-20 11:23 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

Export rte_pci_match() function as it needed in the followup patch.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  7 +++++++
 lib/librte_eal/common/eal_common_pci.c          | 10 +---------
 lib/librte_eal/common/include/rte_pci.h         | 15 +++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +++++++
 4 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 47a09ea7f..cfbf8fbd0 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -238,3 +238,10 @@ EXPERIMENTAL {
 	rte_service_start_with_defaults;
 
 } DPDK_17.08;
+
+DPDK_17.11 {
+	global:
+
+	rte_pci_match;
+
+} DPDK_17.08;
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 52fd38cdd..3b7d0a0ee 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -150,16 +150,8 @@ pci_unmap_resource(void *requested_addr, size_t size)
 
 /*
  * Match the PCI Driver and Device using the ID Table
- *
- * @param pci_drv
- *	PCI driver from which ID table would be extracted
- * @param pci_dev
- *	PCI device to match against the driver
- * @return
- *	1 for successful match
- *	0 for unsuccessful match
  */
-static int
+int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev)
 {
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 8b123391c..eab84c7a4 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -366,6 +366,21 @@ int rte_pci_scan(void);
 int
 rte_pci_probe(void);
 
+/*
+ * Match the PCI Driver and Device using the ID Table
+ *
+ * @param pci_drv
+ *      PCI driver from which ID table would be extracted
+ * @param pci_dev
+ *      PCI device to match against the driver
+ * @return
+ *      1 for successful match
+ *      0 for unsuccessful match
+ */
+int
+rte_pci_match(const struct rte_pci_driver *pci_drv,
+	      const struct rte_pci_device *pci_dev);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 8c08b8d1e..287cc75cd 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -243,3 +243,10 @@ EXPERIMENTAL {
 	rte_service_start_with_defaults;
 
 } DPDK_17.08;
+
+DPDK_17.11 {
+	global:
+
+	rte_pci_match;
+
+} DPDK_17.08;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v9 2/9] eal/pci: get iommu class
  2017-09-20 11:23               ` [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
  2017-09-20 11:23                 ` [PATCH v9 1/9] eal/pci: export match function Santosh Shukla
@ 2017-09-20 11:23                 ` Santosh Shukla
  2017-09-20 11:39                   ` Burakov, Anatoly
  2017-10-05 23:58                   ` Thomas Monjalon
  2017-09-20 11:23                 ` [PATCH v9 3/9] linuxapp/eal_pci: " Santosh Shukla
                                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-09-20 11:23 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

Introducing rte_pci_get_iommu_class API which helps to get iommu class
of PCI device on the bus and returns preferred iova mapping mode for
PCI bus.

Patch also adds rte_pci_get_iommu_class definition for:
- bsdapp: api returns default iova mode.
- linuxapp: Has stub implementation, Followup patch has complete
  implementation.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
v8 --> v9:
- Added linuxapp iova stub definition (Suugested by Anatoly)

v6 --> v7:
- squashed v6 series patch [02/12] & [03/12] (Aaron comment).

 lib/librte_eal/bsdapp/eal/eal_pci.c             | 10 ++++++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/include/rte_bus.h         | 10 ++++++++++
 lib/librte_eal/common/include/rte_pci.h         | 11 +++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci.c           |  9 +++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 6 files changed, 42 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 04eacdcc7..e2c252320 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -403,6 +403,16 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Get iommu class of pci devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	/* Supports only RTE_KDRV_NIC_UIO */
+	return RTE_IOVA_PA;
+}
+
 int
 pci_update_device(const struct rte_pci_addr *addr)
 {
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index cfbf8fbd0..c6ffd9399 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -243,5 +243,6 @@ DPDK_17.11 {
 	global:
 
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index c79368d3c..9e40687e5 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -55,6 +55,16 @@ extern "C" {
 /** Double linked list of buses */
 TAILQ_HEAD(rte_bus_list, rte_bus);
 
+
+/**
+ * IOVA mapping mode.
+ */
+enum rte_iova_mode {
+	RTE_IOVA_DC = 0,	/* Don't care mode */
+	RTE_IOVA_PA = (1 << 0),
+	RTE_IOVA_VA = (1 << 1)
+};
+
 /**
  * Bus specific scan for devices attached on the bus.
  * For each bus object, the scan would be responsible for finding devices and
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index eab84c7a4..0e36de093 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -381,6 +381,17 @@ int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev);
 
+
+/**
+ * Get iommu class of PCI devices on the bus.
+ * And return their preferred iova mapping mode.
+ *
+ * @return
+ *   - enum rte_iova_mode.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 8951ce742..26f2be822 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -487,6 +487,15 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Get iommu class of pci devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	return RTE_IOVA_PA;
+}
+
 /* Read PCI config space. */
 int rte_pci_read_config(const struct rte_pci_device *device,
 		void *buf, size_t len, off_t offset)
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 287cc75cd..a8c8ea4f4 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -248,5 +248,6 @@ DPDK_17.11 {
 	global:
 
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.08;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v9 3/9] linuxapp/eal_pci: get iommu class
  2017-09-20 11:23               ` [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
  2017-09-20 11:23                 ` [PATCH v9 1/9] eal/pci: export match function Santosh Shukla
  2017-09-20 11:23                 ` [PATCH v9 2/9] eal/pci: get iommu class Santosh Shukla
@ 2017-09-20 11:23                 ` Santosh Shukla
  2017-10-06  0:17                   ` Thomas Monjalon
  2017-09-20 11:23                 ` [PATCH v9 4/9] bus: " Santosh Shukla
                                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-09-20 11:23 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

Get iommu class of PCI device on the bus and returns preferred iova
mapping mode for that bus.

Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
Flag used when driver needs to operate in iova=va mode.

Algorithm for iova scheme selection for PCI bus:
0. If no device bound then return with RTE_IOVA_DC mapping mode,
else goto 1).
1. Look for device attached to vfio kdrv and has .drv_flag set
to RTE_PCI_DRV_IOVA_AS_VA.
2. Look for any device attached to UIO class of driver.
3. Check for vfio-noiommu mode enabled.

If 2) & 3) is false and 1) is true then select
mapping scheme as RTE_IOVA_VA. Otherwise use default
mapping scheme (RTE_IOVA_PA).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
v7 --> v8:
- Replaced 0/1 with false/true boolean value (Suggested by Anatoly)

v6 --> v7:
- squashed v6 series patch no [01/12] & [05/12]..
    i.e.. moved RTE_PCI_DRV_IOVA_AS_VA flag into this patch. (Aaron comment).

 lib/librte_eal/common/include/rte_pci.h |  2 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 89 ++++++++++++++++++++++++++++++++-
 lib/librte_eal/linuxapp/eal/eal_vfio.c  | 19 +++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.h  |  4 ++
 4 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 0e36de093..a67d77f22 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -202,6 +202,8 @@ struct rte_pci_bus {
 #define RTE_PCI_DRV_INTR_RMV 0x0010
 /** Device driver needs to keep mapped resources if unsupported dev detected */
 #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
+/** Device driver supports iova as va */
+#define RTE_PCI_DRV_IOVA_AS_VA 0X0040
 
 /**
  * A structure describing a PCI mapping.
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 26f2be822..2971f1d4f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -45,6 +45,7 @@
 #include "eal_filesystem.h"
 #include "eal_private.h"
 #include "eal_pci_init.h"
+#include "eal_vfio.h"
 
 /**
  * @file
@@ -488,11 +489,97 @@ rte_pci_scan(void)
 }
 
 /*
- * Get iommu class of pci devices on the bus.
+ * Is pci device bound to any kdrv
+ */
+static inline int
+pci_device_is_bound(void)
+{
+	struct rte_pci_device *dev = NULL;
+	int ret = 0;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_UNKNOWN ||
+		    dev->kdrv == RTE_KDRV_NONE) {
+			continue;
+		} else {
+			ret = 1;
+			break;
+		}
+	}
+	return ret;
+}
+
+/*
+ * Any one of the device bound to uio
+ */
+static inline int
+pci_device_bound_uio(void)
+{
+	struct rte_pci_device *dev = NULL;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
+		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
+			return 1;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Any one of the device has iova as va
+ */
+static inline int
+pci_device_has_iova_va(void)
+{
+	struct rte_pci_device *dev = NULL;
+	struct rte_pci_driver *drv = NULL;
+
+	FOREACH_DRIVER_ON_PCIBUS(drv) {
+		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
+			FOREACH_DEVICE_ON_PCIBUS(dev) {
+				if (dev->kdrv == RTE_KDRV_VFIO &&
+				    rte_pci_match(drv, dev))
+					return 1;
+			}
+		}
+	}
+	return 0;
+}
+
+/*
+ * Get iommu class of PCI devices on the bus.
  */
 enum rte_iova_mode
 rte_pci_get_iommu_class(void)
 {
+	bool is_bound;
+	bool is_vfio_noiommu_enabled = true;
+	bool has_iova_va;
+	bool is_bound_uio;
+
+	is_bound = pci_device_is_bound();
+	if (!is_bound)
+		return RTE_IOVA_DC;
+
+	has_iova_va = pci_device_has_iova_va();
+	is_bound_uio = pci_device_bound_uio();
+#ifdef VFIO_PRESENT
+	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == true ?
+					true : false;
+#endif
+
+	if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
+		return RTE_IOVA_VA;
+
+	if (has_iova_va) {
+		RTE_LOG(WARNING, EAL, "Some devices want iova as va but pa will be used because.. ");
+		if (is_vfio_noiommu_enabled)
+			RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
+		if (is_bound_uio)
+			RTE_LOG(WARNING, EAL, "few device bound to UIO\n");
+	}
+
 	return RTE_IOVA_PA;
 }
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 946df7e31..c8a97b7e7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
 	return 0;
 }
 
+int
+vfio_noiommu_is_enabled(void)
+{
+	int fd, ret, cnt __rte_unused;
+	char c;
+
+	ret = -1;
+	fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
+	if (fd < 0)
+		return -1;
+
+	cnt = read(fd, &c, 1);
+	if (c == 'Y')
+		ret = 1;
+
+	close(fd);
+	return ret;
+}
+
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index 5ff63e5d7..26ea8e119 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -150,6 +150,8 @@ struct vfio_config {
 #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
 #define VFIO_GET_REGION_IDX(x) (x >> 40)
+#define VFIO_NOIOMMU_MODE      \
+	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
 
 /* DMA mapping function prototype.
  * Takes VFIO container fd as a parameter.
@@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
 
 int vfio_mp_sync_setup(void);
 
+int vfio_noiommu_is_enabled(void);
+
 #define SOCKET_REQ_CONTAINER 0x100
 #define SOCKET_REQ_GROUP 0x200
 #define SOCKET_CLR_GROUP 0x300
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v9 4/9] bus: get iommu class
  2017-09-20 11:23               ` [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
                                   ` (2 preceding siblings ...)
  2017-09-20 11:23                 ` [PATCH v9 3/9] linuxapp/eal_pci: " Santosh Shukla
@ 2017-09-20 11:23                 ` Santosh Shukla
  2017-09-20 11:23                 ` [PATCH v9 5/9] eal: introduce helper API for iova mode Santosh Shukla
                                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-09-20 11:23 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

API(rte_bus_get_iommu_class) helps to automatically detect and select
appropriate iova mapping scheme for iommu capable device on that bus.

Algorithm for iova scheme selection for bus:
0. Iterate through bus_list.
1. Collect each bus iova mode value and update into 'mode' var.
2. Mode selection scheme is:
if mode == 0 then iova mode is _pa,
if mode == 1 then iova mode is _pa,
if mode == 2 then iova mode is _va,
if mode == 3 then iova mode ia _pa.

So mode !=2  will be default iova mode (_pa).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
 lib/librte_eal/common/eal_common_pci.c          |  1 +
 lib/librte_eal/common/include/rte_bus.h         | 25 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 51 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index c6ffd9399..3466eaf20 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -244,5 +244,6 @@ DPDK_17.11 {
 
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 08bec2d93..a30a8982e 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
 		c[0] = '\0';
 	return rte_bus_find(NULL, bus_can_parse, name);
 }
+
+
+/*
+ * Get iommu class of devices on the bus.
+ */
+enum rte_iova_mode
+rte_bus_get_iommu_class(void)
+{
+	int mode = RTE_IOVA_DC;
+	struct rte_bus *bus;
+
+	TAILQ_FOREACH(bus, &rte_bus_list, next) {
+
+		if (bus->get_iommu_class)
+			mode |= bus->get_iommu_class();
+	}
+
+	if (mode != RTE_IOVA_VA) {
+		/* Use default IOVA mode */
+		mode = RTE_IOVA_PA;
+	}
+	return mode;
+}
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 3b7d0a0ee..0f0e4b93b 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -564,6 +564,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.plug = pci_plug,
 		.unplug = pci_unplug,
 		.parse = pci_parse,
+		.get_iommu_class = rte_pci_get_iommu_class,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 9e40687e5..70a291a4d 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -178,6 +178,20 @@ struct rte_bus_conf {
 	enum rte_bus_scan_mode scan_mode; /**< Scan policy. */
 };
 
+
+/**
+ * Get common iommu class of the all the devices on the bus. The bus may
+ * check that those devices are attached to iommu driver.
+ * If no devices are attached to the bus. The bus may return with don't care
+ * (_DC) value.
+ * Otherwise, The bus will return appropriate _pa or _va iova mode.
+ *
+ * @return
+ *      enum rte_iova_mode value.
+ */
+typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
+
+
 /**
  * A structure describing a generic bus.
  */
@@ -191,6 +205,7 @@ struct rte_bus {
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	struct rte_bus_conf conf;    /**< Bus configuration */
+	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
 
 /**
@@ -290,6 +305,16 @@ struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
  */
 struct rte_bus *rte_bus_find_by_name(const char *busname);
 
+
+/**
+ * Get the common iommu class of devices bound on to buses available in the
+ * system. The default mode is PA.
+ *
+ * @return
+ *     enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_bus_get_iommu_class(void);
+
 /**
  * Helper for Bus registration.
  * The constructor has higher priority than PMD constructors.
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index a8c8ea4f4..9115aa3e9 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -249,5 +249,6 @@ DPDK_17.11 {
 
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.08;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v9 5/9] eal: introduce helper API for iova mode
  2017-09-20 11:23               ` [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
                                   ` (3 preceding siblings ...)
  2017-09-20 11:23                 ` [PATCH v9 4/9] bus: " Santosh Shukla
@ 2017-09-20 11:23                 ` Santosh Shukla
  2017-09-20 11:23                 ` [PATCH v9 6/9] eal: auto detect " Santosh Shukla
                                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-09-20 11:23 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

Introducing rte_eal_iova_mode() helper API. This API
used by non-eal library for detecting iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_eal/bsdapp/eal/eal.c                 |  6 ++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/include/rte_eal.h         | 12 ++++++++++++
 lib/librte_eal/linuxapp/eal/eal.c               |  6 ++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 26 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 5fa598842..07e72203f 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -119,6 +119,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 3466eaf20..6bed74dff 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -245,5 +245,6 @@ DPDK_17.11 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 0e7363d77..932dc1a96 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -45,6 +45,7 @@
 
 #include <rte_per_lcore.h>
 #include <rte_config.h>
+#include <rte_bus.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -87,6 +88,9 @@ struct rte_config {
 	/** Primary or secondary configuration */
 	enum rte_proc_type_t process_type;
 
+	/** PA or VA mapping mode */
+	enum rte_iova_mode iova_mode;
+
 	/**
 	 * Pointer to memory configuration, which may be shared across multiple
 	 * DPDK instances
@@ -287,6 +291,14 @@ static inline int rte_gettid(void)
 	return RTE_PER_LCORE(_thread_id);
 }
 
+/**
+ * Get the iova mode
+ *
+ * @return
+ *   enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_eal_iova_mode(void);
+
 #define RTE_INIT(func) \
 static void __attribute__((constructor, used)) func(void)
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 48f12f44c..febbafdb3 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -128,6 +128,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 9115aa3e9..8e49bf5fa 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -250,5 +250,6 @@ DPDK_17.11 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.08;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v9 6/9] eal: auto detect iova mode
  2017-09-20 11:23               ` [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
                                   ` (4 preceding siblings ...)
  2017-09-20 11:23                 ` [PATCH v9 5/9] eal: introduce helper API for iova mode Santosh Shukla
@ 2017-09-20 11:23                 ` Santosh Shukla
  2017-10-06  0:19                   ` Thomas Monjalon
  2017-09-20 11:23                 ` [PATCH v9 7/9] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
                                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-09-20 11:23 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

For auto detection purpose:
* Below calls moved up in the eal initialization order:
	- eal_option_device_parse
	- rte_bus_scan

Based on the result of rte_bus_scan_iommu_class - select iova
mapping mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
v6 --> v7:
- Moved eal_option_device_parse() up in then order of eal init.
- Added run_once. (aaron suggestion).
- squashed v6 series patch no. [08/12] & [09/12] into one patch (Aaron
      comment)

 lib/librte_eal/bsdapp/eal/eal.c   | 27 ++++++++++++++++-----------
 lib/librte_eal/linuxapp/eal/eal.c | 27 ++++++++++++++++-----------
 2 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 07e72203f..f003f4c04 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -541,6 +541,22 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (eal_option_device_parse()) {
+		rte_errno = ENODEV;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
+
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			eal_hugepage_info_init() < 0) {
@@ -620,17 +636,6 @@ rte_eal_init(int argc, char **argv)
 		rte_config.master_lcore, thread_id, cpuset,
 		ret == 0 ? "" : "...");
 
-	if (eal_option_device_parse()) {
-		rte_errno = ENODEV;
-		return -1;
-	}
-
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index febbafdb3..f4901ffb6 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -798,6 +798,22 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (eal_option_device_parse()) {
+		rte_errno = ENODEV;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
+
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			internal_config.xen_dom0_support == 0 &&
@@ -895,17 +911,6 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (eal_option_device_parse()) {
-		rte_errno = ENODEV;
-		return -1;
-	}
-
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v9 7/9] linuxapp/eal_vfio: honor iova mode before mapping
  2017-09-20 11:23               ` [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
                                   ` (5 preceding siblings ...)
  2017-09-20 11:23                 ` [PATCH v9 6/9] eal: auto detect " Santosh Shukla
@ 2017-09-20 11:23                 ` Santosh Shukla
  2017-09-20 11:23                 ` [PATCH v9 8/9] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
                                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-09-20 11:23 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

Check iova mode and accordingly map iova to pa or va.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c8a97b7e7..b32cd09a2 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
 
 		ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
@@ -792,7 +795,10 @@ vfio_spapr_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
 				 VFIO_DMA_MAP_FLAG_WRITE;
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v9 8/9] linuxapp/eal_memory: honor iova mode in virt2phy
  2017-09-20 11:23               ` [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
                                   ` (6 preceding siblings ...)
  2017-09-20 11:23                 ` [PATCH v9 7/9] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
@ 2017-09-20 11:23                 ` Santosh Shukla
  2017-09-20 11:23                 ` [PATCH v9 9/9] eal/rte_malloc: " Santosh Shukla
                                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-09-20 11:23 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 52791282f..2d9d7c2dc 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -139,6 +139,9 @@ rte_mem_virt2phy(const void *virtaddr)
 	int page_size;
 	off_t offset;
 
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		return (uintptr_t)virtaddr;
+
 	/* when using dom0, /proc/self/pagemap always returns 0, check in
 	 * dpdk memory by browsing the memsegs */
 	if (rte_xen_dom0_supported()) {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v9 9/9] eal/rte_malloc: honor iova mode in virt2phy
  2017-09-20 11:23               ` [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
                                   ` (7 preceding siblings ...)
  2017-09-20 11:23                 ` [PATCH v9 8/9] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
@ 2017-09-20 11:23                 ` Santosh Shukla
  2017-09-26  4:02                 ` [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus santosh
  2017-10-06 11:03                 ` [PATCH v10 " Santosh Shukla
  10 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-09-20 11:23 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin,
	Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_eal/common/rte_malloc.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 5c0627bf4..d65c05a4d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -251,10 +251,17 @@ rte_malloc_set_limit(__rte_unused const char *type,
 phys_addr_t
 rte_malloc_virt2phy(const void *addr)
 {
+	phys_addr_t paddr;
 	const struct malloc_elem *elem = malloc_elem_from_data(addr);
 	if (elem == NULL)
 		return RTE_BAD_PHYS_ADDR;
 	if (elem->ms->phys_addr == RTE_BAD_PHYS_ADDR)
 		return RTE_BAD_PHYS_ADDR;
-	return elem->ms->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		paddr = (uintptr_t)addr;
+	else
+		paddr = elem->ms->phys_addr +
+			((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+	return paddr;
 }
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* Re: [PATCH v9 2/9] eal/pci: get iommu class
  2017-09-20 11:23                 ` [PATCH v9 2/9] eal/pci: get iommu class Santosh Shukla
@ 2017-09-20 11:39                   ` Burakov, Anatoly
  2017-10-05 23:58                   ` Thomas Monjalon
  1 sibling, 0 replies; 248+ messages in thread
From: Burakov, Anatoly @ 2017-09-20 11:39 UTC (permalink / raw)
  To: Santosh Shukla, dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin

On 20-Sep-17 12:23 PM, Santosh Shukla wrote:
> Introducing rte_pci_get_iommu_class API which helps to get iommu class
> of PCI device on the bus and returns preferred iova mapping mode for
> PCI bus.
> 
> Patch also adds rte_pci_get_iommu_class definition for:
> - bsdapp: api returns default iova mode.
> - linuxapp: Has stub implementation, Followup patch has complete
>    implementation.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> ---

Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus
  2017-09-20 11:23               ` [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
                                   ` (8 preceding siblings ...)
  2017-09-20 11:23                 ` [PATCH v9 9/9] eal/rte_malloc: " Santosh Shukla
@ 2017-09-26  4:02                 ` santosh
  2017-10-06 11:03                 ` [PATCH v10 " Santosh Shukla
  10 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-09-26  4:02 UTC (permalink / raw)
  To: dev
  Cc: olivier.matz, thomas, jerin.jacob, hemant.agrawal, aconole,
	stephen, anatoly.burakov, gaetan.rivet, shreyansh.jain,
	bruce.richardson, sergio.gonzalez.monroy, maxime.coquelin

Hi Thomas,


On Wednesday 20 September 2017 12:23 PM, Santosh Shukla wrote:
> v9:
> - Added Tested-By: to series.
> - Includes minor changes related to linuxapp api stub in [02/09]
>   (Suggested by Anatoly)
> - Series rebased on tip commit : aee62e90

imo, series is ready to merge, note that octeontx pmd needs this + other mempool series,
we need them in -rc1 release. Can you pl. plan to merge this series in -rc1?

Thanks. 

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v9 2/9] eal/pci: get iommu class
  2017-09-20 11:23                 ` [PATCH v9 2/9] eal/pci: get iommu class Santosh Shukla
  2017-09-20 11:39                   ` Burakov, Anatoly
@ 2017-10-05 23:58                   ` Thomas Monjalon
  2017-10-06  3:04                     ` santosh
  1 sibling, 1 reply; 248+ messages in thread
From: Thomas Monjalon @ 2017-10-05 23:58 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: dev, olivier.matz, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin

This patch is introducing a new abstraction.
It is important to explain it for future readers of this code.

20/09/2017 13:23, Santosh Shukla:
> +/**
> + * IOVA mapping mode.
> + */

Please explain what IOVA means and what is the purpose of
distinguish the different modes.

> +enum rte_iova_mode {
> +	RTE_IOVA_DC = 0,	/* Don't care mode */
> +	RTE_IOVA_PA = (1 << 0),
> +	RTE_IOVA_VA = (1 << 1)
> +};

You should explain each value of the enum.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v9 3/9] linuxapp/eal_pci: get iommu class
  2017-09-20 11:23                 ` [PATCH v9 3/9] linuxapp/eal_pci: " Santosh Shukla
@ 2017-10-06  0:17                   ` Thomas Monjalon
  2017-10-06  3:22                     ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Thomas Monjalon @ 2017-10-06  0:17 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: dev, olivier.matz, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin

20/09/2017 13:23, Santosh Shukla:
> +/** Device driver supports iova as va */
> +#define RTE_PCI_DRV_IOVA_AS_VA 0X0040

This flag name is surprizing and the comment does not help.
For the comment:
	"Device driver supports I/O virtual addressing" ?
For the flag:
	RTE_PCI_DRV_IOVA ?

[...]
>  /*
> - * Get iommu class of pci devices on the bus.

This line has been added in previous patch.
Please fix it earlier.

[...]
> +/*
> + * Any one of the device has iova as va
> + */
> +static inline int
> +pci_device_has_iova_va(void)

The name of this function does not suggest that it scans
every devices.

> +{
> +	struct rte_pci_device *dev = NULL;
> +	struct rte_pci_driver *drv = NULL;
> +
> +	FOREACH_DRIVER_ON_PCIBUS(drv) {
> +		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
> +			FOREACH_DEVICE_ON_PCIBUS(dev) {
> +				if (dev->kdrv == RTE_KDRV_VFIO &&
> +				    rte_pci_match(drv, dev))
> +					return 1;
> +			}

This is the reason of exporting the match function?
(note: match() is bus driver function, so it should not be exported)
Just because you get every devices without driver filtering?
There should be a better solution.
Please try to compare drv with dev->driver.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v9 6/9] eal: auto detect iova mode
  2017-09-20 11:23                 ` [PATCH v9 6/9] eal: auto detect " Santosh Shukla
@ 2017-10-06  0:19                   ` Thomas Monjalon
  2017-10-06  3:25                     ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Thomas Monjalon @ 2017-10-06  0:19 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: dev, olivier.matz, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin

20/09/2017 13:23, Santosh Shukla:
> For auto detection purpose:
> * Below calls moved up in the eal initialization order:
> 	- eal_option_device_parse
> 	- rte_bus_scan
> 
> Based on the result of rte_bus_scan_iommu_class - select iova
> mapping mode.

It does not explain why you need to move things up.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v9 2/9] eal/pci: get iommu class
  2017-10-05 23:58                   ` Thomas Monjalon
@ 2017-10-06  3:04                     ` santosh
  2017-10-06  7:24                       ` Thomas Monjalon
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-10-06  3:04 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, olivier.matz, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin

Thomas,

You comment is annoying and infuriating both.
Patch is their for more than 4month, had enough time for you to comment
and understand the topic. Thorough review and testing has happened both.

NOTE: You have already delayed this series by one release and
I'm guessing that you intent to push by one more, if you had such
mundane question then why not ask before? Make me think that you are
wasting my time and effort both.

On Friday 06 October 2017 05:28 AM, Thomas Monjalon wrote:

> This patch is introducing a new abstraction.
> It is important to explain it for future readers of this code.

If you don't know - What is iova? How to program iova?
purpose of iova then should read and educate your know - how first.

Yes, its is introducing new abstraction, because dpdk from
ancient days does only one programming mode aka iova=pa.

note:You were still using iova mode as _pa (and didn't care to ask yourself about IOVA!)
which is one of iova mode too!.

However, IOMMU can also generate _va address too called iova=_va mode..
which is also correct/viable/applicable/Okiesh programming mode
for iommu capable HW like dma for example(Note again,.. AGNOSTIC behavior of iommu).

Now Why dpdk needs to understand IOVA programming philosophy:

Though DPDK was _silenty_ using iova as pa mode but then there
is a need arise to make mapping mode explicit and for that we need
abstraction since there wasn't one existed.

Reason:
Because From last few years,.ONA participants like Cavium, nxp
added ARM arch support in dpdk and included drivers for their HW.. 
and their hw has use-case (example external mempool), such a way that
programming those HW in iova as va mode would save cycle in fast path
(this part, we explained so many-1000 time in series and same understood by reviewer)
thus its vital to introduce iova infra in dpdk.

Same applicable for intel HW blocks too. Its works for intel too!

> 20/09/2017 13:23, Santosh Shukla:
>> +/**
>> + * IOVA mapping mode.
>> + */
> Please explain what IOVA means and what is the purpose of
> distinguish the different modes.
>
IOVA mapping mode is device aka iommu programming mode by which
HW(iommu) will generate _pa or _va address accordingly. 

>> +enum rte_iova_mode {
>> +	RTE_IOVA_DC = 0,	/* Don't care mode */
>> +	RTE_IOVA_PA = (1 << 0),
>> +	RTE_IOVA_VA = (1 << 1)
>> +};
> You should explain each value of the enum.

Aren't naming choice for each member of enum is self-explanatory?
I don't find logic anymore in your question? are you asking about side commenting?
if not then IFAIU, you question is basically about what is _pa and _va? if so then
reader should have little know-how before they intent to do fast-path programming.
Author can't write whole IOMMU spec for reader sake. Those are minute and mundate info
incase any user want to program device in _pa or _va. I'm at loss with you question,
I don;t see logic and it is frustrating to me. You had enough time for all this
in case you had really cared,, we have series for external PMD and drivers waiting
for iova infra, I see it a your move nothing bu blocking ONA series progress
Don;t you trust Reviewer in case you have hard time understaing topic and that
makese me to ask - Are you willing to accept this feature or not? if not then
I'm wasting my energy on it.

Thanks.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v9 3/9] linuxapp/eal_pci: get iommu class
  2017-10-06  0:17                   ` Thomas Monjalon
@ 2017-10-06  3:22                     ` santosh
  2017-10-06  7:56                       ` Thomas Monjalon
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-10-06  3:22 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, olivier.matz, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin


On Friday 06 October 2017 05:47 AM, Thomas Monjalon wrote:
> 20/09/2017 13:23, Santosh Shukla:
>> +/** Device driver supports iova as va */
>> +#define RTE_PCI_DRV_IOVA_AS_VA 0X0040
> This flag name is surprizing and the comment does not help.
> For the comment:
> 	"Device driver supports I/O virtual addressing" ?
> For the flag:
> 	RTE_PCI_DRV_IOVA ?

Read [1].

V9 series went through evolution as a result of thorough review process.
That name kept like above is - "Not for FUN", its for reason and its purpose
to be explicit by saying that "driver need iova as va" mode. So as comment
aligned on top says so.

Aron suggested to remove [1] and squash into this patch and that I did.

Your proposition is incorrect, Should says IOVA_AS_VA explicitly!.
Request to follow work history, sorry I agains can't find you comment 
logical.

[1] http://dpdk.org/dev/patchwork/patch/27000/

> [...]
>>  /*
>> - * Get iommu class of pci devices on the bus.
> This line has been added in previous patch.
> Please fix it earlier.

What to fix? Be more explicit, can;t understand your comment.

> [...]
>> +/*
>> + * Any one of the device has iova as va
>> + */
>> +static inline int
>> +pci_device_has_iova_va(void)
> The name of this function does not suggest that it scans
> every devices.

Its not scanning, It search for kdrv match. You misunderstood.
disagree.

>> +{
>> +	struct rte_pci_device *dev = NULL;
>> +	struct rte_pci_driver *drv = NULL;
>> +
>> +	FOREACH_DRIVER_ON_PCIBUS(drv) {
>> +		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
>> +			FOREACH_DEVICE_ON_PCIBUS(dev) {
>> +				if (dev->kdrv == RTE_KDRV_VFIO &&
>> +				    rte_pci_match(drv, dev))
>> +					return 1;
>> +			}
> This is the reason of exporting the match function?
> (note: match() is bus driver function, so it should not be exported)
> Just because you get every devices without driver filtering?

I disagree, It is a bus function abstraction code w.r.t iommu class of device, 
in case you missed reading source code and Implementation is correct.
That needs exporting rte_pci_match(). Or else
write code and show your code snippet as illustration, I doubt that you really
understood this whole topic and its design theme.

Thanks.

> There should be a better solution.
> Please try to compare drv with dev->driver.
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v9 6/9] eal: auto detect iova mode
  2017-10-06  0:19                   ` Thomas Monjalon
@ 2017-10-06  3:25                     ` santosh
  2017-10-06  8:11                       ` Thomas Monjalon
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-10-06  3:25 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, olivier.matz, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin


On Friday 06 October 2017 05:49 AM, Thomas Monjalon wrote:
> 20/09/2017 13:23, Santosh Shukla:
>> For auto detection purpose:
>> * Below calls moved up in the eal initialization order:
>> 	- eal_option_device_parse
>> 	- rte_bus_scan
>>
>> Based on the result of rte_bus_scan_iommu_class - select iova
>> mapping mode.
> It does not explain why you need to move things up.

For that one should understand eal_init sequence first.
Should know about _option_device_parse and rte_bus_scan() dependency.

After that bus_scan is a need for _get_iommu_class() of api to know that
- kdrv is igb/uio/vfio etc.. That's why. Refer work history.
Again V9 series happened not for fun. I diagress on your comment.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v9 2/9] eal/pci: get iommu class
  2017-10-06  3:04                     ` santosh
@ 2017-10-06  7:24                       ` Thomas Monjalon
  2017-10-06  9:13                         ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Thomas Monjalon @ 2017-10-06  7:24 UTC (permalink / raw)
  To: santosh
  Cc: dev, olivier.matz, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin

06/10/2017 05:04, santosh:
> Thomas,
> 
> You comment is annoying and infuriating both.
> Patch is their for more than 4month, had enough time for you to comment
> and understand the topic. Thorough review and testing has happened both.
> 
> NOTE: You have already delayed this series by one release and
> I'm guessing that you intent to push by one more, if you had such
> mundane question then why not ask before? Make me think that you are
> wasting my time and effort both.

You misunderstand me.
My intent is to push this patch.
A lot of people have reviewed it during this cycle.
I was just looking for wording details in order to ease people
when they will see this abstraction in the code base.

> On Friday 06 October 2017 05:28 AM, Thomas Monjalon wrote:
> 
> > This patch is introducing a new abstraction.
> > It is important to explain it for future readers of this code.
> 
> If you don't know - What is iova? How to program iova?
> purpose of iova then should read and educate your know - how first.
> 
> Yes, its is introducing new abstraction, because dpdk from
> ancient days does only one programming mode aka iova=pa.
> 
> note:You were still using iova mode as _pa (and didn't care to ask yourself about IOVA!)
> which is one of iova mode too!.
> 
> However, IOMMU can also generate _va address too called iova=_va mode..
> which is also correct/viable/applicable/Okiesh programming mode
> for iommu capable HW like dma for example(Note again,.. AGNOSTIC behavior of iommu).
> 
> Now Why dpdk needs to understand IOVA programming philosophy:
> 
> Though DPDK was _silenty_ using iova as pa mode but then there
> is a need arise to make mapping mode explicit and for that we need
> abstraction since there wasn't one existed.
> 
> Reason:
> Because From last few years,.ONA participants like Cavium, nxp
> added ARM arch support in dpdk and included drivers for their HW.. 
> and their hw has use-case (example external mempool), such a way that
> programming those HW in iova as va mode would save cycle in fast path
> (this part, we explained so many-1000 time in series and same understood by reviewer)
> thus its vital to introduce iova infra in dpdk.
> 
> Same applicable for intel HW blocks too. Its works for intel too!

I know all of that!
I was just thinking that you could add more explanations somewhere
in the code or the doc.

> > 20/09/2017 13:23, Santosh Shukla:
> >> +/**
> >> + * IOVA mapping mode.
> >> + */
> > Please explain what IOVA means and what is the purpose of
> > distinguish the different modes.
> >
> IOVA mapping mode is device aka iommu programming mode by which
> HW(iommu) will generate _pa or _va address accordingly.

In this doxygen block, it would be the right place to explain how the
IOVA mode will impact the rest of DPDK.

> >> +enum rte_iova_mode {
> >> +	RTE_IOVA_DC = 0,	/* Don't care mode */
> >> +	RTE_IOVA_PA = (1 << 0),
> >> +	RTE_IOVA_VA = (1 << 1)
> >> +};
> > You should explain each value of the enum.
> 
> Aren't naming choice for each member of enum is self-explanatory?
> I don't find logic anymore in your question? are you asking about side commenting?
> if not then IFAIU, you question is basically about what is _pa and _va? if so then
> reader should have little know-how before they intent to do fast-path programming.
> Author can't write whole IOMMU spec for reader sake. Those are minute and mundate info
> incase any user want to program device in _pa or _va. I'm at loss with you question,
> I don;t see logic and it is frustrating to me. You had enough time for all this
> in case you had really cared,, we have series for external PMD and drivers waiting
> for iova infra, I see it a your move nothing bu blocking ONA series progress
> Don;t you trust Reviewer in case you have hard time understaing topic and that
> makese me to ask - Are you willing to accept this feature or not? if not then
> I'm wasting my energy on it.

Santosh, I'm sorry if you don't understand that I was just asking for
a bit more doc.
You could just add something like
	/* DMA using physical address */
	/* DMA using virtual address */

Anyway, if you don't want to add any explanation, it won't prevent
pushing this patch.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v9 3/9] linuxapp/eal_pci: get iommu class
  2017-10-06  3:22                     ` santosh
@ 2017-10-06  7:56                       ` Thomas Monjalon
  0 siblings, 0 replies; 248+ messages in thread
From: Thomas Monjalon @ 2017-10-06  7:56 UTC (permalink / raw)
  To: santosh
  Cc: dev, olivier.matz, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin

06/10/2017 05:22, santosh:
> 
> On Friday 06 October 2017 05:47 AM, Thomas Monjalon wrote:
> > 20/09/2017 13:23, Santosh Shukla:
> >> +/** Device driver supports iova as va */
> >> +#define RTE_PCI_DRV_IOVA_AS_VA 0X0040
> > This flag name is surprizing and the comment does not help.
> > For the comment:
> > 	"Device driver supports I/O virtual addressing" ?
> > For the flag:
> > 	RTE_PCI_DRV_IOVA ?
> 
> Read [1].
> 
> V9 series went through evolution as a result of thorough review process.
> That name kept like above is - "Not for FUN", its for reason and its purpose
> to be explicit by saying that "driver need iova as va" mode. So as comment
> aligned on top says so.
> 
> Aron suggested to remove [1] and squash into this patch and that I did.
> 
> Your proposition is incorrect, Should says IOVA_AS_VA explicitly!.
> Request to follow work history, sorry I agains can't find you comment 
> logical.

Yes my proposal is not good.

> [1] http://dpdk.org/dev/patchwork/patch/27000/
> 
> > [...]
> >>  /*
> >> - * Get iommu class of pci devices on the bus.
> > This line has been added in previous patch.
> > Please fix it earlier.
> 
> What to fix? Be more explicit, can;t understand your comment.

You make this change:
- * Get iommu class of pci devices on the bus.
+ * Get iommu class of PCI devices on the bus.

It is better to write squash this uppercase change in
previous commit where you introduce this comment.

> > [...]
> >> +/*
> >> + * Any one of the device has iova as va
> >> + */
> >> +static inline int
> >> +pci_device_has_iova_va(void)
> > The name of this function does not suggest that it scans
> > every devices.
> 
> Its not scanning, It search for kdrv match. You misunderstood.
> disagree.

Yes my wording is not understandable.
By "scanning", I mean interating on lists.

About the function name, it could be:
	pci_one_device_has_iova_va
It better shows that the function check every devices.

> >> +{
> >> +	struct rte_pci_device *dev = NULL;
> >> +	struct rte_pci_driver *drv = NULL;
> >> +
> >> +	FOREACH_DRIVER_ON_PCIBUS(drv) {
> >> +		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
> >> +			FOREACH_DEVICE_ON_PCIBUS(dev) {
> >> +				if (dev->kdrv == RTE_KDRV_VFIO &&
> >> +				    rte_pci_match(drv, dev))
> >> +					return 1;
> >> +			}
> > This is the reason of exporting the match function?
> > (note: match() is bus driver function, so it should not be exported)
> > Just because you get every devices without driver filtering?
> 
> I disagree, It is a bus function abstraction code w.r.t iommu class of device, 
> in case you missed reading source code and Implementation is correct.
> That needs exporting rte_pci_match(). Or else
> write code and show your code snippet as illustration, I doubt that you really
> understood this whole topic and its design theme.

OK, let's imagine I don't understand the whole topic.

> > There should be a better solution.
> > Please try to compare drv with dev->driver.

You could have answered that dev->driver is filled on probing
and you are doing the check before probing.

I don't want to continue this discussion.
We will rework which functions are exported when moving the PCI driver
out of EAL.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v9 6/9] eal: auto detect iova mode
  2017-10-06  3:25                     ` santosh
@ 2017-10-06  8:11                       ` Thomas Monjalon
  2017-10-06  9:11                         ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Thomas Monjalon @ 2017-10-06  8:11 UTC (permalink / raw)
  To: santosh
  Cc: dev, olivier.matz, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin

06/10/2017 05:25, santosh:
> 
> On Friday 06 October 2017 05:49 AM, Thomas Monjalon wrote:
> > 20/09/2017 13:23, Santosh Shukla:
> >> For auto detection purpose:
> >> * Below calls moved up in the eal initialization order:
> >> 	- eal_option_device_parse
> >> 	- rte_bus_scan
> >>
> >> Based on the result of rte_bus_scan_iommu_class - select iova
> >> mapping mode.
> > It does not explain why you need to move things up.
> 
> For that one should understand eal_init sequence first.
> Should know about _option_device_parse and rte_bus_scan() dependency.
> 
> After that bus_scan is a need for _get_iommu_class() of api to know that
> - kdrv is igb/uio/vfio etc.. That's why. Refer work history.
> Again V9 series happened not for fun. I diagress on your comment.

This is the basics of writing a commit log.
You have to explain why things are done.
You move things because of dependencies without explaining them.

And I'm pretty sure this move will cause big troubles.
For instance, have you tried shared library mode?

One more comment, you are considering only devices scanned at initialization.
What happens when a new device is plugged in?

I can push it as is, given there are some Reviewed-by and Tested-by.
I am trying to avoid you a revert of this patch when one will discover
some major bugs.
But I wonder whether it's worth given how you welcome it.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v9 6/9] eal: auto detect iova mode
  2017-10-06  8:11                       ` Thomas Monjalon
@ 2017-10-06  9:11                         ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-10-06  9:11 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, olivier.matz, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin


On Friday 06 October 2017 01:41 PM, Thomas Monjalon wrote:
> 06/10/2017 05:25, santosh:
>> On Friday 06 October 2017 05:49 AM, Thomas Monjalon wrote:
>>> 20/09/2017 13:23, Santosh Shukla:
>>>> For auto detection purpose:
>>>> * Below calls moved up in the eal initialization order:
>>>> 	- eal_option_device_parse
>>>> 	- rte_bus_scan
>>>>
>>>> Based on the result of rte_bus_scan_iommu_class - select iova
>>>> mapping mode.
>>> It does not explain why you need to move things up.
>> For that one should understand eal_init sequence first.
>> Should know about _option_device_parse and rte_bus_scan() dependency.
>>
>> After that bus_scan is a need for _get_iommu_class() of api to know that
>> - kdrv is igb/uio/vfio etc.. That's why. Refer work history.
>> Again V9 series happened not for fun. I diagress on your comment.
> This is the basics of writing a commit log.
> You have to explain why things are done.
> You move things because of dependencies without explaining them.

Agree, But if reader does reading from 0..5, by then he could understand
" auto detection purpose" reasoning.

Anyways, I'll add more context in patch summary in v10...sending..

> And I'm pretty sure this move will cause big troubles.
> For instance, have you tried shared library mode?

Its builds, also testpmd works.

> One more comment, you are considering only devices scanned at initialization.
> What happens when a new device is plugged in?

Should work.
in vfio mode: if PMDs(for that device) flag set to IOVA_AS_VA flag then newly
bound device will have iova=_va mapping mode.
Or else iova=_pa.

Thanks.

> I can push it as is, given there are some Reviewed-by and Tested-by.
> I am trying to avoid you a revert of this patch when one will discover
> some major bugs.
> But I wonder whether it's worth given how you welcome it.
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v9 2/9] eal/pci: get iommu class
  2017-10-06  7:24                       ` Thomas Monjalon
@ 2017-10-06  9:13                         ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-10-06  9:13 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, olivier.matz, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin


On Friday 06 October 2017 12:54 PM, Thomas Monjalon wrote:
> 06/10/2017 05:04, santosh:
>> Thomas,
>>
>> You comment is annoying and infuriating both.
>> Patch is their for more than 4month, had enough time for you to comment
>> and understand the topic. Thorough review and testing has happened both.
>>
>> NOTE: You have already delayed this series by one release and
>> I'm guessing that you intent to push by one more, if you had such
>> mundane question then why not ask before? Make me think that you are
>> wasting my time and effort both.
> You misunderstand me.
> My intent is to push this patch.
> A lot of people have reviewed it during this cycle.
> I was just looking for wording details in order to ease people
> when they will see this abstraction in the code base.
>
>> On Friday 06 October 2017 05:28 AM, Thomas Monjalon wrote:
>>
>>> This patch is introducing a new abstraction.
>>> It is important to explain it for future readers of this code.
>> If you don't know - What is iova? How to program iova?
>> purpose of iova then should read and educate your know - how first.
>>
>> Yes, its is introducing new abstraction, because dpdk from
>> ancient days does only one programming mode aka iova=pa.
>>
>> note:You were still using iova mode as _pa (and didn't care to ask yourself about IOVA!)
>> which is one of iova mode too!.
>>
>> However, IOMMU can also generate _va address too called iova=_va mode..
>> which is also correct/viable/applicable/Okiesh programming mode
>> for iommu capable HW like dma for example(Note again,.. AGNOSTIC behavior of iommu).
>>
>> Now Why dpdk needs to understand IOVA programming philosophy:
>>
>> Though DPDK was _silenty_ using iova as pa mode but then there
>> is a need arise to make mapping mode explicit and for that we need
>> abstraction since there wasn't one existed.
>>
>> Reason:
>> Because From last few years,.ONA participants like Cavium, nxp
>> added ARM arch support in dpdk and included drivers for their HW.. 
>> and their hw has use-case (example external mempool), such a way that
>> programming those HW in iova as va mode would save cycle in fast path
>> (this part, we explained so many-1000 time in series and same understood by reviewer)
>> thus its vital to introduce iova infra in dpdk.
>>
>> Same applicable for intel HW blocks too. Its works for intel too!
> I know all of that!
> I was just thinking that you could add more explanations somewhere
> in the code or the doc.
>
>>> 20/09/2017 13:23, Santosh Shukla:
>>>> +/**
>>>> + * IOVA mapping mode.
>>>> + */
>>> Please explain what IOVA means and what is the purpose of
>>> distinguish the different modes.
>>>
>> IOVA mapping mode is device aka iommu programming mode by which
>> HW(iommu) will generate _pa or _va address accordingly.

sending v10 with doc changes.

> In this doxygen block, it would be the right place to explain how the
> IOVA mode will impact the rest of DPDK.
>
>>>> +enum rte_iova_mode {
>>>> +	RTE_IOVA_DC = 0,	/* Don't care mode */
>>>> +	RTE_IOVA_PA = (1 << 0),
>>>> +	RTE_IOVA_VA = (1 << 1)
>>>> +};
>>> You should explain each value of the enum.
>> Aren't naming choice for each member of enum is self-explanatory?
>> I don't find logic anymore in your question? are you asking about side commenting?
>> if not then IFAIU, you question is basically about what is _pa and _va? if so then
>> reader should have little know-how before they intent to do fast-path programming.
>> Author can't write whole IOMMU spec for reader sake. Those are minute and mundate info
>> incase any user want to program device in _pa or _va. I'm at loss with you question,
>> I don;t see logic and it is frustrating to me. You had enough time for all this
>> in case you had really cared,, we have series for external PMD and drivers waiting
>> for iova infra, I see it a your move nothing bu blocking ONA series progress
>> Don;t you trust Reviewer in case you have hard time understaing topic and that
>> makese me to ask - Are you willing to accept this feature or not? if not then
>> I'm wasting my energy on it.
> Santosh, I'm sorry if you don't understand that I was just asking for
> a bit more doc.
> You could just add something like
> 	/* DMA using physical address */
> 	/* DMA using virtual address */

in v10.

Thanks.

> Anyway, if you don't want to add any explanation, it won't prevent
> pushing this patch.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v10 0/9] Infrastructure to detect iova mapping on the bus
  2017-09-20 11:23               ` [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
                                   ` (9 preceding siblings ...)
  2017-09-26  4:02                 ` [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus santosh
@ 2017-10-06 11:03                 ` Santosh Shukla
  2017-10-06 11:03                   ` [PATCH v10 1/9] eal/pci: export match function Santosh Shukla
                                     ` (9 more replies)
  10 siblings, 10 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-10-06 11:03 UTC (permalink / raw)
  To: olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin, Santosh Shukla

v10:
- Added doxygen specific comment for iova mapping mode in patch [2/09]
  (Suggested by Olivier)
- Added pci_one_ for pci_device_has_iova_va and other api for patch [3/9]
  (Suggested by Olivier)
- Added More verbose description in patch summary for patch [6/09]
  (Suggested by Olivier)


v9:
- Added Tested-By: to series.
- Includes minor changes related to linuxapp api stub in [02/09]
  (Suggested by Anatoly)
- Series rebased on tip commit : aee62e90

v8:
Includes minor review changes per v7 review comment from Anatoly.
Patches rebased on Tip commit:3d2e0448eb.

v7:
Includes no major change, minor change detailing:
- patch sqashing (Aaron suggestion)
- added run_once for device_parse() and bus_scan() in eal init
    (Aaron suggestion)
- Moved rte_eal_device_parse() up in eal initialization order.
- Patches rebased on top of version: 17.11-rc0
For v6 info refer [11].

v6:
Sending v5 series rebased on top of version: 17.11-rc0.

v5:
Introducing RTE_PCI_DRV_IOVA_AS_VA flag for autodetection of iova va
mapping.
If a PCI driver demand for IOVA as VA scheme then the driver can add
it in the
PCI driver registration function.

Algorithm to select IOVA as VA for PCI bus case:
     0. If no device bound then return with RTE_IOVA_DC mapping mode,
     else goto 1).
     1. Look for device attached to vfio kdrv and has .drv_flag set
     to RTE_PCI_DRV_IOVA_AS_VA.
     2. Look for any device attached to UIO class of driver.
     3. Check for vfio-noiommu mode enabled.

     If 2) & 3) is false and 1) is true then select
     mapping scheme as RTE_IOVA_VA. Otherwise use default
     mapping scheme (RTE_IOVA_PA).

That way, Bus can truly autodetect the iova mapping mode for
a device Or a set of the device.

Change History:
v9 --> v10:
- Refer top description.

v8 --> v9:
- Added Tested-by: signature of Hemant.
- Added linuxapp stub api definition in [02/09] (Suggested by Amatoly)

v7 --> v8:
- Replace 0 / 1 with true/false boolean values (Suggested by Anatoly).

v6 --> v7:
- Patches squashed per v6.
- Added run_once in eal per v6.
- Moved rte_eal_device_parse() up in eal init oder.

v5 --> v6:
- Added api info in eal's versiom.map (release DPDK_v17.11).

v4 --> v5:
- Change DPDK_17.08 to DPDK_17.11 in _version.map.
- Reworded bus api description (suggested by Hemant).
- Added reviewed-by from Maxime in v5.
- Added acked-by from Hemant for pci and bus patches.

v3 --> v4:
- Re-introduced RTE_IOVA_DEC mode (Suggested by Hemant [5]).
- Renamed flag to RTE_PCI_DRV_IOVA_AS_VA (Suggested by Maxime).
- Reworded WARNING message(suggested by Maxime[7]).
- Created a separate patch for rte_pci_get_iommu_class (suggested by
Maxime[]).
- Added VFIO_PRESENT ifdef build fix.

v2 --> v3:
- Removed rte_mempool_virt2phy (suggested by Olivier [4])

v1 --> v2:
- Removed override eal option i.e. (--iova-mode=<>) Because we have
means to
   truly autodetect the iova mode.
- Introduced RTE_PCI_DRV_NEED_IOVA_VA drv_flag (Suggested by Maxime [3]).
- Using NEED_IOVA_VA drv_flag in autodetection logic.
- Removed Linux version check macro in vfio code, As per Maxime feedback.
- Moved rte_pci_match API from local to global.

Patch Summary:
1) 1nd: declare rte_pci_match api in pci header. Required for
autodetection in
follow up patches.
2) 2nd - 3rd - 4th : autodetection mapping infrastructure for
Linux/bsdapp.
3) 5th: iova mode helper API.
4) 6th: Infra to detect iova mode.
5) 7th: make vfio mapping iova aware.
6) 8th - 9th : Check for IOVA_VA mode in below APIs
         - rte_mem_virt2phy
         - rte_malloc_virt2phy

Test History:
- Tested for x86/XL710 40G NIC card for both modes (iova_va/pa).
- Tested for arm64/thunderx vNIC Integrated NIC for both modes
- Tested for arm64/Octeontx integrated NICs for only
   Iova_va mode(It supports only one mode.)
- Ran standalone tests like mempool_autotest, mbuf_autotest.
- Verified for Doxygen.

Work History:
For v1, Refer [1].
For v2, Refer [2].
For v3, Refer [9].
For v4, refer [10].
for v6, refer [11].

Checkpatch result:
* None 

Thanks.,
[1] https://www.mail-archive.com/dev@dpdk.org/msg67438.html
[2] https://www.mail-archive.com/dev@dpdk.org/msg70674.html
[3] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
[4] https://www.mail-archive.com/dev@dpdk.org/msg70692.html
[5] http://dpdk.org/ml/archives/dev/2017-July/071282.html
[6] http://dpdk.org/ml/archives/dev/2017-July/070951.html
[7] http://dpdk.org/ml/archives/dev/2017-July/070941.html
[8] http://dpdk.org/ml/archives/dev/2017-July/070952.html
[9] http://dpdk.org/ml/archives/dev/2017-July/070918.html
[10] http://dpdk.org/ml/archives/dev/2017-July/071754.html
[11] http://dpdk.org/ml/archives/dev/2017-August/072871.html



Santosh Shukla (9):
  eal/pci: export match function
  eal/pci: get iommu class
  linuxapp/eal_pci: get iommu class
  bus: get iommu class
  eal: introduce helper API for iova mode
  eal: auto detect iova mode
  linuxapp/eal_vfio: honor iova mode before mapping
  linuxapp/eal_memory: honor iova mode in virt2phy
  eal/rte_malloc: honor iova mode in virt2phy

 lib/librte_eal/bsdapp/eal/eal.c                 | 33 ++++++---
 lib/librte_eal/bsdapp/eal/eal_pci.c             | 10 +++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   | 10 +++
 lib/librte_eal/common/eal_common_bus.c          | 23 ++++++
 lib/librte_eal/common/eal_common_pci.c          | 11 +--
 lib/librte_eal/common/include/rte_bus.h         | 40 +++++++++++
 lib/librte_eal/common/include/rte_eal.h         | 12 ++++
 lib/librte_eal/common/include/rte_pci.h         | 28 ++++++++
 lib/librte_eal/common/rte_malloc.c              |  9 ++-
 lib/librte_eal/linuxapp/eal/eal.c               | 33 ++++++---
 lib/librte_eal/linuxapp/eal/eal_memory.c        |  3 +
 lib/librte_eal/linuxapp/eal/eal_pci.c           | 96 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c          | 29 +++++++-
 lib/librte_eal/linuxapp/eal/eal_vfio.h          |  4 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map | 10 +++
 15 files changed, 317 insertions(+), 34 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 248+ messages in thread

* [PATCH v10 1/9] eal/pci: export match function
  2017-10-06 11:03                 ` [PATCH v10 " Santosh Shukla
@ 2017-10-06 11:03                   ` Santosh Shukla
  2017-10-06 11:03                   ` [PATCH v10 2/9] eal/pci: get iommu class Santosh Shukla
                                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-10-06 11:03 UTC (permalink / raw)
  To: olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin, Santosh Shukla

Export rte_pci_match() function as it needed in the followup patch.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  7 +++++++
 lib/librte_eal/common/eal_common_pci.c          | 10 +---------
 lib/librte_eal/common/include/rte_pci.h         | 15 +++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +++++++
 4 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 47a09ea7f..cfbf8fbd0 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -238,3 +238,10 @@ EXPERIMENTAL {
 	rte_service_start_with_defaults;
 
 } DPDK_17.08;
+
+DPDK_17.11 {
+	global:
+
+	rte_pci_match;
+
+} DPDK_17.08;
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 52fd38cdd..3b7d0a0ee 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -150,16 +150,8 @@ pci_unmap_resource(void *requested_addr, size_t size)
 
 /*
  * Match the PCI Driver and Device using the ID Table
- *
- * @param pci_drv
- *	PCI driver from which ID table would be extracted
- * @param pci_dev
- *	PCI device to match against the driver
- * @return
- *	1 for successful match
- *	0 for unsuccessful match
  */
-static int
+int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev)
 {
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 8b123391c..eab84c7a4 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -366,6 +366,21 @@ int rte_pci_scan(void);
 int
 rte_pci_probe(void);
 
+/*
+ * Match the PCI Driver and Device using the ID Table
+ *
+ * @param pci_drv
+ *      PCI driver from which ID table would be extracted
+ * @param pci_dev
+ *      PCI device to match against the driver
+ * @return
+ *      1 for successful match
+ *      0 for unsuccessful match
+ */
+int
+rte_pci_match(const struct rte_pci_driver *pci_drv,
+	      const struct rte_pci_device *pci_dev);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 8c08b8d1e..287cc75cd 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -243,3 +243,10 @@ EXPERIMENTAL {
 	rte_service_start_with_defaults;
 
 } DPDK_17.08;
+
+DPDK_17.11 {
+	global:
+
+	rte_pci_match;
+
+} DPDK_17.08;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v10 2/9] eal/pci: get iommu class
  2017-10-06 11:03                 ` [PATCH v10 " Santosh Shukla
  2017-10-06 11:03                   ` [PATCH v10 1/9] eal/pci: export match function Santosh Shukla
@ 2017-10-06 11:03                   ` Santosh Shukla
  2017-10-06 11:03                   ` [PATCH v10 3/9] linuxapp/eal_pci: " Santosh Shukla
                                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-10-06 11:03 UTC (permalink / raw)
  To: olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin, Santosh Shukla

Introducing rte_pci_get_iommu_class API which helps to get iommu class
of PCI device on the bus and returns preferred iova mapping mode for
PCI bus.

Patch also adds rte_pci_get_iommu_class definition for:
- bsdapp: api returns default iova mode.
- linuxapp: Has stub implementation, Followup patch has complete
  implementation.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/bsdapp/eal/eal_pci.c             | 10 ++++++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/include/rte_bus.h         | 15 +++++++++++++++
 lib/librte_eal/common/include/rte_pci.h         | 11 +++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci.c           |  9 +++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 6 files changed, 47 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 04eacdcc7..e2c252320 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -403,6 +403,16 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Get iommu class of pci devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	/* Supports only RTE_KDRV_NIC_UIO */
+	return RTE_IOVA_PA;
+}
+
 int
 pci_update_device(const struct rte_pci_addr *addr)
 {
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index cfbf8fbd0..c6ffd9399 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -243,5 +243,6 @@ DPDK_17.11 {
 	global:
 
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 8f8b09954..e59c21659 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -55,6 +55,21 @@ extern "C" {
 /** Double linked list of buses */
 TAILQ_HEAD(rte_bus_list, rte_bus);
 
+
+/**
+ * IOVA mapping mode.
+ *
+ * IOVA mapping mode is iommu programming mode of a device.
+ * That device(for example: iommu backed dma device) based
+ * on rte_iova_mode will generate physical or virtual address.
+ *
+ */
+enum rte_iova_mode {
+	RTE_IOVA_DC = 0,	/* Don't care mode */
+	RTE_IOVA_PA = (1 << 0), /* DMA using physical address */
+	RTE_IOVA_VA = (1 << 1)  /* DMA using virtual address */
+};
+
 /**
  * Bus specific scan for devices attached on the bus.
  * For each bus object, the scan would be responsible for finding devices and
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index eab84c7a4..0e36de093 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -381,6 +381,17 @@ int
 rte_pci_match(const struct rte_pci_driver *pci_drv,
 	      const struct rte_pci_device *pci_dev);
 
+
+/**
+ * Get iommu class of PCI devices on the bus.
+ * And return their preferred iova mapping mode.
+ *
+ * @return
+ *   - enum rte_iova_mode.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void);
+
 /**
  * Map the PCI device resources in user space virtual memory address
  *
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 8951ce742..26f2be822 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -487,6 +487,15 @@ rte_pci_scan(void)
 	return -1;
 }
 
+/*
+ * Get iommu class of pci devices on the bus.
+ */
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
+{
+	return RTE_IOVA_PA;
+}
+
 /* Read PCI config space. */
 int rte_pci_read_config(const struct rte_pci_device *device,
 		void *buf, size_t len, off_t offset)
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 287cc75cd..a8c8ea4f4 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -248,5 +248,6 @@ DPDK_17.11 {
 	global:
 
 	rte_pci_match;
+	rte_pci_get_iommu_class;
 
 } DPDK_17.08;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v10 3/9] linuxapp/eal_pci: get iommu class
  2017-10-06 11:03                 ` [PATCH v10 " Santosh Shukla
  2017-10-06 11:03                   ` [PATCH v10 1/9] eal/pci: export match function Santosh Shukla
  2017-10-06 11:03                   ` [PATCH v10 2/9] eal/pci: get iommu class Santosh Shukla
@ 2017-10-06 11:03                   ` Santosh Shukla
  2017-10-11  1:47                     ` Tan, Jianfeng
  2017-10-06 11:03                   ` [PATCH v10 4/9] bus: " Santosh Shukla
                                     ` (6 subsequent siblings)
  9 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-10-06 11:03 UTC (permalink / raw)
  To: olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin, Santosh Shukla

Get iommu class of PCI device on the bus and returns preferred iova
mapping mode for that bus.

Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
Flag used when driver needs to operate in iova=va mode.

Algorithm for iova scheme selection for PCI bus:
0. If no device bound then return with RTE_IOVA_DC mapping mode,
else goto 1).
1. Look for device attached to vfio kdrv and has .drv_flag set
to RTE_PCI_DRV_IOVA_AS_VA.
2. Look for any device attached to UIO class of driver.
3. Check for vfio-noiommu mode enabled.

If 2) & 3) is false and 1) is true then select
mapping scheme as RTE_IOVA_VA. Otherwise use default
mapping scheme (RTE_IOVA_PA).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_eal/common/include/rte_pci.h |  2 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 89 ++++++++++++++++++++++++++++++++-
 lib/librte_eal/linuxapp/eal/eal_vfio.c  | 19 +++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.h  |  4 ++
 4 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 0e36de093..a67d77f22 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -202,6 +202,8 @@ struct rte_pci_bus {
 #define RTE_PCI_DRV_INTR_RMV 0x0010
 /** Device driver needs to keep mapped resources if unsupported dev detected */
 #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
+/** Device driver supports iova as va */
+#define RTE_PCI_DRV_IOVA_AS_VA 0X0040
 
 /**
  * A structure describing a PCI mapping.
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 26f2be822..b4dbf953a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -45,6 +45,7 @@
 #include "eal_filesystem.h"
 #include "eal_private.h"
 #include "eal_pci_init.h"
+#include "eal_vfio.h"
 
 /**
  * @file
@@ -488,11 +489,97 @@ rte_pci_scan(void)
 }
 
 /*
- * Get iommu class of pci devices on the bus.
+ * Is pci device bound to any kdrv
+ */
+static inline int
+pci_one_device_is_bound(void)
+{
+	struct rte_pci_device *dev = NULL;
+	int ret = 0;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_UNKNOWN ||
+		    dev->kdrv == RTE_KDRV_NONE) {
+			continue;
+		} else {
+			ret = 1;
+			break;
+		}
+	}
+	return ret;
+}
+
+/*
+ * Any one of the device bound to uio
+ */
+static inline int
+pci_one_device_bound_uio(void)
+{
+	struct rte_pci_device *dev = NULL;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (dev->kdrv == RTE_KDRV_IGB_UIO ||
+		   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
+			return 1;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Any one of the device has iova as va
+ */
+static inline int
+pci_one_device_has_iova_va(void)
+{
+	struct rte_pci_device *dev = NULL;
+	struct rte_pci_driver *drv = NULL;
+
+	FOREACH_DRIVER_ON_PCIBUS(drv) {
+		if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
+			FOREACH_DEVICE_ON_PCIBUS(dev) {
+				if (dev->kdrv == RTE_KDRV_VFIO &&
+				    rte_pci_match(drv, dev))
+					return 1;
+			}
+		}
+	}
+	return 0;
+}
+
+/*
+ * Get iommu class of PCI devices on the bus.
  */
 enum rte_iova_mode
 rte_pci_get_iommu_class(void)
 {
+	bool is_bound;
+	bool is_vfio_noiommu_enabled = true;
+	bool has_iova_va;
+	bool is_bound_uio;
+
+	is_bound = pci_one_device_is_bound();
+	if (!is_bound)
+		return RTE_IOVA_DC;
+
+	has_iova_va = pci_one_device_has_iova_va();
+	is_bound_uio = pci_one_device_bound_uio();
+#ifdef VFIO_PRESENT
+	is_vfio_noiommu_enabled = vfio_noiommu_is_enabled() == true ?
+					true : false;
+#endif
+
+	if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled)
+		return RTE_IOVA_VA;
+
+	if (has_iova_va) {
+		RTE_LOG(WARNING, EAL, "Some devices want iova as va but pa will be used because.. ");
+		if (is_vfio_noiommu_enabled)
+			RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
+		if (is_bound_uio)
+			RTE_LOG(WARNING, EAL, "few device bound to UIO\n");
+	}
+
 	return RTE_IOVA_PA;
 }
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 946df7e31..c8a97b7e7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -816,4 +816,23 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
 	return 0;
 }
 
+int
+vfio_noiommu_is_enabled(void)
+{
+	int fd, ret, cnt __rte_unused;
+	char c;
+
+	ret = -1;
+	fd = open(VFIO_NOIOMMU_MODE, O_RDONLY);
+	if (fd < 0)
+		return -1;
+
+	cnt = read(fd, &c, 1);
+	if (c == 'Y')
+		ret = 1;
+
+	close(fd);
+	return ret;
+}
+
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index 5ff63e5d7..26ea8e119 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -150,6 +150,8 @@ struct vfio_config {
 #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
 #define VFIO_GET_REGION_IDX(x) (x >> 40)
+#define VFIO_NOIOMMU_MODE      \
+	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
 
 /* DMA mapping function prototype.
  * Takes VFIO container fd as a parameter.
@@ -210,6 +212,8 @@ int pci_vfio_is_enabled(void);
 
 int vfio_mp_sync_setup(void);
 
+int vfio_noiommu_is_enabled(void);
+
 #define SOCKET_REQ_CONTAINER 0x100
 #define SOCKET_REQ_GROUP 0x200
 #define SOCKET_CLR_GROUP 0x300
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v10 4/9] bus: get iommu class
  2017-10-06 11:03                 ` [PATCH v10 " Santosh Shukla
                                     ` (2 preceding siblings ...)
  2017-10-06 11:03                   ` [PATCH v10 3/9] linuxapp/eal_pci: " Santosh Shukla
@ 2017-10-06 11:03                   ` Santosh Shukla
  2017-10-06 11:03                   ` [PATCH v10 5/9] eal: introduce helper API for iova mode Santosh Shukla
                                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-10-06 11:03 UTC (permalink / raw)
  To: olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin, Santosh Shukla

API(rte_bus_get_iommu_class) helps to automatically detect and select
appropriate iova mapping scheme for iommu capable device on that bus.

Algorithm for iova scheme selection for bus:
0. Iterate through bus_list.
1. Collect each bus iova mode value and update into 'mode' var.
2. Mode selection scheme is:
if mode == 0 then iova mode is _pa,
if mode == 1 then iova mode is _pa,
if mode == 2 then iova mode is _va,
if mode == 3 then iova mode ia _pa.

So mode !=2  will be default iova mode (_pa).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/eal_common_bus.c          | 23 +++++++++++++++++++++++
 lib/librte_eal/common/eal_common_pci.c          |  1 +
 lib/librte_eal/common/include/rte_bus.h         | 25 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 51 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index c6ffd9399..3466eaf20 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -244,5 +244,6 @@ DPDK_17.11 {
 
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 08bec2d93..a30a8982e 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -222,3 +222,26 @@ rte_bus_find_by_device_name(const char *str)
 		c[0] = '\0';
 	return rte_bus_find(NULL, bus_can_parse, name);
 }
+
+
+/*
+ * Get iommu class of devices on the bus.
+ */
+enum rte_iova_mode
+rte_bus_get_iommu_class(void)
+{
+	int mode = RTE_IOVA_DC;
+	struct rte_bus *bus;
+
+	TAILQ_FOREACH(bus, &rte_bus_list, next) {
+
+		if (bus->get_iommu_class)
+			mode |= bus->get_iommu_class();
+	}
+
+	if (mode != RTE_IOVA_VA) {
+		/* Use default IOVA mode */
+		mode = RTE_IOVA_PA;
+	}
+	return mode;
+}
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 3b7d0a0ee..0f0e4b93b 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -564,6 +564,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.plug = pci_plug,
 		.unplug = pci_unplug,
 		.parse = pci_parse,
+		.get_iommu_class = rte_pci_get_iommu_class,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index e59c21659..3a5891595 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -183,6 +183,20 @@ struct rte_bus_conf {
 	enum rte_bus_scan_mode scan_mode; /**< Scan policy. */
 };
 
+
+/**
+ * Get common iommu class of the all the devices on the bus. The bus may
+ * check that those devices are attached to iommu driver.
+ * If no devices are attached to the bus. The bus may return with don't care
+ * (_DC) value.
+ * Otherwise, The bus will return appropriate _pa or _va iova mode.
+ *
+ * @return
+ *      enum rte_iova_mode value.
+ */
+typedef enum rte_iova_mode (*rte_bus_get_iommu_class_t)(void);
+
+
 /**
  * A structure describing a generic bus.
  */
@@ -196,6 +210,7 @@ struct rte_bus {
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	struct rte_bus_conf conf;    /**< Bus configuration */
+	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
 
 /**
@@ -295,6 +310,16 @@ struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
  */
 struct rte_bus *rte_bus_find_by_name(const char *busname);
 
+
+/**
+ * Get the common iommu class of devices bound on to buses available in the
+ * system. The default mode is PA.
+ *
+ * @return
+ *     enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_bus_get_iommu_class(void);
+
 /**
  * Helper for Bus registration.
  * The constructor has higher priority than PMD constructors.
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index a8c8ea4f4..9115aa3e9 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -249,5 +249,6 @@ DPDK_17.11 {
 
 	rte_pci_match;
 	rte_pci_get_iommu_class;
+	rte_bus_get_iommu_class;
 
 } DPDK_17.08;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v10 5/9] eal: introduce helper API for iova mode
  2017-10-06 11:03                 ` [PATCH v10 " Santosh Shukla
                                     ` (3 preceding siblings ...)
  2017-10-06 11:03                   ` [PATCH v10 4/9] bus: " Santosh Shukla
@ 2017-10-06 11:03                   ` Santosh Shukla
  2017-10-06 11:03                   ` [PATCH v10 6/9] eal: auto detect " Santosh Shukla
                                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-10-06 11:03 UTC (permalink / raw)
  To: olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin, Santosh Shukla

Introducing rte_eal_iova_mode() helper API. This API
used by non-eal library for detecting iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_eal/bsdapp/eal/eal.c                 |  6 ++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/include/rte_eal.h         | 12 ++++++++++++
 lib/librte_eal/linuxapp/eal/eal.c               |  6 ++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 5 files changed, 26 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 5fa598842..07e72203f 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -119,6 +119,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 3466eaf20..6bed74dff 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -245,5 +245,6 @@ DPDK_17.11 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 559d2308e..436094d24 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -45,6 +45,7 @@
 
 #include <rte_per_lcore.h>
 #include <rte_config.h>
+#include <rte_bus.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -87,6 +88,9 @@ struct rte_config {
 	/** Primary or secondary configuration */
 	enum rte_proc_type_t process_type;
 
+	/** PA or VA mapping mode */
+	enum rte_iova_mode iova_mode;
+
 	/**
 	 * Pointer to memory configuration, which may be shared across multiple
 	 * DPDK instances
@@ -287,6 +291,14 @@ static inline int rte_gettid(void)
 	return RTE_PER_LCORE(_thread_id);
 }
 
+/**
+ * Get the iova mode
+ *
+ * @return
+ *   enum rte_iova_mode value.
+ */
+enum rte_iova_mode rte_eal_iova_mode(void);
+
 /**
  * Run function before main() with low priority.
  *
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 48f12f44c..febbafdb3 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -128,6 +128,12 @@ rte_eal_get_configuration(void)
 	return &rte_config;
 }
 
+enum rte_iova_mode
+rte_eal_iova_mode(void)
+{
+	return rte_eal_get_configuration()->iova_mode;
+}
+
 /* parse a sysfs (or other) file containing one integer value */
 int
 eal_parse_sysfs_value(const char *filename, unsigned long *val)
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 9115aa3e9..8e49bf5fa 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -250,5 +250,6 @@ DPDK_17.11 {
 	rte_pci_match;
 	rte_pci_get_iommu_class;
 	rte_bus_get_iommu_class;
+	rte_eal_iova_mode;
 
 } DPDK_17.08;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v10 6/9] eal: auto detect iova mode
  2017-10-06 11:03                 ` [PATCH v10 " Santosh Shukla
                                     ` (4 preceding siblings ...)
  2017-10-06 11:03                   ` [PATCH v10 5/9] eal: introduce helper API for iova mode Santosh Shukla
@ 2017-10-06 11:03                   ` Santosh Shukla
  2017-10-13  8:48                     ` Maxime Coquelin
  2017-10-06 11:03                   ` [PATCH v10 7/9] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
                                     ` (3 subsequent siblings)
  9 siblings, 1 reply; 248+ messages in thread
From: Santosh Shukla @ 2017-10-06 11:03 UTC (permalink / raw)
  To: olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin, Santosh Shukla

iova autodetection depends on rte_bus_scan result. Result of bus scan will
have updated device_list and each device in that list has its '.kdev' state
updated. That kdrv state used to detect iova mapping mode for that device.

_device_parse() has dependency on rt_bus_scan so,
Below calls moved up in the eal initialization order:
	- eal_option_device_parse
	- rte_bus_scan

And based on the result of rte_bus_scan_iommu_class - select iova
mapping mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_eal/bsdapp/eal/eal.c   | 27 ++++++++++++++++-----------
 lib/librte_eal/linuxapp/eal/eal.c | 27 ++++++++++++++++-----------
 2 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 07e72203f..f003f4c04 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -541,6 +541,22 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (eal_option_device_parse()) {
+		rte_errno = ENODEV;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
+
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			eal_hugepage_info_init() < 0) {
@@ -620,17 +636,6 @@ rte_eal_init(int argc, char **argv)
 		rte_config.master_lcore, thread_id, cpuset,
 		ret == 0 ? "" : "...");
 
-	if (eal_option_device_parse()) {
-		rte_errno = ENODEV;
-		return -1;
-	}
-
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index febbafdb3..f4901ffb6 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -798,6 +798,22 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (eal_option_device_parse()) {
+		rte_errno = ENODEV;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
+
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
+
+	/* autodetect the iova mapping mode (default is iova_pa) */
+	rte_eal_get_configuration()->iova_mode = rte_bus_get_iommu_class();
+
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			internal_config.xen_dom0_support == 0 &&
@@ -895,17 +911,6 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (eal_option_device_parse()) {
-		rte_errno = ENODEV;
-		return -1;
-	}
-
-	if (rte_bus_scan()) {
-		rte_eal_init_alert("Cannot scan the buses for devices\n");
-		rte_errno = ENODEV;
-		return -1;
-	}
-
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
 		/*
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v10 7/9] linuxapp/eal_vfio: honor iova mode before mapping
  2017-10-06 11:03                 ` [PATCH v10 " Santosh Shukla
                                     ` (5 preceding siblings ...)
  2017-10-06 11:03                   ` [PATCH v10 6/9] eal: auto detect " Santosh Shukla
@ 2017-10-06 11:03                   ` Santosh Shukla
  2017-10-06 11:03                   ` [PATCH v10 8/9] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
                                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-10-06 11:03 UTC (permalink / raw)
  To: olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin, Santosh Shukla

Check iova mode and accordingly map iova to pa or va.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c8a97b7e7..b32cd09a2 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
 
 		ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
@@ -792,7 +795,10 @@ vfio_spapr_dma_map(int vfio_container_fd)
 		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
 		dma_map.vaddr = ms[i].addr_64;
 		dma_map.size = ms[i].len;
-		dma_map.iova = ms[i].phys_addr;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			dma_map.iova = dma_map.vaddr;
+		else
+			dma_map.iova = ms[i].phys_addr;
 		dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
 				 VFIO_DMA_MAP_FLAG_WRITE;
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v10 8/9] linuxapp/eal_memory: honor iova mode in virt2phy
  2017-10-06 11:03                 ` [PATCH v10 " Santosh Shukla
                                     ` (6 preceding siblings ...)
  2017-10-06 11:03                   ` [PATCH v10 7/9] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
@ 2017-10-06 11:03                   ` Santosh Shukla
  2017-10-06 11:03                   ` [PATCH v10 9/9] eal/rte_malloc: " Santosh Shukla
  2017-10-06 18:40                   ` [PATCH v10 0/9] Infrastructure to detect iova mapping on the bus Thomas Monjalon
  9 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-10-06 11:03 UTC (permalink / raw)
  To: olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin, Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 52791282f..2d9d7c2dc 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -139,6 +139,9 @@ rte_mem_virt2phy(const void *virtaddr)
 	int page_size;
 	off_t offset;
 
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		return (uintptr_t)virtaddr;
+
 	/* when using dom0, /proc/self/pagemap always returns 0, check in
 	 * dpdk memory by browsing the memsegs */
 	if (rte_xen_dom0_supported()) {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* [PATCH v10 9/9] eal/rte_malloc: honor iova mode in virt2phy
  2017-10-06 11:03                 ` [PATCH v10 " Santosh Shukla
                                     ` (7 preceding siblings ...)
  2017-10-06 11:03                   ` [PATCH v10 8/9] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
@ 2017-10-06 11:03                   ` Santosh Shukla
  2017-10-06 18:40                   ` [PATCH v10 0/9] Infrastructure to detect iova mapping on the bus Thomas Monjalon
  9 siblings, 0 replies; 248+ messages in thread
From: Santosh Shukla @ 2017-10-06 11:03 UTC (permalink / raw)
  To: olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin, Santosh Shukla

Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_eal/common/rte_malloc.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 5c0627bf4..d65c05a4d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -251,10 +251,17 @@ rte_malloc_set_limit(__rte_unused const char *type,
 phys_addr_t
 rte_malloc_virt2phy(const void *addr)
 {
+	phys_addr_t paddr;
 	const struct malloc_elem *elem = malloc_elem_from_data(addr);
 	if (elem == NULL)
 		return RTE_BAD_PHYS_ADDR;
 	if (elem->ms->phys_addr == RTE_BAD_PHYS_ADDR)
 		return RTE_BAD_PHYS_ADDR;
-	return elem->ms->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		paddr = (uintptr_t)addr;
+	else
+		paddr = elem->ms->phys_addr +
+			((uintptr_t)addr - (uintptr_t)elem->ms->addr);
+	return paddr;
 }
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 248+ messages in thread

* Re: [PATCH v10 0/9] Infrastructure to detect iova mapping on the bus
  2017-10-06 11:03                 ` [PATCH v10 " Santosh Shukla
                                     ` (8 preceding siblings ...)
  2017-10-06 11:03                   ` [PATCH v10 9/9] eal/rte_malloc: " Santosh Shukla
@ 2017-10-06 18:40                   ` Thomas Monjalon
  9 siblings, 0 replies; 248+ messages in thread
From: Thomas Monjalon @ 2017-10-06 18:40 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: dev, olivier.matz, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin

> Santosh Shukla (9):
>   eal/pci: export match function
>   eal/pci: get iommu class
>   linuxapp/eal_pci: get iommu class
>   bus: get iommu class
>   eal: introduce helper API for iova mode
>   eal: auto detect iova mode
>   linuxapp/eal_vfio: honor iova mode before mapping
>   linuxapp/eal_memory: honor iova mode in virt2phy
>   eal/rte_malloc: honor iova mode in virt2phy

Applied with few uppercase changes in comments, thanks

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v10 3/9] linuxapp/eal_pci: get iommu class
  2017-10-06 11:03                   ` [PATCH v10 3/9] linuxapp/eal_pci: " Santosh Shukla
@ 2017-10-11  1:47                     ` Tan, Jianfeng
  2017-10-11  4:43                       ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Tan, Jianfeng @ 2017-10-11  1:47 UTC (permalink / raw)
  To: Santosh Shukla, olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin

Hi,

Nice patch series. But I still have a small question about below flag.


On 10/6/2017 7:03 PM, Santosh Shukla wrote:
> Get iommu class of PCI device on the bus and returns preferred iova
> mapping mode for that bus.
>
> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
> Flag used when driver needs to operate in iova=va mode.
>
Does this flag indicate a must to use VA as IOVA, or a nice-to-have one? 
In detail, above commit log says, "needs to operate in iova=va mode", 
but the comment in the patch indicates this flag means "driver supports 
IOVA as VA".

If it's the latter case, I would suppose all drivers support to use VA 
as IOVA, if the NICs are binded to vfio-pci (iommu mode). Please correct 
me if I'm wrong.

Thanks,
Jianfeng

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v10 3/9] linuxapp/eal_pci: get iommu class
  2017-10-11  1:47                     ` Tan, Jianfeng
@ 2017-10-11  4:43                       ` santosh
  2017-10-11  5:31                         ` Tan, Jianfeng
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-10-11  4:43 UTC (permalink / raw)
  To: Tan, Jianfeng, olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin


On Wednesday 11 October 2017 07:17 AM, Tan, Jianfeng wrote:
> Hi,
>
> Nice patch series. But I still have a small question about below flag.
>
>
> On 10/6/2017 7:03 PM, Santosh Shukla wrote:
>> Get iommu class of PCI device on the bus and returns preferred iova
>> mapping mode for that bus.
>>
>> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
>> Flag used when driver needs to operate in iova=va mode.
>>
> Does this flag indicate a must to use VA as IOVA, or a nice-to-have one? In detail, above commit log says, "needs to operate in iova=va mode", but the comment in the patch indicates this flag means "driver supports IOVA as VA".
>
> If it's the latter case, I would suppose all drivers support to use VA as IOVA, if the NICs are binded to vfio-pci (iommu mode). Please correct me if I'm wrong.
>
- Any iommu backed pmd could choose to use this flag.
- Reasoning for need was performance for our external mempool pmd: avoid phy2virt translation on
mbuf thus save cycles.

> Thanks,
> Jianfeng

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v10 3/9] linuxapp/eal_pci: get iommu class
  2017-10-11  4:43                       ` santosh
@ 2017-10-11  5:31                         ` Tan, Jianfeng
  2017-10-11  5:37                           ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Tan, Jianfeng @ 2017-10-11  5:31 UTC (permalink / raw)
  To: santosh, olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin



On 10/11/2017 12:43 PM, santosh wrote:
> On Wednesday 11 October 2017 07:17 AM, Tan, Jianfeng wrote:
>> Hi,
>>
>> Nice patch series. But I still have a small question about below flag.
>>
>>
>> On 10/6/2017 7:03 PM, Santosh Shukla wrote:
>>> Get iommu class of PCI device on the bus and returns preferred iova
>>> mapping mode for that bus.
>>>
>>> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
>>> Flag used when driver needs to operate in iova=va mode.
>>>
>> Does this flag indicate a must to use VA as IOVA, or a nice-to-have one? In detail, above commit log says, "needs to operate in iova=va mode", but the comment in the patch indicates this flag means "driver supports IOVA as VA".
>>
>> If it's the latter case, I would suppose all drivers support to use VA as IOVA, if the NICs are binded to vfio-pci (iommu mode). Please correct me if I'm wrong.
>>
> - Any iommu backed pmd could choose to use this flag.

But if this is characterized by assumption for all PMDs, why do we 
trouble to introduce this flag.

> - Reasoning for need was performance for our external mempool pmd: avoid phy2virt translation on
> mbuf thus save cycles.
>

Agreed, and it's also for running DPDK without root privilege.

Thanks,
Jianfeng

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v10 3/9] linuxapp/eal_pci: get iommu class
  2017-10-11  5:31                         ` Tan, Jianfeng
@ 2017-10-11  5:37                           ` santosh
  2017-10-11  7:04                             ` Tan, Jianfeng
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-10-11  5:37 UTC (permalink / raw)
  To: Tan, Jianfeng, olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy, maxime.coquelin


On Wednesday 11 October 2017 11:01 AM, Tan, Jianfeng wrote:
>
>
> On 10/11/2017 12:43 PM, santosh wrote:
>> On Wednesday 11 October 2017 07:17 AM, Tan, Jianfeng wrote:
>>> Hi,
>>>
>>> Nice patch series. But I still have a small question about below flag.
>>>
>>>
>>> On 10/6/2017 7:03 PM, Santosh Shukla wrote:
>>>> Get iommu class of PCI device on the bus and returns preferred iova
>>>> mapping mode for that bus.
>>>>
>>>> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
>>>> Flag used when driver needs to operate in iova=va mode.
>>>>
>>> Does this flag indicate a must to use VA as IOVA, or a nice-to-have one? In detail, above commit log says, "needs to operate in iova=va mode", but the comment in the patch indicates this flag means "driver supports IOVA as VA".
>>>
>>> If it's the latter case, I would suppose all drivers support to use VA as IOVA, if the NICs are binded to vfio-pci (iommu mode). Please correct me if I'm wrong.
>>>
>> - Any iommu backed pmd could choose to use this flag.
>
> But if this is characterized by assumption for all PMDs, why do we trouble to introduce this flag.
>
to hint bus layer about iova=va mapping choice for _this_ driver and default is iova=pa.

Thanks.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v10 3/9] linuxapp/eal_pci: get iommu class
  2017-10-11  5:37                           ` santosh
@ 2017-10-11  7:04                             ` Tan, Jianfeng
  2017-10-11  7:10                               ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Tan, Jianfeng @ 2017-10-11  7:04 UTC (permalink / raw)
  To: santosh, olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen, Burakov,
	Anatoly, gaetan.rivet, shreyansh.jain, Richardson, Bruce,
	Gonzalez Monroy, Sergio, maxime.coquelin



> -----Original Message-----
> From: santosh [mailto:santosh.shukla@caviumnetworks.com]
> Sent: Wednesday, October 11, 2017 1:38 PM
> To: Tan, Jianfeng; olivier.matz@6wind.com; dev@dpdk.org
> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
> hemant.agrawal@nxp.com; aconole@redhat.com;
> stephen@networkplumber.org; Burakov, Anatoly; gaetan.rivet@6wind.com;
> shreyansh.jain@nxp.com; Richardson, Bruce; Gonzalez Monroy, Sergio;
> maxime.coquelin@redhat.com
> Subject: Re: [dpdk-dev] [PATCH v10 3/9] linuxapp/eal_pci: get iommu class
> 
> 
> On Wednesday 11 October 2017 11:01 AM, Tan, Jianfeng wrote:
> >
> >
> > On 10/11/2017 12:43 PM, santosh wrote:
> >> On Wednesday 11 October 2017 07:17 AM, Tan, Jianfeng wrote:
> >>> Hi,
> >>>
> >>> Nice patch series. But I still have a small question about below flag.
> >>>
> >>>
> >>> On 10/6/2017 7:03 PM, Santosh Shukla wrote:
> >>>> Get iommu class of PCI device on the bus and returns preferred iova
> >>>> mapping mode for that bus.
> >>>>
> >>>> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
> >>>> Flag used when driver needs to operate in iova=va mode.
> >>>>
> >>> Does this flag indicate a must to use VA as IOVA, or a nice-to-have one?
> In detail, above commit log says, "needs to operate in iova=va mode", but
> the comment in the patch indicates this flag means "driver supports IOVA as
> VA".
> >>>
> >>> If it's the latter case, I would suppose all drivers support to use VA as
> IOVA, if the NICs are binded to vfio-pci (iommu mode). Please correct me if
> I'm wrong.
> >>>
> >> - Any iommu backed pmd could choose to use this flag.
> >
> > But if this is characterized by assumption for all PMDs, why do we trouble
> to introduce this flag.
> >
> to hint bus layer about iova=va mapping choice for _this_ driver and default
> is iova=pa.
> 

So that sounds if this flag is set by some PMD, we must use iova=va.

Then how about we enable this, iova=va, if only all PCI devices are binded to vfio-pci (iommu mode)?

Thanks,
Jianfeng

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v10 3/9] linuxapp/eal_pci: get iommu class
  2017-10-11  7:04                             ` Tan, Jianfeng
@ 2017-10-11  7:10                               ` santosh
  2017-10-11  8:31                                 ` Tan, Jianfeng
  0 siblings, 1 reply; 248+ messages in thread
From: santosh @ 2017-10-11  7:10 UTC (permalink / raw)
  To: Tan, Jianfeng, olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen, Burakov,
	Anatoly, gaetan.rivet, shreyansh.jain, Richardson, Bruce,
	Gonzalez Monroy, Sergio, maxime.coquelin


On Wednesday 11 October 2017 12:34 PM, Tan, Jianfeng wrote:
>
>> -----Original Message-----
>> From: santosh [mailto:santosh.shukla@caviumnetworks.com]
>> Sent: Wednesday, October 11, 2017 1:38 PM
>> To: Tan, Jianfeng; olivier.matz@6wind.com; dev@dpdk.org
>> Cc: thomas@monjalon.net; jerin.jacob@caviumnetworks.com;
>> hemant.agrawal@nxp.com; aconole@redhat.com;
>> stephen@networkplumber.org; Burakov, Anatoly; gaetan.rivet@6wind.com;
>> shreyansh.jain@nxp.com; Richardson, Bruce; Gonzalez Monroy, Sergio;
>> maxime.coquelin@redhat.com
>> Subject: Re: [dpdk-dev] [PATCH v10 3/9] linuxapp/eal_pci: get iommu class
>>
>>
>> On Wednesday 11 October 2017 11:01 AM, Tan, Jianfeng wrote:
>>>
>>> On 10/11/2017 12:43 PM, santosh wrote:
>>>> On Wednesday 11 October 2017 07:17 AM, Tan, Jianfeng wrote:
>>>>> Hi,
>>>>>
>>>>> Nice patch series. But I still have a small question about below flag.
>>>>>
>>>>>
>>>>> On 10/6/2017 7:03 PM, Santosh Shukla wrote:
>>>>>> Get iommu class of PCI device on the bus and returns preferred iova
>>>>>> mapping mode for that bus.
>>>>>>
>>>>>> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
>>>>>> Flag used when driver needs to operate in iova=va mode.
>>>>>>
>>>>> Does this flag indicate a must to use VA as IOVA, or a nice-to-have one?
>> In detail, above commit log says, "needs to operate in iova=va mode", but
>> the comment in the patch indicates this flag means "driver supports IOVA as
>> VA".
>>>>> If it's the latter case, I would suppose all drivers support to use VA as
>> IOVA, if the NICs are binded to vfio-pci (iommu mode). Please correct me if
>> I'm wrong.
>>>> - Any iommu backed pmd could choose to use this flag.
>>> But if this is characterized by assumption for all PMDs, why do we trouble
>> to introduce this flag.
>> to hint bus layer about iova=va mapping choice for _this_ driver and default
>> is iova=pa.
>>
> So that sounds if this flag is set by some PMD, we must use iova=va.
>
> Then how about we enable this, iova=va, if only all PCI devices are binded to vfio-pci (iommu mode)?

Right, same I proposed (I guess) in v2 such that iova bus autodetecting in case see all device bound
to vfio-pci then autoselect iova=va, in v3 series (I guess) discussion: it was concluded that
better to send hint from driver. Refer work history, though iova bus still does said
auto-detection.

Thanks.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v10 3/9] linuxapp/eal_pci: get iommu class
  2017-10-11  7:10                               ` santosh
@ 2017-10-11  8:31                                 ` Tan, Jianfeng
  2017-10-11  8:51                                   ` santosh
  0 siblings, 1 reply; 248+ messages in thread
From: Tan, Jianfeng @ 2017-10-11  8:31 UTC (permalink / raw)
  To: santosh, olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen, Burakov,
	Anatoly, gaetan.rivet, shreyansh.jain, Richardson, Bruce,
	Gonzalez Monroy, Sergio, maxime.coquelin


> > Then how about we enable this, iova=va, if only all PCI devices are binded
> to vfio-pci (iommu mode)?
> 
> Right, same I proposed (I guess) in v2 such that iova bus autodetecting in
> case see all device bound
> to vfio-pci then autoselect iova=va, in v3 series (I guess) discussion: it was
> concluded that
> better to send hint from driver. Refer work history, though iova bus still does
> said
> auto-detection.

Sorry I missed that. I tend to think that almost all PMDs for physical devices shall add this flag then.

Thanks,
Jianfeng

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v10 3/9] linuxapp/eal_pci: get iommu class
  2017-10-11  8:31                                 ` Tan, Jianfeng
@ 2017-10-11  8:51                                   ` santosh
  0 siblings, 0 replies; 248+ messages in thread
From: santosh @ 2017-10-11  8:51 UTC (permalink / raw)
  To: Tan, Jianfeng, olivier.matz, dev
  Cc: thomas, jerin.jacob, hemant.agrawal, aconole, stephen, Burakov,
	Anatoly, gaetan.rivet, shreyansh.jain, Richardson, Bruce,
	Gonzalez Monroy, Sergio, maxime.coquelin


On Wednesday 11 October 2017 02:01 PM, Tan, Jianfeng wrote:
>>> Then how about we enable this, iova=va, if only all PCI devices are binded
>> to vfio-pci (iommu mode)?
>>
>> Right, same I proposed (I guess) in v2 such that iova bus autodetecting in
>> case see all device bound
>> to vfio-pci then autoselect iova=va, in v3 series (I guess) discussion: it was
>> concluded that
>> better to send hint from driver. Refer work history, though iova bus still does
>> said
>> auto-detection.
> Sorry I missed that. I tend to think that almost all PMDs for physical devices shall add this flag then.

IMO +1, But decision is upto PMD owner.

Thanks.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v10 6/9] eal: auto detect iova mode
  2017-10-06 11:03                   ` [PATCH v10 6/9] eal: auto detect " Santosh Shukla
@ 2017-10-13  8:48                     ` Maxime Coquelin
  2017-10-13  9:58                       ` Thomas Monjalon
  0 siblings, 1 reply; 248+ messages in thread
From: Maxime Coquelin @ 2017-10-13  8:48 UTC (permalink / raw)
  To: Santosh Shukla, olivier.matz, dev, thomas
  Cc: jerin.jacob, hemant.agrawal, aconole, stephen, anatoly.burakov,
	gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy

Hi Santosh,

On 10/06/2017 01:03 PM, Santosh Shukla wrote:
> iova autodetection depends on rte_bus_scan result. Result of bus scan will
> have updated device_list and each device in that list has its '.kdev' state
> updated. That kdrv state used to detect iova mapping mode for that device.
> 
> _device_parse() has dependency on rt_bus_scan so,
> Below calls moved up in the eal initialization order:
> 	- eal_option_device_parse
> 	- rte_bus_scan
> 
> And based on the result of rte_bus_scan_iommu_class - select iova
> mapping mode.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> ---
>   lib/librte_eal/bsdapp/eal/eal.c   | 27 ++++++++++++++++-----------
>   lib/librte_eal/linuxapp/eal/eal.c | 27 ++++++++++++++++-----------
>   2 files changed, 32 insertions(+), 22 deletions(-)

We noticed a regression on current master, which prevents to use Vhost
PMD with CONFIG_RTE_BUILD_SHARED_LIB=y:

# ./install/bin/testpmd --file-prefix=src -l 0,2 -n 4 --vdev 
'net_vhost0,iface=/tmp/vhost-user2' -d ./install/lib/librte_pmd_vhost.so 
-- --portmask=1 --disable-hw-vlan -i --rxq=1 --txq=1 --nb-cores=1 
--eth-peer=0,52:54:00:11:22:12
EAL: Detected 4 lcore(s)
ERROR: failed to parse device "net_vhost0"
EAL: Unable to parse device 'net_vhost0,iface=/tmp/vhost-user2'
PANIC in main():
Cannot init EAL
5: [./install/bin/testpmd(_start+0x2a) [0x41e91a]]
4: [/lib64/libc.so.6(__libc_start_main+0xea) [0x7f551882550a]]
3: [./install/bin/testpmd(main+0x68e) [0x41e77e]]
2: 
[/home/max/projects/src/mainline/dpdk/x86_64-native-linuxapp-gcc/lib/librte_eal.so.5.1(__rte_panic+0xba) 
[0x7f551982c05a]]
1: 
[/home/max/projects/src/mainline/dpdk/x86_64-native-linuxapp-gcc/lib/librte_eal.so.5.1(rte_dump_stack+0x1b) 
[0x7f551983645b]]
Aborted (core dumped)

Git bisect seems to point to this patch:

$ git bisect log
git bisect start
# bad: [5518fc95427891e8bcf72f461cdaa38604226442] mempool/dpaa2: improve 
error handling
git bisect bad 5518fc95427891e8bcf72f461cdaa38604226442
# good: [02657b4adcb8af773e26ec061b01cd7abdd3f0b6] version: 17.08.0
git bisect good 02657b4adcb8af773e26ec061b01cd7abdd3f0b6
# good: [4fa5e0bbc5730887a4a15b915bb15deb5ef1f607] net/dpaa: support 
hashed RSS
git bisect good 4fa5e0bbc5730887a4a15b915bb15deb5ef1f607
# bad: [381acec2b1bd838c4a494b82c692db35573554da] eventdev: ease 
single-link queue config requirements
git bisect bad 381acec2b1bd838c4a494b82c692db35573554da
# bad: [f1810113590373b157ebba555d6b51f38c8ca10f] config: enable igb_uio 
on arm64
git bisect bad f1810113590373b157ebba555d6b51f38c8ca10f
# good: [69293c7762a0dbb3c28f5e93be00aaa49b52cb48] bus/fslmc: remove 
unused funcs and align names in QBMAN
git bisect good 69293c7762a0dbb3c28f5e93be00aaa49b52cb48
# good: [f8244c6399d9fae6afab6770ae367aef38742ea5] ethdev: increase port 
id range
git bisect good f8244c6399d9fae6afab6770ae367aef38742ea5
# bad: [680f6c12600f5d341c5968a1daeef7c5a055451b] mem: honor IOVA mode 
in virt2phy
git bisect bad 680f6c12600f5d341c5968a1daeef7c5a055451b
# good: [a4f0a2dbe5abc2cadf0300fb4d5767b66254035d] pci: get IOMMU class
git bisect good a4f0a2dbe5abc2cadf0300fb4d5767b66254035d
# good: [93878cf0255e9dc21322ed99ad535adc048fa44f] eal: introduce helper 
API for IOVA mode
git bisect good 93878cf0255e9dc21322ed99ad535adc048fa44f
# bad: [e85a919286d2543500bc384df206740845e85362] vfio: honor IOVA mode 
before mapping
git bisect bad e85a919286d2543500bc384df206740845e85362
# bad: [cf408c22476c9f866deacac634dd17591e07a5c5] eal: auto detect IOVA mode
git bisect bad cf408c22476c9f866deacac634dd17591e07a5c5
# first bad commit: [cf408c22476c9f866deacac634dd17591e07a5c5] eal: auto 
detect IOVA mode

This is the build commands I used to run the bisection:
sed -i 's/CONFIG_RTE_BUILD_SHARED_LIB=n/CONFIG_RTE_BUILD_SHARED_LIB=y/g' 
config/common_base
make -j4 install T=x86_64-native-linuxapp-gcc DESTDIR=install 
EXTRA_CFLAGS='-g'
sed -i 's/CONFIG_RTE_BUILD_SHARED_LIB=y/CONFIG_RTE_BUILD_SHARED_LIB=n/g' 
config/common_base

Regards,
Maxime

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v10 6/9] eal: auto detect iova mode
  2017-10-13  8:48                     ` Maxime Coquelin
@ 2017-10-13  9:58                       ` Thomas Monjalon
  0 siblings, 0 replies; 248+ messages in thread
From: Thomas Monjalon @ 2017-10-13  9:58 UTC (permalink / raw)
  To: Maxime Coquelin, Santosh Shukla
  Cc: dev, olivier.matz, jerin.jacob, hemant.agrawal, aconole, stephen,
	anatoly.burakov, gaetan.rivet, shreyansh.jain, bruce.richardson,
	sergio.gonzalez.monroy

13/10/2017 10:48, Maxime Coquelin:
> Hi Santosh,
> 
> On 10/06/2017 01:03 PM, Santosh Shukla wrote:
> > iova autodetection depends on rte_bus_scan result. Result of bus scan will
> > have updated device_list and each device in that list has its '.kdev' state
> > updated. That kdrv state used to detect iova mapping mode for that device.
> > 
> > _device_parse() has dependency on rt_bus_scan so,
> > Below calls moved up in the eal initialization order:
> > 	- eal_option_device_parse
> > 	- rte_bus_scan
> > 
> > And based on the result of rte_bus_scan_iommu_class - select iova
> > mapping mode.
> > 
> > Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> > Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> > ---
> >   lib/librte_eal/bsdapp/eal/eal.c   | 27 ++++++++++++++++-----------
> >   lib/librte_eal/linuxapp/eal/eal.c | 27 ++++++++++++++++-----------
> >   2 files changed, 32 insertions(+), 22 deletions(-)
> 
> We noticed a regression on current master, which prevents to use Vhost
> PMD with CONFIG_RTE_BUILD_SHARED_LIB=y:

It was my guess during review:
	http://dpdk.org/ml/archives/dev/2017-October/077863.html

I really don't understand how it can work,
because the bus scan is moved before shared libraries (plugins)
are loaded.
It will be even worst when PCI and vdev buses will be some
shared libraries.

Is it a chicken/egg issue?

If we cannot find a good solution, we may have to revert for RC1.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping
  2017-08-31  3:26             ` [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
  2017-09-04 15:40               ` Burakov, Anatoly
@ 2017-10-26 12:57               ` Jonas Pfefferle1
  2017-11-02 10:17                 ` Thomas Monjalon
  1 sibling, 1 reply; 248+ messages in thread
From: Jonas Pfefferle1 @ 2017-10-26 12:57 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: dev, thomas, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen, aconole


Hi @all

I just stumbled upon this patch while testing on POWER. RTE_IOVA_VA will
not work for the sPAPR code since the dma window size is currently
determined by the physical address only. I'm preparing a patch to address
this.

Thanks,
Jonas

"dev" <dev-bounces@dpdk.org> wrote on 08/31/2017 05:26:16 AM:

> From: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> To: dev@dpdk.org
> Cc: thomas@monjalon.net, jerin.jacob@caviumnetworks.com,
> hemant.agrawal@nxp.com, olivier.matz@6wind.com,
> maxime.coquelin@redhat.com, sergio.gonzalez.monroy@intel.com,
> bruce.richardson@intel.com, shreyansh.jain@nxp.com,
> gaetan.rivet@6wind.com, anatoly.burakov@intel.com,
> stephen@networkplumber.org, aconole@redhat.com, Santosh Shukla
> <santosh.shukla@caviumnetworks.com>
> Date: 08/31/2017 05:28 AM
> Subject: [dpdk-dev] [PATCH v7 7/9] linuxapp/eal_vfio: honor iova
> mode before mapping
> Sent by: "dev" <dev-bounces@dpdk.org>
>
> Check iova mode and accordingly map iova to pa or va.
>
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
> librte_eal/linuxapp/eal/eal_vfio.c
> index c8a97b7e7..b32cd09a2 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd)
>        dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>        dma_map.vaddr = ms[i].addr_64;
>        dma_map.size = ms[i].len;
> -      dma_map.iova = ms[i].phys_addr;
> +      if (rte_eal_iova_mode() == RTE_IOVA_VA)
> +         dma_map.iova = dma_map.vaddr;
> +      else
> +         dma_map.iova = ms[i].phys_addr;
>        dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
>
>        ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
> @@ -792,7 +795,10 @@ vfio_spapr_dma_map(int vfio_container_fd)
>        dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>        dma_map.vaddr = ms[i].addr_64;
>        dma_map.size = ms[i].len;
> -      dma_map.iova = ms[i].phys_addr;
> +      if (rte_eal_iova_mode() == RTE_IOVA_VA)
> +         dma_map.iova = dma_map.vaddr;
> +      else
> +         dma_map.iova = ms[i].phys_addr;
>        dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
>               VFIO_DMA_MAP_FLAG_WRITE;
>
> --
> 2.13.0
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping
  2017-10-26 12:57               ` Jonas Pfefferle1
@ 2017-11-02 10:17                 ` Thomas Monjalon
  2017-11-02 10:26                   ` Jonas Pfefferle1
  0 siblings, 1 reply; 248+ messages in thread
From: Thomas Monjalon @ 2017-11-02 10:17 UTC (permalink / raw)
  To: Jonas Pfefferle1
  Cc: dev, Santosh Shukla, jerin.jacob, hemant.agrawal, olivier.matz,
	maxime.coquelin, sergio.gonzalez.monroy, bruce.richardson,
	shreyansh.jain, gaetan.rivet, anatoly.burakov, stephen, aconole

Hi

26/10/2017 14:57, Jonas Pfefferle1:
> 
> Hi @all
> 
> I just stumbled upon this patch while testing on POWER. RTE_IOVA_VA will
> not work for the sPAPR code since the dma window size is currently
> determined by the physical address only.

Is it 	affecting POWER8?

> I'm preparing a patch to address this.

Any news?
Can you use virtual addresses?

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping
  2017-11-02 10:17                 ` Thomas Monjalon
@ 2017-11-02 10:26                   ` Jonas Pfefferle1
  2017-11-03  9:56                     ` Jonas Pfefferle1
  0 siblings, 1 reply; 248+ messages in thread
From: Jonas Pfefferle1 @ 2017-11-02 10:26 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: aconole, anatoly.burakov, bruce.richardson, dev, gaetan.rivet,
	hemant.agrawal, jerin.jacob, maxime.coquelin, olivier.matz,
	Santosh Shukla, sergio.gonzalez.monroy, shreyansh.jain, stephen,
	Alexey Kardashevskiy


Thomas Monjalon <thomas@monjalon.net> wrote on 11/02/2017 11:17:10 AM:

> From: Thomas Monjalon <thomas@monjalon.net>
> To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
> Cc: dev@dpdk.org, Santosh Shukla
> <santosh.shukla@caviumnetworks.com>, jerin.jacob@caviumnetworks.com,
> hemant.agrawal@nxp.com, olivier.matz@6wind.com,
> maxime.coquelin@redhat.com, sergio.gonzalez.monroy@intel.com,
> bruce.richardson@intel.com, shreyansh.jain@nxp.com,
> gaetan.rivet@6wind.com, anatoly.burakov@intel.com,
> stephen@networkplumber.org, aconole@redhat.com
> Date: 11/02/2017 11:17 AM
> Subject: Re: [dpdk-dev] [PATCH v7 7/9] linuxapp/eal_vfio: honor iova
> mode before mapping
>
> Hi
>
> 26/10/2017 14:57, Jonas Pfefferle1:
> >
> > Hi @all
> >
> > I just stumbled upon this patch while testing on POWER. RTE_IOVA_VA
will
> > not work for the sPAPR code since the dma window size is currently
> > determined by the physical address only.
>
> Is it    affecting POWER8?

It is.

>
> > I'm preparing a patch to address this.
>
> Any news?
> Can you use virtual addresses?

After a long discussion with Alexey (CC) we came to the conclusion that
with the current sPAPR iommu driver we cannot use virtual addresses since
the iova is restricted to lay in the DMA window which itself is restricted
to physical RAM addresses resp. with the current code 0 to hotplug memory
max. However, Alexey is working on a patch to lift this restriction on the
DMA window size which should allow us to do VA:VA mappings in the future.
For now we should fall back to PA in the dynamic iova mode check. I will
send an according patch later today.

>
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping
  2017-11-02 10:26                   ` Jonas Pfefferle1
@ 2017-11-03  9:56                     ` Jonas Pfefferle1
  2017-11-03 10:28                       ` Thomas Monjalon
  0 siblings, 1 reply; 248+ messages in thread
From: Jonas Pfefferle1 @ 2017-11-03  9:56 UTC (permalink / raw)
  To: Jonas Pfefferle1
  Cc: Thomas Monjalon, aconole, anatoly.burakov, bruce.richardson, dev,
	gaetan.rivet, hemant.agrawal, jerin.jacob, maxime.coquelin,
	olivier.matz, Santosh Shukla, sergio.gonzalez.monroy,
	shreyansh.jain, stephen, Alexey Kardashevskiy

"dev" <dev-bounces@dpdk.org> wrote on 11/02/2017 11:26:57 AM:

> From: "Jonas Pfefferle1" <JPF@zurich.ibm.com>
> To: Thomas Monjalon <thomas@monjalon.net>
> Cc: aconole@redhat.com, anatoly.burakov@intel.com,
> bruce.richardson@intel.com, dev@dpdk.org, gaetan.rivet@6wind.com,
> hemant.agrawal@nxp.com, jerin.jacob@caviumnetworks.com,
> maxime.coquelin@redhat.com, olivier.matz@6wind.com, Santosh Shukla
> <santosh.shukla@caviumnetworks.com>,
> sergio.gonzalez.monroy@intel.com, shreyansh.jain@nxp.com,
> stephen@networkplumber.org, "Alexey Kardashevskiy" <aik@ozlabs.ru>
> Date: 11/02/2017 11:27 AM
> Subject: Re: [dpdk-dev] [PATCH v7 7/9] linuxapp/eal_vfio: honor iova
> mode before mapping
> Sent by: "dev" <dev-bounces@dpdk.org>
>
>
> Thomas Monjalon <thomas@monjalon.net> wrote on 11/02/2017 11:17:10 AM:
>
> > From: Thomas Monjalon <thomas@monjalon.net>
> > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
> > Cc: dev@dpdk.org, Santosh Shukla
> > <santosh.shukla@caviumnetworks.com>, jerin.jacob@caviumnetworks.com,
> > hemant.agrawal@nxp.com, olivier.matz@6wind.com,
> > maxime.coquelin@redhat.com, sergio.gonzalez.monroy@intel.com,
> > bruce.richardson@intel.com, shreyansh.jain@nxp.com,
> > gaetan.rivet@6wind.com, anatoly.burakov@intel.com,
> > stephen@networkplumber.org, aconole@redhat.com
> > Date: 11/02/2017 11:17 AM
> > Subject: Re: [dpdk-dev] [PATCH v7 7/9] linuxapp/eal_vfio: honor iova
> > mode before mapping
> >
> > Hi
> >
> > 26/10/2017 14:57, Jonas Pfefferle1:
> > >
> > > Hi @all
> > >
> > > I just stumbled upon this patch while testing on POWER. RTE_IOVA_VA
> will
> > > not work for the sPAPR code since the dma window size is currently
> > > determined by the physical address only.
> >
> > Is it    affecting POWER8?
>
> It is.
>
> >
> > > I'm preparing a patch to address this.
> >
> > Any news?
> > Can you use virtual addresses?
>
> After a long discussion with Alexey (CC) we came to the conclusion that
> with the current sPAPR iommu driver we cannot use virtual addresses since
> the iova is restricted to lay in the DMA window which itself is
restricted
> to physical RAM addresses resp. with the current code 0 to hotplug memory
> max. However, Alexey is working on a patch to lift this restriction on
the
> DMA window size which should allow us to do VA:VA mappings in the future.
> For now we should fall back to PA in the dynamic iova mode check. I will
> send an according patch later today.

I looked into this yesterday but I'm not sure what the right solution is
here.
At the time rte_pci_get_iommu_class is called we already know which IOMMU
types are supported because vfio_get_container_fd resp.
vfio_has_supported_extensions  has been called however we do not know which
one is going to be used (Decided later in vfio_setup_device resp.
vfio_set_iommu_type). We can choose a iova mode which is supported by all
types but if the modes are exclusive to the types we have to guess which
one is going to be used. Or let the user decide?

Thanks,
Jonas

>
> >
> >
>

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping
  2017-11-03  9:56                     ` Jonas Pfefferle1
@ 2017-11-03 10:28                       ` Thomas Monjalon
  2017-11-03 10:44                         ` Jonas Pfefferle1
  0 siblings, 1 reply; 248+ messages in thread
From: Thomas Monjalon @ 2017-11-03 10:28 UTC (permalink / raw)
  To: Jonas Pfefferle1
  Cc: aconole, anatoly.burakov, bruce.richardson, dev, gaetan.rivet,
	hemant.agrawal, jerin.jacob, maxime.coquelin, olivier.matz,
	Santosh Shukla, sergio.gonzalez.monroy, shreyansh.jain, stephen,
	Alexey Kardashevskiy

03/11/2017 10:56, Jonas Pfefferle1:
> Thomas Monjalon <thomas@monjalon.net> wrote on 11/02/2017 11:17:10 AM:
> > > 26/10/2017 14:57, Jonas Pfefferle1:
> > > >
> > > > Hi @all
> > > >
> > > > I just stumbled upon this patch while testing on POWER. RTE_IOVA_VA
> > will
> > > > not work for the sPAPR code since the dma window size is currently
> > > > determined by the physical address only.
> > >
> > > Is it    affecting POWER8?
> >
> > It is.
> >
> > >
> > > > I'm preparing a patch to address this.
> > >
> > > Any news?
> > > Can you use virtual addresses?
> >
> > After a long discussion with Alexey (CC) we came to the conclusion that
> > with the current sPAPR iommu driver we cannot use virtual addresses since
> > the iova is restricted to lay in the DMA window which itself is
> restricted
> > to physical RAM addresses resp. with the current code 0 to hotplug memory
> > max. However, Alexey is working on a patch to lift this restriction on
> the
> > DMA window size which should allow us to do VA:VA mappings in the future.
> > For now we should fall back to PA in the dynamic iova mode check. I will
> > send an according patch later today.
> 
> I looked into this yesterday but I'm not sure what the right solution is
> here.
> At the time rte_pci_get_iommu_class is called we already know which IOMMU
> types are supported because vfio_get_container_fd resp.
> vfio_has_supported_extensions  has been called however we do not know which
> one is going to be used (Decided later in vfio_setup_device resp.
> vfio_set_iommu_type). We can choose a iova mode which is supported by all
> types but if the modes are exclusive to the types we have to guess which
> one is going to be used. Or let the user decide?

You can keep the old behaviour, restricting to physical memory,
until you support virtual addressing.
It can be just a #ifdef RTE_ARCH_PPC_64.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping
  2017-11-03 10:28                       ` Thomas Monjalon
@ 2017-11-03 10:44                         ` Jonas Pfefferle1
  2017-11-03 10:54                           ` Thomas Monjalon
  0 siblings, 1 reply; 248+ messages in thread
From: Jonas Pfefferle1 @ 2017-11-03 10:44 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: aconole, Alexey Kardashevskiy, anatoly.burakov, bruce.richardson,
	dev, gaetan.rivet, hemant.agrawal, jerin.jacob, maxime.coquelin,
	olivier.matz, Santosh Shukla, sergio.gonzalez.monroy,
	shreyansh.jain, stephen

Thomas Monjalon <thomas@monjalon.net> wrote on 11/03/2017 11:28:10 AM:

> From: Thomas Monjalon <thomas@monjalon.net>
> To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
> Cc: aconole@redhat.com, anatoly.burakov@intel.com,
> bruce.richardson@intel.com, dev@dpdk.org, gaetan.rivet@6wind.com,
> hemant.agrawal@nxp.com, jerin.jacob@caviumnetworks.com,
> maxime.coquelin@redhat.com, olivier.matz@6wind.com, Santosh Shukla
> <santosh.shukla@caviumnetworks.com>,
> sergio.gonzalez.monroy@intel.com, shreyansh.jain@nxp.com,
> stephen@networkplumber.org, Alexey Kardashevskiy <aik@ozlabs.ru>
> Date: 11/03/2017 11:28 AM
> Subject: Re: [dpdk-dev] [PATCH v7 7/9] linuxapp/eal_vfio: honor iova
> mode before mapping
>
> 03/11/2017 10:56, Jonas Pfefferle1:
> > Thomas Monjalon <thomas@monjalon.net> wrote on 11/02/2017 11:17:10 AM:
> > > > 26/10/2017 14:57, Jonas Pfefferle1:
> > > > >
> > > > > Hi @all
> > > > >
> > > > > I just stumbled upon this patch while testing on POWER.
RTE_IOVA_VA
> > > will
> > > > > not work for the sPAPR code since the dma window size is
currently
> > > > > determined by the physical address only.
> > > >
> > > > Is it    affecting POWER8?
> > >
> > > It is.
> > >
> > > >
> > > > > I'm preparing a patch to address this.
> > > >
> > > > Any news?
> > > > Can you use virtual addresses?
> > >
> > > After a long discussion with Alexey (CC) we came to the conclusion
that
> > > with the current sPAPR iommu driver we cannot use virtual addresses
since
> > > the iova is restricted to lay in the DMA window which itself is
> > restricted
> > > to physical RAM addresses resp. with the current code 0 to hotplug
memory
> > > max. However, Alexey is working on a patch to lift this restriction
on
> > the
> > > DMA window size which should allow us to do VA:VA mappings in the
future.
> > > For now we should fall back to PA in the dynamic iova mode check. I
will
> > > send an according patch later today.
> >
> > I looked into this yesterday but I'm not sure what the right solution
is
> > here.
> > At the time rte_pci_get_iommu_class is called we already know which
IOMMU
> > types are supported because vfio_get_container_fd resp.
> > vfio_has_supported_extensions  has been called however we do not know
which
> > one is going to be used (Decided later in vfio_setup_device resp.
> > vfio_set_iommu_type). We can choose a iova mode which is supported by
all
> > types but if the modes are exclusive to the types we have to guess
which
> > one is going to be used. Or let the user decide?
>
> You can keep the old behaviour, restricting to physical memory,
> until you support virtual addressing.
> It can be just a #ifdef RTE_ARCH_PPC_64.
>

Ok but we might want to refine this in the future. IMO It looks much
cleaner
to decide this on the iommu type plus this would also cover the noiommu
case without having this extra check reading the sysfs variable.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping
  2017-11-03 10:44                         ` Jonas Pfefferle1
@ 2017-11-03 10:54                           ` Thomas Monjalon
  2017-11-03 11:28                             ` Jonas Pfefferle1
  0 siblings, 1 reply; 248+ messages in thread
From: Thomas Monjalon @ 2017-11-03 10:54 UTC (permalink / raw)
  To: Jonas Pfefferle1
  Cc: aconole, Alexey Kardashevskiy, anatoly.burakov, bruce.richardson,
	dev, gaetan.rivet, hemant.agrawal, jerin.jacob, maxime.coquelin,
	olivier.matz, Santosh Shukla, sergio.gonzalez.monroy,
	shreyansh.jain, stephen

03/11/2017 11:44, Jonas Pfefferle1:
> Thomas Monjalon <thomas@monjalon.net> wrote on 11/03/2017 11:28:10 AM:
> > 03/11/2017 10:56, Jonas Pfefferle1:
> > > Thomas Monjalon <thomas@monjalon.net> wrote on 11/02/2017 11:17:10 AM:
> > > > > 26/10/2017 14:57, Jonas Pfefferle1:
> > > > > >
> > > > > > Hi @all
> > > > > >
> > > > > > I just stumbled upon this patch while testing on POWER.
> RTE_IOVA_VA
> > > > will
> > > > > > not work for the sPAPR code since the dma window size is
> currently
> > > > > > determined by the physical address only.
> > > > >
> > > > > Is it    affecting POWER8?
> > > >
> > > > It is.
> > > >
> > > > >
> > > > > > I'm preparing a patch to address this.
> > > > >
> > > > > Any news?
> > > > > Can you use virtual addresses?
> > > >
> > > > After a long discussion with Alexey (CC) we came to the conclusion
> that
> > > > with the current sPAPR iommu driver we cannot use virtual addresses
> since
> > > > the iova is restricted to lay in the DMA window which itself is
> > > restricted
> > > > to physical RAM addresses resp. with the current code 0 to hotplug
> memory
> > > > max. However, Alexey is working on a patch to lift this restriction
> on
> > > the
> > > > DMA window size which should allow us to do VA:VA mappings in the
> future.
> > > > For now we should fall back to PA in the dynamic iova mode check. I
> will
> > > > send an according patch later today.
> > >
> > > I looked into this yesterday but I'm not sure what the right solution
> is
> > > here.
> > > At the time rte_pci_get_iommu_class is called we already know which
> IOMMU
> > > types are supported because vfio_get_container_fd resp.
> > > vfio_has_supported_extensions  has been called however we do not know
> which
> > > one is going to be used (Decided later in vfio_setup_device resp.
> > > vfio_set_iommu_type). We can choose a iova mode which is supported by
> all
> > > types but if the modes are exclusive to the types we have to guess
> which
> > > one is going to be used. Or let the user decide?
> >
> > You can keep the old behaviour, restricting to physical memory,
> > until you support virtual addressing.
> > It can be just a #ifdef RTE_ARCH_PPC_64.
> >
> 
> Ok but we might want to refine this in the future. IMO It looks much
> cleaner
> to decide this on the iommu type plus this would also cover the noiommu
> case without having this extra check reading the sysfs variable.

You are using the word "this" too many times to help me understand :)

Anyway, please send a quick fix today for 17.11.
The RC3 will be probably closed before Monday.

^ permalink raw reply	[flat|nested] 248+ messages in thread

* Re: [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping
  2017-11-03 10:54                           ` Thomas Monjalon
@ 2017-11-03 11:28                             ` Jonas Pfefferle1
  0 siblings, 0 replies; 248+ messages in thread
From: Jonas Pfefferle1 @ 2017-11-03 11:28 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: aconole, Alexey Kardashevskiy, anatoly.burakov, bruce.richardson,
	dev, gaetan.rivet, hemant.agrawal, jerin.jacob, maxime.coquelin,
	olivier.matz, Santosh Shukla, sergio.gonzalez.monroy,
	shreyansh.jain, stephen

Thomas Monjalon <thomas@monjalon.net> wrote on 11/03/2017 11:54:45 AM:

> From: Thomas Monjalon <thomas@monjalon.net>
> To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
> Cc: aconole@redhat.com, Alexey Kardashevskiy <aik@ozlabs.ru>,
> anatoly.burakov@intel.com, bruce.richardson@intel.com, dev@dpdk.org,
> gaetan.rivet@6wind.com, hemant.agrawal@nxp.com,
> jerin.jacob@caviumnetworks.com, maxime.coquelin@redhat.com,
> olivier.matz@6wind.com, Santosh Shukla
> <santosh.shukla@caviumnetworks.com>,
> sergio.gonzalez.monroy@intel.com, shreyansh.jain@nxp.com,
> stephen@networkplumber.org
> Date: 11/03/2017 11:55 AM
> Subject: Re: [dpdk-dev] [PATCH v7 7/9] linuxapp/eal_vfio: honor iova
> mode before mapping
>
> 03/11/2017 11:44, Jonas Pfefferle1:
> > Thomas Monjalon <thomas@monjalon.net> wrote on 11/03/2017 11:28:10 AM:
> > > 03/11/2017 10:56, Jonas Pfefferle1:
> > > > Thomas Monjalon <thomas@monjalon.net> wrote on 11/02/2017 11:17:10
AM:
> > > > > > 26/10/2017 14:57, Jonas Pfefferle1:
> > > > > > >
> > > > > > > Hi @all
> > > > > > >
> > > > > > > I just stumbled upon this patch while testing on POWER.
> > RTE_IOVA_VA
> > > > > will
> > > > > > > not work for the sPAPR code since the dma window size is
> > currently
> > > > > > > determined by the physical address only.
> > > > > >
> > > > > > Is it    affecting POWER8?
> > > > >
> > > > > It is.
> > > > >
> > > > > >
> > > > > > > I'm preparing a patch to address this.
> > > > > >
> > > > > > Any news?
> > > > > > Can you use virtual addresses?
> > > > >
> > > > > After a long discussion with Alexey (CC) we came to the
conclusion
> > that
> > > > > with the current sPAPR iommu driver we cannot use virtual
addresses
> > since
> > > > > the iova is restricted to lay in the DMA window which itself is
> > > > restricted
> > > > > to physical RAM addresses resp. with the current code 0 to
hotplug
> > memory
> > > > > max. However, Alexey is working on a patch to lift this
restriction
> > on
> > > > the
> > > > > DMA window size which should allow us to do VA:VA mappings in the
> > future.
> > > > > For now we should fall back to PA in the dynamic iova mode check.
I
> > will
> > > > > send an according patch later today.
> > > >
> > > > I looked into this yesterday but I'm not sure what the right
solution
> > is
> > > > here.
> > > > At the time rte_pci_get_iommu_class is called we already know which
> > IOMMU
> > > > types are supported because vfio_get_container_fd resp.
> > > > vfio_has_supported_extensions  has been called however we do not
know
> > which
> > > > one is going to be used (Decided later in vfio_setup_device resp.
> > > > vfio_set_iommu_type). We can choose a iova mode which is supported
by
> > all
> > > > types but if the modes are exclusive to the types we have to guess
> > which
> > > > one is going to be used. Or let the user decide?
> > >
> > > You can keep the old behaviour, restricting to physical memory,
> > > until you support virtual addressing.
> > > It can be just a #ifdef RTE_ARCH_PPC_64.
> > >
> >
> > Ok but we might want to refine this in the future. IMO It looks much
> > cleaner
> > to decide this on the iommu type plus this would also cover the noiommu
> > case without having this extra check reading the sysfs variable.
>
> You are using the word "this" too many times to help me understand :)

What I meant is a fix that decides which iova mode to use based on the
iommu types supported (determined by vfio_get_container_fd) instead of
another extra case for PPC much like the noiommu check. Both should
be covered by the supported types based check.
IMO much cleaner and simpler to support new iommu types.

>
> Anyway, please send a quick fix today for 17.11.
> The RC3 will be probably closed before Monday.
>

Will do.

^ permalink raw reply	[flat|nested] 248+ messages in thread

end of thread, other threads:[~2017-11-03 11:29 UTC | newest]

Thread overview: 248+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-08 11:05 [PATCH 00/10] Infrastructure to detect iova mapping on the bus Santosh Shukla
2017-06-08 11:05 ` [PATCH 01/10] bsdapp/eal_pci: get iommu class Santosh Shukla
2017-06-08 11:05 ` [PATCH 02/10] linuxapp/eal_pci: " Santosh Shukla
2017-07-05  8:17   ` Maxime Coquelin
2017-07-05 10:05     ` santosh
2017-06-08 11:05 ` [PATCH 03/10] bus: " Santosh Shukla
2017-06-08 11:05 ` [PATCH 04/10] eal: add eal option to configure iova mode Santosh Shukla
2017-06-08 11:05 ` [PATCH 05/10] linuxapp/eal: detect " Santosh Shukla
2017-07-05 13:17   ` Hemant Agrawal
2017-07-05 13:49     ` santosh
2017-06-08 11:05 ` [PATCH 06/10] bsdapp/eal: detect iova mapping mode Santosh Shukla
2017-06-08 11:05 ` [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
2017-07-05  9:14   ` Maxime Coquelin
2017-07-05 15:43     ` Jerin Jacob
2017-07-06  7:58       ` Maxime Coquelin
2017-07-06  9:49         ` Jerin Jacob
2017-07-06 10:59           ` Maxime Coquelin
2017-07-06 11:12             ` Jerin Jacob
2017-07-06 11:19             ` santosh
2017-07-06 13:08               ` Maxime Coquelin
2017-07-06 13:11                 ` Maxime Coquelin
2017-07-06 14:13                   ` santosh
2017-07-06 14:39                     ` Maxime Coquelin
2017-06-08 11:05 ` [PATCH 08/10] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
2017-06-08 11:05 ` [PATCH 09/10] mempool: " Santosh Shukla
2017-06-08 11:05 ` [PATCH 10/10] eal/rte_malloc: " Santosh Shukla
2017-07-04  4:41 ` [PATCH 00/10] Infrastructure to detect iova mapping on the bus santosh
2017-07-04  7:19   ` Thomas Monjalon
2017-07-04  7:57     ` santosh
2017-07-04  9:03       ` Thomas Monjalon
2017-07-04  9:21         ` santosh
2017-07-04  9:42           ` Thomas Monjalon
2017-07-04 10:10 ` Thomas Monjalon
2017-07-04 11:20   ` santosh
2017-07-05  9:30 ` Maxime Coquelin
2017-07-05  9:47   ` santosh
2017-07-10 11:42 ` [PATCH v2 00/12] " Santosh Shukla
2017-07-10 11:42   ` [PATCH v2 01/12] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
2017-07-10 11:42   ` [PATCH v2 02/12] eal/pci: export match function Santosh Shukla
2017-07-10 11:42   ` [PATCH v2 03/12] bsdapp/eal_pci: get iommu class Santosh Shukla
2017-07-10 11:42   ` [PATCH v2 04/12] linuxapp/eal_pci: " Santosh Shukla
2017-07-10 11:42   ` [PATCH v2 05/12] bus: " Santosh Shukla
2017-07-10 11:42   ` [PATCH v2 06/12] eal: introduce iova mode helper api Santosh Shukla
2017-07-10 11:42   ` [PATCH v2 07/12] linuxapp/eal: auto detect iova mode Santosh Shukla
2017-07-10 11:42   ` [PATCH v2 08/12] bsdapp/eal: auto detect iova mapping mode Santosh Shukla
2017-07-10 11:42   ` [PATCH v2 09/12] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
2017-07-10 11:42   ` [PATCH v2 10/12] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
2017-07-10 11:42   ` [PATCH v2 11/12] mempool: " Santosh Shukla
2017-07-10 12:27     ` Olivier Matz
2017-07-10 13:30       ` santosh
2017-07-10 13:51         ` Thomas Monjalon
2017-07-10 13:56           ` santosh
2017-07-10 14:09             ` Thomas Monjalon
2017-07-10 14:22               ` santosh
2017-07-10 14:37                 ` Thomas Monjalon
2017-08-04  4:00                   ` santosh
2017-07-10 11:42   ` [PATCH v2 12/12] eal/rte_malloc: " Santosh Shukla
2017-07-11  6:16   ` [PATCH v3 00/11] Infrastructure to detect iova mapping on the bus Santosh Shukla
2017-07-11  6:16     ` [PATCH v3 01/11] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
2017-07-11  9:09       ` Maxime Coquelin
2017-07-11 10:35         ` santosh
2017-07-11 12:07           ` Maxime Coquelin
2017-07-11  6:16     ` [PATCH v3 02/11] eal/pci: export match function Santosh Shukla
2017-07-11  9:11       ` Maxime Coquelin
2017-07-11  9:12         ` Maxime Coquelin
2017-07-11  6:16     ` [PATCH v3 03/11] bsdapp/eal_pci: get iommu class Santosh Shukla
2017-07-11  9:15       ` Maxime Coquelin
2017-07-11 10:41         ` santosh
2017-07-11 12:09           ` Maxime Coquelin
2017-07-11  6:16     ` [PATCH v3 04/11] linuxapp/eal_pci: " Santosh Shukla
2017-07-11  9:23       ` Maxime Coquelin
2017-07-11 10:43         ` santosh
2017-07-12  8:20       ` Sergio Gonzalez Monroy
2017-07-13  8:23         ` santosh
2017-07-14  7:43           ` Sergio Gonzalez Monroy
2017-07-14  8:11             ` santosh
2017-07-14  7:39       ` Hemant Agrawal
2017-07-14  7:55         ` santosh
2017-07-14  8:06           ` Hemant Agrawal
2017-07-14  8:46             ` santosh
2017-07-14  9:13               ` santosh
2017-07-11  6:16     ` [PATCH v3 05/11] bus: " Santosh Shukla
2017-07-14  8:07       ` Hemant Agrawal
2017-07-14  8:30         ` santosh
2017-07-14  9:39           ` Hemant Agrawal
2017-07-14 10:22             ` santosh
2017-07-14 10:29               ` santosh
2017-07-14 10:51                 ` Hemant Agrawal
2017-07-14 11:03                   ` santosh
2017-07-14 11:15                     ` Hemant Agrawal
2017-07-11  6:16     ` [PATCH v3 06/11] eal: introduce iova mode helper api Santosh Shukla
2017-07-11  6:16     ` [PATCH v3 07/11] linuxapp/eal: auto detect iova mode Santosh Shukla
2017-07-13 11:29       ` Hemant Agrawal
2017-07-13 11:45         ` Hemant Agrawal
2017-07-13 18:25         ` santosh
2017-07-14  8:49           ` Hemant Agrawal
2017-07-14  9:21             ` santosh
2017-07-11  6:16     ` [PATCH v3 08/11] bsdapp/eal: auto detect iova mapping mode Santosh Shukla
2017-07-11  6:16     ` [PATCH v3 09/11] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
2017-07-11  6:16     ` [PATCH v3 10/11] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
2017-07-11  6:16     ` [PATCH v3 11/11] eal/rte_malloc: " Santosh Shukla
2017-07-18  5:59     ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
2017-07-18  5:59       ` [PATCH v4 01/12] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
2017-07-18  5:59       ` [PATCH v4 02/12] eal/pci: export match function Santosh Shukla
2017-07-18  5:59       ` [PATCH v4 03/12] eal/pci: get iommu class Santosh Shukla
2017-07-18  5:59       ` [PATCH v4 04/12] bsdapp/eal_pci: " Santosh Shukla
2017-07-18  5:59       ` [PATCH v4 05/12] linuxapp/eal_pci: " Santosh Shukla
2017-07-18 10:55         ` Hemant Agrawal
2017-07-18  5:59       ` [PATCH v4 06/12] bus: " Santosh Shukla
2017-07-18 11:05         ` Hemant Agrawal
2017-07-18 11:16           ` santosh
2017-07-18  5:59       ` [PATCH v4 07/12] eal: introduce iova mode helper api Santosh Shukla
2017-07-18  5:59       ` [PATCH v4 08/12] linuxapp/eal: auto detect iova mode Santosh Shukla
2017-07-18 11:34         ` Hemant Agrawal
2017-07-18 11:56           ` santosh
2017-07-18  5:59       ` [PATCH v4 09/12] bsdapp/eal: auto detect iova mapping mode Santosh Shukla
2017-07-18  5:59       ` [PATCH v4 10/12] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
2017-07-18  5:59       ` [PATCH v4 11/12] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
2017-07-18  5:59       ` [PATCH v4 12/12] eal/rte_malloc: " Santosh Shukla
2017-07-21  8:07       ` [PATCH v4 00/12] Infrastructure to detect iova mapping on the bus Maxime Coquelin
2017-07-24  8:39       ` [PATCH v5 " Santosh Shukla
2017-07-24  8:39         ` [PATCH v5 01/12] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
2017-07-24  8:39         ` [PATCH v5 02/12] eal/pci: export match function Santosh Shukla
2017-07-24  8:39         ` [PATCH v5 03/12] eal/pci: get iommu class Santosh Shukla
2017-07-24  8:39         ` [PATCH v5 04/12] bsdapp/eal_pci: " Santosh Shukla
2017-07-24  8:39         ` [PATCH v5 05/12] linuxapp/eal_pci: " Santosh Shukla
2017-07-24  8:39         ` [PATCH v5 06/12] bus: " Santosh Shukla
2017-07-24  8:39         ` [PATCH v5 07/12] eal: introduce iova mode helper api Santosh Shukla
2017-07-24  8:40         ` [PATCH v5 08/12] linuxapp/eal: auto detect iova mode Santosh Shukla
2017-07-24  8:40         ` [PATCH v5 09/12] bsdapp/eal: auto detect iova mapping mode Santosh Shukla
2017-07-24  8:40         ` [PATCH v5 10/12] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
2017-07-24  8:40         ` [PATCH v5 11/12] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
2017-07-24  8:40         ` [PATCH v5 12/12] eal/rte_malloc: " Santosh Shukla
2017-08-14 16:10         ` [PATCH v6 00/12] Infrastructure to detect iova mapping on the bus Santosh Shukla
2017-08-14 16:10           ` [PATCH v6 01/12] eal/pci: introduce PCI driver iova as va flag Santosh Shukla
2017-08-17 12:35             ` Aaron Conole
2017-08-14 16:10           ` [PATCH v6 02/12] eal/pci: export match function Santosh Shukla
2017-08-14 16:10           ` [PATCH v6 03/12] eal/pci: get iommu class Santosh Shukla
2017-08-17 12:38             ` Aaron Conole
2017-08-14 16:10           ` [PATCH v6 04/12] bsdapp/eal_pci: " Santosh Shukla
2017-08-14 16:10           ` [PATCH v6 05/12] linuxapp/eal_pci: " Santosh Shukla
2017-08-14 16:10           ` [PATCH v6 06/12] bus: " Santosh Shukla
2017-08-14 16:10           ` [PATCH v6 07/12] eal: introduce iova mode helper api Santosh Shukla
2017-08-14 16:10           ` [PATCH v6 08/12] linuxapp/eal: auto detect iova mode Santosh Shukla
2017-08-16 17:38             ` Aaron Conole
2017-08-17 14:43               ` santosh
2017-08-14 16:10           ` [PATCH v6 09/12] bsdapp/eal: auto detect iova mapping mode Santosh Shukla
2017-08-17 12:41             ` Aaron Conole
2017-08-14 16:10           ` [PATCH v6 10/12] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
2017-08-14 16:10           ` [PATCH v6 11/12] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
2017-08-14 16:10           ` [PATCH v6 12/12] eal/rte_malloc: " Santosh Shukla
2017-08-31  3:26           ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
2017-08-31  3:26             ` [PATCH v7 1/9] eal/pci: export match function Santosh Shukla
2017-09-04 14:49               ` Burakov, Anatoly
2017-09-06 15:39               ` Ferruh Yigit
2017-09-18 10:07                 ` santosh
2017-08-31  3:26             ` [PATCH v7 2/9] eal/pci: get iommu class Santosh Shukla
2017-09-04 14:53               ` Burakov, Anatoly
2017-09-04 15:13                 ` santosh
2017-09-04 15:16                   ` Burakov, Anatoly
2017-09-04 15:31                     ` santosh
2017-09-04 15:35                       ` Burakov, Anatoly
2017-09-04 15:30               ` Burakov, Anatoly
2017-08-31  3:26             ` [PATCH v7 3/9] linuxapp/eal_pci: " Santosh Shukla
2017-09-04 15:08               ` Burakov, Anatoly
2017-09-05  8:47                 ` santosh
2017-09-05  8:55                   ` Burakov, Anatoly
2017-09-05  8:59                     ` santosh
2017-09-05  9:01               ` Burakov, Anatoly
2017-08-31  3:26             ` [PATCH v7 4/9] bus: " Santosh Shukla
2017-09-04 15:25               ` Burakov, Anatoly
2017-08-31  3:26             ` [PATCH v7 5/9] eal: introduce iova mode helper api Santosh Shukla
2017-08-31  3:26             ` [PATCH v7 6/9] eal: auto detect iova mode Santosh Shukla
2017-09-04 15:32               ` Burakov, Anatoly
2017-08-31  3:26             ` [PATCH v7 7/9] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
2017-09-04 15:40               ` Burakov, Anatoly
2017-10-26 12:57               ` Jonas Pfefferle1
2017-11-02 10:17                 ` Thomas Monjalon
2017-11-02 10:26                   ` Jonas Pfefferle1
2017-11-03  9:56                     ` Jonas Pfefferle1
2017-11-03 10:28                       ` Thomas Monjalon
2017-11-03 10:44                         ` Jonas Pfefferle1
2017-11-03 10:54                           ` Thomas Monjalon
2017-11-03 11:28                             ` Jonas Pfefferle1
2017-08-31  3:26             ` [PATCH v7 8/9] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
2017-09-04 15:42               ` Burakov, Anatoly
2017-08-31  3:26             ` [PATCH v7 9/9] eal/rte_malloc: " Santosh Shukla
2017-09-04 15:44               ` Burakov, Anatoly
2017-09-05 12:28             ` [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus Hemant Agrawal
2017-09-05 12:30               ` Hemant Agrawal
2017-09-18 10:42             ` [PATCH v8 " Santosh Shukla
2017-09-18 10:42               ` [PATCH v8 1/9] eal/pci: export match function Santosh Shukla
2017-09-18 10:42               ` [PATCH v8 2/9] eal/pci: get iommu class Santosh Shukla
2017-09-19 16:37                 ` Burakov, Anatoly
2017-09-19 17:29                   ` santosh
2017-09-20  9:09                     ` Burakov, Anatoly
2017-09-20 10:24                       ` santosh
2017-09-18 10:42               ` [PATCH v8 3/9] linuxapp/eal_pci: " Santosh Shukla
2017-09-18 10:42               ` [PATCH v8 4/9] bus: " Santosh Shukla
2017-09-18 10:42               ` [PATCH v8 5/9] eal: introduce iova mode helper api Santosh Shukla
2017-09-18 10:42               ` [PATCH v8 6/9] eal: auto detect iova mode Santosh Shukla
2017-09-18 10:42               ` [PATCH v8 7/9] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
2017-09-18 10:42               ` [PATCH v8 8/9] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
2017-09-18 10:42               ` [PATCH v8 9/9] eal/rte_malloc: " Santosh Shukla
2017-09-20 11:23               ` [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus Santosh Shukla
2017-09-20 11:23                 ` [PATCH v9 1/9] eal/pci: export match function Santosh Shukla
2017-09-20 11:23                 ` [PATCH v9 2/9] eal/pci: get iommu class Santosh Shukla
2017-09-20 11:39                   ` Burakov, Anatoly
2017-10-05 23:58                   ` Thomas Monjalon
2017-10-06  3:04                     ` santosh
2017-10-06  7:24                       ` Thomas Monjalon
2017-10-06  9:13                         ` santosh
2017-09-20 11:23                 ` [PATCH v9 3/9] linuxapp/eal_pci: " Santosh Shukla
2017-10-06  0:17                   ` Thomas Monjalon
2017-10-06  3:22                     ` santosh
2017-10-06  7:56                       ` Thomas Monjalon
2017-09-20 11:23                 ` [PATCH v9 4/9] bus: " Santosh Shukla
2017-09-20 11:23                 ` [PATCH v9 5/9] eal: introduce helper API for iova mode Santosh Shukla
2017-09-20 11:23                 ` [PATCH v9 6/9] eal: auto detect " Santosh Shukla
2017-10-06  0:19                   ` Thomas Monjalon
2017-10-06  3:25                     ` santosh
2017-10-06  8:11                       ` Thomas Monjalon
2017-10-06  9:11                         ` santosh
2017-09-20 11:23                 ` [PATCH v9 7/9] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
2017-09-20 11:23                 ` [PATCH v9 8/9] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
2017-09-20 11:23                 ` [PATCH v9 9/9] eal/rte_malloc: " Santosh Shukla
2017-09-26  4:02                 ` [PATCH v9 0/9] Infrastructure to detect iova mapping on the bus santosh
2017-10-06 11:03                 ` [PATCH v10 " Santosh Shukla
2017-10-06 11:03                   ` [PATCH v10 1/9] eal/pci: export match function Santosh Shukla
2017-10-06 11:03                   ` [PATCH v10 2/9] eal/pci: get iommu class Santosh Shukla
2017-10-06 11:03                   ` [PATCH v10 3/9] linuxapp/eal_pci: " Santosh Shukla
2017-10-11  1:47                     ` Tan, Jianfeng
2017-10-11  4:43                       ` santosh
2017-10-11  5:31                         ` Tan, Jianfeng
2017-10-11  5:37                           ` santosh
2017-10-11  7:04                             ` Tan, Jianfeng
2017-10-11  7:10                               ` santosh
2017-10-11  8:31                                 ` Tan, Jianfeng
2017-10-11  8:51                                   ` santosh
2017-10-06 11:03                   ` [PATCH v10 4/9] bus: " Santosh Shukla
2017-10-06 11:03                   ` [PATCH v10 5/9] eal: introduce helper API for iova mode Santosh Shukla
2017-10-06 11:03                   ` [PATCH v10 6/9] eal: auto detect " Santosh Shukla
2017-10-13  8:48                     ` Maxime Coquelin
2017-10-13  9:58                       ` Thomas Monjalon
2017-10-06 11:03                   ` [PATCH v10 7/9] linuxapp/eal_vfio: honor iova mode before mapping Santosh Shukla
2017-10-06 11:03                   ` [PATCH v10 8/9] linuxapp/eal_memory: honor iova mode in virt2phy Santosh Shukla
2017-10-06 11:03                   ` [PATCH v10 9/9] eal/rte_malloc: " Santosh Shukla
2017-10-06 18:40                   ` [PATCH v10 0/9] Infrastructure to detect iova mapping on the bus Thomas Monjalon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.