DPDK-dev Archive on lore.kernel.org
 help / color / Atom feed
* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
@ 2019-07-12  9:17 Jerin Jacob Kollanukkaran
  2019-07-12  9:58 ` Burakov, Anatoly
  0 siblings, 1 reply; 16+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-07-12  9:17 UTC (permalink / raw)
  To: Ferruh Yigit, Vamsi Krishna Attunuru, dev
  Cc: olivier.matz, arybchenko, Burakov, Anatoly


> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Thursday, July 11, 2019 9:52 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Vamsi Krishna Attunuru
> <vattunuru@marvell.com>; dev@dpdk.org
> Cc: olivier.matz@6wind.com; arybchenko@solarflare.com; Burakov, Anatoly
> <anatoly.burakov@intel.com>
> Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> 
> External Email
> 
> ----------------------------------------------------------------------
> On 7/4/2019 10:48 AM, Jerin Jacob Kollanukkaran wrote:
> >> From: Vamsi Krishna Attunuru
> >> Sent: Thursday, July 4, 2019 12:13 PM
> >> To: dev@dpdk.org
> >> Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com;
> >> arybchenko@solarflare.com; Jerin Jacob Kollanukkaran
> >> <jerinj@marvell.com>; Burakov, Anatoly <anatoly.burakov@intel.com>
> >> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> >>
> >> Hi All,
> >>
> >> Just to summarize, below items have arisen from the initial review.
> >> 1) Can the new mempool flag be made default to all the pools and will
> there be case that new flag functionality would fail  for some page sizes.?
> >
> > If the minimum huge page size is 2MB and normal huge page size is
> > 512MB or 1G. So I think, new flags can be default as skipping the page
> boundaries for Mempool objects has nearly zero overhead. But I leave
> decision to maintainers.
> >
> >> 2) Adding HW device info(pci dev info) to KNI device structure, will it
> break KNI on virtual devices in VA or PA mode.?
> >
> > Iommu_domain will be created only for PCI devices and the system runs
> > in IOVA_VA mode. Virtual devices(IOVA_DC(don't care) or IOVA_PA
> > devices still it works without PCI device structure)
> >
> > It is  a useful feature where KNI can run without root privilege and
> > it is pending for long time. Request to review and close this
> 
> I support the idea to remove 'kni' forcing to the IOVA=PA mode, but also not
> sure about forcing all KNI users to update their code to allocate mempool in a
> very specific way.
> 
> What about giving more control to the user on this?
> 
> Any user want to use IOVA=VA and KNI together can update application to
> justify memory allocation of the KNI and give an explicit "kni iova_mode=1"
> config.

Where this config comes, eal or kni sample app or KNI public API?


> Who want to use existing KNI implementation can continue to use it with
> IOVA=PA mode which is current case, or for this case user may need to force
> the DPDK application to IOVA=PA but at least there is a workaround.
> 
> And kni sample application should have sample for both case, although this
> increases the testing and maintenance cost, I hope we can get support from
> you on the iova_mode=1 usecase.
> 
> What do you think?

IMO, If possible we can avoid extra indirection of new config. In worst case
We can add it. How about following to not have new config

1) Make MEMPOOL_F_NO_PAGE_BOUND  as default
http://patches.dpdk.org/patch/55277/
There is absolutely zero overhead of this flag considering the huge page size are minimum
2MB. Typically 512MB or 1GB.
Any one has any objection?

2) Introduce rte_kni_mempool_create() API in kni lib to abstract the 
Mempool requirement for KNI. This will enable portable KNI applications.

Thoughts?

> 
> 
> 
> >
> >>
> >> Can someone suggest if any changes required to address above issues.
> > ________________________________________
> > From: dev <mailto:dev-bounces@dpdk.org> on behalf of Vamsi Krishna
> > Attunuru <mailto:vattunuru@marvell.com>
> > Sent: Monday, July 1, 2019 7:21:22 PM
> > To: Jerin Jacob Kollanukkaran; Burakov, Anatoly; mailto:dev@dpdk.org
> > Cc: mailto:ferruh.yigit@intel.com; mailto:olivier.matz@6wind.com;
> > mailto:arybchenko@solarflare.com
> > Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in
> > KNI
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > ping..
> >
> > ________________________________
> > From: Jerin Jacob Kollanukkaran
> > Sent: Thursday, June 27, 2019 3:04:58 PM
> > To: Burakov, Anatoly; Vamsi Krishna Attunuru; mailto:dev@dpdk.org
> > Cc: mailto:ferruh.yigit@intel.com; mailto:olivier.matz@6wind.com;
> > mailto:arybchenko@solarflare.com
> > Subject: RE: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> >
> >> -----Original Message-----
> >> From: Burakov, Anatoly <mailto:anatoly.burakov@intel.com>
> >> Sent: Tuesday, June 25, 2019 7:09 PM
> >> To: Jerin Jacob Kollanukkaran <mailto:jerinj@marvell.com>; Vamsi
> >> Krishna Attunuru <mailto:vattunuru@marvell.com>;
> mailto:dev@dpdk.org
> >> Cc: mailto:ferruh.yigit@intel.com; mailto:olivier.matz@6wind.com;
> >> mailto:arybchenko@solarflare.com
> >> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> >>
> >> On 25-Jun-19 12:30 PM, Burakov, Anatoly wrote:
> >>> On 25-Jun-19 12:15 PM, Jerin Jacob Kollanukkaran wrote:
> >>>>> -----Original Message-----
> >>>>> From: dev <mailto:dev-bounces@dpdk.org> On Behalf Of Burakov,
> >>>>> Anatoly
> >>>>> Sent: Tuesday, June 25, 2019 3:30 PM
> >>>>> To: Vamsi Krishna Attunuru <mailto:vattunuru@marvell.com>;
> >>>>> mailto:dev@dpdk.org
> >>>>> Cc: mailto:ferruh.yigit@intel.com; mailto:olivier.matz@6wind.com;
> >>>>> mailto:arybchenko@solarflare.com
> >>>>> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in
> >>>>> KNI
> >>>>>
> >>>>> On 25-Jun-19 4:56 AM, mailto:vattunuru@marvell.com wrote:
> >>>>>> From: Vamsi Attunuru <mailto:vattunuru@marvell.com>
> >>>>>>
> >>>>>> ----
> >>>>>> V6 Changes:
> >>>>>> * Added new mempool flag to ensure mbuf memory is not scattered
> >>>>>> across page boundaries.
> >>>>>> * Added KNI kernel module required PCI device information.
> >>>>>> * Modified KNI example application to create mempool with new
> >>>>>> mempool flag.
> >>>>>>
> >>>>> Others can chime in, but my 2 cents: this reduces the usefulness
> >>>>> of KNI because it limits the kinds of mempools one can use them
> >>>>> with, and makes it so that the code that works with every other
> >>>>> PMD requires changes to work with KNI.
> >>>>
> >>>> # One option to make this flag as default only for packet
> >>>> mempool(not allow allocate on page boundary).
> >>>> In real world the overhead will be very minimal considering Huge
> >>>> page size is 1G or 512M # Enable this flag explicitly only IOVA =
> >>>> VA mode in library. Not need to expose to application # I don't
> >>>> think, there needs to be any PMD specific change to make KNI with
> >>>> IOVA = VA mode # No preference on flags to be passed by application
> vs in library.
> >>>> But IMO this change would be
> >>>> needed in mempool support KNI in IOVA = VA mode.
> >>>>
> >>>
> >>> I would be OK to just make it default behavior to not cross page
> >>> boundaries when allocating buffers. This would solve the problem for
> >>> KNI and for any other use case that would rely on PA-contiguous
> >>> buffers in face of IOVA as VA mode.
> >>>
> >>> We could also add a flag to explicitly allow page crossing without
> >>> also making mbufs IOVA-non-contiguous, but i'm not sure if there are
> >>> use cases that would benefit from this.
> >>
> >> On another thought, such a default would break 4K pages in case for
> >> packets bigger than page size (i.e. jumbo frames). Should we care?
> >
> > The hugepage size will not be 4K. Right?
> >
> > Olivier,
> >
> > As a maintainer any thoughts of exposing/not exposing the new mepool
> > flag to Skip the page boundaries?
> >
> > All,
> > Either option is fine, Asking for feedback to processed further?
> >


^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
@ 2019-07-12 11:37 Jerin Jacob Kollanukkaran
  2019-07-12 12:09 ` Burakov, Anatoly
  0 siblings, 1 reply; 16+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-07-12 11:37 UTC (permalink / raw)
  To: Burakov, Anatoly, Ferruh Yigit, Vamsi Krishna Attunuru, dev
  Cc: olivier.matz, arybchenko

> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Friday, July 12, 2019 4:19 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; Vamsi Krishna Attunuru
> <vattunuru@marvell.com>; dev@dpdk.org
> Cc: olivier.matz@6wind.com; arybchenko@solarflare.com
> Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> On 12-Jul-19 11:26 AM, Jerin Jacob Kollanukkaran wrote:
> >>>> What do you think?
> >>>
> >>> IMO, If possible we can avoid extra indirection of new config. In
> >>> worst case We can add it. How about following to not have new config
> >>>
> >>> 1) Make MEMPOOL_F_NO_PAGE_BOUND  as default
> >>> http://patches.dpdk.org/patch/55277/
> >>> There is absolutely zero overhead of this flag considering the huge
> >>> page size are minimum 2MB. Typically 512MB or 1GB.
> >>> Any one has any objection?
> >>
> >> Pretty much zero overhead in hugepage case, not so in non-hugepage
> case.
> >> It's rare, but since we support it, we have to account for it.
> >
> > That is a fair concern.
> > How about enable the flag in mempool ONLY when
> rte_eal_has_hugepages()
> > In the common layer?
> 
> Perhaps it's better to check page size of the underlying memory, because 4K
> pages are not necessarily no-huge mode - they could also be external
> memory. That's going to be a bit hard because there may not be a way to
> know which memory we're allocating from in advance, aside from simple
> checks like `(rte_eal_has_hugepages() ||
> rte_malloc_heap_socket_is_external(socket_id))` - but maybe those would
> be sufficient.

Yes.


> 
> >
> >> (also, i don't really like the name NO_PAGE_BOUND since in memzone
> >> API there's a "bounded memzone" allocation API, and this flag's name
> >> reads like objects would not be bounded by page size, not that they
> >> won't cross page
> >> boundary)
> >
> > No strong opinion for the name. What name you suggest?
> 
> How about something like MEMPOOL_F_NO_PAGE_SPLIT?

Looks good to me.

In summary, Change wrt existing patch"
- Change NO_PAGE_BOUND to MEMPOOL_F_NO_PAGE_SPLIT
- Set this flag in  rte_pktmbuf_pool_create() when rte_eal_has_hugepages() ||
 rte_malloc_heap_socket_is_external(socket_id))

Olivier, Any objection?
Ref: http://patches.dpdk.org/patch/55277/

> 
> >
> >>
> >>>
> >>> 2) Introduce rte_kni_mempool_create() API in kni lib to abstract the
> >>> Mempool requirement for KNI. This will enable portable KNI
> applications.
> >>
> >> This means that using KNI is not a drop-in replacement for any other
> >> PMD. If maintainers of KNI are OK with this then sure :)
> >
> > The PMD  don’t have any dependency on NO_PAGE_BOUND flag. Right?
> > If KNI app is using rte_kni_mempool_create() to create the mempool, In
> > what case do you see problem with specific PMD?
> 
> I'm not saying the PMD's have a dependency on the flag, i'm saying that the
> same code cannot be used with and without KNI because you need to call a
> separate API for mempool creation if you want to use it with KNI.

Yes. Need to call the introduced API from 19.08. If we not choose above(first) approach.
It can be documented in "API changes" in release notes. I prefer to have the first 
solution if there is no downside.


> For KNI, the underlying memory must abide by certain constraints that are
> not there for other PMD's, so either you fix all memory to these constraints,
> or you lose the ability to reuse the code with other PMD's as is.
> 
> That is, unless i'm grossly misunderstanding what you're suggesting here :)
> 
> >
> >>
> >> --
> >> Thanks,
> >> Anatoly
> 
> 
> --
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
@ 2019-07-12 10:26 Jerin Jacob Kollanukkaran
  2019-07-12 10:48 ` Burakov, Anatoly
  0 siblings, 1 reply; 16+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-07-12 10:26 UTC (permalink / raw)
  To: Burakov, Anatoly, Ferruh Yigit, Vamsi Krishna Attunuru, dev
  Cc: olivier.matz, arybchenko

> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Friday, July 12, 2019 3:28 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; Vamsi Krishna Attunuru
> <vattunuru@marvell.com>; dev@dpdk.org
> Cc: olivier.matz@6wind.com; arybchenko@solarflare.com
> Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> 
> External Email
> 
> ----------------------------------------------------------------------
> On 12-Jul-19 10:17 AM, Jerin Jacob Kollanukkaran wrote:
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@intel.com>
> >> Sent: Thursday, July 11, 2019 9:52 PM
> >> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Vamsi Krishna
> >> Attunuru <vattunuru@marvell.com>; dev@dpdk.org
> >> Cc: olivier.matz@6wind.com; arybchenko@solarflare.com; Burakov,
> >> Anatoly <anatoly.burakov@intel.com>
> >> Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in
> >> KNI
> >>
> >> External Email
> >>
> >> ---------------------------------------------------------------------
> >> - On 7/4/2019 10:48 AM, Jerin Jacob Kollanukkaran wrote:
> >>>> From: Vamsi Krishna Attunuru
> >>>> Sent: Thursday, July 4, 2019 12:13 PM
> >>>> To: dev@dpdk.org
> >>>> Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com;
> >>>> arybchenko@solarflare.com; Jerin Jacob Kollanukkaran
> >>>> <jerinj@marvell.com>; Burakov, Anatoly <anatoly.burakov@intel.com>
> >>>> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> >>>>
> >>>> Hi All,
> >>>>
> >>>> Just to summarize, below items have arisen from the initial review.
> >>>> 1) Can the new mempool flag be made default to all the pools and
> >>>> will
> >> there be case that new flag functionality would fail  for some page sizes.?
> >>>
> >>> If the minimum huge page size is 2MB and normal huge page size is
> >>> 512MB or 1G. So I think, new flags can be default as skipping the
> >>> page
> >> boundaries for Mempool objects has nearly zero overhead. But I leave
> >> decision to maintainers.
> >>>
> >>>> 2) Adding HW device info(pci dev info) to KNI device structure,
> >>>> will it
> >> break KNI on virtual devices in VA or PA mode.?
> >>>
> >>> Iommu_domain will be created only for PCI devices and the system
> >>> runs in IOVA_VA mode. Virtual devices(IOVA_DC(don't care) or
> IOVA_PA
> >>> devices still it works without PCI device structure)
> >>>
> >>> It is  a useful feature where KNI can run without root privilege and
> >>> it is pending for long time. Request to review and close this
> >>
> >> I support the idea to remove 'kni' forcing to the IOVA=PA mode, but
> >> also not sure about forcing all KNI users to update their code to
> >> allocate mempool in a very specific way.
> >>
> >> What about giving more control to the user on this?
> >>
> >> Any user want to use IOVA=VA and KNI together can update application
> >> to justify memory allocation of the KNI and give an explicit "kni
> iova_mode=1"
> >> config.
> >
> > Where this config comes, eal or kni sample app or KNI public API?
> >
> >
> >> Who want to use existing KNI implementation can continue to use it
> >> with IOVA=PA mode which is current case, or for this case user may
> >> need to force the DPDK application to IOVA=PA but at least there is a
> workaround.
> >>
> >> And kni sample application should have sample for both case, although
> >> this increases the testing and maintenance cost, I hope we can get
> >> support from you on the iova_mode=1 usecase.
> >>
> >> What do you think?
> >
> > IMO, If possible we can avoid extra indirection of new config. In
> > worst case We can add it. How about following to not have new config
> >
> > 1) Make MEMPOOL_F_NO_PAGE_BOUND  as default
> > http://patches.dpdk.org/patch/55277/
> > There is absolutely zero overhead of this flag considering the huge
> > page size are minimum 2MB. Typically 512MB or 1GB.
> > Any one has any objection?
> 
> Pretty much zero overhead in hugepage case, not so in non-hugepage case.
> It's rare, but since we support it, we have to account for it.

That is a fair concern. 
How about enable the flag in mempool ONLY when rte_eal_has_hugepages()
In the common layer?

> (also, i don't really like the name NO_PAGE_BOUND since in memzone API
> there's a "bounded memzone" allocation API, and this flag's name reads like
> objects would not be bounded by page size, not that they won't cross page
> boundary)

No strong opinion for the name. What name you suggest?

> 
> >
> > 2) Introduce rte_kni_mempool_create() API in kni lib to abstract the
> > Mempool requirement for KNI. This will enable portable KNI applications.
> 
> This means that using KNI is not a drop-in replacement for any other
> PMD. If maintainers of KNI are OK with this then sure :)

The PMD  don’t have any dependency on NO_PAGE_BOUND flag. Right?
If KNI app is using rte_kni_mempool_create() to create the mempool,
In what case do you see problem with specific PMD?

> 
> --
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 16+ messages in thread
* [dpdk-dev] [PATCH v5] kni: add IOVA va support for kni
@ 2019-04-22  6:15 kirankumark
  2019-06-25  3:56 ` [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI vattunuru
  0 siblings, 1 reply; 16+ messages in thread
From: kirankumark @ 2019-04-22  6:15 UTC (permalink / raw)
  To: ferruh.yigit; +Cc: dev, Kiran Kumar K

From: Kiran Kumar K <kirankumark@marvell.com>

With current KNI implementation kernel module will work only in
IOVA=PA mode. This patch will add support for kernel module to work
with IOVA=VA mode.

The idea is to get the physical address from iova address using
api iommu_iova_to_phys. Using this API, we will get the physical
address from iova address and later use phys_to_virt API to
convert the physical address to kernel virtual address.

With this approach we have compared the performance with IOVA=PA
and there is no difference observed. Seems like kernel is the
overhead.

This approach will not work with the kernel versions less than 4.4.0
because of API compatibility issues.

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
---
V5 changes:
* Fixed build issue with 32b build

V4 changes:
* Fixed build issues with older kernel versions
* This approach will only work with kernel above 4.4.0

V3 Changes:
* Add new approach to work kni with IOVA=VA mode using
iommu_iova_to_phys API.

 kernel/linux/kni/kni_dev.h                    |  4 +
 kernel/linux/kni/kni_misc.c                   | 63 ++++++++++++---
 kernel/linux/kni/kni_net.c                    | 76 +++++++++++++++----
 lib/librte_eal/linux/eal/eal.c                |  9 ---
 .../linux/eal/include/rte_kni_common.h        |  1 +
 lib/librte_kni/rte_kni.c                      |  2 +
 6 files changed, 122 insertions(+), 33 deletions(-)

diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h
index df46aa70e..9c4944921 100644
--- a/kernel/linux/kni/kni_dev.h
+++ b/kernel/linux/kni/kni_dev.h
@@ -23,6 +23,7 @@
 #include <linux/netdevice.h>
 #include <linux/spinlock.h>
 #include <linux/list.h>
+#include <linux/iommu.h>

 #include <rte_kni_common.h>
 #define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
@@ -39,6 +40,9 @@ struct kni_dev {
 	/* kni list */
 	struct list_head list;

+	uint8_t iova_mode;
+	struct iommu_domain *domain;
+
 	struct net_device_stats stats;
 	int status;
 	uint16_t group_id;           /* Group ID of a group of KNI devices */
diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c
index 31845e10f..9e90af31b 100644
--- a/kernel/linux/kni/kni_misc.c
+++ b/kernel/linux/kni/kni_misc.c
@@ -306,10 +306,12 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
 	struct rte_kni_device_info dev_info;
 	struct net_device *net_dev = NULL;
 	struct kni_dev *kni, *dev, *n;
+	struct pci_dev *pci = NULL;
+	struct iommu_domain *domain = NULL;
+	phys_addr_t phys_addr;
 #ifdef RTE_KNI_KMOD_ETHTOOL
 	struct pci_dev *found_pci = NULL;
 	struct net_device *lad_dev = NULL;
-	struct pci_dev *pci = NULL;
 #endif

 	pr_info("Creating kni...\n");
@@ -368,15 +370,56 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
 	strncpy(kni->name, dev_info.name, RTE_KNI_NAMESIZE);

 	/* Translate user space info into kernel space info */
-	kni->tx_q = phys_to_virt(dev_info.tx_phys);
-	kni->rx_q = phys_to_virt(dev_info.rx_phys);
-	kni->alloc_q = phys_to_virt(dev_info.alloc_phys);
-	kni->free_q = phys_to_virt(dev_info.free_phys);
-
-	kni->req_q = phys_to_virt(dev_info.req_phys);
-	kni->resp_q = phys_to_virt(dev_info.resp_phys);
-	kni->sync_va = dev_info.sync_va;
-	kni->sync_kva = phys_to_virt(dev_info.sync_phys);
+
+	if (dev_info.iova_mode) {
+#if KERNEL_VERSION(4, 4, 0) > LINUX_VERSION_CODE
+		(void)pci;
+		pr_err("Kernel version is not supported\n");
+		return -EINVAL;
+#else
+		pci = pci_get_device(dev_info.vendor_id,
+				     dev_info.device_id, NULL);
+		while (pci) {
+			if ((pci->bus->number == dev_info.bus) &&
+			    (PCI_SLOT(pci->devfn) == dev_info.devid) &&
+			    (PCI_FUNC(pci->devfn) == dev_info.function)) {
+				domain = iommu_get_domain_for_dev(&pci->dev);
+				break;
+			}
+			pci = pci_get_device(dev_info.vendor_id,
+					     dev_info.device_id, pci);
+		}
+#endif
+		kni->domain = domain;
+		phys_addr = iommu_iova_to_phys(domain, dev_info.tx_phys);
+		kni->tx_q = phys_to_virt(phys_addr);
+		phys_addr = iommu_iova_to_phys(domain, dev_info.rx_phys);
+		kni->rx_q = phys_to_virt(phys_addr);
+		phys_addr = iommu_iova_to_phys(domain, dev_info.alloc_phys);
+		kni->alloc_q = phys_to_virt(phys_addr);
+		phys_addr = iommu_iova_to_phys(domain, dev_info.free_phys);
+		kni->free_q = phys_to_virt(phys_addr);
+		phys_addr = iommu_iova_to_phys(domain, dev_info.req_phys);
+		kni->req_q = phys_to_virt(phys_addr);
+		phys_addr = iommu_iova_to_phys(domain, dev_info.resp_phys);
+		kni->resp_q = phys_to_virt(phys_addr);
+		kni->sync_va = dev_info.sync_va;
+		phys_addr = iommu_iova_to_phys(domain, dev_info.sync_phys);
+		kni->sync_kva = phys_to_virt(phys_addr);
+		kni->iova_mode = 1;
+
+	} else {
+		kni->tx_q = phys_to_virt(dev_info.tx_phys);
+		kni->rx_q = phys_to_virt(dev_info.rx_phys);
+		kni->alloc_q = phys_to_virt(dev_info.alloc_phys);
+		kni->free_q = phys_to_virt(dev_info.free_phys);
+
+		kni->req_q = phys_to_virt(dev_info.req_phys);
+		kni->resp_q = phys_to_virt(dev_info.resp_phys);
+		kni->sync_va = dev_info.sync_va;
+		kni->sync_kva = phys_to_virt(dev_info.sync_phys);
+		kni->iova_mode = 0;
+	}

 	kni->mbuf_size = dev_info.mbuf_size;

diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
index be9e6b0b9..e77a28066 100644
--- a/kernel/linux/kni/kni_net.c
+++ b/kernel/linux/kni/kni_net.c
@@ -35,6 +35,22 @@ static void kni_net_rx_normal(struct kni_dev *kni);
 /* kni rx function pointer, with default to normal rx */
 static kni_net_rx_t kni_net_rx_func = kni_net_rx_normal;

+/* iova to kernel virtual address */
+static void *
+iova2kva(struct kni_dev *kni, void *pa)
+{
+	return phys_to_virt(iommu_iova_to_phys(kni->domain,
+				(uintptr_t)pa));
+}
+
+static void *
+iova2data_kva(struct kni_dev *kni, struct rte_kni_mbuf *m)
+{
+	return phys_to_virt((iommu_iova_to_phys(kni->domain,
+					(uintptr_t)m->buf_physaddr) +
+			     m->data_off));
+}
+
 /* physical address to kernel virtual address */
 static void *
 pa2kva(void *pa)
@@ -186,7 +202,10 @@ kni_fifo_trans_pa2va(struct kni_dev *kni,
 			return;

 		for (i = 0; i < num_rx; i++) {
-			kva = pa2kva(kni->pa[i]);
+			if (likely(kni->iova_mode == 1))
+				kva = iova2kva(kni, kni->pa[i]);
+			else
+				kva = pa2kva(kni->pa[i]);
 			kni->va[i] = pa2va(kni->pa[i], kva);
 		}

@@ -263,8 +282,13 @@ kni_net_tx(struct sk_buff *skb, struct net_device *dev)
 	if (likely(ret == 1)) {
 		void *data_kva;

-		pkt_kva = pa2kva(pkt_pa);
-		data_kva = kva2data_kva(pkt_kva);
+		if (likely(kni->iova_mode == 1)) {
+			pkt_kva = iova2kva(kni, pkt_pa);
+			data_kva = iova2data_kva(kni, pkt_kva);
+		} else {
+			pkt_kva = pa2kva(pkt_pa);
+			data_kva = kva2data_kva(pkt_kva);
+		}
 		pkt_va = pa2va(pkt_pa, pkt_kva);

 		len = skb->len;
@@ -335,9 +359,14 @@ kni_net_rx_normal(struct kni_dev *kni)

 	/* Transfer received packets to netif */
 	for (i = 0; i < num_rx; i++) {
-		kva = pa2kva(kni->pa[i]);
+		if (likely(kni->iova_mode == 1)) {
+			kva = iova2kva(kni, kni->pa[i]);
+			data_kva = iova2data_kva(kni, kva);
+		} else {
+			kva = pa2kva(kni->pa[i]);
+			data_kva = kva2data_kva(kva);
+		}
 		len = kva->pkt_len;
-		data_kva = kva2data_kva(kva);
 		kni->va[i] = pa2va(kni->pa[i], kva);

 		skb = dev_alloc_skb(len + 2);
@@ -434,13 +463,20 @@ kni_net_rx_lo_fifo(struct kni_dev *kni)
 		num = ret;
 		/* Copy mbufs */
 		for (i = 0; i < num; i++) {
-			kva = pa2kva(kni->pa[i]);
+
+			if (likely(kni->iova_mode == 1)) {
+				kva = iova2kva(kni, kni->pa[i]);
+				data_kva = iova2data_kva(kni, kva);
+				alloc_kva = iova2kva(kni, kni->alloc_pa[i]);
+				alloc_data_kva = iova2data_kva(kni, alloc_kva);
+			} else {
+				kva = pa2kva(kni->pa[i]);
+				data_kva = kva2data_kva(kva);
+				alloc_kva = pa2kva(kni->alloc_pa[i]);
+				alloc_data_kva = kva2data_kva(alloc_kva);
+			}
 			len = kva->pkt_len;
-			data_kva = kva2data_kva(kva);
 			kni->va[i] = pa2va(kni->pa[i], kva);
-
-			alloc_kva = pa2kva(kni->alloc_pa[i]);
-			alloc_data_kva = kva2data_kva(alloc_kva);
 			kni->alloc_va[i] = pa2va(kni->alloc_pa[i], alloc_kva);

 			memcpy(alloc_data_kva, data_kva, len);
@@ -507,9 +543,15 @@ kni_net_rx_lo_fifo_skb(struct kni_dev *kni)

 	/* Copy mbufs to sk buffer and then call tx interface */
 	for (i = 0; i < num; i++) {
-		kva = pa2kva(kni->pa[i]);
+
+		if (likely(kni->iova_mode == 1)) {
+			kva = iova2kva(kni, kni->pa[i]);
+			data_kva = iova2data_kva(kni, kva);
+		} else {
+			kva = pa2kva(kni->pa[i]);
+			data_kva = kva2data_kva(kva);
+		}
 		len = kva->pkt_len;
-		data_kva = kva2data_kva(kva);
 		kni->va[i] = pa2va(kni->pa[i], kva);

 		skb = dev_alloc_skb(len + 2);
@@ -545,8 +587,14 @@ kni_net_rx_lo_fifo_skb(struct kni_dev *kni)
 				if (!kva->next)
 					break;

-				kva = pa2kva(va2pa(kva->next, kva));
-				data_kva = kva2data_kva(kva);
+				if (likely(kni->iova_mode == 1)) {
+					kva = iova2kva(kni,
+						       va2pa(kva->next, kva));
+					data_kva = iova2data_kva(kni, kva);
+				} else {
+					kva = pa2kva(va2pa(kva->next, kva));
+					data_kva = kva2data_kva(kva);
+				}
 			}
 		}

diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
index f7ae62d7b..8fac6707d 100644
--- a/lib/librte_eal/linux/eal/eal.c
+++ b/lib/librte_eal/linux/eal/eal.c
@@ -1040,15 +1040,6 @@ rte_eal_init(int argc, char **argv)
 		/* autodetect the IOVA mapping mode (default is RTE_IOVA_PA) */
 		rte_eal_get_configuration()->iova_mode =
 			rte_bus_get_iommu_class();
-
-		/* Workaround for KNI which requires physical address to work */
-		if (rte_eal_get_configuration()->iova_mode == RTE_IOVA_VA &&
-				rte_eal_check_module("rte_kni") == 1) {
-			rte_eal_get_configuration()->iova_mode = RTE_IOVA_PA;
-			RTE_LOG(WARNING, EAL,
-				"Some devices want IOVA as VA but PA will be used because.. "
-				"KNI module inserted\n");
-		}
 	} else {
 		rte_eal_get_configuration()->iova_mode =
 			internal_config.iova_mode;
diff --git a/lib/librte_eal/linux/eal/include/rte_kni_common.h b/lib/librte_eal/linux/eal/include/rte_kni_common.h
index 5afa08713..79ee4bc5a 100644
--- a/lib/librte_eal/linux/eal/include/rte_kni_common.h
+++ b/lib/librte_eal/linux/eal/include/rte_kni_common.h
@@ -128,6 +128,7 @@ struct rte_kni_device_info {
 	unsigned mbuf_size;
 	unsigned int mtu;
 	char mac_addr[6];
+	uint8_t iova_mode;
 };

 #define KNI_DEVICE "kni"
diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
index 946459c79..ec8f23694 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -304,6 +304,8 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool,
 	kni->group_id = conf->group_id;
 	kni->mbuf_size = conf->mbuf_size;

+	dev_info.iova_mode = (rte_eal_iova_mode() == RTE_IOVA_VA) ? 1 : 0;
+
 	ret = ioctl(kni_fd, RTE_KNI_IOCTL_CREATE, &dev_info);
 	if (ret < 0)
 		goto ioctl_fail;
--
2.17.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, back to index

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-12  9:17 [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI Jerin Jacob Kollanukkaran
2019-07-12  9:58 ` Burakov, Anatoly
  -- strict thread matches above, loose matches on Subject: below --
2019-07-12 11:37 Jerin Jacob Kollanukkaran
2019-07-12 12:09 ` Burakov, Anatoly
2019-07-12 10:26 Jerin Jacob Kollanukkaran
2019-07-12 10:48 ` Burakov, Anatoly
2019-04-22  6:15 [dpdk-dev] [PATCH v5] kni: add IOVA va support for kni kirankumark
2019-06-25  3:56 ` [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI vattunuru
2019-06-25 10:00   ` Burakov, Anatoly
2019-06-25 11:15     ` Jerin Jacob Kollanukkaran
2019-06-25 11:30       ` Burakov, Anatoly
2019-06-25 13:38         ` Burakov, Anatoly
2019-06-27  9:34           ` Jerin Jacob Kollanukkaran
2019-07-01 13:51             ` Vamsi Krishna Attunuru
2019-07-04  6:42               ` Vamsi Krishna Attunuru
2019-07-04  9:48                 ` Jerin Jacob Kollanukkaran
2019-07-11 16:21                   ` Ferruh Yigit

DPDK-dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/dpdk-dev/0 dpdk-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dpdk-dev dpdk-dev/ https://lore.kernel.org/dpdk-dev \
		dev@dpdk.org
	public-inbox-index dpdk-dev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git