DPDK-dev Archive on lore.kernel.org
 help / color / Atom feed
From: Ferruh Yigit <ferruh.yigit@intel.com>
To: Kiran Kumar Kokkilagadda <kirankumark@marvell.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	Jerin Jacob <jerin.jacob@caviumnetworks.com>
Subject: Re: [PATCH v2] kni: add IOVA va support for kni
Date: Wed, 3 Apr 2019 17:29:25 +0100
Message-ID: <ff476309-7384-314b-ccf5-92d52a209eeb@intel.com> (raw)
In-Reply-To: <20190401095118.4176-1-kirankumark@marvell.com>

On 4/1/2019 10:51 AM, Kiran Kumar Kokkilagadda wrote:
> From: Kiran Kumar K <kirankumark@marvell.com>
> 
> With current KNI implementation kernel module will work only in
> IOVA=PA mode. This patch will add support for kernel module to work
> with IOVA=VA mode.

Thanks Kiran for removing the limitation, I have a few questions, can you please
help me understand.

And when this patch is ready, the restriction in 'linux/eal/eal.c', in
'rte_eal_init' should be removed, perhaps with this patch. I assume you already
doing it to be able to test this patch.

> 
> The idea is to maintain a mapping in KNI module between user pages and
> kernel pages and in fast path perform a lookup in this table and get
> the kernel virtual address for corresponding user virtual address.
> 
> In IOVA=VA mode, the memory allocated to the pool is physically
> and virtually contiguous. We will take advantage of this and create a
> mapping in the kernel.In kernel we need mapping for queues
> (tx_q, rx_q,... slow path) and mbuf memory (fast path).

Is it?
As far as I know mempool can have multiple chunks and they can be both virtually
and physically separated.

And even for a single chunk, that will be virtually continuous, but will it be
physically continuous?

> 
> At the KNI init time, in slow path we will create a mapping for the
> queues and mbuf using get_user_pages similar to af_xdp. Using pool
> memory base address, we will create a page map table for the mbuf,
> which we will use in the fast path for kernel page translation.
> 
> At KNI init time, we will pass the base address of the pool and size of
> the pool to kernel. In kernel, using get_user_pages API, we will get
> the pages with size PAGE_SIZE and store the mapping and start address
> of user space in a table.
> 
> In fast path for any user address perform PAGE_SHIFT
> (user_addr >> PAGE_SHIFT) and subtract the start address from this value,
> we will get the index of the kernel page with in the page map table.
> Adding offset to this kernel page address, we will get the kernel address
> for this user virtual address.
> 
> For example user pool base address is X, and size is S that we passed to
> kernel. In kernel we will create a mapping for this using get_user_pages.
> Our page map table will look like [Y, Y+PAGE_SIZE, Y+(PAGE_SIZE*2) ....]
> and user start page will be U (we will get it from X >> PAGE_SHIFT).
> 
> For any user address Z we will get the index of the page map table using
> ((Z >> PAGE_SHIFT) - U). Adding offset (Z & (PAGE_SIZE - 1)) to this
> address will give kernel virtual address.
> 
> Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>

<...>

> +int
> +kni_pin_pages(void *address, size_t size, struct page_info *mem)
> +{
> +	unsigned int gup_flags = FOLL_WRITE;
> +	long npgs;
> +	int err;
> +
> +	/* Get at least one page */
> +	if (size < PAGE_SIZE)
> +		size = PAGE_SIZE;
> +
> +	/* Compute number of user pages based on page size */
> +	mem->npgs = (size + PAGE_SIZE - 1) / PAGE_SIZE;
> +
> +	/* Allocate memory for the pages */
> +	mem->pgs = kcalloc(mem->npgs, sizeof(*mem->pgs),
> +		      GFP_KERNEL | __GFP_NOWARN);
> +	if (!mem->pgs) {
> +		pr_err("%s: -ENOMEM\n", __func__);
> +		return -ENOMEM;
> +	}
> +
> +	down_write(&current->mm->mmap_sem);
> +
> +	/* Get the user pages from the user address*/
> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(4,9,0)
> +	npgs = get_user_pages((u64)address, mem->npgs,
> +				gup_flags, &mem->pgs[0], NULL);
> +#else
> +	npgs = get_user_pages(current, current->mm, (u64)address, mem->npgs,
> +				gup_flags, 0, &mem->pgs[0], NULL);
> +#endif
> +	up_write(&current->mm->mmap_sem);

This should work even memory is physically not continuous, right? Where exactly
physically continuous requirement is coming from?

<...>

> +
> +/* Get the kernel address from the user address using
> + * page map table. Will be used only in IOVA=VA mode
> + */
> +static inline void*
> +get_kva(uint64_t usr_addr, struct kni_dev *kni)
> +{
> +	uint32_t index;
> +	/* User page - start user page will give the index
> +	 * with in the page map table
> +	 */
> +	index = (usr_addr >> PAGE_SHIFT) - kni->va_info.start_page;
> +
> +	/* Add the offset to the page address */
> +	return (kni->va_info.page_map[index].addr +
> +		(usr_addr & kni->va_info.page_mask));
> +
> +}
> +
>  /* physical address to kernel virtual address */
>  static void *
>  pa2kva(void *pa)
> @@ -186,7 +205,10 @@ kni_fifo_trans_pa2va(struct kni_dev *kni,
>  			return;
> 
>  		for (i = 0; i < num_rx; i++) {
> -			kva = pa2kva(kni->pa[i]);
> +			if (likely(kni->iova_mode == 1))
> +				kva = get_kva((u64)(kni->pa[i]), kni);

kni->pa[] now has iova addresses, for 'get_kva()' to work shouldn't
'va_info.start_page' calculated from 'mempool_memhdr->iova' instead of
'mempool_memhdr->addr'

If this is working I must be missing something but not able to find what it is.

<...>

> @@ -304,6 +304,27 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool,
>  	kni->group_id = conf->group_id;
>  	kni->mbuf_size = conf->mbuf_size;
> 
> +	dev_info.iova_mode = (rte_eal_iova_mode() == RTE_IOVA_VA) ? 1 : 0;
> +	if (dev_info.iova_mode) {
> +		struct rte_mempool_memhdr *hdr;
> +		uint64_t pool_size = 0;
> +
> +		/* In each pool header chunk, we will maintain the
> +		 * base address of the pool. This chunk is physically and
> +		 * virtually contiguous.
> +		 * This approach will work, only if the allocated pool
> +		 * memory is contiguous, else it won't work
> +		 */
> +		hdr = STAILQ_FIRST(&pktmbuf_pool->mem_list);
> +		dev_info.mbuf_va = (void *)(hdr->addr);
> +
> +		/* Traverse the list and get the total size of the pool */
> +		STAILQ_FOREACH(hdr, &pktmbuf_pool->mem_list, next) {
> +			pool_size += hdr->len;
> +		}

This code is aware that there may be multiple chunks, but assumes they are all
continuous, I don't know if this assumption is correct.

Also I guess there is another assumption that there will be single pktmbuf_pool
in the application which passed into kni?
What if there are multiple pktmbuf_pool, like one for each PMD, will this work?
Now some mbufs in kni Rx fifo will come from different pktmbuf_pool which we
don't know their pages, so won't able to get their kernel virtual address.

  reply index

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-27 10:49 [PATCH] " Kiran Kumar
2018-09-27 10:58 ` Burakov, Anatoly
2018-10-02 17:05 ` Ferruh Yigit
2019-04-01 17:30   ` Jerin Jacob Kollanukkaran
2019-04-01 18:20     ` Ferruh Yigit
2019-04-01  9:51 ` [PATCH v2] " Kiran Kumar Kokkilagadda
2019-04-03 16:29   ` Ferruh Yigit [this message]
2019-04-04  5:03     ` [dpdk-dev] [EXT] " Kiran Kumar Kokkilagadda
2019-04-04 11:20       ` Ferruh Yigit
2019-04-04 13:29         ` Burakov, Anatoly
2019-04-04  9:57     ` Burakov, Anatoly
2019-04-04 11:21       ` Ferruh Yigit
2019-04-16  4:55   ` [dpdk-dev] [PATCH v3] " kirankumark
2019-04-19 10:38     ` Thomas Monjalon
2019-04-22  4:39     ` [dpdk-dev] [PATCH v4] " kirankumark
2019-04-22  6:15       ` [dpdk-dev] [PATCH v5] " kirankumark
2019-04-26  9:11         ` Burakov, Anatoly
2019-06-25  3:56         ` [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI vattunuru
2019-06-25  3:56           ` [dpdk-dev] [PATCH v6 1/4] lib/mempool: skip populating mempool objs that falls on page boundaries vattunuru
2019-06-25  3:56           ` [dpdk-dev] [PATCH v6 2/4] lib/kni: add PCI related information vattunuru
2019-06-25 17:41             ` Stephen Hemminger
2019-06-26  3:48               ` [dpdk-dev] [EXT] " Vamsi Krishna Attunuru
2019-06-26 14:58                 ` Stephen Hemminger
2019-06-27  9:43                   ` Vamsi Krishna Attunuru
2019-07-11 16:22             ` [dpdk-dev] " Ferruh Yigit
2019-07-12 11:02               ` [dpdk-dev] [EXT] " Vamsi Krishna Attunuru
2019-07-12 11:11                 ` Ferruh Yigit
2019-06-25  3:56           ` [dpdk-dev] [PATCH v6 3/4] example/kni: add IOVA support for kni application vattunuru
2019-07-11 16:23             ` Ferruh Yigit
2019-06-25  3:57           ` [dpdk-dev] [PATCH v6 4/4] kernel/linux/kni: add IOVA support in kni module vattunuru
2019-07-11 16:30             ` Ferruh Yigit
2019-07-12 10:38               ` [dpdk-dev] [EXT] " Vamsi Krishna Attunuru
2019-07-12 11:10                 ` Ferruh Yigit
2019-07-12 12:27                   ` Vamsi Krishna Attunuru
2019-07-12 16:29                   ` Vamsi Krishna Attunuru
2019-07-15 11:26                     ` Ferruh Yigit
2019-07-15 13:06                       ` Vamsi Krishna Attunuru
2019-07-11 16:43             ` [dpdk-dev] " Stephen Hemminger
2019-06-25 10:00           ` [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI Burakov, Anatoly
2019-06-25 11:15             ` Jerin Jacob Kollanukkaran
2019-06-25 11:30               ` Burakov, Anatoly
2019-06-25 13:38                 ` Burakov, Anatoly
2019-06-27  9:34                   ` Jerin Jacob Kollanukkaran
2019-07-01 13:51                     ` Vamsi Krishna Attunuru
2019-07-04  6:42                       ` Vamsi Krishna Attunuru
2019-07-04  9:48                         ` Jerin Jacob Kollanukkaran
2019-07-11 16:21                           ` Ferruh Yigit
2019-07-17  9:04           ` [dpdk-dev] [PATCH v7 0/4] kni: add IOVA=VA support vattunuru
2019-07-17  9:04             ` [dpdk-dev] [PATCH v7 1/4] mempool: modify mempool populate() to skip objects from page boundaries vattunuru
2019-07-17 13:36               ` Andrew Rybchenko
2019-07-17 13:47                 ` Olivier Matz
2019-07-17 17:31                 ` Vamsi Krishna Attunuru
2019-07-18  9:28                   ` Andrew Rybchenko
2019-07-18 14:16                     ` Vamsi Krishna Attunuru
2019-07-19 13:38                       ` [dpdk-dev] [RFC 0/4] mempool: avoid objects allocations across pages Olivier Matz
2019-07-19 13:38                         ` [dpdk-dev] [RFC 1/4] mempool: clarify default populate function Olivier Matz
2019-07-19 15:42                           ` Andrew Rybchenko
2019-07-19 13:38                         ` [dpdk-dev] [RFC 2/4] mempool: unalign size when calculating required mem amount Olivier Matz
2019-08-07 15:21                           ` [dpdk-dev] ***Spam*** " Andrew Rybchenko
2019-07-19 13:38                         ` [dpdk-dev] [RFC 3/4] mempool: introduce function to get mempool page size Olivier Matz
2019-08-07 15:21                           ` Andrew Rybchenko
2019-07-19 13:38                         ` [dpdk-dev] [RFC 4/4] mempool: prevent objects from being across pages Olivier Matz
2019-07-19 14:03                           ` Burakov, Anatoly
2019-07-19 14:11                           ` Burakov, Anatoly
2019-08-07 15:21                           ` Andrew Rybchenko
2019-07-23  5:37                         ` [dpdk-dev] [RFC 0/4] mempool: avoid objects allocations " Vamsi Krishna Attunuru
2019-08-07 15:21                         ` [dpdk-dev] ***Spam*** " Andrew Rybchenko
2019-07-17  9:04             ` [dpdk-dev] [PATCH v7 2/4] kni: add IOVA = VA support in KNI lib vattunuru
2019-07-17  9:04             ` [dpdk-dev] [PATCH v7 3/4] kni: add IOVA=VA support in KNI module vattunuru
2019-07-17  9:04             ` [dpdk-dev] [PATCH v7 4/4] kni: modify IOVA mode checks to support VA vattunuru
2019-07-23  5:38             ` [dpdk-dev] [PATCH v8 0/5] kni: add IOVA=VA support vattunuru
2019-07-23  5:38               ` [dpdk-dev] [PATCH v8 1/5] mempool: populate mempool with page sized chunks of memory vattunuru
2019-07-23 11:08                 ` Andrew Rybchenko
2019-07-23 12:28                   ` Vamsi Krishna Attunuru
2019-07-23 19:33                     ` Andrew Rybchenko
2019-07-24  7:09                       ` Vamsi Krishna Attunuru
2019-07-24  7:27                         ` Andrew Rybchenko
2019-07-29  6:25                           ` Vamsi Krishna Attunuru
2019-07-23  5:38               ` [dpdk-dev] [PATCH v8 2/5] add IOVA -VA support in KNI lib vattunuru
2019-07-23 10:54                 ` Andrew Rybchenko
2019-07-23  5:38               ` [dpdk-dev] [PATCH v8 3/5] kni: add app specific mempool create & free routine vattunuru
2019-07-23 10:50                 ` Andrew Rybchenko
2019-07-23 11:01                   ` Vamsi Krishna Attunuru
2019-07-23  5:38               ` [dpdk-dev] [PATCH v8 4/5] kni: add IOVA=VA support in KNI module vattunuru
2019-07-23  5:38               ` [dpdk-dev] [PATCH v8 5/5] kni: modify IOVA mode checks to support VA vattunuru
2019-07-24  7:14               ` [dpdk-dev] [PATCH v8 0/5] kni: add IOVA=VA support Vamsi Krishna Attunuru
2019-07-29 12:13               ` [dpdk-dev] [PATCH v9 " vattunuru
2019-07-29 12:13                 ` [dpdk-dev] [PATCH v9 1/5] mempool: populate mempool with the page sized chunks of memory vattunuru
2019-07-29 12:41                   ` Andrew Rybchenko
2019-07-29 13:33                     ` [dpdk-dev] [EXT] " Vamsi Krishna Attunuru
2019-08-16  6:12                   ` [dpdk-dev] [PATCH v10 0/5] kni: add IOVA=VA support vattunuru
2019-08-16  6:12                     ` [dpdk-dev] [PATCH v10 1/5] mempool: populate mempool with the page sized chunks vattunuru
2019-08-16  6:12                     ` [dpdk-dev] [PATCH v10 2/5] kni: add IOVA=VA support in KNI lib vattunuru
2019-08-16  6:12                     ` [dpdk-dev] [PATCH v10 3/5] kni: add app specific mempool create and free routines vattunuru
2019-08-16  6:12                     ` [dpdk-dev] [PATCH v10 4/5] kni: add IOVA=VA support in KNI module vattunuru
2019-08-16  6:12                     ` [dpdk-dev] [PATCH v10 5/5] kni: modify IOVA mode checks to support VA vattunuru
2019-07-29 12:13                 ` [dpdk-dev] [PATCH v9 2/5] kni: add IOVA=VA support in KNI lib vattunuru
2019-07-29 12:24                   ` Igor Ryzhov
2019-07-29 13:22                     ` [dpdk-dev] [EXT] " Vamsi Krishna Attunuru
2019-07-29 12:13                 ` [dpdk-dev] [PATCH v9 3/5] kni: add app specific mempool create & free routine vattunuru
2019-07-29 12:13                 ` [dpdk-dev] [PATCH v9 4/5] kni: add IOVA=VA support in KNI module vattunuru
2019-07-29 12:13                 ` [dpdk-dev] [PATCH v9 5/5] kni: modify IOVA mode checks to support VA vattunuru
2019-04-23  8:56       ` [dpdk-dev] [PATCH v4] kni: add IOVA va support for kni Burakov, Anatoly

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ff476309-7384-314b-ccf5-92d52a209eeb@intel.com \
    --to=ferruh.yigit@intel.com \
    --cc=dev@dpdk.org \
    --cc=jerin.jacob@caviumnetworks.com \
    --cc=kirankumark@marvell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK-dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/dpdk-dev/0 dpdk-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dpdk-dev dpdk-dev/ https://lore.kernel.org/dpdk-dev \
		dev@dpdk.org dpdk-dev@archiver.kernel.org
	public-inbox-index dpdk-dev


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox