All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rajesh Ravi <rajesh.ravi@broadcom.com>
To: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: dev <dev@dpdk.org>, Bruce Richardson <bruce.richardson@intel.com>,
	 Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>,
	 Jonathan Richardson <jonathan.richardson@broadcom.com>,
	 Scott Branden <scott.branden@broadcom.com>,
	Vikram Prakash <vikram.prakash@broadcom.com>,
	 Srinath Mannam <srinath.mannam@broadcom.com>,
	David Marchand <david.marchand@redhat.com>,
	 Thomas Monjalon <thomas@monjalon.net>
Subject: Re: [dpdk-dev] [PATCH 19.11] vfio: fix DMA mapping of externally allocated heaps
Date: Thu, 7 Nov 2019 12:05:16 +0530	[thread overview]
Message-ID: <CAAr5zNfNQ+_3=ruo3DEkhHULPiBFp+e0c3Nr0XS=rgxE1=vcEw@mail.gmail.com> (raw)
In-Reply-To: <5dd669557fc499df5e345a14c9252c095eff6c07.1572966906.git.anatoly.burakov@intel.com>

Tested-by:  Rajesh Ravi <rajesh.ravi@broadcom.com>

Tested  the patch modified for DPDK 19.02 along with SPDK 19.07

Regards,
Rajesh

On Tue, Nov 5, 2019 at 8:45 PM Anatoly Burakov <anatoly.burakov@intel.com>
wrote:

> Currently, externally created heaps are supposed to be automatically
> mapped for VFIO DMA by EAL, however they only do so if, at the time of
> heap creation, VFIO is initialized and has at least one device
> available. If no devices are available at the time of heap creation (or
> if devices were available, but were since hot-unplugged, thus dropping
> all VFIO container mappings), then VFIO mapping code would have skipped
> over externally allocated heaps.
>
> The fix is two-fold. First, we allow externally allocated memory
> segments to be marked as "heap" segments. This allows us to distinguish
> between external memory segments that were created via heap API, from
> those that were created via rte_extmem_register() API.
>
> Then, we fix the VFIO code to only skip non-heap external segments.
> Also, since external heaps are not guaranteed to have valid IOVA
> addresses, we will skip those which have invalid IOVA addresses as well.
>
> Fixes: 0f526d674f8e ("malloc: separate creating memseg list and malloc
> heap")
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>
> Notes:
>     This cannot be backported to older releases as it breaks the
>     API and ABI. A separate fix is in the works for stable.
>
>  lib/librte_eal/common/include/rte_memory.h |  1 +
>  lib/librte_eal/common/rte_malloc.c         |  1 +
>  lib/librte_eal/freebsd/eal/eal_memory.c    |  1 +
>  lib/librte_eal/linux/eal/eal_memory.c      |  3 ++
>  lib/librte_eal/linux/eal/eal_vfio.c        | 46 +++++++++++++++++++---
>  5 files changed, 47 insertions(+), 5 deletions(-)
>
> diff --git a/lib/librte_eal/common/include/rte_memory.h
> b/lib/librte_eal/common/include/rte_memory.h
> index 38e00e382c..bf81a2faa8 100644
> --- a/lib/librte_eal/common/include/rte_memory.h
> +++ b/lib/librte_eal/common/include/rte_memory.h
> @@ -81,6 +81,7 @@ struct rte_memseg_list {
>         volatile uint32_t version; /**< version number for multiprocess
> sync. */
>         size_t len; /**< Length of memory area covered by this memseg
> list. */
>         unsigned int external; /**< 1 if this list points to external
> memory */
> +       unsigned int heap; /**< 1 if this list points to a heap */
>         struct rte_fbarray memseg_arr;
>  };
>
> diff --git a/lib/librte_eal/common/rte_malloc.c
> b/lib/librte_eal/common/rte_malloc.c
> index 044d3a9078..413e4aa004 100644
> --- a/lib/librte_eal/common/rte_malloc.c
> +++ b/lib/librte_eal/common/rte_malloc.c
> @@ -396,6 +396,7 @@ rte_malloc_heap_memory_add(const char *heap_name, void
> *va_addr, size_t len,
>
>         rte_spinlock_lock(&heap->lock);
>         ret = malloc_heap_add_external_memory(heap, msl);
> +       msl->heap = 1; /* mark it as heap segment */
>         rte_spinlock_unlock(&heap->lock);
>
>  unlock:
> diff --git a/lib/librte_eal/freebsd/eal/eal_memory.c
> b/lib/librte_eal/freebsd/eal/eal_memory.c
> index 7fe3178898..a97d8f0f0c 100644
> --- a/lib/librte_eal/freebsd/eal/eal_memory.c
> +++ b/lib/librte_eal/freebsd/eal/eal_memory.c
> @@ -93,6 +93,7 @@ rte_eal_hugepage_init(void)
>                 msl->page_sz = page_sz;
>                 msl->len = internal_config.memory;
>                 msl->socket_id = 0;
> +               msl->heap = 1;
>
>                 /* populate memsegs. each memseg is 1 page long */
>                 for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
> diff --git a/lib/librte_eal/linux/eal/eal_memory.c
> b/lib/librte_eal/linux/eal/eal_memory.c
> index accfd2e232..43e4ffc757 100644
> --- a/lib/librte_eal/linux/eal/eal_memory.c
> +++ b/lib/librte_eal/linux/eal/eal_memory.c
> @@ -831,6 +831,7 @@ alloc_memseg_list(struct rte_memseg_list *msl,
> uint64_t page_sz,
>         msl->page_sz = page_sz;
>         msl->socket_id = socket_id;
>         msl->base_va = NULL;
> +       msl->heap = 1; /* mark it as a heap segment */
>
>         RTE_LOG(DEBUG, EAL, "Memseg list allocated: 0x%zxkB at socket
> %i\n",
>                         (size_t)page_sz >> 10, socket_id);
> @@ -1405,6 +1406,7 @@ eal_legacy_hugepage_init(void)
>                 msl->page_sz = page_sz;
>                 msl->socket_id = 0;
>                 msl->len = internal_config.memory;
> +               msl->heap = 1;
>
>                 /* we're in single-file segments mode, so only the segment
> list
>                  * fd needs to be set up.
> @@ -1677,6 +1679,7 @@ eal_legacy_hugepage_init(void)
>                 mem_sz = msl->len;
>                 munmap(msl->base_va, mem_sz);
>                 msl->base_va = NULL;
> +               msl->heap = 0;
>
>                 /* destroy backing fbarray */
>                 rte_fbarray_destroy(&msl->memseg_arr);
> diff --git a/lib/librte_eal/linux/eal/eal_vfio.c
> b/lib/librte_eal/linux/eal/eal_vfio.c
> index d9541b1220..d5a2bbea0d 100644
> --- a/lib/librte_eal/linux/eal/eal_vfio.c
> +++ b/lib/librte_eal/linux/eal/eal_vfio.c
> @@ -1250,7 +1250,16 @@ type1_map(const struct rte_memseg_list *msl, const
> struct rte_memseg *ms,
>  {
>         int *vfio_container_fd = arg;
>
> -       if (msl->external)
> +       /* skip external memory that isn't a heap */
> +       if (msl->external && !msl->heap)
> +               return 0;
> +
> +       /* skip any segments with invalid IOVA addresses */
> +       if (ms->iova == RTE_BAD_IOVA)
> +               return 0;
> +
> +       /* if IOVA mode is VA, we've already mapped the internal segments
> */
> +       if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
>                 return 0;
>
>         return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
> ms->iova,
> @@ -1313,12 +1322,18 @@ vfio_type1_dma_mem_map(int vfio_container_fd,
> uint64_t vaddr, uint64_t iova,
>  static int
>  vfio_type1_dma_map(int vfio_container_fd)
>  {
> +       int ret;
>         if (rte_eal_iova_mode() == RTE_IOVA_VA) {
>                 /* with IOVA as VA mode, we can get away with mapping
> contiguous
>                  * chunks rather than going page-by-page.
>                  */
> -               return rte_memseg_contig_walk(type1_map_contig,
> +               ret = rte_memseg_contig_walk(type1_map_contig,
>                                 &vfio_container_fd);
> +               if (ret)
> +                       return ret;
> +               /* we have to continue the walk because we've skipped the
> +                * external segments during the config walk.
> +                */
>         }
>         return rte_memseg_walk(type1_map, &vfio_container_fd);
>  }
> @@ -1410,7 +1425,15 @@ vfio_spapr_map_walk(const struct rte_memseg_list
> *msl,
>  {
>         struct spapr_remap_walk_param *param = arg;
>
> -       if (msl->external || ms->addr_64 == param->addr_64)
> +       /* skip external memory that isn't a heap */
> +       if (msl->external && !msl->heap)
> +               return 0;
> +
> +       /* skip any segments with invalid IOVA addresses */
> +       if (ms->iova == RTE_BAD_IOVA)
> +               return 0;
> +
> +       if (ms->addr_64 == param->addr_64)
>                 return 0;
>
>         return vfio_spapr_dma_do_map(param->vfio_container_fd,
> ms->addr_64, ms->iova,
> @@ -1423,7 +1446,15 @@ vfio_spapr_unmap_walk(const struct rte_memseg_list
> *msl,
>  {
>         struct spapr_remap_walk_param *param = arg;
>
> -       if (msl->external || ms->addr_64 == param->addr_64)
> +       /* skip external memory that isn't a heap */
> +       if (msl->external && !msl->heap)
> +               return 0;
> +
> +       /* skip any segments with invalid IOVA addresses */
> +       if (ms->iova == RTE_BAD_IOVA)
> +               return 0;
> +
> +       if (ms->addr_64 == param->addr_64)
>                 return 0;
>
>         return vfio_spapr_dma_do_map(param->vfio_container_fd,
> ms->addr_64, ms->iova,
> @@ -1443,7 +1474,12 @@ vfio_spapr_window_size_walk(const struct
> rte_memseg_list *msl,
>         struct spapr_walk_param *param = arg;
>         uint64_t max = ms->iova + ms->len;
>
> -       if (msl->external)
> +       /* skip external memory that isn't a heap */
> +       if (msl->external && !msl->heap)
> +               return 0;
> +
> +       /* skip any segments with invalid IOVA addresses */
> +       if (ms->iova == RTE_BAD_IOVA)
>                 return 0;
>
>         /* do not iterate ms we haven't mapped yet  */
> --
> 2.17.1
>


-- 
Regards,
Rajesh

  parent reply	other threads:[~2019-11-08 16:17 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-05 15:15 [dpdk-dev] [PATCH 19.11] vfio: fix DMA mapping of externally allocated heaps Anatoly Burakov
2019-11-05 17:15 ` Burakov, Anatoly
2019-11-06 13:58   ` David Marchand
2019-11-06 15:27     ` Rajesh Ravi
2019-11-06 16:03       ` Burakov, Anatoly
2019-11-06 21:53 ` David Marchand
2019-11-06 22:12   ` Thomas Monjalon
2019-11-07  6:35 ` Rajesh Ravi [this message]
2019-11-07 15:35 ` David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAr5zNfNQ+_3=ruo3DEkhHULPiBFp+e0c3Nr0XS=rgxE1=vcEw@mail.gmail.com' \
    --to=rajesh.ravi@broadcom.com \
    --cc=ajit.khaparde@broadcom.com \
    --cc=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=jonathan.richardson@broadcom.com \
    --cc=scott.branden@broadcom.com \
    --cc=srinath.mannam@broadcom.com \
    --cc=thomas@monjalon.net \
    --cc=vikram.prakash@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.