All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: David Christensen <drc@linux.vnet.ibm.com>, dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH 2/2] vfio: modify spapr iommu support to use static window sizing
Date: Tue, 5 May 2020 15:57:01 +0100	[thread overview]
Message-ID: <8f83778f-fe74-6c69-e2d0-73b9a6e7ca79@intel.com> (raw)
In-Reply-To: <782d6f04-f476-93d6-1a8f-2ed0b39dde10@linux.vnet.ibm.com>

On 01-May-20 5:48 PM, David Christensen wrote:
>>>> I'm not sure of the algorithm for "memory size" here.
>>>>
>>>> Technically, DPDK can reserve memory segments anywhere in the VA 
>>>> space allocated by memseg lists. That space may be far bigger than 
>>>> system memory (on a typical Intel server board you'd see 128GB of VA 
>>>> space preallocated even though the machine itself might only have, 
>>>> say, 16GB of RAM installed). The same applies to any other arch 
>>>> running on Linux, so the window needs to cover at least 
>>>> RTE_MIN(base_virtaddr, lowest memseglist VA address) and up to 
>>>> highest memseglist VA address. That's not even mentioning the fact 
>>>> that the user may register external memory for DMA which may cause 
>>>> the window to be of insufficient size to cover said external memory.
>>>>
>>>> I also think that in general, "system memory" metric is ill suited 
>>>> for measuring VA space, because unlike system memory, the VA space 
>>>> is sparse and can therefore span *a lot* of address space even 
>>>> though in reality it may actually use very little physical memory.
>>>
>>> I'm open to suggestions here.  Perhaps an alternative in /proc/meminfo:
>>>
>>> VmallocTotal:   549755813888 kB
>>>
>>> I tested it with 1GB hugepages and it works, need to check with 2M as 
>>> well.  If there's no alternative for sizing the window based on 
>>> available system parameters then I have another option which creates 
>>> a new RTE_IOVA_TA mode that forces IOVA addresses into the range 0 to 
>>> X where X is configured on the EAL command-line (--iova-base, 
>>> --iova-len).   I use these command-line values to create a static 
>>> window.
>>>
>>
>> A whole new IOVA mode, while being a cleaner solution, would require a 
>> lot of testing, and it doesn't really solve the external memory 
>> problem, because we're still reliant on the user to provide IOVA 
>> addresses. Perhaps something akin to VA/IOVA address reservation would 
>> solve the problem, but again, lots of changes and testing, all for a 
>> comparatively narrow use case.
>>
>> The vmalloc area seems big enough (512 terabytes on your machine, 32 
>> terabytes on mine), so it'll probably be OK. I'd settle for:
>>
>> 1) start at base_virtaddr OR lowest memseg list address, whichever is 
>> lowest
> 
> The IOMMU only supports two starting addresses, 0 or 1<<59, so 
> implementation will need to start at 0.  (I've been bit by this before, 
> my understanding is that the processor only supports 54 bits of the 
> address and that the PCI host bridge uses bit 59 of the IOVA as a signal 
> to do the address translation for the second DMA window.)

Fair enough, 0 it is then.

> 
>> 2) end at lowest addr + VmallocTotal OR highest memseglist addr, 
>> whichever is higher
> 
> So, instead of rte_memseg_walk() execute rte_memseg_list_walk() to find 
> the lowest/highest msl addresses?

Yep. rte_memseg_walk() will only cover allocated pages, while 
rte_memseg_list_walk() will cover even empty page tables.

> 
>> 3) a check in user DMA map function that would warn/throw an error 
>> whenever there is an attempt to map an address for DMA that doesn't 
>> fit into the DMA window
> 
> Isn't this mostly prevented by the use of  rte_mem_set_dma_mask() and 
> rte_mem_check_dma_mask()?  I'd expect an error would be thrown by the 
> kernel IOMMU API for an out-of-range mapping that I would simply return 
> to the caller (drivers/vfio/vfio_iommu_spapr_tce.c includes the comment 
> /* iova is checked by the IOMMU API */).  Why do you think double 
> checking this would help?

I don't think we check rte_mem_check_dma_mask() anywhere in the call 
path of external memory code. Also, i just checked, and you're right, 
rte_vfio_container_dma_map() will fail if the kernel fails to map the 
memory, however nothing will fail in external memory because the IOVA 
addresses aren't checked for being within DMA mask.

See malloc_heap.c:1097 onwards, we simply add user-specified IOVA 
addresses into the page table without checking if the fit into the DMA 
mask. The DMA mapping will then happen through a mem event callback, but 
we don't check return value of that callback either, so even if DMA 
mapping fails, we'll only get a log message.

So, perhaps the real solution here is to add a DMA mask check into 
rte_malloc_heap_memory_add(), so that we check the IOVA addresses before 
we ever try to do anything with them. I'll submit a patch for this.

> 
>>
>> I think that would be best approach. Thoughts?
> 
> Dave


-- 
Thanks,
Anatoly

  reply	other threads:[~2020-05-05 14:57 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-29 23:29 [dpdk-dev] [PATCH 0/2] vfio: change spapr DMA window sizing operation David Christensen
2020-04-29 23:29 ` [dpdk-dev] [PATCH 1/2] vfio: use ifdef's for ppc64 spapr code David Christensen
2020-04-30 11:14   ` Burakov, Anatoly
2020-04-30 16:22     ` David Christensen
2020-04-30 16:24       ` Burakov, Anatoly
2020-04-30 17:38         ` David Christensen
2020-05-01  8:49           ` Burakov, Anatoly
2020-04-29 23:29 ` [dpdk-dev] [PATCH 2/2] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-04-30 11:34   ` Burakov, Anatoly
2020-04-30 17:36     ` David Christensen
2020-05-01  9:06       ` Burakov, Anatoly
2020-05-01 16:48         ` David Christensen
2020-05-05 14:57           ` Burakov, Anatoly [this message]
2020-05-05 16:26             ` David Christensen
2020-05-06 10:18               ` Burakov, Anatoly
2020-06-30 21:38 ` [dpdk-dev] [PATCH v2 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-06-30 21:38   ` [dpdk-dev] [PATCH v2 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-08-10 21:07   ` [dpdk-dev] [PATCH v3 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-08-10 21:07     ` [dpdk-dev] [PATCH v3 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-09-03 18:55       ` David Christensen
2020-09-17 11:13       ` Burakov, Anatoly
2020-10-07 12:49         ` Thomas Monjalon
2020-10-07 17:44         ` David Christensen
2020-10-08  9:39           ` Burakov, Anatoly
2020-10-12 19:19             ` David Christensen
2020-10-14  9:27               ` Burakov, Anatoly
2020-10-15 17:23     ` [dpdk-dev] [PATCH v4 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-10-15 17:23       ` [dpdk-dev] [PATCH v4 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-10-20 12:05         ` Thomas Monjalon
2020-10-29 21:30           ` Thomas Monjalon
2020-11-02 11:04         ` Burakov, Anatoly
2020-11-03 22:05       ` [dpdk-dev] [PATCH v5 0/1] " David Christensen
2020-11-03 22:05         ` [dpdk-dev] [PATCH v5 1/1] " David Christensen
2020-11-04 19:43           ` Thomas Monjalon
2020-11-04 21:00             ` David Christensen
2020-11-04 21:02               ` Thomas Monjalon
2020-11-04 22:25                 ` David Christensen
2020-11-05  7:12                   ` Thomas Monjalon
2020-11-06 22:16                     ` David Christensen
2020-11-07  9:58                       ` Thomas Monjalon
2020-11-09 20:35         ` [dpdk-dev] [PATCH v5 0/1] " David Christensen
2020-11-09 20:35           ` [dpdk-dev] [PATCH v6 1/1] " David Christensen
2020-11-09 21:10             ` Thomas Monjalon
2020-11-10 17:41           ` [dpdk-dev] [PATCH v7 0/1] " David Christensen
2020-11-10 17:41             ` [dpdk-dev] [PATCH v7 1/1] " David Christensen
2020-11-10 17:43           ` [dpdk-dev] [PATCH v7 0/1] " David Christensen
2020-11-10 17:43             ` [dpdk-dev] [PATCH v7 1/1] " David Christensen
2020-11-13  8:39               ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f83778f-fe74-6c69-e2d0-73b9a6e7ca79@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    --cc=drc@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.