All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Christensen <drc@linux.vnet.ibm.com>
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>, dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH 2/2] vfio: modify spapr iommu support to use static window sizing
Date: Thu, 30 Apr 2020 10:36:17 -0700	[thread overview]
Message-ID: <6763793c-265b-c5cf-228a-b2c177574c84@linux.vnet.ibm.com> (raw)
In-Reply-To: <6cbb170a-3f13-47ba-e0ad-4a86cd6cb352@intel.com>



On 4/30/20 4:34 AM, Burakov, Anatoly wrote:
> On 30-Apr-20 12:29 AM, David Christensen wrote:
>> Current SPAPR IOMMU support code dynamically modifies the DMA window
>> size in response to every new memory allocation. This is potentially
>> dangerous because all existing mappings need to be unmapped/remapped in
>> order to resize the DMA window, leaving hardware holding IOVA addresses
>> that are not properly prepared for DMA.  The new SPAPR code statically
>> assigns the DMA window size on first use, using the largest physical
>> memory address when IOVA=PA and the base_virtaddr + physical memory size
>> when IOVA=VA.  As a result, memory will only be unmapped when
>> specifically requested.
>>
>> Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
>> ---
> 
> Hi David,
> 
> I haven't yet looked at the code in detail (will do so later), but some 
> general comments and questions below.
> 
>> +        /*
>> +         * Read "System RAM" in /proc/iomem:
>> +         * 00000000-1fffffffff : System RAM
>> +         * 200000000000-201fffffffff : System RAM
>> +         */
>> +        FILE *fd = fopen(proc_iomem, "r");
>> +        if (fd == NULL) {
>> +            RTE_LOG(ERR, EAL, "Cannot open %s\n", proc_iomem);
>> +            return -1;
>> +        }
> 
> A quick check on my machines shows that when cat'ing /proc/iomem as 
> non-root, you get zeroes everywhere, which leads me to believe that you 
> have to be root to get anything useful out of /proc/iomem. Since one of 
> the major selling points of VFIO is the ability to run as non-root, 
> depending on iomem kind of defeats the purpose a bit.

I observed the same thing on my system during development.  I didn't see 
anything that precluded support for RTE_IOVA_PA in the VFIO code.  Are 
you suggesting that I should explicitly not support that configuration? 
If you're attempting to use RTE_IOVA_PA then you're already required to 
run as root, so there shouldn't be an issue accessing this

>> +        return 0;
>> +
>> +    } else if (rte_eal_iova_mode() == RTE_IOVA_VA) {
>> +        /* Set the DMA window to base_virtaddr + system memory size */
>> +        const char proc_meminfo[] = "/proc/meminfo";
>> +        const char str_memtotal[] = "MemTotal:";
>> +        int memtotal_len = sizeof(str_memtotal) - 1;
>> +        char buffer[256];
>> +        uint64_t size = 0;
>> +
>> +        FILE *fd = fopen(proc_meminfo, "r");
>> +        if (fd == NULL) {
>> +            RTE_LOG(ERR, EAL, "Cannot open %s\n", proc_meminfo);
>> +            return -1;
>> +        }
>> +        while (fgets(buffer, sizeof(buffer), fd)) {
>> +            if (strncmp(buffer, str_memtotal, memtotal_len) == 0) {
>> +                size = rte_str_to_size(&buffer[memtotal_len]);
>> +                break;
>> +            }
>> +        }
>> +        fclose(fd);
>> +
>> +        if (size == 0) {
>> +            RTE_LOG(ERR, EAL, "Failed to find valid \"MemTotal\" entry "
>> +                "in file %s\n", proc_meminfo);
>> +            return -1;
>> +        }
>> +
>> +        RTE_LOG(DEBUG, EAL, "MemTotal is 0x%" PRIx64 "\n", size);
>> +        /* if no base virtual address is configured use 4GB */
>> +        spapr_dma_win_len = rte_align64pow2(size +
>> +            (internal_config.base_virtaddr > 0 ?
>> +            (uint64_t)internal_config.base_virtaddr : 1ULL << 32));
>> +        rte_mem_set_dma_mask(__builtin_ctzll(spapr_dma_win_len));
> 
> I'm not sure of the algorithm for "memory size" here.
> 
> Technically, DPDK can reserve memory segments anywhere in the VA space 
> allocated by memseg lists. That space may be far bigger than system 
> memory (on a typical Intel server board you'd see 128GB of VA space 
> preallocated even though the machine itself might only have, say, 16GB 
> of RAM installed). The same applies to any other arch running on Linux, 
> so the window needs to cover at least RTE_MIN(base_virtaddr, lowest 
> memseglist VA address) and up to highest memseglist VA address. That's 
> not even mentioning the fact that the user may register external memory 
> for DMA which may cause the window to be of insufficient size to cover 
> said external memory.
> 
> I also think that in general, "system memory" metric is ill suited for 
> measuring VA space, because unlike system memory, the VA space is sparse 
> and can therefore span *a lot* of address space even though in reality 
> it may actually use very little physical memory.

I'm open to suggestions here.  Perhaps an alternative in /proc/meminfo:

VmallocTotal:   549755813888 kB

I tested it with 1GB hugepages and it works, need to check with 2M as 
well.  If there's no alternative for sizing the window based on 
available system parameters then I have another option which creates a 
new RTE_IOVA_TA mode that forces IOVA addresses into the range 0 to X 
where X is configured on the EAL command-line (--iova-base, --iova-len). 
  I use these command-line values to create a static window.

Dave

Dave



  reply	other threads:[~2020-04-30 17:36 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-29 23:29 [dpdk-dev] [PATCH 0/2] vfio: change spapr DMA window sizing operation David Christensen
2020-04-29 23:29 ` [dpdk-dev] [PATCH 1/2] vfio: use ifdef's for ppc64 spapr code David Christensen
2020-04-30 11:14   ` Burakov, Anatoly
2020-04-30 16:22     ` David Christensen
2020-04-30 16:24       ` Burakov, Anatoly
2020-04-30 17:38         ` David Christensen
2020-05-01  8:49           ` Burakov, Anatoly
2020-04-29 23:29 ` [dpdk-dev] [PATCH 2/2] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-04-30 11:34   ` Burakov, Anatoly
2020-04-30 17:36     ` David Christensen [this message]
2020-05-01  9:06       ` Burakov, Anatoly
2020-05-01 16:48         ` David Christensen
2020-05-05 14:57           ` Burakov, Anatoly
2020-05-05 16:26             ` David Christensen
2020-05-06 10:18               ` Burakov, Anatoly
2020-06-30 21:38 ` [dpdk-dev] [PATCH v2 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-06-30 21:38   ` [dpdk-dev] [PATCH v2 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-08-10 21:07   ` [dpdk-dev] [PATCH v3 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-08-10 21:07     ` [dpdk-dev] [PATCH v3 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-09-03 18:55       ` David Christensen
2020-09-17 11:13       ` Burakov, Anatoly
2020-10-07 12:49         ` Thomas Monjalon
2020-10-07 17:44         ` David Christensen
2020-10-08  9:39           ` Burakov, Anatoly
2020-10-12 19:19             ` David Christensen
2020-10-14  9:27               ` Burakov, Anatoly
2020-10-15 17:23     ` [dpdk-dev] [PATCH v4 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-10-15 17:23       ` [dpdk-dev] [PATCH v4 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-10-20 12:05         ` Thomas Monjalon
2020-10-29 21:30           ` Thomas Monjalon
2020-11-02 11:04         ` Burakov, Anatoly
2020-11-03 22:05       ` [dpdk-dev] [PATCH v5 0/1] " David Christensen
2020-11-03 22:05         ` [dpdk-dev] [PATCH v5 1/1] " David Christensen
2020-11-04 19:43           ` Thomas Monjalon
2020-11-04 21:00             ` David Christensen
2020-11-04 21:02               ` Thomas Monjalon
2020-11-04 22:25                 ` David Christensen
2020-11-05  7:12                   ` Thomas Monjalon
2020-11-06 22:16                     ` David Christensen
2020-11-07  9:58                       ` Thomas Monjalon
2020-11-09 20:35         ` [dpdk-dev] [PATCH v5 0/1] " David Christensen
2020-11-09 20:35           ` [dpdk-dev] [PATCH v6 1/1] " David Christensen
2020-11-09 21:10             ` Thomas Monjalon
2020-11-10 17:41           ` [dpdk-dev] [PATCH v7 0/1] " David Christensen
2020-11-10 17:41             ` [dpdk-dev] [PATCH v7 1/1] " David Christensen
2020-11-10 17:43           ` [dpdk-dev] [PATCH v7 0/1] " David Christensen
2020-11-10 17:43             ` [dpdk-dev] [PATCH v7 1/1] " David Christensen
2020-11-13  8:39               ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6763793c-265b-c5cf-228a-b2c177574c84@linux.vnet.ibm.com \
    --to=drc@linux.vnet.ibm.com \
    --cc=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.