All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
To: joro@8bytes.org, robin.murphy@arm.com, will.deacon@arm.com
Cc: lorenzo.pieralisi@arm.com, Jayachandran.Nair@cavium.com,
	Ganapatrao.Kulkarni@cavium.com, ard.biesheuvel@linaro.org,
	linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
	linux-arm-kernel@lists.infradead.org,
	Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Subject: [PATCH 0/1] Optimise IOVA allocations for PCI devices
Date: Mon, 18 Sep 2017 12:56:53 +0200	[thread overview]
Message-ID: <1505732214-9052-1-git-send-email-tomasz.nowicki@caviumnetworks.com> (raw)

Here is my test setup where I have stareted performance measurements.

 ------------  PCIe  -------------   TX   -------------  PCIe  -----
| ThunderX2  |------| Intel XL710 | ---> | Intel XL710 |------| X86 |
| (128 cpus) |      |   40GbE     |      |    40GbE    |       -----
 ------------        -------------        -------------

As the reference lets take v4.13 host, SMMUv3 off and 1-thread iperf
taskset to one CPU. The performance results I got:

SMMU off -> 100%
SMMU on -> 0,02%

I followed down the DMA mapping path and found out IOVA 32-bit space
full so that kernel was flushing rcaches for all CPUs in (1).
For 128 CPUs, this kills the performance. Furthermore, for my case, rcaches
contained PFNs > 32-bit mostly so the second round of IOVA allocation failed
as well. As the consequence IOVA had to be allocated outside of 32-bit (2)
from scratch since all rcaches have been flushed in (1).

    if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
(1)-->  iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift);

    if (!iova)
(2)-->  iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);

My fix simply introduces parameter for alloc_iova_fast() to decide whether
rcache flush has to be done or not. All users follow mentioned scenario
so they should let flush as the last chance to avoid time costly iteration
over all CPUs.

This bring my iperf performance back to 100% with SMMU on.

My bad feelings regarding this solution is that machines with relatively
small numbers of CPUs may get DAC addresses more frequently for PCI
devices. Please let me know your thoughts.

Tomasz Nowicki (1):
  iommu/iova: Make rcache flush optional on IOVA allocation failure

 drivers/iommu/amd_iommu.c   | 5 +++--
 drivers/iommu/dma-iommu.c   | 6 ++++--
 drivers/iommu/intel-iommu.c | 5 +++--
 drivers/iommu/iova.c        | 7 +++----
 include/linux/iova.h        | 5 +++--
 5 files changed, 16 insertions(+), 12 deletions(-)

-- 
2.7.4

WARNING: multiple messages have this Message-ID (diff)
From: tomasz.nowicki@caviumnetworks.com (Tomasz Nowicki)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 0/1] Optimise IOVA allocations for PCI devices
Date: Mon, 18 Sep 2017 12:56:53 +0200	[thread overview]
Message-ID: <1505732214-9052-1-git-send-email-tomasz.nowicki@caviumnetworks.com> (raw)

Here is my test setup where I have stareted performance measurements.

 ------------  PCIe  -------------   TX   -------------  PCIe  -----
| ThunderX2  |------| Intel XL710 | ---> | Intel XL710 |------| X86 |
| (128 cpus) |      |   40GbE     |      |    40GbE    |       -----
 ------------        -------------        -------------

As the reference lets take v4.13 host, SMMUv3 off and 1-thread iperf
taskset to one CPU. The performance results I got:

SMMU off -> 100%
SMMU on -> 0,02%

I followed down the DMA mapping path and found out IOVA 32-bit space
full so that kernel was flushing rcaches for all CPUs in (1).
For 128 CPUs, this kills the performance. Furthermore, for my case, rcaches
contained PFNs > 32-bit mostly so the second round of IOVA allocation failed
as well. As the consequence IOVA had to be allocated outside of 32-bit (2)
from scratch since all rcaches have been flushed in (1).

    if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
(1)-->  iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift);

    if (!iova)
(2)-->  iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);

My fix simply introduces parameter for alloc_iova_fast() to decide whether
rcache flush has to be done or not. All users follow mentioned scenario
so they should let flush as the last chance to avoid time costly iteration
over all CPUs.

This bring my iperf performance back to 100% with SMMU on.

My bad feelings regarding this solution is that machines with relatively
small numbers of CPUs may get DAC addresses more frequently for PCI
devices. Please let me know your thoughts.

Tomasz Nowicki (1):
  iommu/iova: Make rcache flush optional on IOVA allocation failure

 drivers/iommu/amd_iommu.c   | 5 +++--
 drivers/iommu/dma-iommu.c   | 6 ++++--
 drivers/iommu/intel-iommu.c | 5 +++--
 drivers/iommu/iova.c        | 7 +++----
 include/linux/iova.h        | 5 +++--
 5 files changed, 16 insertions(+), 12 deletions(-)

-- 
2.7.4

             reply	other threads:[~2017-09-18 10:57 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-18 10:56 Tomasz Nowicki [this message]
2017-09-18 10:56 ` [PATCH 0/1] Optimise IOVA allocations for PCI devices Tomasz Nowicki
2017-09-18 10:56 ` [PATCH 1/1] iommu/iova: Make rcache flush optional on IOVA allocation failure Tomasz Nowicki
2017-09-18 10:56   ` Tomasz Nowicki
2017-09-18 16:02   ` Robin Murphy
2017-09-18 16:02     ` Robin Murphy
2017-09-19  2:57     ` Nate Watterson
2017-09-19  2:57       ` Nate Watterson
2017-09-19  2:57       ` Nate Watterson
2017-09-19  8:10       ` Tomasz Nowicki
2017-09-19  8:10         ` Tomasz Nowicki
2017-09-19  8:10         ` Tomasz Nowicki
2017-09-19  8:03     ` Tomasz Nowicki
2017-09-19  8:03       ` Tomasz Nowicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1505732214-9052-1-git-send-email-tomasz.nowicki@caviumnetworks.com \
    --to=tomasz.nowicki@caviumnetworks.com \
    --cc=Ganapatrao.Kulkarni@cavium.com \
    --cc=Jayachandran.Nair@cavium.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=robin.murphy@arm.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.