All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: "Isaac J. Manjarres" <isaacm@codeaurora.org>,
	will@kernel.org, joro@8bytes.org, pdaly@codeaurora.org,
	pratikp@codeaurora.org, linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 0/5] Optimize iommu_map_sg() performance
Date: Wed, 13 Jan 2021 02:54:11 +0000	[thread overview]
Message-ID: <ef9390bc-57de-07fa-30fd-863672685788@arm.com> (raw)
In-Reply-To: <20210112163307.GA1199965@infradead.org>

On 2021-01-12 16:33, Christoph Hellwig wrote:
> On Tue, Jan 12, 2021 at 04:00:59PM +0000, Robin Murphy wrote:
>> Out of curiosity, how much of the difference is attributable to actual
>> indirect call overhead vs. the additional massive reduction in visits to
>> arm_smmu_rpm_{get,put} that you fail to mention? There are ways to optimise
>> indirect calling that would benefit *all* cases, rather than just one
>> operation for one particular driver.
> 
> Do we have systems that use different iommu_ops at the same time?
> If not this would be a prime candidate for static call optimizations.

They're not at all common, but such systems do technically exist. It's 
hard to make them work in the current "one set of ops per bus" model, 
but I still have a long-term dream of sorting that out so such setups 
*can* be supported properly. I certainly wouldn't want to make any 
changes that completely close the door on that idea, but any static call 
optimisation that can be done in a config-gated manner should be viable 
for x86 at least. Even better if we could do it with a dynamic 
branch-patching solution that keeps the indirect case as a fallback; 
AFAICS that should be feasible to eagerly apply somewhere around 
iommu_device_register(), then just undo again if another driver ever 
does show up registering a new set of ops that don't match. I'm pretty 
confident that the systems where performance matters most are going to 
be sensible homogeneous ones - on the Arm side the SBSA should see to 
that. The weird mix-and-match cases are typically going to be FPGA 
prototyping systems and esoteric embedded stuff that are worlds away 
from worrying about keeping up with line rate on a 40GbE NIC...

> Also I've been pondering adding direct calls to the iommu dma ops like
> we do for DMA direct.  This would allow to stop using dma_ops entirely
> for arm64.

Yes, now that we're starting to get things sufficiently consolidated 
behind iommu-dma that might be a reasonable thing to try, although given 
the amount of inherent work further down in the IOVA and IOMMU layers 
that dwarfs that of the direct case, I doubt that reducing the initial 
dispatch overhead would make any noticeable difference in practice.

Robin.

WARNING: multiple messages have this Message-ID (diff)
From: Robin Murphy <robin.murphy@arm.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: "Isaac J. Manjarres" <isaacm@codeaurora.org>,
	pdaly@codeaurora.org, will@kernel.org,
	linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
	pratikp@codeaurora.org, linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v2 0/5] Optimize iommu_map_sg() performance
Date: Wed, 13 Jan 2021 02:54:11 +0000	[thread overview]
Message-ID: <ef9390bc-57de-07fa-30fd-863672685788@arm.com> (raw)
In-Reply-To: <20210112163307.GA1199965@infradead.org>

On 2021-01-12 16:33, Christoph Hellwig wrote:
> On Tue, Jan 12, 2021 at 04:00:59PM +0000, Robin Murphy wrote:
>> Out of curiosity, how much of the difference is attributable to actual
>> indirect call overhead vs. the additional massive reduction in visits to
>> arm_smmu_rpm_{get,put} that you fail to mention? There are ways to optimise
>> indirect calling that would benefit *all* cases, rather than just one
>> operation for one particular driver.
> 
> Do we have systems that use different iommu_ops at the same time?
> If not this would be a prime candidate for static call optimizations.

They're not at all common, but such systems do technically exist. It's 
hard to make them work in the current "one set of ops per bus" model, 
but I still have a long-term dream of sorting that out so such setups 
*can* be supported properly. I certainly wouldn't want to make any 
changes that completely close the door on that idea, but any static call 
optimisation that can be done in a config-gated manner should be viable 
for x86 at least. Even better if we could do it with a dynamic 
branch-patching solution that keeps the indirect case as a fallback; 
AFAICS that should be feasible to eagerly apply somewhere around 
iommu_device_register(), then just undo again if another driver ever 
does show up registering a new set of ops that don't match. I'm pretty 
confident that the systems where performance matters most are going to 
be sensible homogeneous ones - on the Arm side the SBSA should see to 
that. The weird mix-and-match cases are typically going to be FPGA 
prototyping systems and esoteric embedded stuff that are worlds away 
from worrying about keeping up with line rate on a 40GbE NIC...

> Also I've been pondering adding direct calls to the iommu dma ops like
> we do for DMA direct.  This would allow to stop using dma_ops entirely
> for arm64.

Yes, now that we're starting to get things sufficiently consolidated 
behind iommu-dma that might be a reasonable thing to try, although given 
the amount of inherent work further down in the IOVA and IOMMU layers 
that dwarfs that of the direct case, I doubt that reducing the initial 
dispatch overhead would make any noticeable difference in practice.

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

WARNING: multiple messages have this Message-ID (diff)
From: Robin Murphy <robin.murphy@arm.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: "Isaac J. Manjarres" <isaacm@codeaurora.org>,
	pdaly@codeaurora.org, will@kernel.org, joro@8bytes.org,
	linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
	pratikp@codeaurora.org, linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v2 0/5] Optimize iommu_map_sg() performance
Date: Wed, 13 Jan 2021 02:54:11 +0000	[thread overview]
Message-ID: <ef9390bc-57de-07fa-30fd-863672685788@arm.com> (raw)
In-Reply-To: <20210112163307.GA1199965@infradead.org>

On 2021-01-12 16:33, Christoph Hellwig wrote:
> On Tue, Jan 12, 2021 at 04:00:59PM +0000, Robin Murphy wrote:
>> Out of curiosity, how much of the difference is attributable to actual
>> indirect call overhead vs. the additional massive reduction in visits to
>> arm_smmu_rpm_{get,put} that you fail to mention? There are ways to optimise
>> indirect calling that would benefit *all* cases, rather than just one
>> operation for one particular driver.
> 
> Do we have systems that use different iommu_ops at the same time?
> If not this would be a prime candidate for static call optimizations.

They're not at all common, but such systems do technically exist. It's 
hard to make them work in the current "one set of ops per bus" model, 
but I still have a long-term dream of sorting that out so such setups 
*can* be supported properly. I certainly wouldn't want to make any 
changes that completely close the door on that idea, but any static call 
optimisation that can be done in a config-gated manner should be viable 
for x86 at least. Even better if we could do it with a dynamic 
branch-patching solution that keeps the indirect case as a fallback; 
AFAICS that should be feasible to eagerly apply somewhere around 
iommu_device_register(), then just undo again if another driver ever 
does show up registering a new set of ops that don't match. I'm pretty 
confident that the systems where performance matters most are going to 
be sensible homogeneous ones - on the Arm side the SBSA should see to 
that. The weird mix-and-match cases are typically going to be FPGA 
prototyping systems and esoteric embedded stuff that are worlds away 
from worrying about keeping up with line rate on a 40GbE NIC...

> Also I've been pondering adding direct calls to the iommu dma ops like
> we do for DMA direct.  This would allow to stop using dma_ops entirely
> for arm64.

Yes, now that we're starting to get things sufficiently consolidated 
behind iommu-dma that might be a reasonable thing to try, although given 
the amount of inherent work further down in the IOVA and IOMMU layers 
that dwarfs that of the direct case, I doubt that reducing the initial 
dispatch overhead would make any noticeable difference in practice.

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-01-13  2:55 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-11 14:54 [PATCH v2 0/5] Optimize iommu_map_sg() performance Isaac J. Manjarres
2021-01-11 14:54 ` Isaac J. Manjarres
2021-01-11 14:54 ` [PATCH v2 1/5] iommu/io-pgtable: Introduce map_sg() as a page table op Isaac J. Manjarres
2021-01-11 14:54   ` Isaac J. Manjarres
2021-01-11 14:54 ` [PATCH v2 2/5] iommu/io-pgtable-arm: Hook up map_sg() Isaac J. Manjarres
2021-01-11 14:54   ` Isaac J. Manjarres
2021-01-11 14:54 ` [PATCH v2 3/5] iommu/io-pgtable-arm-v7s: " Isaac J. Manjarres
2021-01-11 14:54   ` Isaac J. Manjarres
2021-01-11 14:54 ` [PATCH v2 4/5] iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers Isaac J. Manjarres
2021-01-11 14:54   ` Isaac J. Manjarres
2021-01-11 14:54 ` [PATCH v2 5/5] iommu/arm-smmu: Hook up map_sg() Isaac J. Manjarres
2021-01-11 14:54   ` Isaac J. Manjarres
2021-01-12 16:00 ` [PATCH v2 0/5] Optimize iommu_map_sg() performance Robin Murphy
2021-01-12 16:00   ` Robin Murphy
2021-01-12 16:00   ` Robin Murphy
2021-01-12 16:33   ` Christoph Hellwig
2021-01-12 16:33     ` Christoph Hellwig
2021-01-12 16:33     ` Christoph Hellwig
2021-01-13  2:54     ` Robin Murphy [this message]
2021-01-13  2:54       ` Robin Murphy
2021-01-13  2:54       ` Robin Murphy
2021-01-21 21:30   ` isaacm
2021-01-21 21:30     ` isaacm
2021-01-22 13:44     ` Robin Murphy
2021-01-22 13:44       ` Robin Murphy
2021-01-22 13:44       ` Robin Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ef9390bc-57de-07fa-30fd-863672685788@arm.com \
    --to=robin.murphy@arm.com \
    --cc=hch@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=isaacm@codeaurora.org \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pdaly@codeaurora.org \
    --cc=pratikp@codeaurora.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.