From: Rob Herring <robh@kernel.org>
To: Robin Murphy <robin.murphy@arm.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>,
Linux IOMMU <iommu@lists.linux-foundation.org>,
Will Deacon <will@kernel.org>,
"moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE"
<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v2] iommu/arm-smmu-v3: Add SMMUv3.2 range invalidation support
Date: Fri, 21 Feb 2020 10:47:23 -0600 [thread overview]
Message-ID: <CAL_JsqJam8qO-XAfQTXNZ0y_gLdOx0LAX28XnKjXd2yh7y3_Uw@mail.gmail.com> (raw)
In-Reply-To: <49847ca9-3ee8-4c08-a772-a93f10aa817f@arm.com>
On Fri, Feb 21, 2020 at 10:19 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 20/02/2020 5:54 pm, Rob Herring wrote:
> > On Mon, Feb 17, 2020 at 1:17 PM Robin Murphy <robin.murphy@arm.com> wrote:
> >>
> >> On 13/02/2020 9:49 pm, Rob Herring wrote:
> >>> On Thu, Jan 30, 2020 at 11:34 AM Robin Murphy <robin.murphy@arm.com> wrote:
> >>>>
> >>>> On 30/01/2020 3:06 pm, Auger Eric wrote:
> >>>>> Hi Rob,
> >>>>> On 1/17/20 10:16 PM, Rob Herring wrote:
> >>>>>> Arm SMMUv3.2 adds support for TLB range invalidate operations.
> >>>>>> Support for range invalidate is determined by the RIL bit in the IDR3
> >>>>>> register.
> >>>>>>
> >>>>>> The range invalidate is in units of the leaf page size and operates on
> >>>>>> 1-32 chunks of a power of 2 multiple pages. First, we determine from the
> >>>>>> size what power of 2 multiple we can use. Then we calculate how many
> >>>>>> chunks (1-31) of the power of 2 size for the range on the iteration. On
> >>>>>> each iteration, we move up in size by at least 5 bits.
> >>>>>>
> >>>>>> Cc: Eric Auger <eric.auger@redhat.com>
> >>>>>> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> >>>>>> Cc: Will Deacon <will@kernel.org>
> >>>>>> Cc: Robin Murphy <robin.murphy@arm.com>
> >>>>>> Cc: Joerg Roedel <joro@8bytes.org>
> >>>>>> Signed-off-by: Rob Herring <robh@kernel.org>
> >>>
> >>>
> >>>>>> @@ -2003,7 +2024,7 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
> >>>>>> {
> >>>>>> u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS];
> >>>>>> struct arm_smmu_device *smmu = smmu_domain->smmu;
> >>>>>> - unsigned long start = iova, end = iova + size;
> >>>>>> + unsigned long start = iova, end = iova + size, num_pages = 0, tg = 0;
> >>>>>> int i = 0;
> >>>>>> struct arm_smmu_cmdq_ent cmd = {
> >>>>>> .tlbi = {
> >>>>>> @@ -2022,12 +2043,50 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
> >>>>>> cmd.tlbi.vmid = smmu_domain->s2_cfg.vmid;
> >>>>>> }
> >>>>>>
> >>>>>> + if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
> >>>>>> + /* Get the leaf page size */
> >>>>>> + tg = __ffs(smmu_domain->domain.pgsize_bitmap);
> >>>>>> +
> >>>>>> + /* Convert page size of 12,14,16 (log2) to 1,2,3 */
> >>>>>> + cmd.tlbi.tg = ((tg - ilog2(SZ_4K)) / 2) + 1;
> >>>>
> >>>> Given the comment, I think "(tg - 10) / 2" would suffice ;)
> >>>
> >>> Well, duh...
> >>>
> >>>>
> >>>>>> +
> >>>>>> + /* Determine what level the granule is at */
> >>>>>> + cmd.tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));
> >>>>
> >>>> Is it possible to rephrase that with logs and shifts to avoid the division?
> >>>
> >>> Well, with a loop is the only other way I came up with:
> >>>
> >>> bpl = tg - 3;
> >>> ttl = 3;
> >>> mask = BIT(bpl) - 1;
> >>> while ((granule & (mask << ((4 - ttl) * bpl + 3))) == 0)
> >>> ttl--;
> >>>
> >>> Doesn't seem like much improvement to me given we have h/w divide...
> >>
> >> Sure, it doesn't take too many extra instructions to start matching
> >> typical IDIV latency, so there's no point being silly if there really
> >> isn't a clean alternative.
> >>
> >> Some quick scribbling suggests "4 - ilog2(granule) / 10" might actually
> >> be close enough, but perhaps that's a bit too cheeky.
> >
> > How does divide by 10 save a divide?
>
> Well, by that point I was more just thinking about the smallest
> expression, since *some* division seems unavoidable. Although GCC does
> apparently still think that transforming constant division is a win ;)
Okay. Still, divide by 10 happens to work, but it is very much not
obvious. It also doesn't work for level 1 and 16 or 64KB pages (though
we'll never see that granule). Subtracting 3 is not that obvious
either, but is at least in the spec for calculating the bits per
level.
I think we've spent enough time micro-optimizing this and have better
things to worry about.
Rob
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
next prev parent reply other threads:[~2020-02-21 16:47 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-17 21:16 [PATCH v2] iommu/arm-smmu-v3: Add SMMUv3.2 range invalidation support Rob Herring
2020-01-30 15:06 ` Auger Eric
2020-01-30 17:34 ` Robin Murphy
2020-02-13 21:49 ` Rob Herring
2020-02-17 19:17 ` Robin Murphy
2020-02-20 17:54 ` Rob Herring
2020-02-21 16:18 ` Robin Murphy
2020-02-21 16:47 ` Rob Herring [this message]
2020-02-21 17:19 ` Robin Murphy
2020-02-13 19:54 ` Rob Herring
2020-02-24 11:41 ` Auger Eric
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAL_JsqJam8qO-XAfQTXNZ0y_gLdOx0LAX28XnKjXd2yh7y3_Uw@mail.gmail.com \
--to=robh@kernel.org \
--cc=iommu@lists.linux-foundation.org \
--cc=jean-philippe@linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).