iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: Rob Herring <robh@kernel.org>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>,
	Linux IOMMU <iommu@lists.linux-foundation.org>,
	Will Deacon <will@kernel.org>,
	"moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v2] iommu/arm-smmu-v3: Add SMMUv3.2 range invalidation support
Date: Fri, 21 Feb 2020 17:19:31 +0000	[thread overview]
Message-ID: <67635b64-5db4-8425-e513-62d96a359906@arm.com> (raw)
In-Reply-To: <CAL_JsqJam8qO-XAfQTXNZ0y_gLdOx0LAX28XnKjXd2yh7y3_Uw@mail.gmail.com>

On 21/02/2020 4:47 pm, Rob Herring wrote:
> On Fri, Feb 21, 2020 at 10:19 AM Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 20/02/2020 5:54 pm, Rob Herring wrote:
>>> On Mon, Feb 17, 2020 at 1:17 PM Robin Murphy <robin.murphy@arm.com> wrote:
>>>>
>>>> On 13/02/2020 9:49 pm, Rob Herring wrote:
>>>>> On Thu, Jan 30, 2020 at 11:34 AM Robin Murphy <robin.murphy@arm.com> wrote:
>>>>>>
>>>>>> On 30/01/2020 3:06 pm, Auger Eric wrote:
>>>>>>> Hi Rob,
>>>>>>> On 1/17/20 10:16 PM, Rob Herring wrote:
>>>>>>>> Arm SMMUv3.2 adds support for TLB range invalidate operations.
>>>>>>>> Support for range invalidate is determined by the RIL bit in the IDR3
>>>>>>>> register.
>>>>>>>>
>>>>>>>> The range invalidate is in units of the leaf page size and operates on
>>>>>>>> 1-32 chunks of a power of 2 multiple pages. First, we determine from the
>>>>>>>> size what power of 2 multiple we can use. Then we calculate how many
>>>>>>>> chunks (1-31) of the power of 2 size for the range on the iteration. On
>>>>>>>> each iteration, we move up in size by at least 5 bits.
>>>>>>>>
>>>>>>>> Cc: Eric Auger <eric.auger@redhat.com>
>>>>>>>> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
>>>>>>>> Cc: Will Deacon <will@kernel.org>
>>>>>>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>>>>>>> Cc: Joerg Roedel <joro@8bytes.org>
>>>>>>>> Signed-off-by: Rob Herring <robh@kernel.org>
>>>>>
>>>>>
>>>>>>>> @@ -2003,7 +2024,7 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
>>>>>>>>      {
>>>>>>>>         u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS];
>>>>>>>>         struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>>>>>> -    unsigned long start = iova, end = iova + size;
>>>>>>>> +    unsigned long start = iova, end = iova + size, num_pages = 0, tg = 0;
>>>>>>>>         int i = 0;
>>>>>>>>         struct arm_smmu_cmdq_ent cmd = {
>>>>>>>>                 .tlbi = {
>>>>>>>> @@ -2022,12 +2043,50 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
>>>>>>>>                 cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
>>>>>>>>         }
>>>>>>>>
>>>>>>>> +    if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
>>>>>>>> +            /* Get the leaf page size */
>>>>>>>> +            tg = __ffs(smmu_domain->domain.pgsize_bitmap);
>>>>>>>> +
>>>>>>>> +            /* Convert page size of 12,14,16 (log2) to 1,2,3 */
>>>>>>>> +            cmd.tlbi.tg = ((tg - ilog2(SZ_4K)) / 2) + 1;
>>>>>>
>>>>>> Given the comment, I think "(tg - 10) / 2" would suffice ;)
>>>>>
>>>>> Well, duh...
>>>>>
>>>>>>
>>>>>>>> +
>>>>>>>> +            /* Determine what level the granule is at */
>>>>>>>> +            cmd.tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));
>>>>>>
>>>>>> Is it possible to rephrase that with logs and shifts to avoid the division?
>>>>>
>>>>> Well, with a loop is the only other way I came up with:
>>>>>
>>>>> bpl = tg - 3;
>>>>> ttl = 3;
>>>>> mask = BIT(bpl) - 1;
>>>>> while ((granule & (mask << ((4 - ttl) * bpl + 3))) == 0)
>>>>>        ttl--;
>>>>>
>>>>> Doesn't seem like much improvement to me given we have h/w divide...
>>>>
>>>> Sure, it doesn't take too many extra instructions to start matching
>>>> typical IDIV latency, so there's no point being silly if there really
>>>> isn't a clean alternative.
>>>>
>>>> Some quick scribbling suggests "4 - ilog2(granule) / 10" might actually
>>>> be close enough, but perhaps that's a bit too cheeky.
>>>
>>> How does divide by 10 save a divide?
>>
>> Well, by that point I was more just thinking about the smallest
>> expression, since *some* division seems unavoidable. Although GCC does
>> apparently still think that transforming constant division is a win ;)
> 
> Okay. Still, divide by 10 happens to work, but it is very much not
> obvious. It also doesn't work for level 1 and 16 or 64KB pages (though

(FWIW the more subtle part of the trick is that 16K doesn't allow level 
1 blocks, and even in the vanishingly unlikely 64K level 1 case TTL=0 
isn't "wrong")

> we'll never see that granule). Subtracting 3 is not that obvious
> either, but is at least in the spec for calculating the bits per
> level.
> 
> I think we've spent enough time micro-optimizing this and have better
> things to worry about.

Agreed! Invalidation performance has just been the bogeyman for so long 
that it's an area to which I've become overly sensitive.

Cheers,
Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2020-02-21 17:19 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-17 21:16 [PATCH v2] iommu/arm-smmu-v3: Add SMMUv3.2 range invalidation support Rob Herring
2020-01-30 15:06 ` Auger Eric
2020-01-30 17:34   ` Robin Murphy
2020-02-13 21:49     ` Rob Herring
2020-02-17 19:17       ` Robin Murphy
2020-02-20 17:54         ` Rob Herring
2020-02-21 16:18           ` Robin Murphy
2020-02-21 16:47             ` Rob Herring
2020-02-21 17:19               ` Robin Murphy [this message]
2020-02-13 19:54   ` Rob Herring
2020-02-24 11:41     ` Auger Eric

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67635b64-5db4-8425-e513-62d96a359906@arm.com \
    --to=robin.murphy@arm.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jean-philippe@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=robh@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).