linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
To: Robin Murphy <robin.murphy@arm.com>, Will Deacon <will.deacon@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	iommu <iommu@lists.linux-foundation.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	LinuxArm <linuxarm@huawei.com>, Hanjun Guo <guohanjun@huawei.com>,
	Libin <huawei.libin@huawei.com>,
	John Garry <john.garry@huawei.com>
Subject: Re: [PATCH v3 1/2] iommu/arm-smmu-v3: fix unexpected CMD_SYNC timeout
Date: Wed, 5 Sep 2018 09:46:29 +0800	[thread overview]
Message-ID: <5B8F3575.8000408@huawei.com> (raw)
In-Reply-To: <5B791622.4040509@huawei.com>



On 2018/8/19 15:02, Leizhen (ThunderTown) wrote:
> 
> 
> On 2018/8/16 17:27, Robin Murphy wrote:
>> On 2018-08-16 10:18 AM, Will Deacon wrote:
>>> On Thu, Aug 16, 2018 at 04:21:17PM +0800, Leizhen (ThunderTown) wrote:
>>>> On 2018/8/15 20:26, Robin Murphy wrote:
>>>>> On 15/08/18 11:23, Zhen Lei wrote:
>>>>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>>>>> index 1d64710..3f5c236 100644
>>>>>> --- a/drivers/iommu/arm-smmu-v3.c
>>>>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>>>>> @@ -566,7 +566,7 @@ struct arm_smmu_device {
>>>>>>
>>>>>>        int                gerr_irq;
>>>>>>        int                combined_irq;
>>>>>> -    atomic_t            sync_nr;
>>>>>> +    u32                sync_nr;
>>>>>>
>>>>>>        unsigned long            ias; /* IPA */
>>>>>>        unsigned long            oas; /* PA */
>>>>>> @@ -775,6 +775,11 @@ static int queue_remove_raw(struct arm_smmu_queue *q, u64 *ent)
>>>>>>        return 0;
>>>>>>    }
>>>>>>
>>>>>> +static inline void arm_smmu_cmdq_sync_set_msidata(u64 *cmd, u32 msidata)
>>>>>
>>>>> If we *are* going to go down this route then I think it would make sense
>>>>> to move the msiaddr and CMDQ_SYNC_0_CS_MSI logic here as well; i.e.
>>>>> arm_smmu_cmdq_build_cmd() always generates a "normal" SEV-based sync
>>>>> command, then calling this guy would convert it to an MSI-based one.
>>>>> As-is, having bits of mutually-dependent data handled across two
>>>>> separate places just seems too messy and error-prone.
>>>>
>>>> Yes, How about create a new function "arm_smmu_cmdq_build_sync_msi_cmd"?
>>>>
>>>> static inline
>>>> void arm_smmu_cmdq_build_sync_msi_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>>>> {
>>>>     cmd[0]  = FIELD_PREP(CMDQ_0_OP, ent->opcode);
>>>>     cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_IRQ);
>>>>     cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSH, ARM_SMMU_SH_ISH);
>>>>     cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIATTR, ARM_SMMU_MEMATTR_OIWB);
> 
> miss:   cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIDATA, ent->sync.msidata);
> 
>>>>     cmd[1]  = ent->sync.msiaddr & CMDQ_SYNC_1_MSIADDR_MASK;
>>>> }
>>>
>>> None of this seems justified given the numbers from John, so please just do
>>> the simple thing and build the command with the lock held.
> 
> In order to observe the optimization effect, I conducted 5 tests for each
> case. Although the test result is volatility, but we can still get which case
> is good or bad. It accords with our theoretical analysis.
> 
> Test command: fio -numjobs=8 -rw=randread -runtime=30 ... -bs=4k
> Test Result: IOPS, for example: read : io=86790MB, bw=2892.1MB/s, iops=740586, runt= 30001msec
> 
> Case 1: (without these patches)
> 675480
> 672055
> 665275
> 648610
> 661146
> 
> Case 2: (move arm_smmu_cmdq_build_cmd into lock)

https://lore.kernel.org/patchwork/patch/973121/
[v2,1/2] iommu/arm-smmu-v3: fix unexpected CMD_SYNC timeout

> 688714
> 697355
> 632951
> 700540
> 678459
> 
> Case 3: (base on case 2, replace arm_smmu_cmdq_build_cmd with arm_smmu_cmdq_build_sync_msi_cmd)

https://patchwork.kernel.org/patch/10569675/
[v4,1/2] iommu/arm-smmu-v3: fix unexpected CMD_SYNC timeout

> 721582
> 729226
> 689574
> 679710
> 727770
> 
> Case 4: (base on case 3, plus patch 2)
> 734077
> 742868
> 738194
> 682544
> 740586
> 
> Case 2 is better than case 1, I think the main reason is the atomic_inc_return_relaxed(&smmu->sync_nr)
> has been removed. Case 3 is better than case 2, because the assembly code is reduced, see below.

Hi, Will
  Have you received this email? Which case do you prefer? Suppose we don't consider patch 2, according
to the test result, maybe we should choose case3.
  Because John Garry wants patch 2 to cover the non-MSI branch also, this may take some time. So can
you decide and apply patch 1 first?


> 
> 
>>
>> Agreed - sorry if my wording was unclear, but that suggestion was only for the possibility of it proving genuinely worthwhile to build the command outside the lock. Since that isn't the case, I definitely prefer the simpler approach too.
> 
> Yes, I mean replace arm_smmu_cmdq_build_cmd with arm_smmu_cmdq_build_sync_msi_cmd to build the command inside the lock.
>          spin_lock_irqsave(&smmu->cmdq.lock, flags);
> +        ent.sync.msidata = ++smmu->sync_nr;
> +        arm_smmu_cmdq_build_sync_msi_cmd(cmd, &ent);
>          arm_smmu_cmdq_insert_cmd(smmu, cmd);
>          spin_unlock_irqrestore(&smmu->cmdq.lock, flags);
> 
> The assembly code showed me that it's very nice.
> ffff0000085e6928:       94123207        bl      ffff000008a73144 <_raw_spin_lock_irqsave>
> ffff0000085e692c:       b9410ad5        ldr     w21, [x22,#264]
> ffff0000085e6930:       d28208c2        mov     x2, #0x1046                     // #4166
> ffff0000085e6934:       aa0003fa        mov     x26, x0
> ffff0000085e6938:       110006b5        add     w21, w21, #0x1
> ffff0000085e693c:       f2a1f802        movk    x2, #0xfc0, lsl #16
> ffff0000085e6940:       aa1603e0        mov     x0, x22
> ffff0000085e6944:       910163a1        add     x1, x29, #0x58
> ffff0000085e6948:       aa158042        orr     x2, x2, x21, lsl #32
> ffff0000085e694c:       b9010ad5        str     w21, [x22,#264]
> ffff0000085e6950:       f9002fa2        str     x2, [x29,#88]
> ffff0000085e6954:       d2994016        mov     x22, #0xca00                    // #51712
> ffff0000085e6958:       f90033b3        str     x19, [x29,#96]
> ffff0000085e695c:       97fffd5b        bl      ffff0000085e5ec8 <arm_smmu_cmdq_insert_cmd>
> ffff0000085e6960:       aa1903e0        mov     x0, x25
> ffff0000085e6964:       aa1a03e1        mov     x1, x26
> ffff0000085e6968:       f2a77356        movk    x22, #0x3b9a, lsl #16
> ffff0000085e696c:       94123145        bl      ffff000008a72e80 <_raw_spin_unlock_irqrestore>
> 
> 
>>
>> Robin.
>>
>> .
>>
> 

-- 
Thanks!
BestRegards


  reply	other threads:[~2018-09-05  1:46 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-15 10:23 [PATCH v3 0/2] bugfix and optimization about CMD_SYNC Zhen Lei
2018-08-15 10:23 ` [PATCH v3 1/2] iommu/arm-smmu-v3: fix unexpected CMD_SYNC timeout Zhen Lei
2018-08-15 12:26   ` Robin Murphy
2018-08-15 13:00     ` Will Deacon
2018-08-15 18:08       ` John Garry
2018-08-16  4:11         ` Leizhen (ThunderTown)
2018-08-16  8:21     ` Leizhen (ThunderTown)
2018-08-16  9:18       ` Will Deacon
2018-08-16  9:27         ` Robin Murphy
2018-08-19  7:02           ` Leizhen (ThunderTown)
2018-09-05  1:46             ` Leizhen (ThunderTown) [this message]
2018-08-15 10:23 ` [PATCH v3 2/2] iommu/arm-smmu-v3: avoid redundant CMD_SYNCs if possible Zhen Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5B8F3575.8000408@huawei.com \
    --to=thunder.leizhen@huawei.com \
    --cc=guohanjun@huawei.com \
    --cc=huawei.libin@huawei.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=john.garry@huawei.com \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=robin.murphy@arm.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).