All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Will Deacon <will@kernel.org>, Joerg Roedel <joro@8bytes.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	iommu <iommu@lists.linux-foundation.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance
Date: Mon, 16 Aug 2021 08:24:28 +0100	[thread overview]
Message-ID: <a3cdd5df-c028-5484-ce99-928a689d341a@huawei.com> (raw)
In-Reply-To: <52204403-f69a-d2b9-9365-7553e87d1298@huawei.com>

> In addition, I find that function arm_smmu_cmdq_build_cmd() can also be optimized
> slightly, three useless instructions can be reduced.

I think that you could optimise further by pre-building commonly used 
commands.

For example, CMD_SYNC without MSI polling is always the same. And then 
only different in 1 field for MSI polling.

But you need to check if the performance gain is worth the change.

> 
> Case 1):
> void arm_smmu_cmdq_build_cmd_tst1(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
>          memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
>          cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
> }
> 0000000000004608 <arm_smmu_cmdq_build_cmd_tst1>:
>      4608:       a9007c1f        stp     xzr, xzr, [x0]
>      460c:       39400022        ldrb    w2, [x1]
>      4610:       f9400001        ldr     x1, [x0]
>      4614:       aa020021        orr     x1, x1, x2
>      4618:       f9000001        str     x1, [x0]
>      461c:       d65f03c0        ret
> 
> Case 2):
> void arm_smmu_cmdq_build_cmd_tst2(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
>          int i;
> 
>          cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
>          for (i = 1; i < CMDQ_ENT_DWORDS; i++)
>                  cmd[i] = 0;
> }
> 0000000000004620 <arm_smmu_cmdq_build_cmd_tst2>:
>      4620:       39400021        ldrb    w1, [x1]
>      4624:       a9007c01        stp     x1, xzr, [x0]
>      4628:       d65f03c0        ret
>      462c:       d503201f        nop
> 
> Case 3):
> void arm_smmu_cmdq_build_cmd_tst3(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
>          memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
>          cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
> }
> 0000000000004630 <arm_smmu_cmdq_build_cmd_tst3>:
>      4630:       a9007c1f        stp     xzr, xzr, [x0]
>      4634:       39400021        ldrb    w1, [x1]
>      4638:       f9000001        str     x1, [x0]
>      463c:       d65f03c0        ret
> 


WARNING: multiple messages have this Message-ID (diff)
From: John Garry <john.garry@huawei.com>
To: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Will Deacon <will@kernel.org>, Joerg Roedel <joro@8bytes.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	iommu <iommu@lists.linux-foundation.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance
Date: Mon, 16 Aug 2021 08:24:28 +0100	[thread overview]
Message-ID: <a3cdd5df-c028-5484-ce99-928a689d341a@huawei.com> (raw)
In-Reply-To: <52204403-f69a-d2b9-9365-7553e87d1298@huawei.com>

> In addition, I find that function arm_smmu_cmdq_build_cmd() can also be optimized
> slightly, three useless instructions can be reduced.

I think that you could optimise further by pre-building commonly used 
commands.

For example, CMD_SYNC without MSI polling is always the same. And then 
only different in 1 field for MSI polling.

But you need to check if the performance gain is worth the change.

> 
> Case 1):
> void arm_smmu_cmdq_build_cmd_tst1(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
>          memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
>          cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
> }
> 0000000000004608 <arm_smmu_cmdq_build_cmd_tst1>:
>      4608:       a9007c1f        stp     xzr, xzr, [x0]
>      460c:       39400022        ldrb    w2, [x1]
>      4610:       f9400001        ldr     x1, [x0]
>      4614:       aa020021        orr     x1, x1, x2
>      4618:       f9000001        str     x1, [x0]
>      461c:       d65f03c0        ret
> 
> Case 2):
> void arm_smmu_cmdq_build_cmd_tst2(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
>          int i;
> 
>          cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
>          for (i = 1; i < CMDQ_ENT_DWORDS; i++)
>                  cmd[i] = 0;
> }
> 0000000000004620 <arm_smmu_cmdq_build_cmd_tst2>:
>      4620:       39400021        ldrb    w1, [x1]
>      4624:       a9007c01        stp     x1, xzr, [x0]
>      4628:       d65f03c0        ret
>      462c:       d503201f        nop
> 
> Case 3):
> void arm_smmu_cmdq_build_cmd_tst3(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
>          memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
>          cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
> }
> 0000000000004630 <arm_smmu_cmdq_build_cmd_tst3>:
>      4630:       a9007c1f        stp     xzr, xzr, [x0]
>      4634:       39400021        ldrb    w1, [x1]
>      4638:       f9000001        str     x1, [x0]
>      463c:       d65f03c0        ret
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

WARNING: multiple messages have this Message-ID (diff)
From: John Garry <john.garry@huawei.com>
To: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Will Deacon <will@kernel.org>, Joerg Roedel <joro@8bytes.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	iommu <iommu@lists.linux-foundation.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance
Date: Mon, 16 Aug 2021 08:24:28 +0100	[thread overview]
Message-ID: <a3cdd5df-c028-5484-ce99-928a689d341a@huawei.com> (raw)
In-Reply-To: <52204403-f69a-d2b9-9365-7553e87d1298@huawei.com>

> In addition, I find that function arm_smmu_cmdq_build_cmd() can also be optimized
> slightly, three useless instructions can be reduced.

I think that you could optimise further by pre-building commonly used 
commands.

For example, CMD_SYNC without MSI polling is always the same. And then 
only different in 1 field for MSI polling.

But you need to check if the performance gain is worth the change.

> 
> Case 1):
> void arm_smmu_cmdq_build_cmd_tst1(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
>          memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
>          cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
> }
> 0000000000004608 <arm_smmu_cmdq_build_cmd_tst1>:
>      4608:       a9007c1f        stp     xzr, xzr, [x0]
>      460c:       39400022        ldrb    w2, [x1]
>      4610:       f9400001        ldr     x1, [x0]
>      4614:       aa020021        orr     x1, x1, x2
>      4618:       f9000001        str     x1, [x0]
>      461c:       d65f03c0        ret
> 
> Case 2):
> void arm_smmu_cmdq_build_cmd_tst2(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
>          int i;
> 
>          cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
>          for (i = 1; i < CMDQ_ENT_DWORDS; i++)
>                  cmd[i] = 0;
> }
> 0000000000004620 <arm_smmu_cmdq_build_cmd_tst2>:
>      4620:       39400021        ldrb    w1, [x1]
>      4624:       a9007c01        stp     x1, xzr, [x0]
>      4628:       d65f03c0        ret
>      462c:       d503201f        nop
> 
> Case 3):
> void arm_smmu_cmdq_build_cmd_tst3(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
>          memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
>          cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
> }
> 0000000000004630 <arm_smmu_cmdq_build_cmd_tst3>:
>      4630:       a9007c1f        stp     xzr, xzr, [x0]
>      4634:       39400021        ldrb    w1, [x1]
>      4638:       f9000001        str     x1, [x0]
>      463c:       d65f03c0        ret
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-08-16  7:25 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-11 11:48 [PATCH 0/4] Prepare for ECMDQ support Zhen Lei
2021-08-11 11:48 ` Zhen Lei
2021-08-11 11:48 ` Zhen Lei
2021-08-11 11:48 ` [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance Zhen Lei
2021-08-11 11:48   ` Zhen Lei
2021-08-11 11:48   ` Zhen Lei
2021-08-13 16:01   ` Robin Murphy
2021-08-13 16:01     ` Robin Murphy
2021-08-13 16:01     ` Robin Murphy
2021-08-13 16:45     ` John Garry
2021-08-13 16:45       ` John Garry
2021-08-13 16:45       ` John Garry
2021-08-16  2:15       ` Leizhen (ThunderTown)
2021-08-16  2:15         ` Leizhen (ThunderTown)
2021-08-16  2:15         ` Leizhen (ThunderTown)
2021-08-16  4:05         ` Leizhen (ThunderTown)
2021-08-16  4:05           ` Leizhen (ThunderTown)
2021-08-16  4:05           ` Leizhen (ThunderTown)
2021-08-16  7:24           ` John Garry [this message]
2021-08-16  7:24             ` John Garry
2021-08-16  7:24             ` John Garry
2021-08-16  7:47             ` Leizhen (ThunderTown)
2021-08-16  7:47               ` Leizhen (ThunderTown)
2021-08-16  7:47               ` Leizhen (ThunderTown)
2021-08-16  8:21               ` Will Deacon
2021-08-16  8:21                 ` Will Deacon
2021-08-16  8:21                 ` Will Deacon
2021-08-16  8:41                 ` Leizhen (ThunderTown)
2021-08-16  8:41                   ` Leizhen (ThunderTown)
2021-08-16  8:41                   ` Leizhen (ThunderTown)
2021-08-11 11:48 ` [PATCH 2/4] iommu/arm-smmu-v3: Add and use static helper function arm_smmu_cmdq_issue_cmd_with_sync() Zhen Lei
2021-08-11 11:48   ` Zhen Lei
2021-08-11 11:48   ` Zhen Lei
2021-08-11 11:48 ` [PATCH 3/4] iommu/arm-smmu-v3: Add and use static helper function arm_smmu_get_cmdq() Zhen Lei
2021-08-11 11:48   ` Zhen Lei
2021-08-11 11:48   ` Zhen Lei
2021-08-11 11:48 ` [PATCH 4/4] iommu/arm-smmu-v3: Extract reusable function __arm_smmu_cmdq_skip_err() Zhen Lei
2021-08-11 11:48   ` Zhen Lei
2021-08-11 11:48   ` Zhen Lei
2021-08-13 14:33 ` [PATCH 0/4] Prepare for ECMDQ support Will Deacon
2021-08-13 14:33   ` Will Deacon
2021-08-13 14:33   ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a3cdd5df-c028-5484-ce99-928a689d341a@huawei.com \
    --to=john.garry@huawei.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=thunder.leizhen@huawei.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.