From: John Garry <john.garry@huawei.com>
To: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>,
Robin Murphy <robin.murphy@arm.com>,
Will Deacon <will@kernel.org>, Joerg Roedel <joro@8bytes.org>,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
iommu <iommu@lists.linux-foundation.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance
Date: Mon, 16 Aug 2021 08:24:28 +0100 [thread overview]
Message-ID: <a3cdd5df-c028-5484-ce99-928a689d341a@huawei.com> (raw)
In-Reply-To: <52204403-f69a-d2b9-9365-7553e87d1298@huawei.com>
> In addition, I find that function arm_smmu_cmdq_build_cmd() can also be optimized
> slightly, three useless instructions can be reduced.
I think that you could optimise further by pre-building commonly used
commands.
For example, CMD_SYNC without MSI polling is always the same. And then
only different in 1 field for MSI polling.
But you need to check if the performance gain is worth the change.
>
> Case 1):
> void arm_smmu_cmdq_build_cmd_tst1(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
> memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
> cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
> }
> 0000000000004608 <arm_smmu_cmdq_build_cmd_tst1>:
> 4608: a9007c1f stp xzr, xzr, [x0]
> 460c: 39400022 ldrb w2, [x1]
> 4610: f9400001 ldr x1, [x0]
> 4614: aa020021 orr x1, x1, x2
> 4618: f9000001 str x1, [x0]
> 461c: d65f03c0 ret
>
> Case 2):
> void arm_smmu_cmdq_build_cmd_tst2(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
> int i;
>
> cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
> for (i = 1; i < CMDQ_ENT_DWORDS; i++)
> cmd[i] = 0;
> }
> 0000000000004620 <arm_smmu_cmdq_build_cmd_tst2>:
> 4620: 39400021 ldrb w1, [x1]
> 4624: a9007c01 stp x1, xzr, [x0]
> 4628: d65f03c0 ret
> 462c: d503201f nop
>
> Case 3):
> void arm_smmu_cmdq_build_cmd_tst3(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
> memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
> cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
> }
> 0000000000004630 <arm_smmu_cmdq_build_cmd_tst3>:
> 4630: a9007c1f stp xzr, xzr, [x0]
> 4634: 39400021 ldrb w1, [x1]
> 4638: f9000001 str x1, [x0]
> 463c: d65f03c0 ret
>
WARNING: multiple messages have this Message-ID (diff)
From: John Garry <john.garry@huawei.com>
To: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>,
Robin Murphy <robin.murphy@arm.com>,
Will Deacon <will@kernel.org>, Joerg Roedel <joro@8bytes.org>,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
iommu <iommu@lists.linux-foundation.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance
Date: Mon, 16 Aug 2021 08:24:28 +0100 [thread overview]
Message-ID: <a3cdd5df-c028-5484-ce99-928a689d341a@huawei.com> (raw)
In-Reply-To: <52204403-f69a-d2b9-9365-7553e87d1298@huawei.com>
> In addition, I find that function arm_smmu_cmdq_build_cmd() can also be optimized
> slightly, three useless instructions can be reduced.
I think that you could optimise further by pre-building commonly used
commands.
For example, CMD_SYNC without MSI polling is always the same. And then
only different in 1 field for MSI polling.
But you need to check if the performance gain is worth the change.
>
> Case 1):
> void arm_smmu_cmdq_build_cmd_tst1(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
> memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
> cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
> }
> 0000000000004608 <arm_smmu_cmdq_build_cmd_tst1>:
> 4608: a9007c1f stp xzr, xzr, [x0]
> 460c: 39400022 ldrb w2, [x1]
> 4610: f9400001 ldr x1, [x0]
> 4614: aa020021 orr x1, x1, x2
> 4618: f9000001 str x1, [x0]
> 461c: d65f03c0 ret
>
> Case 2):
> void arm_smmu_cmdq_build_cmd_tst2(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
> int i;
>
> cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
> for (i = 1; i < CMDQ_ENT_DWORDS; i++)
> cmd[i] = 0;
> }
> 0000000000004620 <arm_smmu_cmdq_build_cmd_tst2>:
> 4620: 39400021 ldrb w1, [x1]
> 4624: a9007c01 stp x1, xzr, [x0]
> 4628: d65f03c0 ret
> 462c: d503201f nop
>
> Case 3):
> void arm_smmu_cmdq_build_cmd_tst3(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
> memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
> cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
> }
> 0000000000004630 <arm_smmu_cmdq_build_cmd_tst3>:
> 4630: a9007c1f stp xzr, xzr, [x0]
> 4634: 39400021 ldrb w1, [x1]
> 4638: f9000001 str x1, [x0]
> 463c: d65f03c0 ret
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
WARNING: multiple messages have this Message-ID (diff)
From: John Garry <john.garry@huawei.com>
To: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>,
Robin Murphy <robin.murphy@arm.com>,
Will Deacon <will@kernel.org>, Joerg Roedel <joro@8bytes.org>,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
iommu <iommu@lists.linux-foundation.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance
Date: Mon, 16 Aug 2021 08:24:28 +0100 [thread overview]
Message-ID: <a3cdd5df-c028-5484-ce99-928a689d341a@huawei.com> (raw)
In-Reply-To: <52204403-f69a-d2b9-9365-7553e87d1298@huawei.com>
> In addition, I find that function arm_smmu_cmdq_build_cmd() can also be optimized
> slightly, three useless instructions can be reduced.
I think that you could optimise further by pre-building commonly used
commands.
For example, CMD_SYNC without MSI polling is always the same. And then
only different in 1 field for MSI polling.
But you need to check if the performance gain is worth the change.
>
> Case 1):
> void arm_smmu_cmdq_build_cmd_tst1(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
> memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
> cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
> }
> 0000000000004608 <arm_smmu_cmdq_build_cmd_tst1>:
> 4608: a9007c1f stp xzr, xzr, [x0]
> 460c: 39400022 ldrb w2, [x1]
> 4610: f9400001 ldr x1, [x0]
> 4614: aa020021 orr x1, x1, x2
> 4618: f9000001 str x1, [x0]
> 461c: d65f03c0 ret
>
> Case 2):
> void arm_smmu_cmdq_build_cmd_tst2(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
> int i;
>
> cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
> for (i = 1; i < CMDQ_ENT_DWORDS; i++)
> cmd[i] = 0;
> }
> 0000000000004620 <arm_smmu_cmdq_build_cmd_tst2>:
> 4620: 39400021 ldrb w1, [x1]
> 4624: a9007c01 stp x1, xzr, [x0]
> 4628: d65f03c0 ret
> 462c: d503201f nop
>
> Case 3):
> void arm_smmu_cmdq_build_cmd_tst3(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> {
> memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
> cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
> }
> 0000000000004630 <arm_smmu_cmdq_build_cmd_tst3>:
> 4630: a9007c1f stp xzr, xzr, [x0]
> 4634: 39400021 ldrb w1, [x1]
> 4638: f9000001 str x1, [x0]
> 463c: d65f03c0 ret
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-08-16 7:25 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-11 11:48 [PATCH 0/4] Prepare for ECMDQ support Zhen Lei
2021-08-11 11:48 ` Zhen Lei
2021-08-11 11:48 ` Zhen Lei
2021-08-11 11:48 ` [PATCH 1/4] iommu/arm-smmu-v3: Use command queue batching helpers to improve performance Zhen Lei
2021-08-11 11:48 ` Zhen Lei
2021-08-11 11:48 ` Zhen Lei
2021-08-13 16:01 ` Robin Murphy
2021-08-13 16:01 ` Robin Murphy
2021-08-13 16:01 ` Robin Murphy
2021-08-13 16:45 ` John Garry
2021-08-13 16:45 ` John Garry
2021-08-13 16:45 ` John Garry
2021-08-16 2:15 ` Leizhen (ThunderTown)
2021-08-16 2:15 ` Leizhen (ThunderTown)
2021-08-16 2:15 ` Leizhen (ThunderTown)
2021-08-16 4:05 ` Leizhen (ThunderTown)
2021-08-16 4:05 ` Leizhen (ThunderTown)
2021-08-16 4:05 ` Leizhen (ThunderTown)
2021-08-16 7:24 ` John Garry [this message]
2021-08-16 7:24 ` John Garry
2021-08-16 7:24 ` John Garry
2021-08-16 7:47 ` Leizhen (ThunderTown)
2021-08-16 7:47 ` Leizhen (ThunderTown)
2021-08-16 7:47 ` Leizhen (ThunderTown)
2021-08-16 8:21 ` Will Deacon
2021-08-16 8:21 ` Will Deacon
2021-08-16 8:21 ` Will Deacon
2021-08-16 8:41 ` Leizhen (ThunderTown)
2021-08-16 8:41 ` Leizhen (ThunderTown)
2021-08-16 8:41 ` Leizhen (ThunderTown)
2021-08-11 11:48 ` [PATCH 2/4] iommu/arm-smmu-v3: Add and use static helper function arm_smmu_cmdq_issue_cmd_with_sync() Zhen Lei
2021-08-11 11:48 ` Zhen Lei
2021-08-11 11:48 ` Zhen Lei
2021-08-11 11:48 ` [PATCH 3/4] iommu/arm-smmu-v3: Add and use static helper function arm_smmu_get_cmdq() Zhen Lei
2021-08-11 11:48 ` Zhen Lei
2021-08-11 11:48 ` Zhen Lei
2021-08-11 11:48 ` [PATCH 4/4] iommu/arm-smmu-v3: Extract reusable function __arm_smmu_cmdq_skip_err() Zhen Lei
2021-08-11 11:48 ` Zhen Lei
2021-08-11 11:48 ` Zhen Lei
2021-08-13 14:33 ` [PATCH 0/4] Prepare for ECMDQ support Will Deacon
2021-08-13 14:33 ` Will Deacon
2021-08-13 14:33 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a3cdd5df-c028-5484-ce99-928a689d341a@huawei.com \
--to=john.garry@huawei.com \
--cc=iommu@lists.linux-foundation.org \
--cc=joro@8bytes.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=robin.murphy@arm.com \
--cc=thunder.leizhen@huawei.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.