* [PATCH] xen/arm: p2m_set_entry duplicate calculation.
@ 2022-04-21 15:17 Paran Lee
2022-04-21 23:26 ` Stefano Stabellini
2022-04-24 16:17 ` Julien Grall
0 siblings, 2 replies; 5+ messages in thread
From: Paran Lee @ 2022-04-21 15:17 UTC (permalink / raw)
To: Stefano Stabellini, Julien Grall, Bertrand Marquis, Volodymyr Babchuk
Cc: austindh.kim, xen-devel
It doesn't seem necessary to do that calculation of order shift again.
Signed-off-by: Paran Lee <p4ranlee@gmail.com>
---
xen/arch/arm/p2m.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 1d1059f7d2..533afc830a 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1092,7 +1092,7 @@ int p2m_set_entry(struct p2m_domain *p2m,
while ( nr )
{
unsigned long mask;
- unsigned long order;
+ unsigned long order, pages;
/*
* Don't take into account the MFN when removing mapping (i.e
@@ -1118,11 +1118,12 @@ int p2m_set_entry(struct p2m_domain *p2m,
if ( rc )
break;
- sgfn = gfn_add(sgfn, (1 << order));
+ pages = 1 << order;
+ sgfn = gfn_add(sgfn, pages);
if ( !mfn_eq(smfn, INVALID_MFN) )
- smfn = mfn_add(smfn, (1 << order));
+ smfn = mfn_add(smfn, pages);
- nr -= (1 << order);
+ nr -= pages;
}
return rc;
--
2.25.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] xen/arm: p2m_set_entry duplicate calculation.
2022-04-21 15:17 [PATCH] xen/arm: p2m_set_entry duplicate calculation Paran Lee
@ 2022-04-21 23:26 ` Stefano Stabellini
2022-04-24 16:17 ` Julien Grall
1 sibling, 0 replies; 5+ messages in thread
From: Stefano Stabellini @ 2022-04-21 23:26 UTC (permalink / raw)
To: Paran Lee
Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis,
Volodymyr Babchuk, austindh.kim, xen-devel
On Fri, 22 Apr 2022, Paran Lee wrote:
> It doesn't seem necessary to do that calculation of order shift again.
>
> Signed-off-by: Paran Lee <p4ranlee@gmail.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> ---
> xen/arch/arm/p2m.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 1d1059f7d2..533afc830a 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -1092,7 +1092,7 @@ int p2m_set_entry(struct p2m_domain *p2m,
> while ( nr )
> {
> unsigned long mask;
> - unsigned long order;
> + unsigned long order, pages;
>
> /*
> * Don't take into account the MFN when removing mapping (i.e
> @@ -1118,11 +1118,12 @@ int p2m_set_entry(struct p2m_domain *p2m,
> if ( rc )
> break;
>
> - sgfn = gfn_add(sgfn, (1 << order));
> + pages = 1 << order;
> + sgfn = gfn_add(sgfn, pages);
> if ( !mfn_eq(smfn, INVALID_MFN) )
> - smfn = mfn_add(smfn, (1 << order));
> + smfn = mfn_add(smfn, pages);
>
> - nr -= (1 << order);
> + nr -= pages;
> }
>
> return rc;
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] xen/arm: p2m_set_entry duplicate calculation.
2022-04-21 15:17 [PATCH] xen/arm: p2m_set_entry duplicate calculation Paran Lee
2022-04-21 23:26 ` Stefano Stabellini
@ 2022-04-24 16:17 ` Julien Grall
2022-04-26 15:37 ` Paran Lee
1 sibling, 1 reply; 5+ messages in thread
From: Julien Grall @ 2022-04-24 16:17 UTC (permalink / raw)
To: Paran Lee, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk
Cc: austindh.kim, xen-devel
Hi,
On 21/04/2022 16:17, Paran Lee wrote:
> It doesn't seem necessary to do that calculation of order shift again.
I think we need to weight that against increasing the number of local
variables that do pretty much the same.
This is pretty much done to a matter of taste here. IMHO, the original
version is better but I see Stefano reviewed it so I will not argue
against it.
That said, given you already sent a few patches, can you explain why you
are doing this? Is this optimization purpose? Is it clean-up?
>
> Signed-off-by: Paran Lee <p4ranlee@gmail.com>
> ---
> xen/arch/arm/p2m.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 1d1059f7d2..533afc830a 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -1092,7 +1092,7 @@ int p2m_set_entry(struct p2m_domain *p2m,
> while ( nr )
> {
> unsigned long mask;
> - unsigned long order;
> + unsigned long order, pages;
>
> /*
> * Don't take into account the MFN when removing mapping (i.e
> @@ -1118,11 +1118,12 @@ int p2m_set_entry(struct p2m_domain *p2m,
> if ( rc )
> break;
>
> - sgfn = gfn_add(sgfn, (1 << order));
> + pages = 1 << order;
Please take the opportunity to switch to 1UL.
> + sgfn = gfn_add(sgfn, pages);
> if ( !mfn_eq(smfn, INVALID_MFN) )
> - smfn = mfn_add(smfn, (1 << order));
> + smfn = mfn_add(smfn, pages);
>
> - nr -= (1 << order);
> + nr -= pages;
> }
>
> return rc;
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] xen/arm: p2m_set_entry duplicate calculation.
2022-04-24 16:17 ` Julien Grall
@ 2022-04-26 15:37 ` Paran Lee
2022-04-26 19:40 ` Julien Grall
0 siblings, 1 reply; 5+ messages in thread
From: Paran Lee @ 2022-04-26 15:37 UTC (permalink / raw)
To: Julien Grall; +Cc: Stefano Stabellini, Austin Kim, xen-devel
Hello, Julien Grall.
Thanks you, I agreed! It made me think once more about what my patch
could improve.
patches I sent have been reviewed in various ways. It was a good
opportunity to analyze my patch from various perspectives. :)
I checked objdump in -O2 optimization(default) of Xen Makefile to make
sure CSE (Common subexpression elimination) works well on the latest
arm64 cross compiler on x86_64 from Arm GNU Toolchain.
$
~/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc
-v
...
A-profile Architecture 10.3-2021.07 (arm-10.29)'
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.3.1 20210621 (GNU Toolchain for the A-profile
Architecture 10.3-2021.07 (arm-10.29)
I compared the before and after my patches. This time, without adding a
"pages" variable, I proceeded to use the local variable mask with order
operation.
I was able to confirm that it does one less operation.
(1) before clean up
0000000000001bb4 <p2m_set_entry>:
while ( nr )
1bb4: b40005e2 cbz x2, 1c70 <p2m_set_entry+0xbc>
{
...
if ( rc )
1c1c: 350002e0 cbnz w0, 1c78 <p2m_set_entry+0xc4>
sgfn = gfn_add(sgfn, (1 << order));
1c20: 1ad32373 lsl w19, w27, w19 // <<< CES works
1c24: 93407e73 sxtw x19, w19 // <<< well!
return _gfn(gfn_x(gfn) + i);
1c28: 8b1302d6 add x22, x22, x19
return _mfn(mfn_x(mfn) + i);
1c2c: 8b130281 add x1, x20, x19
1c30: b100069f cmn x20, #0x1
1c34: 9a941034 csel x20, x1, x20, ne // ne = any
while ( nr )
1c38: eb1302b5 subs x21, x21, x19
1c3c: 540001e0 b.eq 1c78 <p2m_set_entry+0xc4> // b.none
(2) Using again mask variable. mask = 1UL << order
code show me sxtw x19, w19 operation disappeared.
0000000000001bb4 <p2m_set_entry>:
while ( nr )
1bb4: b40005c2 cbz x2, 1c6c <p2m_set_entry+0xb8>
{
...
if ( rc )
1c1c: 350002c0 cbnz w0, 1c74 <p2m_set_entry+0xc0>
mask = 1UL << order;
1c20: 9ad32373 lsl x19, x27, x19 // <<< only lsl
return _gfn(gfn_x(gfn) + i);
1c24: 8b1302d6 add x22, x22, x19
return _mfn(mfn_x(mfn) + i);
1c28: 8b130281 add x1, x20, x19
1c2c: b100069f cmn x20, #0x1
1c30: 9a941034 csel x20, x1, x20, ne // ne = any
while ( nr )
1c34: eb1302b5 subs x21, x21, x19
1c38: 540001e0 b.eq 1c74 <p2m_set_entry+0xc0> // b.none
I'll send you a follow-up patch I've been working on.
BR
Paran Lee
2022-04-25 오전 1:17에 Julien Grall 이(가) 쓴 글:
> Hi,
>
> On 21/04/2022 16:17, Paran Lee wrote:
>> It doesn't seem necessary to do that calculation of order shift again.
>
> I think we need to weight that against increasing the number of local
> variables that do pretty much the same.
>
> This is pretty much done to a matter of taste here. IMHO, the original
> version is better but I see Stefano reviewed it so I will not argue
> against it.
>
> That said, given you already sent a few patches, can you explain why you
> are doing this? Is this optimization purpose? Is it clean-up?
>
>>
>> Signed-off-by: Paran Lee <p4ranlee@gmail.com>
>> ---
>> xen/arch/arm/p2m.c | 9 +++++----
>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index 1d1059f7d2..533afc830a 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -1092,7 +1092,7 @@ int p2m_set_entry(struct p2m_domain *p2m,
>> while ( nr )
>> {
>> unsigned long mask;
>> - unsigned long order;
>> + unsigned long order, pages;
>> /*
>> * Don't take into account the MFN when removing mapping (i.e
>> @@ -1118,11 +1118,12 @@ int p2m_set_entry(struct p2m_domain *p2m,
>> if ( rc )
>> break;
>> - sgfn = gfn_add(sgfn, (1 << order));
>> + pages = 1 << order;
>
> Please take the opportunity to switch to 1UL.
>
>> + sgfn = gfn_add(sgfn, pages);
>> if ( !mfn_eq(smfn, INVALID_MFN) )
>> - smfn = mfn_add(smfn, (1 << order));
>> + smfn = mfn_add(smfn, pages);
>> - nr -= (1 << order);
>> + nr -= pages;
>> }
>> return rc;
>
> Cheers,
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] xen/arm: p2m_set_entry duplicate calculation.
2022-04-26 15:37 ` Paran Lee
@ 2022-04-26 19:40 ` Julien Grall
0 siblings, 0 replies; 5+ messages in thread
From: Julien Grall @ 2022-04-26 19:40 UTC (permalink / raw)
To: Paran Lee; +Cc: Stefano Stabellini, Austin Kim, xen-devel
Hi,
On 26/04/2022 16:37, Paran Lee wrote:
> Thanks you, I agreed! It made me think once more about what my patch
> could improve.
> patches I sent have been reviewed in various ways. It was a good
> opportunity to analyze my patch from various perspectives. :)
>
> I checked objdump in -O2 optimization(default) of Xen Makefile to make
> sure CSE (Common subexpression elimination) works well on the latest
> arm64 cross compiler on x86_64 from Arm GNU Toolchain.
>
> $
> ~/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc
> -v
> ...
> A-profile Architecture 10.3-2021.07 (arm-10.29)'
> Thread model: posix
> Supported LTO compression algorithms: zlib
> gcc version 10.3.1 20210621 (GNU Toolchain for the A-profile
> Architecture 10.3-2021.07 (arm-10.29)
>
> I compared the before and after my patches. This time, without adding a
> "pages" variable, I proceeded to use the local variable mask with order
> operation.
>
> I was able to confirm that it does one less operation.
Well... I don't think the one less operation is because of introduction
of the local variable (see more below).
>
> (1) before clean up
>
> 0000000000001bb4 <p2m_set_entry>:
> while ( nr )
> 1bb4: b40005e2 cbz x2, 1c70 <p2m_set_entry+0xbc>
> {
> ...
> if ( rc )
> 1c1c: 350002e0 cbnz w0, 1c78 <p2m_set_entry+0xc4>
> sgfn = gfn_add(sgfn, (1 << order));
1 << order is a 32-bit value but the second parameter is a 64-bit value
(assuming arm64). So...
> 1c20: 1ad32373 lsl w19, w27, w19 // <<< CES works
> 1c24: 93407e73 sxtw x19, w19 // <<< well!
... this instruction is extending the 32-bit value to 64-bit value.
> return _gfn(gfn_x(gfn) + i);
> 1c28: 8b1302d6 add x22, x22, x19
> return _mfn(mfn_x(mfn) + i);
> 1c2c: 8b130281 add x1, x20, x19
> 1c30: b100069f cmn x20, #0x1
> 1c34: 9a941034 csel x20, x1, x20, ne // ne = any
> while ( nr )
> 1c38: eb1302b5 subs x21, x21, x19
> 1c3c: 540001e0 b.eq 1c78 <p2m_set_entry+0xc4> // b.none
>
> (2) Using again mask variable. mask = 1UL << order
> code show me sxtw x19, w19 operation disappeared.
This code is not only using a local variable but also using "1UL". So, I
suspect that if you were using 1 << order, the instruction would re-appear.
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-04-26 19:41 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-21 15:17 [PATCH] xen/arm: p2m_set_entry duplicate calculation Paran Lee
2022-04-21 23:26 ` Stefano Stabellini
2022-04-24 16:17 ` Julien Grall
2022-04-26 15:37 ` Paran Lee
2022-04-26 19:40 ` Julien Grall
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.