* prefetch inconsistency in copy_page()
@ 2017-07-11 9:42 Ard Biesheuvel
2017-07-11 17:52 ` Will Deacon
0 siblings, 1 reply; 5+ messages in thread
From: Ard Biesheuvel @ 2017-07-11 9:42 UTC (permalink / raw)
To: linux-arm-kernel
Hi all,
No big deal, but I spotted an inconsistency in how copy_page() does
its prefetching. With the code as-is, it prefetches 2 lines at the
start of the function, but it is off by one line in the loop, i.e., it
prefetches 3 lines ahead (and skips a line at entry). So imo, we need
either
"""
@@ -30,9 +30,10 @@
*/
ENTRY(copy_page)
alternative_if ARM64_HAS_NO_HW_PREFETCH
- # Prefetch two cache lines ahead.
+ # Prefetch three cache lines ahead.
prfm pldl1strm, [x1, #128]
prfm pldl1strm, [x1, #256]
+ prfm pldl1strm, [x1, #384]
alternative_else_nop_endif
ldp x2, x3, [x1]
"""
or
"""
@@ -50,7 +51,7 @@ alternative_else_nop_endif
subs x18, x18, #128
alternative_if ARM64_HAS_NO_HW_PREFETCH
- prfm pldl1strm, [x1, #384]
+ prfm pldl1strm, [x1, #256]
alternative_else_nop_endif
stnp x2, x3, [x0]
"""
to make things consistent again.
Thoughts?
--
Ard.
^ permalink raw reply [flat|nested] 5+ messages in thread
* prefetch inconsistency in copy_page()
2017-07-11 9:42 prefetch inconsistency in copy_page() Ard Biesheuvel
@ 2017-07-11 17:52 ` Will Deacon
2017-07-11 17:53 ` Pinski, Andrew
0 siblings, 1 reply; 5+ messages in thread
From: Will Deacon @ 2017-07-11 17:52 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jul 11, 2017 at 10:42:19AM +0100, Ard Biesheuvel wrote:
> Hi all,
>
> No big deal, but I spotted an inconsistency in how copy_page() does
> its prefetching. With the code as-is, it prefetches 2 lines at the
> start of the function, but it is off by one line in the loop, i.e., it
> prefetches 3 lines ahead (and skips a line at entry). So imo, we need
> either
>
> """
> @@ -30,9 +30,10 @@
> */
> ENTRY(copy_page)
> alternative_if ARM64_HAS_NO_HW_PREFETCH
> - # Prefetch two cache lines ahead.
> + # Prefetch three cache lines ahead.
> prfm pldl1strm, [x1, #128]
> prfm pldl1strm, [x1, #256]
> + prfm pldl1strm, [x1, #384]
> alternative_else_nop_endif
>
> ldp x2, x3, [x1]
> """
>
> or
>
> """
> @@ -50,7 +51,7 @@ alternative_else_nop_endif
> subs x18, x18, #128
>
> alternative_if ARM64_HAS_NO_HW_PREFETCH
> - prfm pldl1strm, [x1, #384]
> + prfm pldl1strm, [x1, #256]
> alternative_else_nop_endif
>
> stnp x2, x3, [x0]
>
> """
>
> to make things consistent again.
>
> Thoughts?
Either of those look good to me. Andrew -- any preference? If not, I'll take
the second one.
Will
^ permalink raw reply [flat|nested] 5+ messages in thread
* prefetch inconsistency in copy_page()
2017-07-11 17:52 ` Will Deacon
@ 2017-07-11 17:53 ` Pinski, Andrew
2017-07-11 17:58 ` Ard Biesheuvel
0 siblings, 1 reply; 5+ messages in thread
From: Pinski, Andrew @ 2017-07-11 17:53 UTC (permalink / raw)
To: linux-arm-kernel
> -----Original Message-----
> From: Will Deacon [mailto:will.deacon at arm.com]
> Sent: Tuesday, July 11, 2017 10:53 AM
> To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Mark Rutland <mark.rutland@arm.com>; Pinski, Andrew
> <Andrew.Pinski@cavium.com>; linux-arm-kernel at lists.infradead.org
> Subject: Re: prefetch inconsistency in copy_page()
>
> On Tue, Jul 11, 2017 at 10:42:19AM +0100, Ard Biesheuvel wrote:
> > Hi all,
> >
> > No big deal, but I spotted an inconsistency in how copy_page() does
> > its prefetching. With the code as-is, it prefetches 2 lines at the
> > start of the function, but it is off by one line in the loop, i.e., it
> > prefetches 3 lines ahead (and skips a line at entry). So imo, we need
> > either
> >
> > """
> > @@ -30,9 +30,10 @@
> > */
> > ENTRY(copy_page)
> > alternative_if ARM64_HAS_NO_HW_PREFETCH
> > - # Prefetch two cache lines ahead.
> > + # Prefetch three cache lines ahead.
> > prfm pldl1strm, [x1, #128]
> > prfm pldl1strm, [x1, #256]
> > + prfm pldl1strm, [x1, #384]
> > alternative_else_nop_endif
> >
> > ldp x2, x3, [x1]
> > """
> >
> > or
> >
> > """
> > @@ -50,7 +51,7 @@ alternative_else_nop_endif
> > subs x18, x18, #128
> >
> > alternative_if ARM64_HAS_NO_HW_PREFETCH
> > - prfm pldl1strm, [x1, #384]
> > + prfm pldl1strm, [x1, #256]
> > alternative_else_nop_endif
> >
> > stnp x2, x3, [x0]
> >
> > """
> >
> > to make things consistent again.
> >
> > Thoughts?
>
> Either of those look good to me. Andrew -- any preference? If not, I'll
> take the second one.
The first one is my preference but both are ok to me.
Thanks,
Andrew
>
> Will
^ permalink raw reply [flat|nested] 5+ messages in thread
* prefetch inconsistency in copy_page()
2017-07-11 17:53 ` Pinski, Andrew
@ 2017-07-11 17:58 ` Ard Biesheuvel
2017-07-11 18:26 ` Will Deacon
0 siblings, 1 reply; 5+ messages in thread
From: Ard Biesheuvel @ 2017-07-11 17:58 UTC (permalink / raw)
To: linux-arm-kernel
On 11 July 2017 at 18:53, Pinski, Andrew <Andrew.Pinski@cavium.com> wrote:
>> -----Original Message-----
>> From: Will Deacon [mailto:will.deacon at arm.com]
>> Sent: Tuesday, July 11, 2017 10:53 AM
>> To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>; Pinski, Andrew
>> <Andrew.Pinski@cavium.com>; linux-arm-kernel at lists.infradead.org
>> Subject: Re: prefetch inconsistency in copy_page()
>>
>> On Tue, Jul 11, 2017 at 10:42:19AM +0100, Ard Biesheuvel wrote:
>> > Hi all,
>> >
>> > No big deal, but I spotted an inconsistency in how copy_page() does
>> > its prefetching. With the code as-is, it prefetches 2 lines at the
>> > start of the function, but it is off by one line in the loop, i.e., it
>> > prefetches 3 lines ahead (and skips a line at entry). So imo, we need
>> > either
>> >
>> > """
>> > @@ -30,9 +30,10 @@
>> > */
>> > ENTRY(copy_page)
>> > alternative_if ARM64_HAS_NO_HW_PREFETCH
>> > - # Prefetch two cache lines ahead.
>> > + # Prefetch three cache lines ahead.
>> > prfm pldl1strm, [x1, #128]
>> > prfm pldl1strm, [x1, #256]
>> > + prfm pldl1strm, [x1, #384]
>> > alternative_else_nop_endif
>> >
>> > ldp x2, x3, [x1]
>> > """
>> >
>> > or
>> >
>> > """
>> > @@ -50,7 +51,7 @@ alternative_else_nop_endif
>> > subs x18, x18, #128
>> >
>> > alternative_if ARM64_HAS_NO_HW_PREFETCH
>> > - prfm pldl1strm, [x1, #384]
>> > + prfm pldl1strm, [x1, #256]
>> > alternative_else_nop_endif
>> >
>> > stnp x2, x3, [x0]
>> >
>> > """
>> >
>> > to make things consistent again.
>> >
>> > Thoughts?
>>
>> Either of those look good to me. Andrew -- any preference? If not, I'll
>> take the second one.
>
> The first one is my preference but both are ok to me.
>
I suppose the second one is the least likely to invalidate existing
benchmark results.
Will: should I spin it as a proper patch? Mind if I clean up the
whitespace at the same time?
^ permalink raw reply [flat|nested] 5+ messages in thread
* prefetch inconsistency in copy_page()
2017-07-11 17:58 ` Ard Biesheuvel
@ 2017-07-11 18:26 ` Will Deacon
0 siblings, 0 replies; 5+ messages in thread
From: Will Deacon @ 2017-07-11 18:26 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jul 11, 2017 at 06:58:54PM +0100, Ard Biesheuvel wrote:
> On 11 July 2017 at 18:53, Pinski, Andrew <Andrew.Pinski@cavium.com> wrote:
> >> -----Original Message-----
> >> From: Will Deacon [mailto:will.deacon at arm.com]
> >> Sent: Tuesday, July 11, 2017 10:53 AM
> >> To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >> Cc: Mark Rutland <mark.rutland@arm.com>; Pinski, Andrew
> >> <Andrew.Pinski@cavium.com>; linux-arm-kernel at lists.infradead.org
> >> Subject: Re: prefetch inconsistency in copy_page()
> >>
> >> On Tue, Jul 11, 2017 at 10:42:19AM +0100, Ard Biesheuvel wrote:
> >> > Hi all,
> >> >
> >> > No big deal, but I spotted an inconsistency in how copy_page() does
> >> > its prefetching. With the code as-is, it prefetches 2 lines at the
> >> > start of the function, but it is off by one line in the loop, i.e., it
> >> > prefetches 3 lines ahead (and skips a line at entry). So imo, we need
> >> > either
> >> >
> >> > """
> >> > @@ -30,9 +30,10 @@
> >> > */
> >> > ENTRY(copy_page)
> >> > alternative_if ARM64_HAS_NO_HW_PREFETCH
> >> > - # Prefetch two cache lines ahead.
> >> > + # Prefetch three cache lines ahead.
> >> > prfm pldl1strm, [x1, #128]
> >> > prfm pldl1strm, [x1, #256]
> >> > + prfm pldl1strm, [x1, #384]
> >> > alternative_else_nop_endif
> >> >
> >> > ldp x2, x3, [x1]
> >> > """
> >> >
> >> > or
> >> >
> >> > """
> >> > @@ -50,7 +51,7 @@ alternative_else_nop_endif
> >> > subs x18, x18, #128
> >> >
> >> > alternative_if ARM64_HAS_NO_HW_PREFETCH
> >> > - prfm pldl1strm, [x1, #384]
> >> > + prfm pldl1strm, [x1, #256]
> >> > alternative_else_nop_endif
> >> >
> >> > stnp x2, x3, [x0]
> >> >
> >> > """
> >> >
> >> > to make things consistent again.
> >> >
> >> > Thoughts?
> >>
> >> Either of those look good to me. Andrew -- any preference? If not, I'll
> >> take the second one.
> >
> > The first one is my preference but both are ok to me.
> >
>
> I suppose the second one is the least likely to invalidate existing
> benchmark results.
>
> Will: should I spin it as a proper patch? Mind if I clean up the
> whitespace at the same time?
Fill your boots!
Will
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-07-11 18:26 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-11 9:42 prefetch inconsistency in copy_page() Ard Biesheuvel
2017-07-11 17:52 ` Will Deacon
2017-07-11 17:53 ` Pinski, Andrew
2017-07-11 17:58 ` Ard Biesheuvel
2017-07-11 18:26 ` Will Deacon
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.