All of lore.kernel.org
 help / color / mirror / Atom feed
* prefetch inconsistency in copy_page()
@ 2017-07-11  9:42 Ard Biesheuvel
  2017-07-11 17:52 ` Will Deacon
  0 siblings, 1 reply; 5+ messages in thread
From: Ard Biesheuvel @ 2017-07-11  9:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

No big deal, but I spotted an inconsistency in how copy_page() does
its prefetching. With the code as-is, it prefetches 2 lines at the
start of the function, but it is off by one line in the loop, i.e., it
prefetches 3 lines ahead (and skips a line at entry). So imo, we need
either

"""
@@ -30,9 +30,10 @@
  */
 ENTRY(copy_page)
 alternative_if ARM64_HAS_NO_HW_PREFETCH
-       # Prefetch two cache lines ahead.
+       # Prefetch three cache lines ahead.
        prfm    pldl1strm, [x1, #128]
        prfm    pldl1strm, [x1, #256]
+       prfm    pldl1strm, [x1, #384]
 alternative_else_nop_endif

        ldp     x2, x3, [x1]
"""

or

"""
@@ -50,7 +51,7 @@ alternative_else_nop_endif
        subs    x18, x18, #128

 alternative_if ARM64_HAS_NO_HW_PREFETCH
-       prfm    pldl1strm, [x1, #384]
+       prfm    pldl1strm, [x1, #256]
 alternative_else_nop_endif

        stnp    x2, x3, [x0]

"""

to make things consistent again.

Thoughts?

-- 
Ard.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* prefetch inconsistency in copy_page()
  2017-07-11  9:42 prefetch inconsistency in copy_page() Ard Biesheuvel
@ 2017-07-11 17:52 ` Will Deacon
  2017-07-11 17:53   ` Pinski, Andrew
  0 siblings, 1 reply; 5+ messages in thread
From: Will Deacon @ 2017-07-11 17:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 11, 2017 at 10:42:19AM +0100, Ard Biesheuvel wrote:
> Hi all,
> 
> No big deal, but I spotted an inconsistency in how copy_page() does
> its prefetching. With the code as-is, it prefetches 2 lines at the
> start of the function, but it is off by one line in the loop, i.e., it
> prefetches 3 lines ahead (and skips a line at entry). So imo, we need
> either
> 
> """
> @@ -30,9 +30,10 @@
>   */
>  ENTRY(copy_page)
>  alternative_if ARM64_HAS_NO_HW_PREFETCH
> -       # Prefetch two cache lines ahead.
> +       # Prefetch three cache lines ahead.
>         prfm    pldl1strm, [x1, #128]
>         prfm    pldl1strm, [x1, #256]
> +       prfm    pldl1strm, [x1, #384]
>  alternative_else_nop_endif
> 
>         ldp     x2, x3, [x1]
> """
> 
> or
> 
> """
> @@ -50,7 +51,7 @@ alternative_else_nop_endif
>         subs    x18, x18, #128
> 
>  alternative_if ARM64_HAS_NO_HW_PREFETCH
> -       prfm    pldl1strm, [x1, #384]
> +       prfm    pldl1strm, [x1, #256]
>  alternative_else_nop_endif
> 
>         stnp    x2, x3, [x0]
> 
> """
> 
> to make things consistent again.
> 
> Thoughts?

Either of those look good to me. Andrew -- any preference? If not, I'll take
the second one.

Will

^ permalink raw reply	[flat|nested] 5+ messages in thread

* prefetch inconsistency in copy_page()
  2017-07-11 17:52 ` Will Deacon
@ 2017-07-11 17:53   ` Pinski, Andrew
  2017-07-11 17:58     ` Ard Biesheuvel
  0 siblings, 1 reply; 5+ messages in thread
From: Pinski, Andrew @ 2017-07-11 17:53 UTC (permalink / raw)
  To: linux-arm-kernel

> -----Original Message-----
> From: Will Deacon [mailto:will.deacon at arm.com]
> Sent: Tuesday, July 11, 2017 10:53 AM
> To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Mark Rutland <mark.rutland@arm.com>; Pinski, Andrew
> <Andrew.Pinski@cavium.com>; linux-arm-kernel at lists.infradead.org
> Subject: Re: prefetch inconsistency in copy_page()
> 
> On Tue, Jul 11, 2017 at 10:42:19AM +0100, Ard Biesheuvel wrote:
> > Hi all,
> >
> > No big deal, but I spotted an inconsistency in how copy_page() does
> > its prefetching. With the code as-is, it prefetches 2 lines at the
> > start of the function, but it is off by one line in the loop, i.e., it
> > prefetches 3 lines ahead (and skips a line at entry). So imo, we need
> > either
> >
> > """
> > @@ -30,9 +30,10 @@
> >   */
> >  ENTRY(copy_page)
> >  alternative_if ARM64_HAS_NO_HW_PREFETCH
> > -       # Prefetch two cache lines ahead.
> > +       # Prefetch three cache lines ahead.
> >         prfm    pldl1strm, [x1, #128]
> >         prfm    pldl1strm, [x1, #256]
> > +       prfm    pldl1strm, [x1, #384]
> >  alternative_else_nop_endif
> >
> >         ldp     x2, x3, [x1]
> > """
> >
> > or
> >
> > """
> > @@ -50,7 +51,7 @@ alternative_else_nop_endif
> >         subs    x18, x18, #128
> >
> >  alternative_if ARM64_HAS_NO_HW_PREFETCH
> > -       prfm    pldl1strm, [x1, #384]
> > +       prfm    pldl1strm, [x1, #256]
> >  alternative_else_nop_endif
> >
> >         stnp    x2, x3, [x0]
> >
> > """
> >
> > to make things consistent again.
> >
> > Thoughts?
> 
> Either of those look good to me. Andrew -- any preference? If not, I'll
> take the second one.

The first one is my preference but both are ok to me.

Thanks,
Andrew

> 
> Will

^ permalink raw reply	[flat|nested] 5+ messages in thread

* prefetch inconsistency in copy_page()
  2017-07-11 17:53   ` Pinski, Andrew
@ 2017-07-11 17:58     ` Ard Biesheuvel
  2017-07-11 18:26       ` Will Deacon
  0 siblings, 1 reply; 5+ messages in thread
From: Ard Biesheuvel @ 2017-07-11 17:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 11 July 2017 at 18:53, Pinski, Andrew <Andrew.Pinski@cavium.com> wrote:
>> -----Original Message-----
>> From: Will Deacon [mailto:will.deacon at arm.com]
>> Sent: Tuesday, July 11, 2017 10:53 AM
>> To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>; Pinski, Andrew
>> <Andrew.Pinski@cavium.com>; linux-arm-kernel at lists.infradead.org
>> Subject: Re: prefetch inconsistency in copy_page()
>>
>> On Tue, Jul 11, 2017 at 10:42:19AM +0100, Ard Biesheuvel wrote:
>> > Hi all,
>> >
>> > No big deal, but I spotted an inconsistency in how copy_page() does
>> > its prefetching. With the code as-is, it prefetches 2 lines at the
>> > start of the function, but it is off by one line in the loop, i.e., it
>> > prefetches 3 lines ahead (and skips a line at entry). So imo, we need
>> > either
>> >
>> > """
>> > @@ -30,9 +30,10 @@
>> >   */
>> >  ENTRY(copy_page)
>> >  alternative_if ARM64_HAS_NO_HW_PREFETCH
>> > -       # Prefetch two cache lines ahead.
>> > +       # Prefetch three cache lines ahead.
>> >         prfm    pldl1strm, [x1, #128]
>> >         prfm    pldl1strm, [x1, #256]
>> > +       prfm    pldl1strm, [x1, #384]
>> >  alternative_else_nop_endif
>> >
>> >         ldp     x2, x3, [x1]
>> > """
>> >
>> > or
>> >
>> > """
>> > @@ -50,7 +51,7 @@ alternative_else_nop_endif
>> >         subs    x18, x18, #128
>> >
>> >  alternative_if ARM64_HAS_NO_HW_PREFETCH
>> > -       prfm    pldl1strm, [x1, #384]
>> > +       prfm    pldl1strm, [x1, #256]
>> >  alternative_else_nop_endif
>> >
>> >         stnp    x2, x3, [x0]
>> >
>> > """
>> >
>> > to make things consistent again.
>> >
>> > Thoughts?
>>
>> Either of those look good to me. Andrew -- any preference? If not, I'll
>> take the second one.
>
> The first one is my preference but both are ok to me.
>

I suppose the second one is the least likely to invalidate existing
benchmark results.

Will: should I spin it as a proper patch? Mind if I clean up the
whitespace at the same time?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* prefetch inconsistency in copy_page()
  2017-07-11 17:58     ` Ard Biesheuvel
@ 2017-07-11 18:26       ` Will Deacon
  0 siblings, 0 replies; 5+ messages in thread
From: Will Deacon @ 2017-07-11 18:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 11, 2017 at 06:58:54PM +0100, Ard Biesheuvel wrote:
> On 11 July 2017 at 18:53, Pinski, Andrew <Andrew.Pinski@cavium.com> wrote:
> >> -----Original Message-----
> >> From: Will Deacon [mailto:will.deacon at arm.com]
> >> Sent: Tuesday, July 11, 2017 10:53 AM
> >> To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >> Cc: Mark Rutland <mark.rutland@arm.com>; Pinski, Andrew
> >> <Andrew.Pinski@cavium.com>; linux-arm-kernel at lists.infradead.org
> >> Subject: Re: prefetch inconsistency in copy_page()
> >>
> >> On Tue, Jul 11, 2017 at 10:42:19AM +0100, Ard Biesheuvel wrote:
> >> > Hi all,
> >> >
> >> > No big deal, but I spotted an inconsistency in how copy_page() does
> >> > its prefetching. With the code as-is, it prefetches 2 lines at the
> >> > start of the function, but it is off by one line in the loop, i.e., it
> >> > prefetches 3 lines ahead (and skips a line at entry). So imo, we need
> >> > either
> >> >
> >> > """
> >> > @@ -30,9 +30,10 @@
> >> >   */
> >> >  ENTRY(copy_page)
> >> >  alternative_if ARM64_HAS_NO_HW_PREFETCH
> >> > -       # Prefetch two cache lines ahead.
> >> > +       # Prefetch three cache lines ahead.
> >> >         prfm    pldl1strm, [x1, #128]
> >> >         prfm    pldl1strm, [x1, #256]
> >> > +       prfm    pldl1strm, [x1, #384]
> >> >  alternative_else_nop_endif
> >> >
> >> >         ldp     x2, x3, [x1]
> >> > """
> >> >
> >> > or
> >> >
> >> > """
> >> > @@ -50,7 +51,7 @@ alternative_else_nop_endif
> >> >         subs    x18, x18, #128
> >> >
> >> >  alternative_if ARM64_HAS_NO_HW_PREFETCH
> >> > -       prfm    pldl1strm, [x1, #384]
> >> > +       prfm    pldl1strm, [x1, #256]
> >> >  alternative_else_nop_endif
> >> >
> >> >         stnp    x2, x3, [x0]
> >> >
> >> > """
> >> >
> >> > to make things consistent again.
> >> >
> >> > Thoughts?
> >>
> >> Either of those look good to me. Andrew -- any preference? If not, I'll
> >> take the second one.
> >
> > The first one is my preference but both are ok to me.
> >
> 
> I suppose the second one is the least likely to invalidate existing
> benchmark results.
> 
> Will: should I spin it as a proper patch? Mind if I clean up the
> whitespace at the same time?

Fill your boots!

Will

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-07-11 18:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-11  9:42 prefetch inconsistency in copy_page() Ard Biesheuvel
2017-07-11 17:52 ` Will Deacon
2017-07-11 17:53   ` Pinski, Andrew
2017-07-11 17:58     ` Ard Biesheuvel
2017-07-11 18:26       ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.