All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] memcpy optimized with strd/ldrd
       [not found] <03e101cd0e07$eec39f10$cc4add30$@com>
@ 2012-03-30 11:41 ` Boojin Kim
  2012-03-30 13:19   ` Nicolas Pitre
  0 siblings, 1 reply; 6+ messages in thread
From: Boojin Kim @ 2012-03-30 11:41 UTC (permalink / raw)
  To: linux-arm-kernel

Nicolas Pitre wrote:
>
>
> Here's my version.  Lightly tested.
> I have no A15 hardware to run any performance comparison though.
>
I'm reviewing and testing your patch. But, My other work disturbs to reviewing it.
I will give you feedback soon within this week.
Wait a little more.
And, Thanks for your patches. :)

>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 0/4] memcpy optimized with strd/ldrd
  2012-03-30 11:41 ` [PATCH 0/4] memcpy optimized with strd/ldrd Boojin Kim
@ 2012-03-30 13:19   ` Nicolas Pitre
  2012-04-03  8:07     ` Boojin Kim
  0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Pitre @ 2012-03-30 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 30 Mar 2012, Boojin Kim wrote:

> Nicolas Pitre wrote:
> >
> >
> > Here's my version.  Lightly tested.
> > I have no A15 hardware to run any performance comparison though.
> >
> I'm reviewing and testing your patch. But, My other work disturbs to reviewing it.
> I will give you feedback soon within this week.
> Wait a little more.
> And, Thanks for your patches. :)

FYI, it occurred to me that some corner cases might not be quite right 
with regards to alignment for the STRD instruction.  It seems that the 
hardware on which I tested it (Marvell Dove CPU) apparently copes with 
misaligned SDRD's when they're still 32-bit aligned.  So I need to run 
this code through a real validation harness on different hardware.


Nicolas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 0/4] memcpy optimized with strd/ldrd
  2012-03-30 13:19   ` Nicolas Pitre
@ 2012-04-03  8:07     ` Boojin Kim
  2012-04-03 14:48       ` Nicolas Pitre
  0 siblings, 1 reply; 6+ messages in thread
From: Boojin Kim @ 2012-04-03  8:07 UTC (permalink / raw)
  To: linux-arm-kernel

Nicolas Pitre wrote:

> > >
> > > Here's my version.  Lightly tested.
> > > I have no A15 hardware to run any performance comparison though.
> > >
> > I'm reviewing and testing your patch. But, My other work disturbs to reviewing it.
> > I will give you feedback soon within this week.
> > Wait a little more.
> > And, Thanks for your patches. :)
>
> FYI, it occurred to me that some corner cases might not be quite right
> with regards to alignment for the STRD instruction.  It seems that the
> hardware on which I tested it (Marvell Dove CPU) apparently copes with
> misaligned SDRD's when they're still 32-bit aligned.  So I need to run
> this code through a real validation harness on different hardware.

It's sad, but the performance result wasn't better after adapting your patch.
I think something on 1~3 patch brings performance degreasing.
Thanks :)

>
>
> Nicolas
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 0/4] memcpy optimized with strd/ldrd
  2012-04-03  8:07     ` Boojin Kim
@ 2012-04-03 14:48       ` Nicolas Pitre
  2012-04-26  7:35         ` Boojin Kim
  0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Pitre @ 2012-04-03 14:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 3 Apr 2012, Boojin Kim wrote:

> Nicolas Pitre wrote:
> 
> > > >
> > > > Here's my version.  Lightly tested.
> > > > I have no A15 hardware to run any performance comparison though.
> > > >
> > > I'm reviewing and testing your patch. But, My other work disturbs to reviewing it.
> > > I will give you feedback soon within this week.
> > > Wait a little more.
> > > And, Thanks for your patches. :)
> >
> > FYI, it occurred to me that some corner cases might not be quite right
> > with regards to alignment for the STRD instruction.  It seems that the
> > hardware on which I tested it (Marvell Dove CPU) apparently copes with
> > misaligned SDRD's when they're still 32-bit aligned.  So I need to run
> > this code through a real validation harness on different hardware.
> 
> It's sad, but the performance result wasn't better after adapting your patch.
> I think something on 1~3 patch brings performance degreasing.

If you could identify which patch is responsible that would be helpful.

Thanks.


Nicolas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 0/4] memcpy optimized with strd/ldrd
  2012-04-03 14:48       ` Nicolas Pitre
@ 2012-04-26  7:35         ` Boojin Kim
  0 siblings, 0 replies; 6+ messages in thread
From: Boojin Kim @ 2012-04-26  7:35 UTC (permalink / raw)
  To: linux-arm-kernel

Nicolas Pitre wrote:
> Sent: Tuesday, April 03, 2012 11:49 PM
> To: Boojin Kim
> Cc: linux-arm-kernel at lists.infradead.org
> Subject: RE: [PATCH 0/4] memcpy optimized with strd/ldrd
>
> On Tue, 3 Apr 2012, Boojin Kim wrote:
>
> > Nicolas Pitre wrote:
> >
> > > > >
> > > > > Here's my version.  Lightly tested.
> > > > > I have no A15 hardware to run any performance comparison though.
> > > > >
> > > > I'm reviewing and testing your patch. But, My other work disturbs to reviewing it.
> > > > I will give you feedback soon within this week.
> > > > Wait a little more.
> > > > And, Thanks for your patches. :)
> > >
> > > FYI, it occurred to me that some corner cases might not be quite right
> > > with regards to alignment for the STRD instruction.  It seems that the
> > > hardware on which I tested it (Marvell Dove CPU) apparently copes with
> > > misaligned SDRD's when they're still 32-bit aligned.  So I need to run
> > > this code through a real validation harness on different hardware.
> >
> > It's sad, but the performance result wasn't better after adapting your patch.
> > I think something on 1~3 patch brings performance degreasing.
>
> If you could identify which patch is responsible that would be helpful.
Sorry for late response. I'm so busy these days. Y_Y
I checked your patches. And, the 1st patch makes performance drop.
Transmit time for 4KB memcpy is 489ns. After applying 1st patch, the transmit time is 578ns.
Performance also drops on memcpy of other small size about 10%.
I wish this is helpful for you.
Thanks,
>
> Thanks.
>
>
> Nicolas
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 0/4] memcpy optimized with strd/ldrd
  2012-03-28  5:23 [PATCH 1/2] ARM: lib: Add optimized memcpy with 64 byte pld size Nicolas Pitre
@ 2012-03-29  4:00 ` Nicolas Pitre
  0 siblings, 0 replies; 6+ messages in thread
From: Nicolas Pitre @ 2012-03-29  4:00 UTC (permalink / raw)
  To: linux-arm-kernel


Here's my version.  Lightly tested.
I have no A15 hardware to run any performance comparison though.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-04-26  7:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <03e101cd0e07$eec39f10$cc4add30$@com>
2012-03-30 11:41 ` [PATCH 0/4] memcpy optimized with strd/ldrd Boojin Kim
2012-03-30 13:19   ` Nicolas Pitre
2012-04-03  8:07     ` Boojin Kim
2012-04-03 14:48       ` Nicolas Pitre
2012-04-26  7:35         ` Boojin Kim
2012-03-28  5:23 [PATCH 1/2] ARM: lib: Add optimized memcpy with 64 byte pld size Nicolas Pitre
2012-03-29  4:00 ` [PATCH 0/4] memcpy optimized with strd/ldrd Nicolas Pitre

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.