All of lore.kernel.org
 help / color / mirror / Atom feed
* Image scaling performance
@ 2015-02-24  9:39 Michael Zimmermann
  2015-02-24  9:51 ` Vladimir 'phcoder' Serbinenko
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Zimmermann @ 2015-02-24  9:39 UTC (permalink / raw)
  To: The development of GNU GRUB

Any ideas what could slow down the image scaling algorithm?
The only reasons I could think of would either be slow memory or some
compiler problems. Since my Ram is mapped cachable I don't think the
RAM is too slow.

I even forces using the Nearest neighbor algorithm already. It speeds
things up a lot but it's not as fast as you'd expect.

Some technical info:
ARMv7
Linaro GCC 4.9
MMU setup is done by the previous bootloader(I disabled GRUB's (uboot)
MMU setup - it prooved to be faster)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-24  9:39 Image scaling performance Michael Zimmermann
@ 2015-02-24  9:51 ` Vladimir 'phcoder' Serbinenko
  2015-02-24 10:00   ` Michael Zimmermann
  0 siblings, 1 reply; 24+ messages in thread
From: Vladimir 'phcoder' Serbinenko @ 2015-02-24  9:51 UTC (permalink / raw)
  To: The development of GRUB 2

[-- Attachment #1: Type: text/plain, Size: 967 bytes --]

Did you try to look at ASM of the function in question? Do you compile to
thumb? Multiplication sometimes generates function calls in thumb. Try
marking the scaling function as arm explicitly
Le 2015-02-24 10:39, "Michael Zimmermann" <sigmaepsilon92@gmail.com> a
écrit :

> Any ideas what could slow down the image scaling algorithm?
> The only reasons I could think of would either be slow memory or some
> compiler problems. Since my Ram is mapped cachable I don't think the
> RAM is too slow.
>
> I even forces using the Nearest neighbor algorithm already. It speeds
> things up a lot but it's not as fast as you'd expect.
>
> Some technical info:
> ARMv7
> Linaro GCC 4.9
> MMU setup is done by the previous bootloader(I disabled GRUB's (uboot)
> MMU setup - it prooved to be faster)
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
>

[-- Attachment #2: Type: text/html, Size: 1382 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-24  9:51 ` Vladimir 'phcoder' Serbinenko
@ 2015-02-24 10:00   ` Michael Zimmermann
  2015-02-24 11:27     ` Vladimir 'phcoder' Serbinenko
  2015-02-25 15:45     ` Leif Lindholm
  0 siblings, 2 replies; 24+ messages in thread
From: Michael Zimmermann @ 2015-02-24 10:00 UTC (permalink / raw)
  To: The development of GNU GRUB

the function seems to use __aeabi_uidiv. I'm not sure if this is a sw
or hw implementation.
Full code:
ASM: http://pastebin.com/FnPRZt1H
pseudo-C: http://pastebin.com/dH3YBk46

On Tue, Feb 24, 2015 at 10:51 AM, Vladimir 'phcoder' Serbinenko
<phcoder@gmail.com> wrote:
> Did you try to look at ASM of the function in question? Do you compile to
> thumb? Multiplication sometimes generates function calls in thumb. Try
> marking the scaling function as arm explicitly
>
> Le 2015-02-24 10:39, "Michael Zimmermann" <sigmaepsilon92@gmail.com> a écrit
> :
>>
>> Any ideas what could slow down the image scaling algorithm?
>> The only reasons I could think of would either be slow memory or some
>> compiler problems. Since my Ram is mapped cachable I don't think the
>> RAM is too slow.
>>
>> I even forces using the Nearest neighbor algorithm already. It speeds
>> things up a lot but it's not as fast as you'd expect.
>>
>> Some technical info:
>> ARMv7
>> Linaro GCC 4.9
>> MMU setup is done by the previous bootloader(I disabled GRUB's (uboot)
>> MMU setup - it prooved to be faster)
>>
>> _______________________________________________
>> Grub-devel mailing list
>> Grub-devel@gnu.org
>> https://lists.gnu.org/mailman/listinfo/grub-devel
>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-24 10:00   ` Michael Zimmermann
@ 2015-02-24 11:27     ` Vladimir 'phcoder' Serbinenko
  2015-02-24 11:47       ` Michael Zimmermann
  2015-02-25 15:45     ` Leif Lindholm
  1 sibling, 1 reply; 24+ messages in thread
From: Vladimir 'phcoder' Serbinenko @ 2015-02-24 11:27 UTC (permalink / raw)
  To: The development of GNU GRUB


[-- Attachment #1.1: Type: text/plain, Size: 1850 bytes --]

Le Tue Feb 24 2015 at 11:01:03 AM, Michael Zimmermann <
sigmaepsilon92@gmail.com> a écrit :

> the function seems to use __aeabi_uidiv. I'm not sure if this is a sw
> or hw implementation.
>
software. Try attached patch

> Full code:
> ASM: http://pastebin.com/FnPRZt1H
> pseudo-C <http://pastebin.com/FnPRZt1Hpseudo-C>:
> http://pastebin.com/dH3YBk46
>
> On Tue, Feb 24, 2015 at 10:51 AM, Vladimir 'phcoder' Serbinenko
> <phcoder@gmail.com> wrote:
> > Did you try to look at ASM of the function in question? Do you compile to
> > thumb? Multiplication sometimes generates function calls in thumb. Try
> > marking the scaling function as arm explicitly
> >
> > Le 2015-02-24 10:39, "Michael Zimmermann" <sigmaepsilon92@gmail.com> a
> écrit
> > :
> >>
> >> Any ideas what could slow down the image scaling algorithm?
> >> The only reasons I could think of would either be slow memory or some
> >> compiler problems. Since my Ram is mapped cachable I don't think the
> >> RAM is too slow.
> >>
> >> I even forces using the Nearest neighbor algorithm already. It speeds
> >> things up a lot but it's not as fast as you'd expect.
> >>
> >> Some technical info:
> >> ARMv7
> >> Linaro GCC 4.9
> >> MMU setup is done by the previous bootloader(I disabled GRUB's (uboot)
> >> MMU setup - it prooved to be faster)
> >>
> >> _______________________________________________
> >> Grub-devel mailing list
> >> Grub-devel@gnu.org
> >> https://lists.gnu.org/mailman/listinfo/grub-devel
> >
> >
> > _______________________________________________
> > Grub-devel mailing list
> > Grub-devel@gnu.org
> > https://lists.gnu.org/mailman/listinfo/grub-devel
> >
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
>

[-- Attachment #1.2: Type: text/html, Size: 3190 bytes --]

[-- Attachment #2: scale.diff --]
[-- Type: text/x-patch, Size: 1420 bytes --]

diff --git a/grub-core/video/bitmap_scale.c b/grub-core/video/bitmap_scale.c
index 0b93d02..1c7195f 100644
--- a/grub-core/video/bitmap_scale.c
+++ b/grub-core/video/bitmap_scale.c
@@ -366,22 +366,32 @@ scale_nn (struct grub_video_bitmap *dst, struct grub_video_bitmap *src)
   /* bytes_per_pixel is the same for both src and dst. */
   unsigned bytes_per_pixel = dst->mode_info.bytes_per_pixel;
 
-  unsigned dy;
-  for (dy = 0; dy < dh; dy++)
+  unsigned dy, sy, ystep, yfrac, yover;
+  unsigned dx, sx, xstep, xfrac, xover;
+  ystep = sw / dw;
+  yover = sw % dw;
+  xstep = sh / dh;
+  xover = sh % dh;
+
+  for (dy = 0, sy = 0; dy < dh; dy++, sy += ystep, yfrac += yover)
     {
       unsigned dx;
-      for (dx = 0; dx < dw; dx++)
+      if (yfrac > dw)
+	{
+	  yfrac -= dw;
+	  sy++;
+	}
+      for (dx = 0, sx = 0; dx < dw; dx++, sx += xstep, xfrac += xover)
         {
           grub_uint8_t *dptr;
           grub_uint8_t *sptr;
-          unsigned sx;
-          unsigned sy;
           unsigned comp;
 
-          /* Compute the source coordinate that the destination coordinate
-             maps to.  Note: sx/sw = dx/dw  =>  sx = sw*dx/dw. */
-          sx = sw * dx / dw;
-          sy = sh * dy / dh;
+	  if (xfrac > dh)
+	    {
+	      xfrac -= dh;
+	      sx++;
+	    }
 
           /* Get the address of the pixels in src and dst. */
           dptr = ddata + dy * dstride + dx * bytes_per_pixel;

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-24 11:27     ` Vladimir 'phcoder' Serbinenko
@ 2015-02-24 11:47       ` Michael Zimmermann
  2015-02-24 12:39         ` Vladimir 'phcoder' Serbinenko
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Zimmermann @ 2015-02-24 11:47 UTC (permalink / raw)
  To: The development of GNU GRUB

thx I'll try that but wouldn't it make more sense to implement a hw
version of this function?(do I need a different compiler?)

Almost all modules use this call and I guess it could really improve
the performance.

On Tue, Feb 24, 2015 at 12:27 PM, Vladimir 'phcoder' Serbinenko
<phcoder@gmail.com> wrote:
>
>
> Le Tue Feb 24 2015 at 11:01:03 AM, Michael Zimmermann
> <sigmaepsilon92@gmail.com> a écrit :
>>
>> the function seems to use __aeabi_uidiv. I'm not sure if this is a sw
>> or hw implementation.
>
> software. Try attached patch
>>
>> Full code:
>> ASM: http://pastebin.com/FnPRZt1H
>> pseudo-C: http://pastebin.com/dH3YBk46
>>
>> On Tue, Feb 24, 2015 at 10:51 AM, Vladimir 'phcoder' Serbinenko
>> <phcoder@gmail.com> wrote:
>> > Did you try to look at ASM of the function in question? Do you compile
>> > to
>> > thumb? Multiplication sometimes generates function calls in thumb. Try
>> > marking the scaling function as arm explicitly
>> >
>> > Le 2015-02-24 10:39, "Michael Zimmermann" <sigmaepsilon92@gmail.com> a
>> > écrit
>> > :
>> >>
>> >> Any ideas what could slow down the image scaling algorithm?
>> >> The only reasons I could think of would either be slow memory or some
>> >> compiler problems. Since my Ram is mapped cachable I don't think the
>> >> RAM is too slow.
>> >>
>> >> I even forces using the Nearest neighbor algorithm already. It speeds
>> >> things up a lot but it's not as fast as you'd expect.
>> >>
>> >> Some technical info:
>> >> ARMv7
>> >> Linaro GCC 4.9
>> >> MMU setup is done by the previous bootloader(I disabled GRUB's (uboot)
>> >> MMU setup - it prooved to be faster)
>> >>
>> >> _______________________________________________
>> >> Grub-devel mailing list
>> >> Grub-devel@gnu.org
>> >> https://lists.gnu.org/mailman/listinfo/grub-devel
>> >
>> >
>> > _______________________________________________
>> > Grub-devel mailing list
>> > Grub-devel@gnu.org
>> > https://lists.gnu.org/mailman/listinfo/grub-devel
>> >
>>
>> _______________________________________________
>> Grub-devel mailing list
>> Grub-devel@gnu.org
>> https://lists.gnu.org/mailman/listinfo/grub-devel
>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-24 11:47       ` Michael Zimmermann
@ 2015-02-24 12:39         ` Vladimir 'phcoder' Serbinenko
  2015-02-24 18:01           ` Michael Zimmermann
  2015-02-25 16:20           ` Leif Lindholm
  0 siblings, 2 replies; 24+ messages in thread
From: Vladimir 'phcoder' Serbinenko @ 2015-02-24 12:39 UTC (permalink / raw)
  To: The development of GNU GRUB

[-- Attachment #1: Type: text/plain, Size: 3026 bytes --]

Le Tue Feb 24 2015 at 12:48:10 PM, Michael Zimmermann <
sigmaepsilon92@gmail.com> a écrit :

> thx I'll try that but wouldn't it make more sense to implement a hw
> version of this function?(do I need a different compiler?)
>
> AFAIK there isn't a consistent division instruction across all ARMs and
which is enabled on boot time.
You can implement hw version of division but it will crash on some machines.
Division is a slow operation on any platform and should be avoided as far
as possible.

> Almost all modules use this call and I guess it could really improve
> the performance.
>
> On Tue, Feb 24, 2015 at 12:27 PM, Vladimir 'phcoder' Serbinenko
> <phcoder@gmail.com> wrote:
> >
> >
> > Le Tue Feb 24 2015 at 11:01:03 AM, Michael Zimmermann
> > <sigmaepsilon92@gmail.com> a écrit :
> >>
> >> the function seems to use __aeabi_uidiv. I'm not sure if this is a sw
> >> or hw implementation.
> >
> > software. Try attached patch
> >>
> >> Full code:
> >> ASM: http://pastebin.com/FnPRZt1H
> >> pseudo-C: http://pastebin.com/dH3YBk46
> >>
> >> On Tue, Feb 24, 2015 at 10:51 AM, Vladimir 'phcoder' Serbinenko
> >> <phcoder@gmail.com> wrote:
> >> > Did you try to look at ASM of the function in question? Do you compile
> >> > to
> >> > thumb? Multiplication sometimes generates function calls in thumb. Try
> >> > marking the scaling function as arm explicitly
> >> >
> >> > Le 2015-02-24 10:39, "Michael Zimmermann" <sigmaepsilon92@gmail.com>
> a
> >> > écrit
> >> > :
> >> >>
> >> >> Any ideas what could slow down the image scaling algorithm?
> >> >> The only reasons I could think of would either be slow memory or some
> >> >> compiler problems. Since my Ram is mapped cachable I don't think the
> >> >> RAM is too slow.
> >> >>
> >> >> I even forces using the Nearest neighbor algorithm already. It speeds
> >> >> things up a lot but it's not as fast as you'd expect.
> >> >>
> >> >> Some technical info:
> >> >> ARMv7
> >> >> Linaro GCC 4.9
> >> >> MMU setup is done by the previous bootloader(I disabled GRUB's
> (uboot)
> >> >> MMU setup - it prooved to be faster)
> >> >>
> >> >> _______________________________________________
> >> >> Grub-devel mailing list
> >> >> Grub-devel@gnu.org
> >> >> https://lists.gnu.org/mailman/listinfo/grub-devel
> >> >
> >> >
> >> > _______________________________________________
> >> > Grub-devel mailing list
> >> > Grub-devel@gnu.org
> >> > https://lists.gnu.org/mailman/listinfo/grub-devel
> >> >
> >>
> >> _______________________________________________
> >> Grub-devel mailing list
> >> Grub-devel@gnu.org
> >> https://lists.gnu.org/mailman/listinfo/grub-devel
> >
> >
> > _______________________________________________
> > Grub-devel mailing list
> > Grub-devel@gnu.org
> > https://lists.gnu.org/mailman/listinfo/grub-devel
> >
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
>

[-- Attachment #2: Type: text/html, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-24 12:39         ` Vladimir 'phcoder' Serbinenko
@ 2015-02-24 18:01           ` Michael Zimmermann
  2015-02-24 18:22             ` Andrei Borzenkov
  2015-02-25 16:20           ` Leif Lindholm
  1 sibling, 1 reply; 24+ messages in thread
From: Michael Zimmermann @ 2015-02-24 18:01 UTC (permalink / raw)
  To: The development of GNU GRUB

what do u mean with "which is enabled on boot time."?
what do linux kernel and userspace applications use?

On Tue, Feb 24, 2015 at 1:39 PM, Vladimir 'phcoder' Serbinenko
<phcoder@gmail.com> wrote:
>
>
> Le Tue Feb 24 2015 at 12:48:10 PM, Michael Zimmermann
> <sigmaepsilon92@gmail.com> a écrit :
>>
>> thx I'll try that but wouldn't it make more sense to implement a hw
>> version of this function?(do I need a different compiler?)
>>
> AFAIK there isn't a consistent division instruction across all ARMs and
> which is enabled on boot time.
> You can implement hw version of division but it will crash on some machines.
> Division is a slow operation on any platform and should be avoided as far as
> possible.
>>
>> Almost all modules use this call and I guess it could really improve
>> the performance.
>>
>> On Tue, Feb 24, 2015 at 12:27 PM, Vladimir 'phcoder' Serbinenko
>> <phcoder@gmail.com> wrote:
>> >
>> >
>> > Le Tue Feb 24 2015 at 11:01:03 AM, Michael Zimmermann
>> > <sigmaepsilon92@gmail.com> a écrit :
>> >>
>> >> the function seems to use __aeabi_uidiv. I'm not sure if this is a sw
>> >> or hw implementation.
>> >
>> > software. Try attached patch
>> >>
>> >> Full code:
>> >> ASM: http://pastebin.com/FnPRZt1H
>> >> pseudo-C: http://pastebin.com/dH3YBk46
>> >>
>> >> On Tue, Feb 24, 2015 at 10:51 AM, Vladimir 'phcoder' Serbinenko
>> >> <phcoder@gmail.com> wrote:
>> >> > Did you try to look at ASM of the function in question? Do you
>> >> > compile
>> >> > to
>> >> > thumb? Multiplication sometimes generates function calls in thumb.
>> >> > Try
>> >> > marking the scaling function as arm explicitly
>> >> >
>> >> > Le 2015-02-24 10:39, "Michael Zimmermann" <sigmaepsilon92@gmail.com>
>> >> > a
>> >> > écrit
>> >> > :
>> >> >>
>> >> >> Any ideas what could slow down the image scaling algorithm?
>> >> >> The only reasons I could think of would either be slow memory or
>> >> >> some
>> >> >> compiler problems. Since my Ram is mapped cachable I don't think the
>> >> >> RAM is too slow.
>> >> >>
>> >> >> I even forces using the Nearest neighbor algorithm already. It
>> >> >> speeds
>> >> >> things up a lot but it's not as fast as you'd expect.
>> >> >>
>> >> >> Some technical info:
>> >> >> ARMv7
>> >> >> Linaro GCC 4.9
>> >> >> MMU setup is done by the previous bootloader(I disabled GRUB's
>> >> >> (uboot)
>> >> >> MMU setup - it prooved to be faster)
>> >> >>
>> >> >> _______________________________________________
>> >> >> Grub-devel mailing list
>> >> >> Grub-devel@gnu.org
>> >> >> https://lists.gnu.org/mailman/listinfo/grub-devel
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Grub-devel mailing list
>> >> > Grub-devel@gnu.org
>> >> > https://lists.gnu.org/mailman/listinfo/grub-devel
>> >> >
>> >>
>> >> _______________________________________________
>> >> Grub-devel mailing list
>> >> Grub-devel@gnu.org
>> >> https://lists.gnu.org/mailman/listinfo/grub-devel
>> >
>> >
>> > _______________________________________________
>> > Grub-devel mailing list
>> > Grub-devel@gnu.org
>> > https://lists.gnu.org/mailman/listinfo/grub-devel
>> >
>>
>> _______________________________________________
>> Grub-devel mailing list
>> Grub-devel@gnu.org
>> https://lists.gnu.org/mailman/listinfo/grub-devel
>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-24 18:01           ` Michael Zimmermann
@ 2015-02-24 18:22             ` Andrei Borzenkov
  0 siblings, 0 replies; 24+ messages in thread
From: Andrei Borzenkov @ 2015-02-24 18:22 UTC (permalink / raw)
  To: Michael Zimmermann; +Cc: The development of GNU GRUB

В Tue, 24 Feb 2015 19:01:03 +0100
Michael Zimmermann <sigmaepsilon92@gmail.com> пишет:

> what do u mean with "which is enabled on boot time."?
> what do linux kernel and userspace applications use?
> 

Software implementation provided either by libgcc or explicitly defined
like grub does it (e.g. see arch/arm/lib/lib1funcs.S in linux source
tree). gcc generates call to them in both cases.

> On Tue, Feb 24, 2015 at 1:39 PM, Vladimir 'phcoder' Serbinenko
> <phcoder@gmail.com> wrote:
> >
> >
> > Le Tue Feb 24 2015 at 12:48:10 PM, Michael Zimmermann
> > <sigmaepsilon92@gmail.com> a écrit :
> >>
> >> thx I'll try that but wouldn't it make more sense to implement a hw
> >> version of this function?(do I need a different compiler?)
> >>
> > AFAIK there isn't a consistent division instruction across all ARMs and
> > which is enabled on boot time.
> > You can implement hw version of division but it will crash on some machines.
> > Division is a slow operation on any platform and should be avoided as far as
> > possible.
> >>
> >> Almost all modules use this call and I guess it could really improve
> >> the performance.
> >>
> >> On Tue, Feb 24, 2015 at 12:27 PM, Vladimir 'phcoder' Serbinenko
> >> <phcoder@gmail.com> wrote:
> >> >
> >> >
> >> > Le Tue Feb 24 2015 at 11:01:03 AM, Michael Zimmermann
> >> > <sigmaepsilon92@gmail.com> a écrit :
> >> >>
> >> >> the function seems to use __aeabi_uidiv. I'm not sure if this is a sw
> >> >> or hw implementation.
> >> >
> >> > software. Try attached patch
> >> >>
> >> >> Full code:
> >> >> ASM: http://pastebin.com/FnPRZt1H
> >> >> pseudo-C: http://pastebin.com/dH3YBk46
> >> >>
> >> >> On Tue, Feb 24, 2015 at 10:51 AM, Vladimir 'phcoder' Serbinenko
> >> >> <phcoder@gmail.com> wrote:
> >> >> > Did you try to look at ASM of the function in question? Do you
> >> >> > compile
> >> >> > to
> >> >> > thumb? Multiplication sometimes generates function calls in thumb.
> >> >> > Try
> >> >> > marking the scaling function as arm explicitly
> >> >> >
> >> >> > Le 2015-02-24 10:39, "Michael Zimmermann" <sigmaepsilon92@gmail.com>
> >> >> > a
> >> >> > écrit
> >> >> > :
> >> >> >>
> >> >> >> Any ideas what could slow down the image scaling algorithm?
> >> >> >> The only reasons I could think of would either be slow memory or
> >> >> >> some
> >> >> >> compiler problems. Since my Ram is mapped cachable I don't think the
> >> >> >> RAM is too slow.
> >> >> >>
> >> >> >> I even forces using the Nearest neighbor algorithm already. It
> >> >> >> speeds
> >> >> >> things up a lot but it's not as fast as you'd expect.
> >> >> >>
> >> >> >> Some technical info:
> >> >> >> ARMv7
> >> >> >> Linaro GCC 4.9
> >> >> >> MMU setup is done by the previous bootloader(I disabled GRUB's
> >> >> >> (uboot)
> >> >> >> MMU setup - it prooved to be faster)
> >> >> >>
> >> >> >> _______________________________________________
> >> >> >> Grub-devel mailing list
> >> >> >> Grub-devel@gnu.org
> >> >> >> https://lists.gnu.org/mailman/listinfo/grub-devel
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > Grub-devel mailing list
> >> >> > Grub-devel@gnu.org
> >> >> > https://lists.gnu.org/mailman/listinfo/grub-devel
> >> >> >
> >> >>
> >> >> _______________________________________________
> >> >> Grub-devel mailing list
> >> >> Grub-devel@gnu.org
> >> >> https://lists.gnu.org/mailman/listinfo/grub-devel
> >> >
> >> >
> >> > _______________________________________________
> >> > Grub-devel mailing list
> >> > Grub-devel@gnu.org
> >> > https://lists.gnu.org/mailman/listinfo/grub-devel
> >> >
> >>
> >> _______________________________________________
> >> Grub-devel mailing list
> >> Grub-devel@gnu.org
> >> https://lists.gnu.org/mailman/listinfo/grub-devel
> >
> >
> > _______________________________________________
> > Grub-devel mailing list
> > Grub-devel@gnu.org
> > https://lists.gnu.org/mailman/listinfo/grub-devel
> >
> 
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-24 10:00   ` Michael Zimmermann
  2015-02-24 11:27     ` Vladimir 'phcoder' Serbinenko
@ 2015-02-25 15:45     ` Leif Lindholm
  2015-02-25 16:23       ` Leif Lindholm
  1 sibling, 1 reply; 24+ messages in thread
From: Leif Lindholm @ 2015-02-25 15:45 UTC (permalink / raw)
  To: The development of GNU GRUB

On Tue, Feb 24, 2015 at 11:00:41AM +0100, Michael Zimmermann wrote:
> the function seems to use __aeabi_uidiv. I'm not sure if this is a sw
> or hw implementation.
> Full code:
> ASM: http://pastebin.com/FnPRZt1H
> pseudo-C: http://pastebin.com/dH3YBk46

> >> Some technical info:
> >> ARMv7
> >> Linaro GCC 4.9

I don't see any calls to any of the __aeabi helpers generated for this
file with current head. Which specific Linaro toolchain are you using?
(mine is"Linaro GCC 4.9-2014.09").

Also, scale_nn gets inlined into grub_video_bitmap_scale for me.

(Just trying to understand what is causing the difference.)

/
    Leif


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-24 12:39         ` Vladimir 'phcoder' Serbinenko
  2015-02-24 18:01           ` Michael Zimmermann
@ 2015-02-25 16:20           ` Leif Lindholm
  1 sibling, 0 replies; 24+ messages in thread
From: Leif Lindholm @ 2015-02-25 16:20 UTC (permalink / raw)
  To: The development of GNU GRUB

On Tue, Feb 24, 2015 at 12:39:31PM +0000, Vladimir 'phcoder' Serbinenko wrote:
> > thx I'll try that but wouldn't it make more sense to implement a hw
> > version of this function?(do I need a different compiler?)
> >
> AFAIK there isn't a consistent division instruction across all ARMs and
> which is enabled on boot time.
> You can implement hw version of division but it will crash on some machines.
> Division is a slow operation on any platform and should be avoided as far
> as possible.

For 32-bit ARM, only the later processors (Cortex-A7, -A12, -A15,
-A17) have SDIV/UDIV.

/
    Leif


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-25 15:45     ` Leif Lindholm
@ 2015-02-25 16:23       ` Leif Lindholm
  2015-02-25 18:38         ` Michael Zimmermann
  0 siblings, 1 reply; 24+ messages in thread
From: Leif Lindholm @ 2015-02-25 16:23 UTC (permalink / raw)
  To: The development of GNU GRUB

On Wed, Feb 25, 2015 at 03:45:40PM +0000, Leif Lindholm wrote:
> > >> Some technical info:
> > >> ARMv7
> > >> Linaro GCC 4.9
> 
> I don't see any calls to any of the __aeabi helpers generated for this
> file with current head. Which specific Linaro toolchain are you using?
> (mine is"Linaro GCC 4.9-2014.09").

Scratch that, I do see them. Just failing to drive the tools properly.

/
    Leif


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-25 16:23       ` Leif Lindholm
@ 2015-02-25 18:38         ` Michael Zimmermann
  2015-02-25 18:41           ` Vladimir 'phcoder' Serbinenko
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Zimmermann @ 2015-02-25 18:38 UTC (permalink / raw)
  To: The development of GNU GRUB

Why u think the native div code would crash on most devices? I support
ARMv7+ only anyway.

On Wed, Feb 25, 2015 at 5:23 PM, Leif Lindholm <leif.lindholm@linaro.org> wrote:
> On Wed, Feb 25, 2015 at 03:45:40PM +0000, Leif Lindholm wrote:
>> > >> Some technical info:
>> > >> ARMv7
>> > >> Linaro GCC 4.9
>>
>> I don't see any calls to any of the __aeabi helpers generated for this
>> file with current head. Which specific Linaro toolchain are you using?
>> (mine is"Linaro GCC 4.9-2014.09").
>
> Scratch that, I do see them. Just failing to drive the tools properly.
>
> /
>     Leif
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-25 18:38         ` Michael Zimmermann
@ 2015-02-25 18:41           ` Vladimir 'phcoder' Serbinenko
  2015-02-25 18:46             ` Michael Zimmermann
  0 siblings, 1 reply; 24+ messages in thread
From: Vladimir 'phcoder' Serbinenko @ 2015-02-25 18:41 UTC (permalink / raw)
  To: The development of GRUB 2

[-- Attachment #1: Type: text/plain, Size: 1218 bytes --]

ARMv7 doesn't mandate div instructions. It's a separate flag in features.
GRUB supports earlier CPUs as well and we use them for testing. My only
test machine is armv6
Le 2015-02-25 19:38, "Michael Zimmermann" <sigmaepsilon92@gmail.com> a
écrit :

> Why u think the native div code would crash on most devices? I support
> ARMv7+ only anyway.
>
> On Wed, Feb 25, 2015 at 5:23 PM, Leif Lindholm <leif.lindholm@linaro.org>
> wrote:
> > On Wed, Feb 25, 2015 at 03:45:40PM +0000, Leif Lindholm wrote:
> >> > >> Some technical info:
> >> > >> ARMv7
> >> > >> Linaro GCC 4.9
> >>
> >> I don't see any calls to any of the __aeabi helpers generated for this
> >> file with current head. Which specific Linaro toolchain are you using?
> >> (mine is"Linaro GCC 4.9-2014.09").
> >
> > Scratch that, I do see them. Just failing to drive the tools properly.
> >
> > /
> >     Leif
> >
> > _______________________________________________
> > Grub-devel mailing list
> > Grub-devel@gnu.org
> > https://lists.gnu.org/mailman/listinfo/grub-devel
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
>

[-- Attachment #2: Type: text/html, Size: 1930 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-25 18:41           ` Vladimir 'phcoder' Serbinenko
@ 2015-02-25 18:46             ` Michael Zimmermann
  2015-02-25 18:56               ` Vladimir 'φ-coder/phcoder' Serbinenko
                                 ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Michael Zimmermann @ 2015-02-25 18:46 UTC (permalink / raw)
  To: The development of GNU GRUB

oh ok so linux's div/mod/... assembler is as slow/fast as grub's code?
Linux uses armv5>= ifdefs. Maybe we could optimized things a little :)
About scale_nn,
amarullz(https://plus.google.com/u/0/+AhmadAmarullah/about) wrote a
optimized version without divs:
loops: http://pastebin.com/MaZqWSA9
memcpy: http://pastebin.com/iNq0V5Tw

this code works a little faster. I'm still questioning the efficiency
math operations because on slow devices there are other bottlenecks of
the same kind(like de/compression).

On Wed, Feb 25, 2015 at 7:41 PM, Vladimir 'phcoder' Serbinenko
<phcoder@gmail.com> wrote:
> ARMv7 doesn't mandate div instructions. It's a separate flag in features.
> GRUB supports earlier CPUs as well and we use them for testing. My only test
> machine is armv6
>
> Le 2015-02-25 19:38, "Michael Zimmermann" <sigmaepsilon92@gmail.com> a écrit
> :
>
>> Why u think the native div code would crash on most devices? I support
>> ARMv7+ only anyway.
>>
>> On Wed, Feb 25, 2015 at 5:23 PM, Leif Lindholm <leif.lindholm@linaro.org>
>> wrote:
>> > On Wed, Feb 25, 2015 at 03:45:40PM +0000, Leif Lindholm wrote:
>> >> > >> Some technical info:
>> >> > >> ARMv7
>> >> > >> Linaro GCC 4.9
>> >>
>> >> I don't see any calls to any of the __aeabi helpers generated for this
>> >> file with current head. Which specific Linaro toolchain are you using?
>> >> (mine is"Linaro GCC 4.9-2014.09").
>> >
>> > Scratch that, I do see them. Just failing to drive the tools properly.
>> >
>> > /
>> >     Leif
>> >
>> > _______________________________________________
>> > Grub-devel mailing list
>> > Grub-devel@gnu.org
>> > https://lists.gnu.org/mailman/listinfo/grub-devel
>>
>> _______________________________________________
>> Grub-devel mailing list
>> Grub-devel@gnu.org
>> https://lists.gnu.org/mailman/listinfo/grub-devel
>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-25 18:46             ` Michael Zimmermann
@ 2015-02-25 18:56               ` Vladimir 'φ-coder/phcoder' Serbinenko
  2015-02-25 19:28                 ` Michael Zimmermann
  2015-02-25 20:48               ` Vladimir 'φ-coder/phcoder' Serbinenko
  2015-02-26 16:44               ` Vladimir 'φ-coder/phcoder' Serbinenko
  2 siblings, 1 reply; 24+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-25 18:56 UTC (permalink / raw)
  To: grub-devel

[-- Attachment #1: Type: text/plain, Size: 2390 bytes --]

On 25.02.2015 19:46, Michael Zimmermann wrote:
> oh ok so linux's div/mod/... assembler is as slow/fast as grub's code?
> Linux uses armv5>= ifdefs. Maybe we could optimized things a little :)
> About scale_nn,
> amarullz(https://plus.google.com/u/0/+AhmadAmarullah/about) wrote a
> optimized version without divs:
> loops: http://pastebin.com/MaZqWSA9
> memcpy: http://pastebin.com/iNq0V5Tw
>
Please try my patch (reattached here after minor fixes). The patch by 
anonymous source, sent by third-party through pastebin isn't acceptable 
from legal perspective
> this code works a little faster. I'm still questioning the efficiency
> math operations because on slow devices there are other bottlenecks of
> the same kind(like de/compression).
>
> On Wed, Feb 25, 2015 at 7:41 PM, Vladimir 'phcoder' Serbinenko
> <phcoder@gmail.com> wrote:
>> ARMv7 doesn't mandate div instructions. It's a separate flag in features.
>> GRUB supports earlier CPUs as well and we use them for testing. My only test
>> machine is armv6
>>
>> Le 2015-02-25 19:38, "Michael Zimmermann" <sigmaepsilon92@gmail.com> a écrit
>> :
>>
>>> Why u think the native div code would crash on most devices? I support
>>> ARMv7+ only anyway.
>>>
>>> On Wed, Feb 25, 2015 at 5:23 PM, Leif Lindholm <leif.lindholm@linaro.org>
>>> wrote:
>>>> On Wed, Feb 25, 2015 at 03:45:40PM +0000, Leif Lindholm wrote:
>>>>>>>> Some technical info:
>>>>>>>> ARMv7
>>>>>>>> Linaro GCC 4.9
>>>>>
>>>>> I don't see any calls to any of the __aeabi helpers generated for this
>>>>> file with current head. Which specific Linaro toolchain are you using?
>>>>> (mine is"Linaro GCC 4.9-2014.09").
>>>>
>>>> Scratch that, I do see them. Just failing to drive the tools properly.
>>>>
>>>> /
>>>>      Leif
>>>>
>>>> _______________________________________________
>>>> Grub-devel mailing list
>>>> Grub-devel@gnu.org
>>>> https://lists.gnu.org/mailman/listinfo/grub-devel
>>>
>>> _______________________________________________
>>> Grub-devel mailing list
>>> Grub-devel@gnu.org
>>> https://lists.gnu.org/mailman/listinfo/grub-devel
>>
>>
>> _______________________________________________
>> Grub-devel mailing list
>> Grub-devel@gnu.org
>> https://lists.gnu.org/mailman/listinfo/grub-devel
>>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
>


[-- Attachment #2: scale.diff --]
[-- Type: text/x-diff, Size: 1442 bytes --]

diff --git a/grub-core/video/bitmap_scale.c b/grub-core/video/bitmap_scale.c
index 0b93d02..64bacbf 100644
--- a/grub-core/video/bitmap_scale.c
+++ b/grub-core/video/bitmap_scale.c
@@ -366,22 +366,31 @@ scale_nn (struct grub_video_bitmap *dst, struct grub_video_bitmap *src)
   /* bytes_per_pixel is the same for both src and dst. */
   unsigned bytes_per_pixel = dst->mode_info.bytes_per_pixel;
 
-  unsigned dy;
-  for (dy = 0; dy < dh; dy++)
+  unsigned dy, sy, ystep, yfrac, yover;
+  unsigned dx, sx, xstep, xfrac, xover;
+  ystep = sw / dw;
+  yover = sw % dw;
+  xstep = sh / dh;
+  xover = sh % dh;
+
+  for (dy = 0, sy = 0, yfrac = 0; dy < dh; dy++, sy += ystep, yfrac += yover)
     {
-      unsigned dx;
-      for (dx = 0; dx < dw; dx++)
+      if (yfrac > dw)
+	{
+	  yfrac -= dw;
+	  sy++;
+	}
+      for (dx = 0, sx = 0, xfrac = 0; dx < dw; dx++, sx += xstep, xfrac += xover)
         {
           grub_uint8_t *dptr;
           grub_uint8_t *sptr;
-          unsigned sx;
-          unsigned sy;
           unsigned comp;
 
-          /* Compute the source coordinate that the destination coordinate
-             maps to.  Note: sx/sw = dx/dw  =>  sx = sw*dx/dw. */
-          sx = sw * dx / dw;
-          sy = sh * dy / dh;
+	  if (xfrac > dh)
+	    {
+	      xfrac -= dh;
+	      sx++;
+	    }
 
           /* Get the address of the pixels in src and dst. */
           dptr = ddata + dy * dstride + dx * bytes_per_pixel;

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-25 18:56               ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2015-02-25 19:28                 ` Michael Zimmermann
  2015-02-25 20:39                   ` Vladimir 'φ-coder/phcoder' Serbinenko
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Zimmermann @ 2015-02-25 19:28 UTC (permalink / raw)
  To: The development of GNU GRUB

your patch still has graphical glitches: http://puu.sh/gcpco/da369f26c7.png
btw it should be legal because modified GPL code still is GPL code.

On Wed, Feb 25, 2015 at 7:56 PM, Vladimir 'φ-coder/phcoder' Serbinenko
<phcoder@gmail.com> wrote:
> On 25.02.2015 19:46, Michael Zimmermann wrote:
>>
>> oh ok so linux's div/mod/... assembler is as slow/fast as grub's code?
>> Linux uses armv5>= ifdefs. Maybe we could optimized things a little :)
>> About scale_nn,
>> amarullz(https://plus.google.com/u/0/+AhmadAmarullah/about) wrote a
>> optimized version without divs:
>> loops: http://pastebin.com/MaZqWSA9
>> memcpy: http://pastebin.com/iNq0V5Tw
>>
> Please try my patch (reattached here after minor fixes). The patch by
> anonymous source, sent by third-party through pastebin isn't acceptable from
> legal perspective
>
>> this code works a little faster. I'm still questioning the efficiency
>> math operations because on slow devices there are other bottlenecks of
>> the same kind(like de/compression).
>>
>> On Wed, Feb 25, 2015 at 7:41 PM, Vladimir 'phcoder' Serbinenko
>> <phcoder@gmail.com> wrote:
>>>
>>> ARMv7 doesn't mandate div instructions. It's a separate flag in features.
>>> GRUB supports earlier CPUs as well and we use them for testing. My only
>>> test
>>> machine is armv6
>>>
>>> Le 2015-02-25 19:38, "Michael Zimmermann" <sigmaepsilon92@gmail.com> a
>>> écrit
>>> :
>>>
>>>> Why u think the native div code would crash on most devices? I support
>>>> ARMv7+ only anyway.
>>>>
>>>> On Wed, Feb 25, 2015 at 5:23 PM, Leif Lindholm
>>>> <leif.lindholm@linaro.org>
>>>> wrote:
>>>>>
>>>>> On Wed, Feb 25, 2015 at 03:45:40PM +0000, Leif Lindholm wrote:
>>>>>>>>>
>>>>>>>>> Some technical info:
>>>>>>>>> ARMv7
>>>>>>>>> Linaro GCC 4.9
>>>>>>
>>>>>>
>>>>>> I don't see any calls to any of the __aeabi helpers generated for this
>>>>>> file with current head. Which specific Linaro toolchain are you using?
>>>>>> (mine is"Linaro GCC 4.9-2014.09").
>>>>>
>>>>>
>>>>> Scratch that, I do see them. Just failing to drive the tools properly.
>>>>>
>>>>> /
>>>>>      Leif
>>>>>
>>>>> _______________________________________________
>>>>> Grub-devel mailing list
>>>>> Grub-devel@gnu.org
>>>>> https://lists.gnu.org/mailman/listinfo/grub-devel
>>>>
>>>>
>>>> _______________________________________________
>>>> Grub-devel mailing list
>>>> Grub-devel@gnu.org
>>>> https://lists.gnu.org/mailman/listinfo/grub-devel
>>>
>>>
>>>
>>> _______________________________________________
>>> Grub-devel mailing list
>>> Grub-devel@gnu.org
>>> https://lists.gnu.org/mailman/listinfo/grub-devel
>>>
>>
>> _______________________________________________
>> Grub-devel mailing list
>> Grub-devel@gnu.org
>> https://lists.gnu.org/mailman/listinfo/grub-devel
>>
>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-25 19:28                 ` Michael Zimmermann
@ 2015-02-25 20:39                   ` Vladimir 'φ-coder/phcoder' Serbinenko
  0 siblings, 0 replies; 24+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-25 20:39 UTC (permalink / raw)
  To: The development of GNU GRUB

[-- Attachment #1: Type: text/plain, Size: 127 bytes --]

On 25.02.2015 20:28, Michael Zimmermann wrote:
> your patch still has graphical glitches: http://puu.sh/gcpco/da369f26c7.png



[-- Attachment #2: scale.diff --]
[-- Type: text/x-diff, Size: 6115 bytes --]

diff --git a/grub-core/video/bitmap_scale.c b/grub-core/video/bitmap_scale.c
index 0b93d02..70c32f0 100644
--- a/grub-core/video/bitmap_scale.c
+++ b/grub-core/video/bitmap_scale.c
@@ -361,35 +361,46 @@ scale_nn (struct grub_video_bitmap *dst, struct grub_video_bitmap *src)
   unsigned dh = dst->mode_info.height;
   unsigned sw = src->mode_info.width;
   unsigned sh = src->mode_info.height;
-  unsigned dstride = dst->mode_info.pitch;
-  unsigned sstride = src->mode_info.pitch;
+  int dstride = dst->mode_info.pitch;
+  int sstride = src->mode_info.pitch;
   /* bytes_per_pixel is the same for both src and dst. */
-  unsigned bytes_per_pixel = dst->mode_info.bytes_per_pixel;
+  int bytes_per_pixel = dst->mode_info.bytes_per_pixel;
+  unsigned dy, sy, ystep, yfrac, yover;
+  unsigned sx, xstep, xfrac, xover;
+  grub_uint8_t *dptr, *dline_end, *sline;
 
-  unsigned dy;
-  for (dy = 0; dy < dh; dy++)
+  xstep = sw / dw;
+  xover = sw % dw;
+  ystep = sh / dh;
+  yover = sh % dh;
+
+  for (dy = 0, sy = 0, yfrac = 0; dy < dh; dy++, sy += ystep, yfrac += yover)
     {
-      unsigned dx;
-      for (dx = 0; dx < dw; dx++)
+      if (yfrac >= dh)
+	{
+	  yfrac -= dh;
+	  sy++;
+	}
+      dptr = ddata + dy * dstride;
+      dline_end = dptr + dw * bytes_per_pixel;
+      sline = sdata + sy * sstride;
+      for (sx = 0, xfrac = 0; dptr < dline_end; sx += xstep, xfrac += xover, dptr += bytes_per_pixel)
         {
-          grub_uint8_t *dptr;
           grub_uint8_t *sptr;
-          unsigned sx;
-          unsigned sy;
-          unsigned comp;
+          int comp;
 
-          /* Compute the source coordinate that the destination coordinate
-             maps to.  Note: sx/sw = dx/dw  =>  sx = sw*dx/dw. */
-          sx = sw * dx / dw;
-          sy = sh * dy / dh;
+	  if (xfrac >= dw)
+	    {
+	      xfrac -= dw;
+	      sx++;
+	    }
 
           /* Get the address of the pixels in src and dst. */
-          dptr = ddata + dy * dstride + dx * bytes_per_pixel;
-          sptr = sdata + sy * sstride + sx * bytes_per_pixel;
+	  sptr = sline + sx * bytes_per_pixel;
 
-          /* Copy the pixel color value. */
-          for (comp = 0; comp < bytes_per_pixel; comp++)
-            dptr[comp] = sptr[comp];
+	  /* Copy the pixel color value. */
+	  for (comp = 0; comp < bytes_per_pixel; comp++)
+	    dptr[comp] = sptr[comp];
         }
     }
   return GRUB_ERR_NONE;
@@ -422,27 +433,40 @@ scale_bilinear (struct grub_video_bitmap *dst, struct grub_video_bitmap *src)
   int sstride = src->mode_info.pitch;
   /* bytes_per_pixel is the same for both src and dst. */
   int bytes_per_pixel = dst->mode_info.bytes_per_pixel;
+  unsigned dy, syf, sy, ystep, yfrac, yover;
+  unsigned sxf, sx, xstep, xfrac, xover;
+  grub_uint8_t *dptr, *dline_end, *sline;
+
+  xstep = (sw << 8) / dw;
+  xover = (sw << 8) % dw;
+  ystep = (sh << 8) / dh;
+  yover = (sh << 8) % dh;
 
-  unsigned dy;
-  for (dy = 0; dy < dh; dy++)
+  for (dy = 0, syf = 0, yfrac = 0; dy < dh; dy++, syf += ystep, yfrac += yover)
     {
-      unsigned dx;
-      for (dx = 0; dx < dw; dx++)
+      if (yfrac >= dh)
+	{
+	  yfrac -= dh;
+	  syf++;
+	}
+      sy = syf >> 8;
+      dptr = ddata + dy * dstride;
+      dline_end = dptr + dw * bytes_per_pixel;
+      sline = sdata + sy * sstride;
+      for (sxf = 0, xfrac = 0; dptr < dline_end; sxf += xstep, xfrac += xover, dptr += bytes_per_pixel)
         {
-          grub_uint8_t *dptr;
           grub_uint8_t *sptr;
-          unsigned sx;
-          unsigned sy;
           int comp;
 
-          /* Compute the source coordinate that the destination coordinate
-             maps to.  Note: sx/sw = dx/dw  =>  sx = sw*dx/dw. */
-          sx = sw * dx / dw;
-          sy = sh * dy / dh;
+	  if (xfrac >= dw)
+	    {
+	      xfrac -= dw;
+	      sxf++;
+	    }
 
           /* Get the address of the pixels in src and dst. */
-          dptr = ddata + dy * dstride + dx * bytes_per_pixel;
-          sptr = sdata + sy * sstride + sx * bytes_per_pixel;
+	  sx = sxf >> 8;
+	  sptr = sline + sx * bytes_per_pixel;
 
           /* If we have enough space to do so, use bilinear interpolation.
              Otherwise, fall back to nearest neighbor for this pixel. */
@@ -453,27 +477,27 @@ scale_bilinear (struct grub_video_bitmap *dst, struct grub_video_bitmap *src)
               /* Fixed-point .8 numbers representing the fraction of the
                  distance in the x (u) and y (v) direction within the
                  box of 4 pixels in the source. */
-              int u = (256 * sw * dx / dw) - (sx * 256);
-              int v = (256 * sh * dy / dh) - (sy * 256);
+              unsigned u = sxf & 0xff;
+              unsigned v = syf & 0xff;
 
               for (comp = 0; comp < bytes_per_pixel; comp++)
                 {
                   /* Get the component's values for the
                      four source corner pixels. */
-                  int f00 = sptr[comp];
-                  int f10 = sptr[comp + bytes_per_pixel];
-                  int f01 = sptr[comp + sstride];
-                  int f11 = sptr[comp + sstride + bytes_per_pixel];
+                  unsigned f00 = sptr[comp];
+                  unsigned f10 = sptr[comp + bytes_per_pixel];
+                  unsigned f01 = sptr[comp + sstride];
+                  unsigned f11 = sptr[comp + sstride + bytes_per_pixel];
 
                   /* Count coeffecients. */
-                  int c00 = (256 - u) * (256 - v);
-                  int c10 = u * (256 - v);
-                  int c01 = (256 - u) * v;
-                  int c11 = u * v;
+                  unsigned c00 = (256 - u) * (256 - v);
+                  unsigned c10 = u * (256 - v);
+                  unsigned c01 = (256 - u) * v;
+                  unsigned c11 = u * v;
 
                   /* Interpolate. */
-                  int fxy = c00 * f00 + c01 * f01 + c10 * f10 + c11 * f11;
-                  fxy = fxy / (256 * 256);
+                  unsigned fxy = c00 * f00 + c01 * f01 + c10 * f10 + c11 * f11;
+                  fxy = fxy >> 16;
 
                   dptr[comp] = fxy;
                 }

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-25 18:46             ` Michael Zimmermann
  2015-02-25 18:56               ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2015-02-25 20:48               ` Vladimir 'φ-coder/phcoder' Serbinenko
  2015-02-25 20:54                 ` Michael Zimmermann
  2015-02-26 16:44               ` Vladimir 'φ-coder/phcoder' Serbinenko
  2 siblings, 1 reply; 24+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-25 20:48 UTC (permalink / raw)
  To: The development of GNU GRUB

On 25.02.2015 19:46, Michael Zimmermann wrote:
> oh ok so linux's div/mod/... assembler is as slow/fast as grub's code?
> Linux uses armv5>= ifdefs. Maybe we could optimized things a little :)
maintaining optimised asm routines is a lot of burden. You'll get more 
bugs than speedup. It's possible to use sdiv/udiv after checking CPU 
model properly but it doesn't cover 64-bit division and usable only on 
few cpus anyway.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-25 20:48               ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2015-02-25 20:54                 ` Michael Zimmermann
  0 siblings, 0 replies; 24+ messages in thread
From: Michael Zimmermann @ 2015-02-25 20:54 UTC (permalink / raw)
  To: The development of GNU GRUB

the latest patch works just fine. even bi-linear scaling is fast :D

On Wed, Feb 25, 2015 at 9:48 PM, Vladimir 'φ-coder/phcoder' Serbinenko
<phcoder@gmail.com> wrote:
> On 25.02.2015 19:46, Michael Zimmermann wrote:
>>
>> oh ok so linux's div/mod/... assembler is as slow/fast as grub's code?
>> Linux uses armv5>= ifdefs. Maybe we could optimized things a little :)
>
> maintaining optimised asm routines is a lot of burden. You'll get more bugs
> than speedup. It's possible to use sdiv/udiv after checking CPU model
> properly but it doesn't cover 64-bit division and usable only on few cpus
> anyway.
>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-25 18:46             ` Michael Zimmermann
  2015-02-25 18:56               ` Vladimir 'φ-coder/phcoder' Serbinenko
  2015-02-25 20:48               ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2015-02-26 16:44               ` Vladimir 'φ-coder/phcoder' Serbinenko
  2015-02-26 17:10                 ` Michael Zimmermann
  2 siblings, 1 reply; 24+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-26 16:44 UTC (permalink / raw)
  To: The development of GNU GRUB

On 25.02.2015 19:46, Michael Zimmermann wrote:
> I'm still questioning the efficiency
> math operations because on slow devices there are other bottlenecks of
> the same kind(like de/compression).
That's pure speculation at that point. GRUB has 3 compression algorithms:
- minilzo. Has some divisions in parts which GRUB doesn't use. Those 
parts are easily disablable and I'll just do so.
- gzip. Uses division only in zlib header check. I'll optimise it a 
little but it's only one division in header check, not in compressed 
data body.
- xz. No divisions


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-26 16:44               ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2015-02-26 17:10                 ` Michael Zimmermann
  2015-02-26 17:16                   ` Vladimir 'φ-coder/phcoder' Serbinenko
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Zimmermann @ 2015-02-26 17:10 UTC (permalink / raw)
  To: The development of GNU GRUB

Is there a way to create a performance profile so I can see what
exactly needs so much time? I don't have JTAG but maybe UART+GDB could
help with that.

adding prints is kind of annoying :D

On Thu, Feb 26, 2015 at 5:44 PM, Vladimir 'φ-coder/phcoder' Serbinenko
<phcoder@gmail.com> wrote:
> On 25.02.2015 19:46, Michael Zimmermann wrote:
>>
>> I'm still questioning the efficiency
>> math operations because on slow devices there are other bottlenecks of
>> the same kind(like de/compression).
>
> That's pure speculation at that point. GRUB has 3 compression algorithms:
> - minilzo. Has some divisions in parts which GRUB doesn't use. Those parts
> are easily disablable and I'll just do so.
> - gzip. Uses division only in zlib header check. I'll optimise it a little
> but it's only one division in header check, not in compressed data body.
> - xz. No divisions
>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-26 17:10                 ` Michael Zimmermann
@ 2015-02-26 17:16                   ` Vladimir 'φ-coder/phcoder' Serbinenko
  2015-02-26 20:27                     ` Michael Zimmermann
  0 siblings, 1 reply; 24+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-26 17:16 UTC (permalink / raw)
  To: The development of GNU GRUB

On 26.02.2015 18:10, Michael Zimmermann wrote:
> Is there a way to create a performance profile so I can see what
> exactly needs so much time? I don't have JTAG but maybe UART+GDB could
> help with that.
>
Have a look at boot_time.
> adding prints is kind of annoying :D
>
> On Thu, Feb 26, 2015 at 5:44 PM, Vladimir 'φ-coder/phcoder' Serbinenko
> <phcoder@gmail.com> wrote:
>> On 25.02.2015 19:46, Michael Zimmermann wrote:
>>>
>>> I'm still questioning the efficiency
>>> math operations because on slow devices there are other bottlenecks of
>>> the same kind(like de/compression).
>>
>> That's pure speculation at that point. GRUB has 3 compression algorithms:
>> - minilzo. Has some divisions in parts which GRUB doesn't use. Those parts
>> are easily disablable and I'll just do so.
>> - gzip. Uses division only in zlib header check. I'll optimise it a little
>> but it's only one division in header check, not in compressed data body.
>> - xz. No divisions
>>
>>
>> _______________________________________________
>> Grub-devel mailing list
>> Grub-devel@gnu.org
>> https://lists.gnu.org/mailman/listinfo/grub-devel
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-26 17:16                   ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2015-02-26 20:27                     ` Michael Zimmermann
  2015-02-26 20:35                       ` Vladimir 'φ-coder/phcoder' Serbinenko
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Zimmermann @ 2015-02-26 20:27 UTC (permalink / raw)
  To: The development of GNU GRUB

well as u can see, boottime isn't detailed enough:
http://puu.sh/gdRXp/fc8fc176ce.png

Maybe I can hack printf to act a boottime.

On Thu, Feb 26, 2015 at 6:16 PM, Vladimir 'φ-coder/phcoder' Serbinenko
<phcoder@gmail.com> wrote:
> On 26.02.2015 18:10, Michael Zimmermann wrote:
>>
>> Is there a way to create a performance profile so I can see what
>> exactly needs so much time? I don't have JTAG but maybe UART+GDB could
>> help with that.
>>
> Have a look at boot_time.
>
>> adding prints is kind of annoying :D
>>
>> On Thu, Feb 26, 2015 at 5:44 PM, Vladimir 'φ-coder/phcoder' Serbinenko
>> <phcoder@gmail.com> wrote:
>>>
>>> On 25.02.2015 19:46, Michael Zimmermann wrote:
>>>>
>>>>
>>>> I'm still questioning the efficiency
>>>> math operations because on slow devices there are other bottlenecks of
>>>> the same kind(like de/compression).
>>>
>>>
>>> That's pure speculation at that point. GRUB has 3 compression algorithms:
>>> - minilzo. Has some divisions in parts which GRUB doesn't use. Those
>>> parts
>>> are easily disablable and I'll just do so.
>>> - gzip. Uses division only in zlib header check. I'll optimise it a
>>> little
>>> but it's only one division in header check, not in compressed data body.
>>> - xz. No divisions
>>>
>>>
>>> _______________________________________________
>>> Grub-devel mailing list
>>> Grub-devel@gnu.org
>>> https://lists.gnu.org/mailman/listinfo/grub-devel
>>
>>
>> _______________________________________________
>> Grub-devel mailing list
>> Grub-devel@gnu.org
>> https://lists.gnu.org/mailman/listinfo/grub-devel
>>
>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Image scaling performance
  2015-02-26 20:27                     ` Michael Zimmermann
@ 2015-02-26 20:35                       ` Vladimir 'φ-coder/phcoder' Serbinenko
  0 siblings, 0 replies; 24+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-26 20:35 UTC (permalink / raw)
  To: The development of GNU GRUB

On 26.02.2015 21:27, Michael Zimmermann wrote:
> well as u can see, boottime isn't detailed enough:
> http://puu.sh/gdRXp/fc8fc176ce.png
>
Just add more boottime checkpoints.
> Maybe I can hack printf to act a boottime.
>
> On Thu, Feb 26, 2015 at 6:16 PM, Vladimir 'φ-coder/phcoder' Serbinenko
> <phcoder@gmail.com> wrote:
>> On 26.02.2015 18:10, Michael Zimmermann wrote:
>>>
>>> Is there a way to create a performance profile so I can see what
>>> exactly needs so much time? I don't have JTAG but maybe UART+GDB could
>>> help with that.
>>>
>> Have a look at boot_time.
>>
>>> adding prints is kind of annoying :D
>>>
>>> On Thu, Feb 26, 2015 at 5:44 PM, Vladimir 'φ-coder/phcoder' Serbinenko
>>> <phcoder@gmail.com> wrote:
>>>>
>>>> On 25.02.2015 19:46, Michael Zimmermann wrote:
>>>>>
>>>>>
>>>>> I'm still questioning the efficiency
>>>>> math operations because on slow devices there are other bottlenecks of
>>>>> the same kind(like de/compression).
>>>>
>>>>
>>>> That's pure speculation at that point. GRUB has 3 compression algorithms:
>>>> - minilzo. Has some divisions in parts which GRUB doesn't use. Those
>>>> parts
>>>> are easily disablable and I'll just do so.
>>>> - gzip. Uses division only in zlib header check. I'll optimise it a
>>>> little
>>>> but it's only one division in header check, not in compressed data body.
>>>> - xz. No divisions
>>>>
>>>>
>>>> _______________________________________________
>>>> Grub-devel mailing list
>>>> Grub-devel@gnu.org
>>>> https://lists.gnu.org/mailman/listinfo/grub-devel
>>>
>>>
>>> _______________________________________________
>>> Grub-devel mailing list
>>> Grub-devel@gnu.org
>>> https://lists.gnu.org/mailman/listinfo/grub-devel
>>>
>>
>>
>> _______________________________________________
>> Grub-devel mailing list
>> Grub-devel@gnu.org
>> https://lists.gnu.org/mailman/listinfo/grub-devel
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
>



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-02-26 20:35 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-24  9:39 Image scaling performance Michael Zimmermann
2015-02-24  9:51 ` Vladimir 'phcoder' Serbinenko
2015-02-24 10:00   ` Michael Zimmermann
2015-02-24 11:27     ` Vladimir 'phcoder' Serbinenko
2015-02-24 11:47       ` Michael Zimmermann
2015-02-24 12:39         ` Vladimir 'phcoder' Serbinenko
2015-02-24 18:01           ` Michael Zimmermann
2015-02-24 18:22             ` Andrei Borzenkov
2015-02-25 16:20           ` Leif Lindholm
2015-02-25 15:45     ` Leif Lindholm
2015-02-25 16:23       ` Leif Lindholm
2015-02-25 18:38         ` Michael Zimmermann
2015-02-25 18:41           ` Vladimir 'phcoder' Serbinenko
2015-02-25 18:46             ` Michael Zimmermann
2015-02-25 18:56               ` Vladimir 'φ-coder/phcoder' Serbinenko
2015-02-25 19:28                 ` Michael Zimmermann
2015-02-25 20:39                   ` Vladimir 'φ-coder/phcoder' Serbinenko
2015-02-25 20:48               ` Vladimir 'φ-coder/phcoder' Serbinenko
2015-02-25 20:54                 ` Michael Zimmermann
2015-02-26 16:44               ` Vladimir 'φ-coder/phcoder' Serbinenko
2015-02-26 17:10                 ` Michael Zimmermann
2015-02-26 17:16                   ` Vladimir 'φ-coder/phcoder' Serbinenko
2015-02-26 20:27                     ` Michael Zimmermann
2015-02-26 20:35                       ` Vladimir 'φ-coder/phcoder' Serbinenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.