All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 4/4] sm501: Optimise 1 pixel 2d ops
  2020-06-06 19:17 [PATCH 0/4] More sm501 fixes and optimisations BALATON Zoltan
  2020-06-06 19:17 ` [PATCH 1/4] sm501: Fix bounds checks BALATON Zoltan
@ 2020-06-06 19:17 ` BALATON Zoltan
  2020-06-15 15:42   ` Peter Maydell
  2020-06-06 19:17 ` [PATCH 2/4] sm501: Drop unneded variable BALATON Zoltan
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: BALATON Zoltan @ 2020-06-06 19:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Sebastian Bauer, Magnus Damm, Gerd Hoffmann,
	Aurelien Jarno

Some guests do 1x1 blits which is faster to do directly than calling a
function for it so avoid overhead in this case.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
---
 hw/display/sm501.c | 40 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/hw/display/sm501.c b/hw/display/sm501.c
index 3397ca9fbf..59693fbb5c 100644
--- a/hw/display/sm501.c
+++ b/hw/display/sm501.c
@@ -793,6 +793,25 @@ static void sm501_2d_operation(SM501State *s)
                 src_x == dst_x && src_y == dst_y) {
                 break;
             }
+            /* Some clients also do 1 pixel blits, avoid overhead for these */
+            if (width == 1 && height == 1) {
+                unsigned int si = (src_x + src_y * src_pitch) * (1 << format);
+                unsigned int di = (dst_x + dst_y * dst_pitch) * (1 << format);
+                switch (format) {
+                case 0:
+                    s->local_mem[dst_base + di] = s->local_mem[src_base + si];
+                    break;
+                case 1:
+                    *(uint16_t *)&s->local_mem[dst_base + di] =
+                                    *(uint16_t *)&s->local_mem[src_base + si];
+                    break;
+                case 2:
+                    *(uint32_t *)&s->local_mem[dst_base + di] =
+                                    *(uint32_t *)&s->local_mem[src_base + si];
+                    break;
+                }
+                break;
+            }
             /* Check for overlaps, this could be made more exact */
             uint32_t sb, se, db, de;
             sb = src_base + src_x + src_y * (width + src_pitch);
@@ -841,9 +860,24 @@ static void sm501_2d_operation(SM501State *s)
             color = cpu_to_le16(color);
         }
 
-        pixman_fill((uint32_t *)&s->local_mem[dst_base],
-                    dst_pitch * (1 << format) / sizeof(uint32_t),
-                    8 * (1 << format), dst_x, dst_y, width, height, color);
+        if (width == 1 && height == 1) {
+            unsigned int i = (dst_x + dst_y * dst_pitch) * (1 << format);
+            switch (format) {
+            case 0:
+                s->local_mem[dst_base + i] = color & 0xff;
+                break;
+            case 1:
+                *(uint16_t *)&s->local_mem[dst_base + i] = color & 0xffff;
+                break;
+            case 2:
+                *(uint32_t *)&s->local_mem[dst_base + i] = color;
+                break;
+            }
+        } else {
+            pixman_fill((uint32_t *)&s->local_mem[dst_base],
+                        dst_pitch * (1 << format) / sizeof(uint32_t),
+                        8 * (1 << format), dst_x, dst_y, width, height, color);
+        }
         break;
     }
     default:
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] sm501: Ignore no-op blits
  2020-06-06 19:17 [PATCH 0/4] More sm501 fixes and optimisations BALATON Zoltan
                   ` (2 preceding siblings ...)
  2020-06-06 19:17 ` [PATCH 2/4] sm501: Drop unneded variable BALATON Zoltan
@ 2020-06-06 19:17 ` BALATON Zoltan
  2020-06-15 14:56   ` Peter Maydell
  2020-06-12 13:10 ` [PATCH 0/4] More sm501 fixes and optimisations BALATON Zoltan
  4 siblings, 1 reply; 15+ messages in thread
From: BALATON Zoltan @ 2020-06-06 19:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Sebastian Bauer, Magnus Damm, Gerd Hoffmann,
	Aurelien Jarno

Some guests seem to try source copy blits with same source and dest
which are no-op so avoid calling pixman for these.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
---
 hw/display/sm501.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/hw/display/sm501.c b/hw/display/sm501.c
index 85d54b598f..3397ca9fbf 100644
--- a/hw/display/sm501.c
+++ b/hw/display/sm501.c
@@ -788,6 +788,11 @@ static void sm501_2d_operation(SM501State *s)
                               (rop2_source_is_pattern ?
                                   " with pattern source" : ""));
             }
+            /* Ignore no-op blits, some guests seem to do this */
+            if (src_base == dst_base && src_pitch == dst_pitch &&
+                src_x == dst_x && src_y == dst_y) {
+                break;
+            }
             /* Check for overlaps, this could be made more exact */
             uint32_t sb, se, db, de;
             sb = src_base + src_x + src_y * (width + src_pitch);
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 0/4] More sm501 fixes and optimisations
@ 2020-06-06 19:17 BALATON Zoltan
  2020-06-06 19:17 ` [PATCH 1/4] sm501: Fix bounds checks BALATON Zoltan
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: BALATON Zoltan @ 2020-06-06 19:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Sebastian Bauer, Magnus Damm, Gerd Hoffmann,
	Aurelien Jarno

Some small fixes and optimisations for hw/display/sm501.c.

BALATON Zoltan (4):
  sm501: Fix bounds checks
  sm501: Drop unneded variable
  sm501: Ignore no-op blits
  sm501: Optimise 1 pixel 2d ops

 hw/display/sm501.c | 58 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 48 insertions(+), 10 deletions(-)

-- 
2.21.3



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] sm501: Fix bounds checks
  2020-06-06 19:17 [PATCH 0/4] More sm501 fixes and optimisations BALATON Zoltan
@ 2020-06-06 19:17 ` BALATON Zoltan
  2020-06-15 15:23   ` Peter Maydell
  2020-06-06 19:17 ` [PATCH 4/4] sm501: Optimise 1 pixel 2d ops BALATON Zoltan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: BALATON Zoltan @ 2020-06-06 19:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Sebastian Bauer, Magnus Damm, Gerd Hoffmann,
	Aurelien Jarno

We don't need to add width to pitch when calculating last point, that
would reject valid ops within the card's local_mem.

Fixes: b15a22bbcbe6a78dc3d88fe3134985e4cdd87de4
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
---
 hw/display/sm501.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/display/sm501.c b/hw/display/sm501.c
index edd8d24a76..5ae320ddc3 100644
--- a/hw/display/sm501.c
+++ b/hw/display/sm501.c
@@ -723,8 +723,8 @@ static void sm501_2d_operation(SM501State *s)
         dst_y -= height - 1;
     }
 
-    if (dst_base >= get_local_mem_size(s) || dst_base +
-        (dst_x + width + (dst_y + height) * (dst_pitch + width)) *
+    if (dst_base >= get_local_mem_size(s) ||
+        dst_base + (dst_x + width + (dst_y + height) * dst_pitch) *
         (1 << format) >= get_local_mem_size(s)) {
         qemu_log_mask(LOG_GUEST_ERROR, "sm501: 2D op dest is outside vram.\n");
         return;
@@ -749,8 +749,8 @@ static void sm501_2d_operation(SM501State *s)
             src_y -= height - 1;
         }
 
-        if (src_base >= get_local_mem_size(s) || src_base +
-            (src_x + width + (src_y + height) * (src_pitch + width)) *
+        if (src_base >= get_local_mem_size(s) ||
+            src_base + (src_x + width + (src_y + height) * src_pitch) *
             (1 << format) >= get_local_mem_size(s)) {
             qemu_log_mask(LOG_GUEST_ERROR,
                           "sm501: 2D op src is outside vram.\n");
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] sm501: Drop unneded variable
  2020-06-06 19:17 [PATCH 0/4] More sm501 fixes and optimisations BALATON Zoltan
  2020-06-06 19:17 ` [PATCH 1/4] sm501: Fix bounds checks BALATON Zoltan
  2020-06-06 19:17 ` [PATCH 4/4] sm501: Optimise 1 pixel 2d ops BALATON Zoltan
@ 2020-06-06 19:17 ` BALATON Zoltan
  2020-06-15 14:50   ` Peter Maydell
  2020-06-06 19:17 ` [PATCH 3/4] sm501: Ignore no-op blits BALATON Zoltan
  2020-06-12 13:10 ` [PATCH 0/4] More sm501 fixes and optimisations BALATON Zoltan
  4 siblings, 1 reply; 15+ messages in thread
From: BALATON Zoltan @ 2020-06-06 19:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Sebastian Bauer, Magnus Damm, Gerd Hoffmann,
	Aurelien Jarno

We don't need a separate variable to keep track if we allocated memory
that needs to be freed as we can test the pointer itself.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
---
 hw/display/sm501.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/display/sm501.c b/hw/display/sm501.c
index 5ae320ddc3..85d54b598f 100644
--- a/hw/display/sm501.c
+++ b/hw/display/sm501.c
@@ -796,13 +796,12 @@ static void sm501_2d_operation(SM501State *s)
             de = db + width + height * (width + dst_pitch);
             if (rtl && ((db >= sb && db <= se) || (de >= sb && de <= se))) {
                 /* regions may overlap: copy via temporary */
-                int free_buf = 0, llb = width * (1 << format);
+                int llb = width * (1 << format);
                 int tmp_stride = DIV_ROUND_UP(llb, sizeof(uint32_t));
                 uint32_t *tmp = tmp_buf;
 
                 if (tmp_stride * sizeof(uint32_t) * height > sizeof(tmp_buf)) {
                     tmp = g_malloc(tmp_stride * sizeof(uint32_t) * height);
-                    free_buf = 1;
                 }
                 pixman_blt((uint32_t *)&s->local_mem[src_base], tmp,
                            src_pitch * (1 << format) / sizeof(uint32_t),
@@ -813,7 +812,7 @@ static void sm501_2d_operation(SM501State *s)
                            dst_pitch * (1 << format) / sizeof(uint32_t),
                            8 * (1 << format), 8 * (1 << format),
                            0, 0, dst_x, dst_y, width, height);
-                if (free_buf) {
+                if (tmp != tmp_buf) {
                     g_free(tmp);
                 }
             } else {
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/4] More sm501 fixes and optimisations
  2020-06-06 19:17 [PATCH 0/4] More sm501 fixes and optimisations BALATON Zoltan
                   ` (3 preceding siblings ...)
  2020-06-06 19:17 ` [PATCH 3/4] sm501: Ignore no-op blits BALATON Zoltan
@ 2020-06-12 13:10 ` BALATON Zoltan
  4 siblings, 0 replies; 15+ messages in thread
From: BALATON Zoltan @ 2020-06-12 13:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Sebastian Bauer, Magnus Damm, Gerd Hoffmann,
	Aurelien Jarno

On Sat, 6 Jun 2020, BALATON Zoltan wrote:
> Some small fixes and optimisations for hw/display/sm501.c.

Ping?

Regards,
BALATON Zoltan

> BALATON Zoltan (4):
>  sm501: Fix bounds checks
>  sm501: Drop unneded variable
>  sm501: Ignore no-op blits
>  sm501: Optimise 1 pixel 2d ops
>
> hw/display/sm501.c | 58 ++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 48 insertions(+), 10 deletions(-)
>
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] sm501: Drop unneded variable
  2020-06-06 19:17 ` [PATCH 2/4] sm501: Drop unneded variable BALATON Zoltan
@ 2020-06-15 14:50   ` Peter Maydell
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Maydell @ 2020-06-15 14:50 UTC (permalink / raw)
  To: BALATON Zoltan
  Cc: Sebastian Bauer, Magnus Damm, QEMU Developers, Aurelien Jarno,
	Gerd Hoffmann

On Sat, 6 Jun 2020 at 20:22, BALATON Zoltan <balaton@eik.bme.hu> wrote:
>
> We don't need a separate variable to keep track if we allocated memory
> that needs to be freed as we can test the pointer itself.
>
> Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
> ---
>  hw/display/sm501.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] sm501: Ignore no-op blits
  2020-06-06 19:17 ` [PATCH 3/4] sm501: Ignore no-op blits BALATON Zoltan
@ 2020-06-15 14:56   ` Peter Maydell
  2020-06-15 15:14     ` BALATON Zoltan
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Maydell @ 2020-06-15 14:56 UTC (permalink / raw)
  To: BALATON Zoltan
  Cc: Sebastian Bauer, Magnus Damm, QEMU Developers, Aurelien Jarno,
	Gerd Hoffmann

On Sat, 6 Jun 2020 at 20:22, BALATON Zoltan <balaton@eik.bme.hu> wrote:
>
> Some guests seem to try source copy blits with same source and dest
> which are no-op so avoid calling pixman for these.

Are they doing actual source copy blits, or one of the other
currently-unimplemented op types which we currently fall
back to doing as source-copy ?

> Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] sm501: Ignore no-op blits
  2020-06-15 14:56   ` Peter Maydell
@ 2020-06-15 15:14     ` BALATON Zoltan
  0 siblings, 0 replies; 15+ messages in thread
From: BALATON Zoltan @ 2020-06-15 15:14 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Sebastian Bauer, Magnus Damm, QEMU Developers, Aurelien Jarno,
	Gerd Hoffmann

On Mon, 15 Jun 2020, Peter Maydell wrote:
> On Sat, 6 Jun 2020 at 20:22, BALATON Zoltan <balaton@eik.bme.hu> wrote:
>>
>> Some guests seem to try source copy blits with same source and dest
>> which are no-op so avoid calling pixman for these.
>
> Are they doing actual source copy blits, or one of the other
> currently-unimplemented op types which we currently fall
> back to doing as source-copy ?

I thought the same but in the cases I've seen the rop was 0xcc which is 
copy source so it looks like an actual source copy blit (this seems to be 
an issue in the guest driver, I've notified appropriate people but not 
sure it's still maintained). Even if it was fallback we wouldn't need to 
do the copy in this case as it changes nothing so we can optimise this 
out. When we implement the missing op that would come before this case so 
this is only for source copy case.

>> Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu
>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

Thank you,
BALATON Zoltan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] sm501: Fix bounds checks
  2020-06-06 19:17 ` [PATCH 1/4] sm501: Fix bounds checks BALATON Zoltan
@ 2020-06-15 15:23   ` Peter Maydell
  2020-06-15 16:40     ` BALATON Zoltan
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Maydell @ 2020-06-15 15:23 UTC (permalink / raw)
  To: BALATON Zoltan
  Cc: Sebastian Bauer, Magnus Damm, QEMU Developers, Aurelien Jarno,
	Gerd Hoffmann

On Sat, 6 Jun 2020 at 20:22, BALATON Zoltan <balaton@eik.bme.hu> wrote:
>
> We don't need to add width to pitch when calculating last point, that
> would reject valid ops within the card's local_mem.
>
> Fixes: b15a22bbcbe6a78dc3d88fe3134985e4cdd87de4
> Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
> ---
>  hw/display/sm501.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

The calculations for sb/se/db/de all have a term which
multiplies by (width + pitch), which makes me suspect
they also need a similar fix ?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] sm501: Optimise 1 pixel 2d ops
  2020-06-06 19:17 ` [PATCH 4/4] sm501: Optimise 1 pixel 2d ops BALATON Zoltan
@ 2020-06-15 15:42   ` Peter Maydell
  2020-06-15 17:00     ` BALATON Zoltan
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Maydell @ 2020-06-15 15:42 UTC (permalink / raw)
  To: BALATON Zoltan
  Cc: Sebastian Bauer, Magnus Damm, QEMU Developers, Aurelien Jarno,
	Gerd Hoffmann

On Sat, 6 Jun 2020 at 20:22, BALATON Zoltan <balaton@eik.bme.hu> wrote:
>
> Some guests do 1x1 blits which is faster to do directly than calling a
> function for it so avoid overhead in this case.

How much does the performance improve by ?

> Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
> ---
>  hw/display/sm501.c | 40 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 37 insertions(+), 3 deletions(-)
>
> diff --git a/hw/display/sm501.c b/hw/display/sm501.c
> index 3397ca9fbf..59693fbb5c 100644
> --- a/hw/display/sm501.c
> +++ b/hw/display/sm501.c
> @@ -793,6 +793,25 @@ static void sm501_2d_operation(SM501State *s)
>                  src_x == dst_x && src_y == dst_y) {
>                  break;
>              }
> +            /* Some clients also do 1 pixel blits, avoid overhead for these */
> +            if (width == 1 && height == 1) {
> +                unsigned int si = (src_x + src_y * src_pitch) * (1 << format);
> +                unsigned int di = (dst_x + dst_y * dst_pitch) * (1 << format);
> +                switch (format) {
> +                case 0:
> +                    s->local_mem[dst_base + di] = s->local_mem[src_base + si];
> +                    break;
> +                case 1:
> +                    *(uint16_t *)&s->local_mem[dst_base + di] =
> +                                    *(uint16_t *)&s->local_mem[src_base + si];
> +                    break;
> +                case 2:
> +                    *(uint32_t *)&s->local_mem[dst_base + di] =
> +                                    *(uint32_t *)&s->local_mem[src_base + si];
> +                    break;
> +                }

You could write this more compactly as
                   stn_he_p(&s->local_mem[dst_base + di], 1 << format,
                            ldn_he_p(&s->local_mem[src_base + si], 1
<< format));

(which handles the length-cases for you and also doesn't rely on
casting a uint8_t* giving you something correctly aligned for a
wider access).

> +                break;
> +            }
>              /* Check for overlaps, this could be made more exact */
>              uint32_t sb, se, db, de;
>              sb = src_base + src_x + src_y * (width + src_pitch);
> @@ -841,9 +860,24 @@ static void sm501_2d_operation(SM501State *s)
>              color = cpu_to_le16(color);
>          }
>
> -        pixman_fill((uint32_t *)&s->local_mem[dst_base],
> -                    dst_pitch * (1 << format) / sizeof(uint32_t),
> -                    8 * (1 << format), dst_x, dst_y, width, height, color);
> +        if (width == 1 && height == 1) {
> +            unsigned int i = (dst_x + dst_y * dst_pitch) * (1 << format);
> +            switch (format) {
> +            case 0:
> +                s->local_mem[dst_base + i] = color & 0xff;
> +                break;
> +            case 1:
> +                *(uint16_t *)&s->local_mem[dst_base + i] = color & 0xffff;
> +                break;
> +            case 2:
> +                *(uint32_t *)&s->local_mem[dst_base + i] = color;
> +                break;
> +            }

               stn_he_p(&s->local_mem[dst_base + i], 1 << format, color);


> +        } else {
> +            pixman_fill((uint32_t *)&s->local_mem[dst_base],
> +                        dst_pitch * (1 << format) / sizeof(uint32_t),
> +                        8 * (1 << format), dst_x, dst_y, width, height, color);
> +        }
>          break;
>      }
>      default:

thanks
-- PMM


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] sm501: Fix bounds checks
  2020-06-15 15:23   ` Peter Maydell
@ 2020-06-15 16:40     ` BALATON Zoltan
  2020-06-15 16:53       ` Peter Maydell
  0 siblings, 1 reply; 15+ messages in thread
From: BALATON Zoltan @ 2020-06-15 16:40 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Sebastian Bauer, Magnus Damm, QEMU Developers, Aurelien Jarno,
	Gerd Hoffmann

On Mon, 15 Jun 2020, Peter Maydell wrote:
> On Sat, 6 Jun 2020 at 20:22, BALATON Zoltan <balaton@eik.bme.hu> wrote:
>>
>> We don't need to add width to pitch when calculating last point, that
>> would reject valid ops within the card's local_mem.
>>
>> Fixes: b15a22bbcbe6a78dc3d88fe3134985e4cdd87de4
>> Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
>> ---
>>  hw/display/sm501.c | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>
> The calculations for sb/se/db/de all have a term which
> multiplies by (width + pitch), which makes me suspect
> they also need a similar fix ?

Maybe. I'll have to check again. Actually is there a simpler way to check 
if two rectangles overlap when they are given with base, x, y, w, h, pitch 
where base is the first byte of the screen, pitch is length of one line 
and x,y is coordinates of top left corner and w,h is dimensions of the 
rect. Now that I think about it we also need to take into accounf the 
bytes per pixel value (1 << format) because base is given in bytes while 
others are in pixels so these formulae likely need some fixes. Pixman has 
some functions for these but those assume common base so to use those we 
would need to bring the two rectangles to common base which I could not 
find out how to do. Probably this is really simple for someone who already 
did a lot of these before.

Regards,
BALATON Zoltan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] sm501: Fix bounds checks
  2020-06-15 16:40     ` BALATON Zoltan
@ 2020-06-15 16:53       ` Peter Maydell
  2020-06-15 17:43         ` BALATON Zoltan
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Maydell @ 2020-06-15 16:53 UTC (permalink / raw)
  To: BALATON Zoltan
  Cc: Sebastian Bauer, Magnus Damm, QEMU Developers, Aurelien Jarno,
	Gerd Hoffmann

On Mon, 15 Jun 2020 at 17:40, BALATON Zoltan <balaton@eik.bme.hu> wrote:
>
> On Mon, 15 Jun 2020, Peter Maydell wrote:
> > The calculations for sb/se/db/de all have a term which
> > multiplies by (width + pitch), which makes me suspect
> > they also need a similar fix ?
>
> Maybe. I'll have to check again. Actually is there a simpler way to check
> if two rectangles overlap when they are given with base, x, y, w, h, pitch
> where base is the first byte of the screen, pitch is length of one line
> and x,y is coordinates of top left corner and w,h is dimensions of the
> rect. Now that I think about it we also need to take into accounf the
> bytes per pixel value (1 << format) because base is given in bytes while
> others are in pixels so these formulae likely need some fixes. Pixman has
> some functions for these but those assume common base so to use those we
> would need to bring the two rectangles to common base which I could not
> find out how to do. Probably this is really simple for someone who already
> did a lot of these before.

I think the thing that makes it particularly awkward is that
the source and dest can have different pitches. That means it's
not a simple "do two rectangles overlap" test because the dest
area might not be a rectangle at all when looked at from the
POV of the source.

Do guests usually set src and dst pitch identical? If so it
might be worth having a more accurate rectangle-overlap test
for the common case and a looser check for the hard-to-handle
case.

I might have a think about this and draw some diagrams tomorrow :-)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] sm501: Optimise 1 pixel 2d ops
  2020-06-15 15:42   ` Peter Maydell
@ 2020-06-15 17:00     ` BALATON Zoltan
  0 siblings, 0 replies; 15+ messages in thread
From: BALATON Zoltan @ 2020-06-15 17:00 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Sebastian Bauer, Magnus Damm, QEMU Developers, Aurelien Jarno,
	Gerd Hoffmann

On Mon, 15 Jun 2020, Peter Maydell wrote:
> On Sat, 6 Jun 2020 at 20:22, BALATON Zoltan <balaton@eik.bme.hu> wrote:
>>
>> Some guests do 1x1 blits which is faster to do directly than calling a
>> function for it so avoid overhead in this case.
>
> How much does the performance improve by ?

I haven't actually measured due to lack of time experimenting with it and 
those who I've asked to check it only reported it felt a bit faster but no 
numerical measurements so that does not prove anything. Anyway these 
special cases should not hurt and simple enough to have it here even if 
may not improve performance very much. The biggest loss is probably 
converting 16 bit screen on every display update to 32 bit because the 
guest driver does not allow higher bit depth than 16 for some reason. But 
I don't plan to spend more time with improving sm501, only done it because 
I had to touch it anyway. Longer term plan is to finish ati-vga which 
should have better guest drivers and better performance.

>> Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
>> ---
>>  hw/display/sm501.c | 40 +++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 37 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/display/sm501.c b/hw/display/sm501.c
>> index 3397ca9fbf..59693fbb5c 100644
>> --- a/hw/display/sm501.c
>> +++ b/hw/display/sm501.c
>> @@ -793,6 +793,25 @@ static void sm501_2d_operation(SM501State *s)
>>                  src_x == dst_x && src_y == dst_y) {
>>                  break;
>>              }
>> +            /* Some clients also do 1 pixel blits, avoid overhead for these */
>> +            if (width == 1 && height == 1) {
>> +                unsigned int si = (src_x + src_y * src_pitch) * (1 << format);
>> +                unsigned int di = (dst_x + dst_y * dst_pitch) * (1 << format);
>> +                switch (format) {
>> +                case 0:
>> +                    s->local_mem[dst_base + di] = s->local_mem[src_base + si];
>> +                    break;
>> +                case 1:
>> +                    *(uint16_t *)&s->local_mem[dst_base + di] =
>> +                                    *(uint16_t *)&s->local_mem[src_base + si];
>> +                    break;
>> +                case 2:
>> +                    *(uint32_t *)&s->local_mem[dst_base + di] =
>> +                                    *(uint32_t *)&s->local_mem[src_base + si];
>> +                    break;
>> +                }
>
> You could write this more compactly as
>                   stn_he_p(&s->local_mem[dst_base + di], 1 << format,
>                            ldn_he_p(&s->local_mem[src_base + si], 1
> << format));
>
> (which handles the length-cases for you and also doesn't rely on
> casting a uint8_t* giving you something correctly aligned for a
> wider access).

OK, I never know these and they are a bit hard to find because they are 
defined as glued together macros so grepping for it does not show the 
definition. Maybe adding a comment with the names where these are defined 
in bswap.h might help. stn_he_p seems to ultimately call __builtin_memcpy 
that probably does the same direct assignment for short values. I wonder 
how much overhead that has going through all the function calls but 
hopefully those are inlined.

Regards,
BALATON Zoltan

>> +                break;
>> +            }
>>              /* Check for overlaps, this could be made more exact */
>>              uint32_t sb, se, db, de;
>>              sb = src_base + src_x + src_y * (width + src_pitch);
>> @@ -841,9 +860,24 @@ static void sm501_2d_operation(SM501State *s)
>>              color = cpu_to_le16(color);
>>          }
>>
>> -        pixman_fill((uint32_t *)&s->local_mem[dst_base],
>> -                    dst_pitch * (1 << format) / sizeof(uint32_t),
>> -                    8 * (1 << format), dst_x, dst_y, width, height, color);
>> +        if (width == 1 && height == 1) {
>> +            unsigned int i = (dst_x + dst_y * dst_pitch) * (1 << format);
>> +            switch (format) {
>> +            case 0:
>> +                s->local_mem[dst_base + i] = color & 0xff;
>> +                break;
>> +            case 1:
>> +                *(uint16_t *)&s->local_mem[dst_base + i] = color & 0xffff;
>> +                break;
>> +            case 2:
>> +                *(uint32_t *)&s->local_mem[dst_base + i] = color;
>> +                break;
>> +            }
>
>               stn_he_p(&s->local_mem[dst_base + i], 1 << format, color);


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] sm501: Fix bounds checks
  2020-06-15 16:53       ` Peter Maydell
@ 2020-06-15 17:43         ` BALATON Zoltan
  0 siblings, 0 replies; 15+ messages in thread
From: BALATON Zoltan @ 2020-06-15 17:43 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Sebastian Bauer, Magnus Damm, QEMU Developers, Aurelien Jarno,
	Gerd Hoffmann

On Mon, 15 Jun 2020, Peter Maydell wrote:
> On Mon, 15 Jun 2020 at 17:40, BALATON Zoltan <balaton@eik.bme.hu> wrote:
>>
>> On Mon, 15 Jun 2020, Peter Maydell wrote:
>>> The calculations for sb/se/db/de all have a term which
>>> multiplies by (width + pitch), which makes me suspect
>>> they also need a similar fix ?
>>
>> Maybe. I'll have to check again. Actually is there a simpler way to check
>> if two rectangles overlap when they are given with base, x, y, w, h, pitch
>> where base is the first byte of the screen, pitch is length of one line
>> and x,y is coordinates of top left corner and w,h is dimensions of the
>> rect. Now that I think about it we also need to take into accounf the
>> bytes per pixel value (1 << format) because base is given in bytes while
>> others are in pixels so these formulae likely need some fixes. Pixman has
>> some functions for these but those assume common base so to use those we
>> would need to bring the two rectangles to common base which I could not
>> find out how to do. Probably this is really simple for someone who already
>> did a lot of these before.
>
> I think the thing that makes it particularly awkward is that
> the source and dest can have different pitches. That means it's

Yes, different bases and different pitches make it difficult to compare 
and also to transform them to a common base to be able to do rectangle 
overlap test (I haven't checked but maybe the pitches could be relative 
prime so it's possible there's no common pitch to transform them to a 
single view).

> not a simple "do two rectangles overlap" test because the dest
> area might not be a rectangle at all when looked at from the
> POV of the source.
>
> Do guests usually set src and dst pitch identical? If so it
> might be worth having a more accurate rectangle-overlap test
> for the common case and a looser check for the hard-to-handle
> case.

Quick test (just booted AmigaOS, opened some windows and moved them around 
a bit) shows same pitch is common but not always:

    7610 sm501 2d engine regs : write addr=10, val=200020
    4249 sm501 2d engine regs : write addr=10, val=4000400
    2340 sm501 2d engine regs : write addr=10, val=4000080
     462 sm501 2d engine regs : write addr=10, val=40000a0
     453 sm501 2d engine regs : write addr=10, val=800080
     434 sm501 2d engine regs : write addr=10, val=1200120
     233 sm501 2d engine regs : write addr=10, val=4000020
     162 sm501 2d engine regs : write addr=10, val=400040
     144 sm501 2d engine regs : write addr=10, val=200080
     141 sm501 2d engine regs : write addr=10, val=2000a0
      75 sm501 2d engine regs : write addr=10, val=600060
      62 sm501 2d engine regs : write addr=10, val=a000a0
      62 sm501 2d engine regs : write addr=10, val=200400
      38 sm501 2d engine regs : write addr=10, val=e000e0
      38 sm501 2d engine regs : write addr=10, val=1000100
      38 sm501 2d engine regs : write addr=10, val=10000a0
      37 sm501 2d engine regs : write addr=10, val=4000040
      36 sm501 2d engine regs : write addr=10, val=40000e0
      34 sm501 2d engine regs : write addr=10, val=a00080
      34 sm501 2d engine regs : write addr=10, val=4000120
      33 sm501 2d engine regs : write addr=10, val=400080
      25 sm501 2d engine regs : write addr=10, val=600080
      23 sm501 2d engine regs : write addr=10, val=4000060
      22 sm501 2d engine regs : write addr=10, val=1a001a0

and may depend on what the guest is doing so not sure it's worth the 
effort. (Probably same pitch is used when moving window and different when 
drawing something like text or icons but different pitch likely uses 
off-screen bitmap as source so overlap is not likely for those.)

> I might have a think about this and draw some diagrams tomorrow :-)

I'll wait for that to see if you can come up with something clever. I've 
tried to find a solution but could not come up with anything simpler than 
a loop checking lines which is probably as much effort as doing the blit 
itself and then doing blit with a temporary is probably the same so I went 
for just checking the obviously far apart blits (and even that seems to be 
easy to miss). So I'm happy if we can just fix the checks we have but if 
there are better suggestions please tell me.

Regards,
BALATON Zoltan


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-06-15 17:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-06 19:17 [PATCH 0/4] More sm501 fixes and optimisations BALATON Zoltan
2020-06-06 19:17 ` [PATCH 1/4] sm501: Fix bounds checks BALATON Zoltan
2020-06-15 15:23   ` Peter Maydell
2020-06-15 16:40     ` BALATON Zoltan
2020-06-15 16:53       ` Peter Maydell
2020-06-15 17:43         ` BALATON Zoltan
2020-06-06 19:17 ` [PATCH 4/4] sm501: Optimise 1 pixel 2d ops BALATON Zoltan
2020-06-15 15:42   ` Peter Maydell
2020-06-15 17:00     ` BALATON Zoltan
2020-06-06 19:17 ` [PATCH 2/4] sm501: Drop unneded variable BALATON Zoltan
2020-06-15 14:50   ` Peter Maydell
2020-06-06 19:17 ` [PATCH 3/4] sm501: Ignore no-op blits BALATON Zoltan
2020-06-15 14:56   ` Peter Maydell
2020-06-15 15:14     ` BALATON Zoltan
2020-06-12 13:10 ` [PATCH 0/4] More sm501 fixes and optimisations BALATON Zoltan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.