qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: BALATON Zoltan <balaton@eik.bme.hu>
To: Gerd Hoffmann <kraxel@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>,
	Sebastian Bauer <mail@sebastianbauer.info>,
	Magnus Damm <magnus.damm@gmail.com>,
	qemu-devel@nongnu.org, Aurelien Jarno <aurelien@aurel32.net>
Subject: Re: [PATCH v2 5/7] sm501: Replace hand written implementation with pixman where possible
Date: Tue, 26 May 2020 15:35:21 +0200 (CEST)	[thread overview]
Message-ID: <alpine.BSF.2.22.395.2005261434540.87757@zero.eik.bme.hu> (raw)
In-Reply-To: <20200526104318.wmsqqtia3h52l454@sirius.home.kraxel.org>

On Tue, 26 May 2020, Gerd Hoffmann wrote:
> On Thu, May 21, 2020 at 09:39:44PM +0200, BALATON Zoltan wrote:
>> Besides being faster this should also prevent malicious guests to
>> abuse 2D engine to overwrite data or cause a crash.
>
>>          uint32_t src_base = s->twoD_source_base & 0x03FFFFFF;
>> -        uint8_t *src = s->local_mem + src_base;
>
>> -                    val = *(_pixel_type *)&src[index_s];                      \
>
> Well, the advantage of *not* using pixman is that you can easily switch
> the code to use offsets instead of pointers, then apply the mask to the
> *final* offset to avoid oob data access:

The mask applied to src_base is not to prevent overflow but to implement 
register limits. Only these bits are valid if I remember correctly, so 
even if I use offsets I need to check for overflow. This patch basically 
does that by changing parameters to unsigned to prevent them being 
negative, checking values we multiply by to prevent them to be zero and 
then calculating first and last offset and check if they are within vram. 
(Unless of course I've made a mistake somewhere.) This should prevent 
overflow with one check and does not need to apply a mask at every step. 
The vram size can also be different so it's not a fixed mask anyway.

If not using pixman then I'd need to reimplement optimised 2D ops that 
will likely never be as good as pixman and no point in doing it several 
times for every device model so I'd rather try to use pixman where 
possible unless a better library is available.

>    val = *(_pixel_type*)(&s->local_mem[(s->twoD_source_base + index_s) & 0x03FFFFFF]);
>
>> +        if ((rop_mode && rop == 0x5) || (!rop_mode && rop == 0x55)) {
>> +            /* Invert dest, is there a way to do this with pixman? */
>
> PIXMAN_OP_XOR maybe?

Maybe, but looking at the pixman source I couldn't decide if

UN8x4_MUL_UN8_ADD_UN8x4_MUL_UN8 (s, dest_ia, d, src_ia);

seen here:
https://cgit.freedesktop.org/pixman/tree/pixman/pixman-combine32.c#n396
is really the same as s ^ d.

>> +            if (rtl && ((db >= sb && db <= se) || (de >= sb && de <= se))) {
>> +                /* regions may overlap: copy via temporary */
>
> The usual way for a hardware blitter is to have a direction bit, i.e.
> the guest os can ask to blit in top->bottom or bottom->top scanline
> ordering.  The guest can use that to make sure the blit does not

Yes, this is the rtl above (right to left) and AmigaOS sets this most of 
the time so only relying on that to detect overlaps is not efficient.

> overwrite things.  But note the guest can also intentionally use
> overlapping regions, i.e. memset(0) the first scanline, then use a blit
> with overlap to clear the whole screen.  The later will surely break if
> you blit via temporary image ...

Fortunately no guest code seems to do that so unless we find one needing 
it I don't worry much about such rare cases. It would be best if pixman 
supported this but while I've found patches were submitted they did not 
get merged so far so using a temporary seems to be the simplest way that 
works well enough for now.

>> +                pixman_blt((uint32_t *)&s->local_mem[src_base],
>> +                           (uint32_t *)&s->local_mem[dst_base],
>> +                           src_pitch * (1 << format) / sizeof(uint32_t),
>> +                           dst_pitch * (1 << format) / sizeof(uint32_t),
>> +                           8 * (1 << format), 8 * (1 << format),
>> +                           src_x, src_y, dst_x, dst_y, width, height);
>
> See above, i'm not convinced pixman is the best way here.
> When using pixman I'd suggest:
>
>  (1) src = pixman_image_create_bits_no_clear(...);
>  (2) dst = pixman_image_create_bits_no_clear(...);
>  (3) pixman_image_composite(PIXMAN_OP_SRC, src, NULL, dst, ...);
>  (4) pixman_image_unref(src);
>  (5) pixman_image_unref(dst);
>
> pixman_blt() is probably doing basically the same.

Actually not the same, pixman_blt is faster operating directly on pointers 
while we need all the pixman_image overhead to use pixman_image_composite. 
Blitter is used for a lot of small ops (I've seen AmigaOS even call it 
with 1 pixel regions) so going through pixman_image every time does not 
seem to be efficient. To implement more complex ops this may be needed so 
I may try to figure that out later but I'd need some test cases to test if 
the results are correct. The current patches do the same as before (except 
for some rare overlapping cases as you noted above but we haven't observed 
any yet) and fix the overflows so this was the best I could do in the time 
I had. Maybe I try to improve this later but don't plan to rewrite it now.

>  The advantage of not
> using pixman_blt() is that
>
>  (a) you can also use pixman ops other than PIXMAN_OP_SRC, and
>  (b) you can have a helper function for (1)+(2) which very carefully
>      applies sanity checks to make sure the pixman image created stays
>      completely inside s->local_mem.
>  (c) you have the option to completely rearrange the code flow, for
>      example update the src pixman image whenever the guest touches
>      src_base or src_pitch or format instead of having a
>      create/op/unref cycle on every blitter op.

From traces I think most guest would write bltter related regs on every op 
so probably not worth the hassle to try to update regions on register 
access and we could do it on every op, possibly optimising 1 pixel blits 
and small regions via some special cases but even then simple copy image 
is probably the most common op that might worth doing via pixman_blt as 
it's expected to be frequently used so the less overhead is the better. 
Therefore I'd only use image_composite for more complex ops but that's too 
much effort for a relatively unused device model. Maybe for ati-vga I'll 
try to make it better but first should fix microengine for that so drivers 
can talk to it. I'd rather spend my limited free time on that than further 
improving sm501 unless some bugs show up.

>> +        pixman_fill((uint32_t *)&s->local_mem[dst_base],
>> +                    dst_pitch * (1 << format) / sizeof(uint32_t),
>> +                    8 * (1 << format), dst_x, dst_y, width, height, color);
>
>  (1) src = pixman_image_create_solid(...), otherwise same as above ;)

Same argument as composite_image and for fill we don't even have any 
advantage so while for composite implementing other ops is a reason to not 
use pixman_blt I see no reason to not go the fastest way for fill.

Regards,
BALATON Zoltan


  reply	other threads:[~2020-05-26 13:38 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-21 19:39 [PATCH v2 0/7] Misc display/sm501 clean ups and fixes BALATON Zoltan
2020-05-21 19:39 ` [PATCH v2 4/7] sm501: Clean up local variables in sm501_2d_operation BALATON Zoltan
2020-05-21 19:39 ` [PATCH v2 6/7] sm501: Optimize small overlapping blits BALATON Zoltan
2020-05-21 19:39 ` [PATCH v2 1/7] sm501: Convert printf + abort to qemu_log_mask BALATON Zoltan
2020-05-21 19:39 ` [PATCH v2 2/7] sm501: Shorten long variable names in sm501_2d_operation BALATON Zoltan
2020-05-21 19:39 ` [PATCH v2 3/7] sm501: Use BIT(x) macro to shorten constant BALATON Zoltan
2020-05-21 19:39 ` [PATCH v2 7/7] sm501: Remove obsolete changelog and todo comment BALATON Zoltan
2020-05-21 19:39 ` [PATCH v2 5/7] sm501: Replace hand written implementation with pixman where possible BALATON Zoltan
2020-05-26 10:43   ` Gerd Hoffmann
2020-05-26 13:35     ` BALATON Zoltan [this message]
2020-05-27  9:15       ` Gerd Hoffmann
2020-05-27 11:05         ` BALATON Zoltan
2020-05-22  1:50 ` [PATCH v2 0/7] Misc display/sm501 clean ups and fixes no-reply
2020-05-28  7:21 ` Gerd Hoffmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.BSF.2.22.395.2005261434540.87757@zero.eik.bme.hu \
    --to=balaton@eik.bme.hu \
    --cc=aurelien@aurel32.net \
    --cc=kraxel@redhat.com \
    --cc=magnus.damm@gmail.com \
    --cc=mail@sebastianbauer.info \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).