[PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.

dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.
@ 2024-01-12 20:38 Ian Forbes
  2024-01-15  8:21 ` Thomas Zimmermann
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Ian Forbes @ 2024-01-12 20:38 UTC (permalink / raw)
  To: dri-devel
  Cc: Ian Forbes, maaz.mombasawala, bcm-kernel-feedback-list, martin.krastev

SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
means that modes with a final buffer size that would exceed graphics memory
must be pruned otherwise creation will fail.

Additionally, device commands which use multiple graphics resources must
have all their resources fit within graphics memory for the duration of the
command. Thus we need a small carve out of 1/4 of graphics memory to ensure
commands likes surface copies to the primary framebuffer for cursor
composition or damage clips can fit within graphics memory.

This fixes an issue where VMs with low graphics memory (< 64MiB) configured
with high resolution mode boot to a black screen because surface creation
fails.

Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
---
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index 28ff30e32fab..39d6d17fc488 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -2854,12 +2854,12 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
 	struct vmw_private *dev_priv = vmw_priv(dev);
 	u32 max_width = dev_priv->texture_max_width;
 	u32 max_height = dev_priv->texture_max_height;
-	u32 assumed_cpp = 4;
+	u32 assumed_cpp = dev_priv->assume_16bpp ? 2 : 4;
+	u32 pitch = mode->hdisplay * assumed_cpp;
+	u64 total = mode->vdisplay * pitch;
+	bool using_stdu = dev_priv->active_display_unit == vmw_du_screen_target;
 
-	if (dev_priv->assume_16bpp)
-		assumed_cpp = 2;
-
-	if (dev_priv->active_display_unit == vmw_du_screen_target) {
+	if (using_stdu) {
 		max_width  = min(dev_priv->stdu_max_width,  max_width);
 		max_height = min(dev_priv->stdu_max_height, max_height);
 	}
@@ -2870,9 +2870,13 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
 	if (max_height < mode->vdisplay)
 		return MODE_BAD_VVALUE;
 
-	if (!vmw_kms_validate_mode_vram(dev_priv,
-			mode->hdisplay * assumed_cpp,
-			mode->vdisplay))
+	if (using_stdu &&
+		(total > (dev_priv->max_mob_pages * PAGE_SIZE * 3 / 4) ||
+		total > dev_priv->max_mob_size)) {
+		return MODE_MEM;
+	}
+
+	if (!vmw_kms_validate_mode_vram(dev_priv, pitch, mode->vdisplay))
 		return MODE_MEM;
 
 	return MODE_OK;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.
  2024-01-12 20:38 [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory Ian Forbes
@ 2024-01-15  8:21 ` Thomas Zimmermann
  2024-01-18 18:25   ` Zack Rusin
  2024-01-30 19:38 ` Zack Rusin
  2024-02-01  5:13 ` [PATCH v2] " Ian Forbes
  2 siblings, 1 reply; 14+ messages in thread
From: Thomas Zimmermann @ 2024-01-15  8:21 UTC (permalink / raw)
  To: Ian Forbes, dri-devel
  Cc: maaz.mombasawala, bcm-kernel-feedback-list, martin.krastev


[-- Attachment #1.1: Type: text/plain, Size: 3381 bytes --]

Hi

Am 12.01.24 um 21:38 schrieb Ian Forbes:
> SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
> means that modes with a final buffer size that would exceed graphics memory
> must be pruned otherwise creation will fail.
> 
> Additionally, device commands which use multiple graphics resources must
> have all their resources fit within graphics memory for the duration of the
> command. Thus we need a small carve out of 1/4 of graphics memory to ensure
> commands likes surface copies to the primary framebuffer for cursor
> composition or damage clips can fit within graphics memory.
> 
> This fixes an issue where VMs with low graphics memory (< 64MiB) configured
> with high resolution mode boot to a black screen because surface creation
> fails.

That is a long-standing problem, which we have observed with other 
drivers as well. On low-memory devices, TTM doesn't play well. The real 
fix would be to export all modes that possibly fit and sort out the 
invalid configurations in atomic_check. It's just a lot more work.

Did you consider simply ignoring vmwgfx devices with less than 64 MiB of 
VRAM?

> 
> Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
> ---
>   drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 20 ++++++++++++--------
>   1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> index 28ff30e32fab..39d6d17fc488 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> @@ -2854,12 +2854,12 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
>   	struct vmw_private *dev_priv = vmw_priv(dev);
>   	u32 max_width = dev_priv->texture_max_width;
>   	u32 max_height = dev_priv->texture_max_height;
> -	u32 assumed_cpp = 4;
> +	u32 assumed_cpp = dev_priv->assume_16bpp ? 2 : 4;
> +	u32 pitch = mode->hdisplay * assumed_cpp;
> +	u64 total = mode->vdisplay * pitch;
> +	bool using_stdu = dev_priv->active_display_unit == vmw_du_screen_target;
>   
> -	if (dev_priv->assume_16bpp)
> -		assumed_cpp = 2;
> -
> -	if (dev_priv->active_display_unit == vmw_du_screen_target) {
> +	if (using_stdu) {
>   		max_width  = min(dev_priv->stdu_max_width,  max_width);
>   		max_height = min(dev_priv->stdu_max_height, max_height);
>   	}
> @@ -2870,9 +2870,13 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
>   	if (max_height < mode->vdisplay)
>   		return MODE_BAD_VVALUE;
>   
> -	if (!vmw_kms_validate_mode_vram(dev_priv,
> -			mode->hdisplay * assumed_cpp,
> -			mode->vdisplay))
> +	if (using_stdu &&
> +		(total > (dev_priv->max_mob_pages * PAGE_SIZE * 3 / 4) ||

IDK the details of vmwgfx's memory management, but instead of this '3/4 
test', wouldn't it be better to partition the VRAM via TTM and test 
against the partition sizes?

Best regards
Thomas

> +		total > dev_priv->max_mob_size)) {
> +		return MODE_MEM;
> +	}
> +
> +	if (!vmw_kms_validate_mode_vram(dev_priv, pitch, mode->vdisplay))
>   		return MODE_MEM;
>   
>   	return MODE_OK;

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.
  2024-01-15  8:21 ` Thomas Zimmermann
@ 2024-01-18 18:25   ` Zack Rusin
  2024-01-19  9:22     ` Thomas Zimmermann
  0 siblings, 1 reply; 14+ messages in thread
From: Zack Rusin @ 2024-01-18 18:25 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: Ian Forbes, maaz.mombasawala, bcm-kernel-feedback-list,
	dri-devel, martin.krastev

On Mon, Jan 15, 2024 at 3:21 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> Hi
>
> Am 12.01.24 um 21:38 schrieb Ian Forbes:
> > SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
> > means that modes with a final buffer size that would exceed graphics memory
> > must be pruned otherwise creation will fail.
> >
> > Additionally, device commands which use multiple graphics resources must
> > have all their resources fit within graphics memory for the duration of the
> > command. Thus we need a small carve out of 1/4 of graphics memory to ensure
> > commands likes surface copies to the primary framebuffer for cursor
> > composition or damage clips can fit within graphics memory.
> >
> > This fixes an issue where VMs with low graphics memory (< 64MiB) configured
> > with high resolution mode boot to a black screen because surface creation
> > fails.
>
> That is a long-standing problem, which we have observed with other
> drivers as well. On low-memory devices, TTM doesn't play well. The real
> fix would be to export all modes that possibly fit and sort out the
> invalid configurations in atomic_check. It's just a lot more work.
>
> Did you consider simply ignoring vmwgfx devices with less than 64 MiB of
> VRAM?

Unfortunately we can't do that because on new esx servers without
gpu's the default is 16MB. A lot of people are still running their esx
boxes with 4MB, which is in general the most common problem because
with 4MB people still tend to like to set 1280x800 which with 32bpp fb
takes 4096000 bytes and with 4MB available that leaves only 96KB
available and we need more to also allocate things like the cursor.
Even if ttm did everything right technically 1280x800 @ 32bpp
resolution will fit in a 4MB graphics memory, but then the system will
not be able to have a hardware (well, virtualized) cursor. It's
extremely unlikely people would even be aware of this tradeoff when
making the decision to increase resolution.

So the driver either needs to preallocate all the memory it possibly
might need for all the basic functionality before modesetting or the
available modes need to be validated with some constraints.

z

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.
  2024-01-18 18:25   ` Zack Rusin
@ 2024-01-19  9:22     ` Thomas Zimmermann
  2024-01-30 18:38       ` Zack Rusin
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Zimmermann @ 2024-01-19  9:22 UTC (permalink / raw)
  To: Zack Rusin
  Cc: Ian Forbes, maaz.mombasawala, bcm-kernel-feedback-list,
	dri-devel, martin.krastev


[-- Attachment #1.1: Type: text/plain, Size: 3038 bytes --]

Hi

Am 18.01.24 um 19:25 schrieb Zack Rusin:
> On Mon, Jan 15, 2024 at 3:21 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>
>> Hi
>>
>> Am 12.01.24 um 21:38 schrieb Ian Forbes:
>>> SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
>>> means that modes with a final buffer size that would exceed graphics memory
>>> must be pruned otherwise creation will fail.
>>>
>>> Additionally, device commands which use multiple graphics resources must
>>> have all their resources fit within graphics memory for the duration of the
>>> command. Thus we need a small carve out of 1/4 of graphics memory to ensure
>>> commands likes surface copies to the primary framebuffer for cursor
>>> composition or damage clips can fit within graphics memory.
>>>
>>> This fixes an issue where VMs with low graphics memory (< 64MiB) configured
>>> with high resolution mode boot to a black screen because surface creation
>>> fails.
>>
>> That is a long-standing problem, which we have observed with other
>> drivers as well. On low-memory devices, TTM doesn't play well. The real
>> fix would be to export all modes that possibly fit and sort out the
>> invalid configurations in atomic_check. It's just a lot more work.
>>
>> Did you consider simply ignoring vmwgfx devices with less than 64 MiB of
>> VRAM?
> 
> Unfortunately we can't do that because on new esx servers without
> gpu's the default is 16MB. A lot of people are still running their esx
> boxes with 4MB, which is in general the most common problem because
> with 4MB people still tend to like to set 1280x800 which with 32bpp fb
> takes 4096000 bytes and with 4MB available that leaves only 96KB
> available and we need more to also allocate things like the cursor.
> Even if ttm did everything right technically 1280x800 @ 32bpp
> resolution will fit in a 4MB graphics memory, but then the system will
> not be able to have a hardware (well, virtualized) cursor. It's
> extremely unlikely people would even be aware of this tradeoff when
> making the decision to increase resolution.

Do you allocate buffer storage directly in the provided VRAM? If so how 
do you do page flips then? You'd need for the example of 1280x800-32, 
you'd need around 8 MiB to keep front and back buffer in VRAM. I guess, 
you only support the framebuffer console (which doesn't do pageflips)?

In mgag200 and ast, I had the luxury for replacing TTM with SHMEM 
helpers, which worked around the problem easily. Maybe that's an option 
for low-memory systems?

Best regards
Thomas

> 
> So the driver either needs to preallocate all the memory it possibly
> might need for all the basic functionality before modesetting or the
> available modes need to be validated with some constraints.
> 
> z

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.
  2024-01-19  9:22     ` Thomas Zimmermann
@ 2024-01-30 18:38       ` Zack Rusin
  2024-01-30 23:50         ` Daniel Stone
  0 siblings, 1 reply; 14+ messages in thread
From: Zack Rusin @ 2024-01-30 18:38 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: Ian Forbes, maaz.mombasawala, bcm-kernel-feedback-list,
	dri-devel, martin.krastev

On Fri, Jan 19, 2024 at 4:22 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> Hi
>
> Am 18.01.24 um 19:25 schrieb Zack Rusin:
> > On Mon, Jan 15, 2024 at 3:21 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >>
> >> Hi
> >>
> >> Am 12.01.24 um 21:38 schrieb Ian Forbes:
> >>> SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
> >>> means that modes with a final buffer size that would exceed graphics memory
> >>> must be pruned otherwise creation will fail.
> >>>
> >>> Additionally, device commands which use multiple graphics resources must
> >>> have all their resources fit within graphics memory for the duration of the
> >>> command. Thus we need a small carve out of 1/4 of graphics memory to ensure
> >>> commands likes surface copies to the primary framebuffer for cursor
> >>> composition or damage clips can fit within graphics memory.
> >>>
> >>> This fixes an issue where VMs with low graphics memory (< 64MiB) configured
> >>> with high resolution mode boot to a black screen because surface creation
> >>> fails.
> >>
> >> That is a long-standing problem, which we have observed with other
> >> drivers as well. On low-memory devices, TTM doesn't play well. The real
> >> fix would be to export all modes that possibly fit and sort out the
> >> invalid configurations in atomic_check. It's just a lot more work.
> >>
> >> Did you consider simply ignoring vmwgfx devices with less than 64 MiB of
> >> VRAM?
> >
> > Unfortunately we can't do that because on new esx servers without
> > gpu's the default is 16MB. A lot of people are still running their esx
> > boxes with 4MB, which is in general the most common problem because
> > with 4MB people still tend to like to set 1280x800 which with 32bpp fb
> > takes 4096000 bytes and with 4MB available that leaves only 96KB
> > available and we need more to also allocate things like the cursor.
> > Even if ttm did everything right technically 1280x800 @ 32bpp
> > resolution will fit in a 4MB graphics memory, but then the system will
> > not be able to have a hardware (well, virtualized) cursor. It's
> > extremely unlikely people would even be aware of this tradeoff when
> > making the decision to increase resolution.
>
> Do you allocate buffer storage directly in the provided VRAM? If so how
> do you do page flips then? You'd need for the example of 1280x800-32,
> you'd need around 8 MiB to keep front and back buffer in VRAM. I guess,
> you only support the framebuffer console (which doesn't do pageflips)?

In general, yes. Of course it's a little more convoluted because we'll
act like OpenGL runtime here (i.e. glXSwapBuffers), i.e. our driver
will fake page-flips because the only memory we'll have is a single
buffer as the actual page-flipping happens in the presentation code on
the host. So the guest is not aware of the actual presentation (it's
also why we don't have any sort of vblank signaling in vmwgfx, the
concept just doesn't exist for us). i.e. on para-virtualized drivers
the actual page-flips will be property of the presentation code that's
outside of the guest. It's definitely one those things that I wanted
to have a good solution for in a while, in particular to have a better
story behind vblank handling, but it's difficult because
"presentation" on vm's is in general difficult to define - it might be
some vnc connected host on the other continent. Having said that
that's basically a wonky VRR display so we should be able to handle
our presentation as VRR and give more control of updates to the guest,
but we haven't done it yet.

> In mgag200 and ast, I had the luxury for replacing TTM with SHMEM
> helpers, which worked around the problem easily. Maybe that's an option
> for low-memory systems?

Our current device doesn't have the ability to present out of
unspecified memory in the guest, i.e. the host, which is doing the
presentation, is not aware of how guest kernel lays out the memory so
we need to basically create a page-table for every graphics object
(VMW_PL_MOB placement in vmwgfx) so that the host can actually find
the memory it needs to read. So the shmem helpers would need something
extra for us to be able to generate those page tables for the
drm_gem_object's it deals with.

z

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.
  2024-01-12 20:38 [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory Ian Forbes
  2024-01-15  8:21 ` Thomas Zimmermann
@ 2024-01-30 19:38 ` Zack Rusin
  2024-01-31  9:11   ` Thomas Zimmermann
  2024-02-01  5:13 ` [PATCH v2] " Ian Forbes
  2 siblings, 1 reply; 14+ messages in thread
From: Zack Rusin @ 2024-01-30 19:38 UTC (permalink / raw)
  To: Ian Forbes
  Cc: martin.krastev, bcm-kernel-feedback-list, dri-devel, maaz.mombasawala

On Fri, Jan 12, 2024 at 4:20 PM Ian Forbes <ian.forbes@broadcom.com> wrote:
>
> SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
> means that modes with a final buffer size that would exceed graphics memory
> must be pruned otherwise creation will fail.
>
> Additionally, device commands which use multiple graphics resources must
> have all their resources fit within graphics memory for the duration of the
> command. Thus we need a small carve out of 1/4 of graphics memory to ensure
> commands likes surface copies to the primary framebuffer for cursor
> composition or damage clips can fit within graphics memory.
>
> This fixes an issue where VMs with low graphics memory (< 64MiB) configured
> with high resolution mode boot to a black screen because surface creation
> fails.
>
> Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> index 28ff30e32fab..39d6d17fc488 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> @@ -2854,12 +2854,12 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
>         struct vmw_private *dev_priv = vmw_priv(dev);
>         u32 max_width = dev_priv->texture_max_width;
>         u32 max_height = dev_priv->texture_max_height;
> -       u32 assumed_cpp = 4;
> +       u32 assumed_cpp = dev_priv->assume_16bpp ? 2 : 4;
> +       u32 pitch = mode->hdisplay * assumed_cpp;
> +       u64 total = mode->vdisplay * pitch;
> +       bool using_stdu = dev_priv->active_display_unit == vmw_du_screen_target;
>
> -       if (dev_priv->assume_16bpp)
> -               assumed_cpp = 2;
> -
> -       if (dev_priv->active_display_unit == vmw_du_screen_target) {
> +       if (using_stdu) {
>                 max_width  = min(dev_priv->stdu_max_width,  max_width);
>                 max_height = min(dev_priv->stdu_max_height, max_height);
>         }
> @@ -2870,9 +2870,13 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
>         if (max_height < mode->vdisplay)
>                 return MODE_BAD_VVALUE;
>
> -       if (!vmw_kms_validate_mode_vram(dev_priv,
> -                       mode->hdisplay * assumed_cpp,
> -                       mode->vdisplay))
> +       if (using_stdu &&
> +               (total > (dev_priv->max_mob_pages * PAGE_SIZE * 3 / 4) ||

Could you export that computation somewhere where we could document
why we're doing this? Just to not leave the awkward "* 3 /4" that
everyone reading this code will wonder about?
And also make sure you indent this correctly, "dim checkpatch" should
warn about this.

z

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.
  2024-01-30 18:38       ` Zack Rusin
@ 2024-01-30 23:50         ` Daniel Stone
  2024-01-31  2:31           ` Zack Rusin
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Stone @ 2024-01-30 23:50 UTC (permalink / raw)
  To: Zack Rusin
  Cc: dri-devel, Ian Forbes, maaz.mombasawala,
	bcm-kernel-feedback-list, Thomas Zimmermann, martin.krastev

Hi,

On Tue, 30 Jan 2024 at 18:39, Zack Rusin <zack.rusin@broadcom.com> wrote:
> In general, yes. Of course it's a little more convoluted because we'll
> act like OpenGL runtime here (i.e. glXSwapBuffers), i.e. our driver
> will fake page-flips because the only memory we'll have is a single
> buffer as the actual page-flipping happens in the presentation code on
> the host. So the guest is not aware of the actual presentation (it's
> also why we don't have any sort of vblank signaling in vmwgfx, the
> concept just doesn't exist for us). i.e. on para-virtualized drivers
> the actual page-flips will be property of the presentation code that's
> outside of the guest. It's definitely one those things that I wanted
> to have a good solution for in a while, in particular to have a better
> story behind vblank handling, but it's difficult because
> "presentation" on vm's is in general difficult to define - it might be
> some vnc connected host on the other continent. Having said that
> that's basically a wonky VRR display so we should be able to handle
> our presentation as VRR and give more control of updates to the guest,
> but we haven't done it yet.

Please don't.

Photon time is _a_ useful metric, but only backwards-informational.
It's nice to give userspace a good forward estimate of when pixels
will hit retinas, but as it's not fully reliable, the main part is
being able to let it know when it did happen so it can adjust. Given
that it's not reliable, we can't use it as a basis for preparing
submissions though, so we don't, even on bare-metal drivers.

As you've noted though, it really falls apart on non-bare-metal cases,
especially where latency vastly exceeds throughput, or when either is
hugely variable. So we don't ever use it as a basis.

VRR is worse though. The FRR model is 'you can display new content
every $period, and here's your basis so you can calibrate phase'. The
VRR model is 'you can display new content so rapidly it's not worth
trying to quantise, just fire it as rapidly as possible'. That's a
world away from 'errrr ... might be 16ms, might be 500? dunno really'.

The entire model we have is that basis timing flows backwards. The
'hardware' gives us a deadline, KMS angles to meet that with a small
margin, the compositor angles to meet that with a margin again, and it
lines up client repaints to hit that window too. Everything works on
that model, so it's not super surprising that using svga is - to quote
one of Weston's DRM-backend people who uses ESXi - 'a juddery mess'.

Given that the entire ecosystem is based on this model, I don't think
there's an easy way out where svga just does something wildly
different. The best way to fix it is to probably work on predictable
quantisation with updates: pick 5/12/47/60Hz to quantise to based on
your current throughput, with something similar to hotplug/LINK_STATUS
and faked EDID to let userspace know when the period changes. If you
have variability within the cycle, e.g. dropped frames, then just suck
it up and keep the illusion alive to userspace that it's presenting to
a fixed period, and if/when you calculate there's a better
quantisation then let userspace know what it is so it can adjust.

But there's really no future in just doing random presentation rates,
because that's not the API anyone has written for.

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.
  2024-01-30 23:50         ` Daniel Stone
@ 2024-01-31  2:31           ` Zack Rusin
  2024-02-06 14:58             ` Daniel Stone
  0 siblings, 1 reply; 14+ messages in thread
From: Zack Rusin @ 2024-01-31  2:31 UTC (permalink / raw)
  To: Daniel Stone
  Cc: dri-devel, Ian Forbes, maaz.mombasawala,
	bcm-kernel-feedback-list, Thomas Zimmermann, martin.krastev

On Tue, Jan 30, 2024 at 6:50 PM Daniel Stone <daniel@fooishbar.org> wrote:
>
> Hi,
>
> On Tue, 30 Jan 2024 at 18:39, Zack Rusin <zack.rusin@broadcom.com> wrote:
> > In general, yes. Of course it's a little more convoluted because we'll
> > act like OpenGL runtime here (i.e. glXSwapBuffers), i.e. our driver
> > will fake page-flips because the only memory we'll have is a single
> > buffer as the actual page-flipping happens in the presentation code on
> > the host. So the guest is not aware of the actual presentation (it's
> > also why we don't have any sort of vblank signaling in vmwgfx, the
> > concept just doesn't exist for us). i.e. on para-virtualized drivers
> > the actual page-flips will be property of the presentation code that's
> > outside of the guest. It's definitely one those things that I wanted
> > to have a good solution for in a while, in particular to have a better
> > story behind vblank handling, but it's difficult because
> > "presentation" on vm's is in general difficult to define - it might be
> > some vnc connected host on the other continent. Having said that
> > that's basically a wonky VRR display so we should be able to handle
> > our presentation as VRR and give more control of updates to the guest,
> > but we haven't done it yet.
>
> Please don't.
>
> Photon time is _a_ useful metric, but only backwards-informational.
> It's nice to give userspace a good forward estimate of when pixels
> will hit retinas, but as it's not fully reliable, the main part is
> being able to let it know when it did happen so it can adjust. Given
> that it's not reliable, we can't use it as a basis for preparing
> submissions though, so we don't, even on bare-metal drivers.
>
> As you've noted though, it really falls apart on non-bare-metal cases,
> especially where latency vastly exceeds throughput, or when either is
> hugely variable. So we don't ever use it as a basis.
>
> VRR is worse though. The FRR model is 'you can display new content
> every $period, and here's your basis so you can calibrate phase'. The
> VRR model is 'you can display new content so rapidly it's not worth
> trying to quantise, just fire it as rapidly as possible'. That's a
> world away from 'errrr ... might be 16ms, might be 500? dunno really'.
>
> The entire model we have is that basis timing flows backwards. The
> 'hardware' gives us a deadline, KMS angles to meet that with a small
> margin, the compositor angles to meet that with a margin again, and it
> lines up client repaints to hit that window too. Everything works on
> that model, so it's not super surprising that using svga is - to quote
> one of Weston's DRM-backend people who uses ESXi - 'a juddery mess'.

That's very hurtful. Or it would be but of course you didn't believe
them because they're working on Weston so clearly don't make good
choices in general, right? The presentation on esxi is just as smooth
as it is by default on Ubuntu on new hardware...

> Given that the entire ecosystem is based on this model, I don't think
> there's an easy way out where svga just does something wildly
> different. The best way to fix it is to probably work on predictable
> quantisation with updates: pick 5/12/47/60Hz to quantise to based on
> your current throughput, with something similar to hotplug/LINK_STATUS
> and faked EDID to let userspace know when the period changes. If you
> have variability within the cycle, e.g. dropped frames, then just suck
> it up and keep the illusion alive to userspace that it's presenting to
> a fixed period, and if/when you calculate there's a better
> quantisation then let userspace know what it is so it can adjust.
>
> But there's really no future in just doing random presentation rates,
> because that's not the API anyone has written for.

See, my hope was that with vrr we could layer the weird remote
presentation semantics of virtualized guest on top of the same
infrastructure that would be used on real hardware. If you're saying
that it's not the way userspace will work, then yea, that doesn't
help. My issue, that's general for para-virtualized drivers, is that
any behavior that differs from hw drivers means that it's going to
break at some point, we see that even for basic things like the
update-layout hotplug events that have been largely standardized for
many years. I'm assuming that refresh-rate-changed will result in the
same regressions, but fwiw if I can implement FRR correctly and punt
any issues that arise due to changes in the FRR as issues in userspace
then that does make my life a lot easier, so I'm not going to object
to that.

z

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.
  2024-01-30 19:38 ` Zack Rusin
@ 2024-01-31  9:11   ` Thomas Zimmermann
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Zimmermann @ 2024-01-31  9:11 UTC (permalink / raw)
  To: Zack Rusin, Ian Forbes
  Cc: bcm-kernel-feedback-list, martin.krastev, dri-devel, maaz.mombasawala


[-- Attachment #1.1: Type: text/plain, Size: 3664 bytes --]

Hi Zack

Am 30.01.24 um 20:38 schrieb Zack Rusin:
> On Fri, Jan 12, 2024 at 4:20 PM Ian Forbes <ian.forbes@broadcom.com> wrote:
>>
>> SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
>> means that modes with a final buffer size that would exceed graphics memory
>> must be pruned otherwise creation will fail.
>>
>> Additionally, device commands which use multiple graphics resources must
>> have all their resources fit within graphics memory for the duration of the
>> command. Thus we need a small carve out of 1/4 of graphics memory to ensure
>> commands likes surface copies to the primary framebuffer for cursor
>> composition or damage clips can fit within graphics memory.
>>
>> This fixes an issue where VMs with low graphics memory (< 64MiB) configured
>> with high resolution mode boot to a black screen because surface creation
>> fails.
>>
>> Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
>> ---
>>   drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 20 ++++++++++++--------
>>   1 file changed, 12 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
>> index 28ff30e32fab..39d6d17fc488 100644
>> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
>> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
>> @@ -2854,12 +2854,12 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
>>          struct vmw_private *dev_priv = vmw_priv(dev);
>>          u32 max_width = dev_priv->texture_max_width;
>>          u32 max_height = dev_priv->texture_max_height;
>> -       u32 assumed_cpp = 4;
>> +       u32 assumed_cpp = dev_priv->assume_16bpp ? 2 : 4;
>> +       u32 pitch = mode->hdisplay * assumed_cpp;
>> +       u64 total = mode->vdisplay * pitch;
>> +       bool using_stdu = dev_priv->active_display_unit == vmw_du_screen_target;
>>
>> -       if (dev_priv->assume_16bpp)
>> -               assumed_cpp = 2;
>> -
>> -       if (dev_priv->active_display_unit == vmw_du_screen_target) {
>> +       if (using_stdu) {
>>                  max_width  = min(dev_priv->stdu_max_width,  max_width);
>>                  max_height = min(dev_priv->stdu_max_height, max_height);
>>          }
>> @@ -2870,9 +2870,13 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
>>          if (max_height < mode->vdisplay)
>>                  return MODE_BAD_VVALUE;
>>
>> -       if (!vmw_kms_validate_mode_vram(dev_priv,
>> -                       mode->hdisplay * assumed_cpp,
>> -                       mode->vdisplay))
>> +       if (using_stdu &&
>> +               (total > (dev_priv->max_mob_pages * PAGE_SIZE * 3 / 4) ||
> 
> Could you export that computation somewhere where we could document
> why we're doing this? Just to not leave the awkward "* 3 /4" that
> everyone reading this code will wonder about?

There's code in VRAM helpers that does a similar test. [1] But the VRAM 
helpers are obsolete. I guess that TTM could contain such a test 
somewhere by testing against a max_bo_size for each TTM resource. 
Whether that is feasible in practice is a different question. Maybe ask 
the TTM maintainers.

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_gem_vram_helper.c#L1156

> And also make sure you indent this correctly, "dim checkpatch" should
> warn about this.
> 
> z

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.
  2024-01-12 20:38 [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory Ian Forbes
  2024-01-15  8:21 ` Thomas Zimmermann
  2024-01-30 19:38 ` Zack Rusin
@ 2024-02-01  5:13 ` Ian Forbes
  2024-02-02 19:29   ` Zack Rusin
  2 siblings, 1 reply; 14+ messages in thread
From: Ian Forbes @ 2024-02-01  5:13 UTC (permalink / raw)
  To: dri-devel
  Cc: bcm-kernel-feedback-list, maaz.mombasawala, martin.krastev,
	zack.rusin, Ian Forbes

SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
means that modes with a final buffer size that would exceed graphics memory
must be pruned otherwise creation will fail.

Additionally, device commands which use multiple graphics resources must
have all their resources fit within graphics memory for the duration of the
command. Thus we need a small carve out of 1/4 of graphics memory to ensure
commands likes surface copies to the primary framebuffer for cursor
composition or damage clips can fit within graphics memory.

This fixes an issue where VMs with low graphics memory (< 64MiB) configured
with high resolution mode boot to a black screen because surface creation
fails.

Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
---
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index cd4925346ed4..84e1b765cda3 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -2858,12 +2858,17 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
 	struct vmw_private *dev_priv = vmw_priv(dev);
 	u32 max_width = dev_priv->texture_max_width;
 	u32 max_height = dev_priv->texture_max_height;
-	u32 assumed_cpp = 4;
-
-	if (dev_priv->assume_16bpp)
-		assumed_cpp = 2;
+	u32 assumed_cpp = dev_priv->assume_16bpp ? 2 : 4;
+	u32 pitch = mode->hdisplay * assumed_cpp;
+	u64 total = mode->vdisplay * pitch;
+	bool using_stdu = dev_priv->active_display_unit == vmw_du_screen_target;
+	u64 max_mem_for_st = dev_priv->max_mob_pages * PAGE_SIZE * 3 / 4;
+	/* ^^^ Max memory for the mode fb when using Screen Target / MOBs.
+	 * We need a carveout (1/4) to account for other gfx resources that are
+	 * required in gfx mem for an fb update to complete with low gfx mem (<64MiB).
+	 */
 
-	if (dev_priv->active_display_unit == vmw_du_screen_target) {
+	if (using_stdu) {
 		max_width  = min(dev_priv->stdu_max_width,  max_width);
 		max_height = min(dev_priv->stdu_max_height, max_height);
 	}
@@ -2874,9 +2879,10 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
 	if (max_height < mode->vdisplay)
 		return MODE_BAD_VVALUE;
 
-	if (!vmw_kms_validate_mode_vram(dev_priv,
-					mode->hdisplay * assumed_cpp,
-					mode->vdisplay))
+	if (using_stdu && (total > max_mem_for_st || total > dev_priv->max_mob_size))
+		return MODE_MEM;
+
+	if (!vmw_kms_validate_mode_vram(dev_priv, pitch, mode->vdisplay))
 		return MODE_MEM;
 
 	return MODE_OK;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.
  2024-02-01  5:13 ` [PATCH v2] " Ian Forbes
@ 2024-02-02 19:29   ` Zack Rusin
  2024-02-06 21:30     ` Ian Forbes
  0 siblings, 1 reply; 14+ messages in thread
From: Zack Rusin @ 2024-02-02 19:29 UTC (permalink / raw)
  To: Ian Forbes
  Cc: dri-devel, bcm-kernel-feedback-list, maaz.mombasawala, martin.krastev

On Fri, Feb 2, 2024 at 11:58 AM Ian Forbes <ian.forbes@broadcom.com> wrote:
>
> SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
> means that modes with a final buffer size that would exceed graphics memory
> must be pruned otherwise creation will fail.

Sorry, I didn't notice this originally but that's not quite true. svga
doesn't require all mob memory to stay within max_mob_pages (which is
SVGA_REG_GBOBJECT_MEM_SIZE_KB). max_mob_pages is really max resident
memory or suggested-guest-memory-for-best-performance. we can grow
that memory (and we do). I think what's causing problems on systems
with low memory is that cursor mobs and the fb's need to be both
resident but can't. Now SVGA_REG_MAX_PRIMARY_MEM is the max memory in
which our topology needs to fit in (which is max_primary_mem on
vmwgfx) but afaict that's not the issue here and it's checked later in
vmw_kms_validate_mode_vram

> Additionally, device commands which use multiple graphics resources must
> have all their resources fit within graphics memory for the duration of the
> command. Thus we need a small carve out of 1/4 of graphics memory to ensure
> commands likes surface copies to the primary framebuffer for cursor
> composition or damage clips can fit within graphics memory.

Yes, we should probably rename max_mob_pages to max_resident_memory
instead to make this obvious.

> This fixes an issue where VMs with low graphics memory (< 64MiB) configured
> with high resolution mode boot to a black screen because surface creation
> fails.

Does this work if you disable gbobjects? Without gbobject's we won't
have screen targets and thus won't be offsetting by 1/4 so I wonder if
4mb vram with legacy display would work with 1280x800 resolution.

Also, you want to add a "V2" section to your change to describe what
changed in v2 vs v1 (and same for any subsequent change).

>
> Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> index cd4925346ed4..84e1b765cda3 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> @@ -2858,12 +2858,17 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
>         struct vmw_private *dev_priv = vmw_priv(dev);
>         u32 max_width = dev_priv->texture_max_width;
>         u32 max_height = dev_priv->texture_max_height;
> -       u32 assumed_cpp = 4;
> -
> -       if (dev_priv->assume_16bpp)
> -               assumed_cpp = 2;
> +       u32 assumed_cpp = dev_priv->assume_16bpp ? 2 : 4;
> +       u32 pitch = mode->hdisplay * assumed_cpp;
> +       u64 total = mode->vdisplay * pitch;
> +       bool using_stdu = dev_priv->active_display_unit == vmw_du_screen_target;
> +       u64 max_mem_for_st = dev_priv->max_mob_pages * PAGE_SIZE * 3 / 4;
> +       /* ^^^ Max memory for the mode fb when using Screen Target / MOBs.
> +        * We need a carveout (1/4) to account for other gfx resources that are
> +        * required in gfx mem for an fb update to complete with low gfx mem (<64MiB).
> +        */

Same wording issue as mentioned above and lets use normal comment
style (i.e. comments attach to the code below). max_mem_for_st should
probably be max_mem_for_mode or max_mem_for_mode_st.

> -       if (dev_priv->active_display_unit == vmw_du_screen_target) {
> +       if (using_stdu) {
>                 max_width  = min(dev_priv->stdu_max_width,  max_width);
>                 max_height = min(dev_priv->stdu_max_height, max_height);
>         }
> @@ -2874,9 +2879,10 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
>         if (max_height < mode->vdisplay)
>                 return MODE_BAD_VVALUE;
>
> -       if (!vmw_kms_validate_mode_vram(dev_priv,
> -                                       mode->hdisplay * assumed_cpp,
> -                                       mode->vdisplay))
> +       if (using_stdu && (total > max_mem_for_st || total > dev_priv->max_mob_size))
> +               return MODE_MEM;
> +
> +       if (!vmw_kms_validate_mode_vram(dev_priv, pitch, mode->vdisplay))
>                 return MODE_MEM;

It might make sense to just reuse vmw_kms_validate_mode_vram , it does
what we're claiming to do here and even though it's called
vmw_kms_validate_mode_vram it does actually validate st primary
memory.

z

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.
  2024-01-31  2:31           ` Zack Rusin
@ 2024-02-06 14:58             ` Daniel Stone
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Stone @ 2024-02-06 14:58 UTC (permalink / raw)
  To: Zack Rusin
  Cc: Thomas Zimmermann, Ian Forbes, maaz.mombasawala,
	bcm-kernel-feedback-list, dri-devel, martin.krastev

On Wed, 31 Jan 2024 at 02:31, Zack Rusin <zack.rusin@broadcom.com> wrote:
> On Tue, Jan 30, 2024 at 6:50 PM Daniel Stone <daniel@fooishbar.org> wrote:
> > The entire model we have is that basis timing flows backwards. The
> > 'hardware' gives us a deadline, KMS angles to meet that with a small
> > margin, the compositor angles to meet that with a margin again, and it
> > lines up client repaints to hit that window too. Everything works on
> > that model, so it's not super surprising that using svga is - to quote
> > one of Weston's DRM-backend people who uses ESXi - 'a juddery mess'.
>
> That's very hurtful. Or it would be but of course you didn't believe
> them because they're working on Weston so clearly don't make good
> choices in general, right? The presentation on esxi is just as smooth
> as it is by default on Ubuntu on new hardware...

Yeah sorry, that wasn't a 'VMware is bad' dig, that was a 'oh that
explains so much if you're deliberately doing the other thing'
realisation. I'm not suggesting anyone else use my life choices as a
template :)

> > Given that the entire ecosystem is based on this model, I don't think
> > there's an easy way out where svga just does something wildly
> > different. The best way to fix it is to probably work on predictable
> > quantisation with updates: pick 5/12/47/60Hz to quantise to based on
> > your current throughput, with something similar to hotplug/LINK_STATUS
> > and faked EDID to let userspace know when the period changes. If you
> > have variability within the cycle, e.g. dropped frames, then just suck
> > it up and keep the illusion alive to userspace that it's presenting to
> > a fixed period, and if/when you calculate there's a better
> > quantisation then let userspace know what it is so it can adjust.
> >
> > But there's really no future in just doing random presentation rates,
> > because that's not the API anyone has written for.
>
> See, my hope was that with vrr we could layer the weird remote
> presentation semantics of virtualized guest on top of the same
> infrastructure that would be used on real hardware. If you're saying
> that it's not the way userspace will work, then yea, that doesn't
> help. My issue, that's general for para-virtualized drivers, is that
> any behavior that differs from hw drivers means that it's going to
> break at some point, we see that even for basic things like the
> update-layout hotplug events that have been largely standardized for
> many years. I'm assuming that refresh-rate-changed will result in the
> same regressions, but fwiw if I can implement FRR correctly and punt
> any issues that arise due to changes in the FRR as issues in userspace
> then that does make my life a lot easier, so I'm not going to object
> to that.

Yeah, I think that's the best way forward ... modelling the full
pipeline in all its glory starts to look way less like KMS and way
more like something like GStreamer. Trying to encapsulate all that
reasonably in the kernel would've required - at the very least - a
KMS-side queue with target display times in order to be at all useful,
and that seemed like way too much complexity when the majority of
hardware could be handled with 'I'll fire an ioctl at you and you
update at the next 16ms boundary'.

I'd be super happy to review any uAPI extensions which added feedback
to userspace to let it know that the optimal presentation cadence had
changed.

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.
  2024-02-02 19:29   ` Zack Rusin
@ 2024-02-06 21:30     ` Ian Forbes
  2024-02-07 18:03       ` Zack Rusin
  0 siblings, 1 reply; 14+ messages in thread
From: Ian Forbes @ 2024-02-06 21:30 UTC (permalink / raw)
  To: Zack Rusin
  Cc: dri-devel, bcm-kernel-feedback-list, maaz.mombasawala, martin.krastev

So the issue is that SVGA_3D_CMD_DX_PRED_COPY_REGION between 2
surfaces that are the size of the mode fails. Technically for this to
work the filter will have to be 1/2 of graphics mem. I was just lucky
that the next mode in the list was already less than 1/2. 3/4 is not
actually going to work. Also this only happens on X/Gnome and seems
more like an issue with the compositor. Wayland/Gnome displays the
desktop but it's unusable and glitches even with the 1/2 limit. I
don't think wayland even abides by the mode limits as I see it trying
to create surfaces larger than the mode. It might be using texture
limits instead.

On Fri, Feb 2, 2024 at 1:29 PM Zack Rusin <zack.rusin@broadcom.com> wrote:
>
> On Fri, Feb 2, 2024 at 11:58 AM Ian Forbes <ian.forbes@broadcom.com> wrote:
> >
> > SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
> > means that modes with a final buffer size that would exceed graphics memory
> > must be pruned otherwise creation will fail.
>
> Sorry, I didn't notice this originally but that's not quite true. svga
> doesn't require all mob memory to stay within max_mob_pages (which is
> SVGA_REG_GBOBJECT_MEM_SIZE_KB). max_mob_pages is really max resident
> memory or suggested-guest-memory-for-best-performance. we can grow
> that memory (and we do). I think what's causing problems on systems
> with low memory is that cursor mobs and the fb's need to be both
> resident but can't. Now SVGA_REG_MAX_PRIMARY_MEM is the max memory in
> which our topology needs to fit in (which is max_primary_mem on
> vmwgfx) but afaict that's not the issue here and it's checked later in
> vmw_kms_validate_mode_vram
>
> > Additionally, device commands which use multiple graphics resources must
> > have all their resources fit within graphics memory for the duration of the
> > command. Thus we need a small carve out of 1/4 of graphics memory to ensure
> > commands likes surface copies to the primary framebuffer for cursor
> > composition or damage clips can fit within graphics memory.
>
> Yes, we should probably rename max_mob_pages to max_resident_memory
> instead to make this obvious.
>
> > This fixes an issue where VMs with low graphics memory (< 64MiB) configured
> > with high resolution mode boot to a black screen because surface creation
> > fails.
>
> Does this work if you disable gbobjects? Without gbobject's we won't
> have screen targets and thus won't be offsetting by 1/4 so I wonder if
> 4mb vram with legacy display would work with 1280x800 resolution.
>
> Also, you want to add a "V2" section to your change to describe what
> changed in v2 vs v1 (and same for any subsequent change).
>
> >
> > Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
> > ---
> >  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 22 ++++++++++++++--------
> >  1 file changed, 14 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> > index cd4925346ed4..84e1b765cda3 100644
> > --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> > +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> > @@ -2858,12 +2858,17 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
> >         struct vmw_private *dev_priv = vmw_priv(dev);
> >         u32 max_width = dev_priv->texture_max_width;
> >         u32 max_height = dev_priv->texture_max_height;
> > -       u32 assumed_cpp = 4;
> > -
> > -       if (dev_priv->assume_16bpp)
> > -               assumed_cpp = 2;
> > +       u32 assumed_cpp = dev_priv->assume_16bpp ? 2 : 4;
> > +       u32 pitch = mode->hdisplay * assumed_cpp;
> > +       u64 total = mode->vdisplay * pitch;
> > +       bool using_stdu = dev_priv->active_display_unit == vmw_du_screen_target;
> > +       u64 max_mem_for_st = dev_priv->max_mob_pages * PAGE_SIZE * 3 / 4;
> > +       /* ^^^ Max memory for the mode fb when using Screen Target / MOBs.
> > +        * We need a carveout (1/4) to account for other gfx resources that are
> > +        * required in gfx mem for an fb update to complete with low gfx mem (<64MiB).
> > +        */
>
> Same wording issue as mentioned above and lets use normal comment
> style (i.e. comments attach to the code below). max_mem_for_st should
> probably be max_mem_for_mode or max_mem_for_mode_st.
>
> > -       if (dev_priv->active_display_unit == vmw_du_screen_target) {
> > +       if (using_stdu) {
> >                 max_width  = min(dev_priv->stdu_max_width,  max_width);
> >                 max_height = min(dev_priv->stdu_max_height, max_height);
> >         }
> > @@ -2874,9 +2879,10 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
> >         if (max_height < mode->vdisplay)
> >                 return MODE_BAD_VVALUE;
> >
> > -       if (!vmw_kms_validate_mode_vram(dev_priv,
> > -                                       mode->hdisplay * assumed_cpp,
> > -                                       mode->vdisplay))
> > +       if (using_stdu && (total > max_mem_for_st || total > dev_priv->max_mob_size))
> > +               return MODE_MEM;
> > +
> > +       if (!vmw_kms_validate_mode_vram(dev_priv, pitch, mode->vdisplay))
> >                 return MODE_MEM;
>
> It might make sense to just reuse vmw_kms_validate_mode_vram , it does
> what we're claiming to do here and even though it's called
> vmw_kms_validate_mode_vram it does actually validate st primary
> memory.
>
> z

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.
  2024-02-06 21:30     ` Ian Forbes
@ 2024-02-07 18:03       ` Zack Rusin
  0 siblings, 0 replies; 14+ messages in thread
From: Zack Rusin @ 2024-02-07 18:03 UTC (permalink / raw)
  To: Ian Forbes
  Cc: dri-devel, bcm-kernel-feedback-list, maaz.mombasawala, martin.krastev

On Tue, Feb 6, 2024 at 4:30 PM Ian Forbes <ian.forbes@broadcom.com> wrote:
>
> So the issue is that SVGA_3D_CMD_DX_PRED_COPY_REGION between 2
> surfaces that are the size of the mode fails. Technically for this to
> work the filter will have to be 1/2 of graphics mem. I was just lucky
> that the next mode in the list was already less than 1/2. 3/4 is not
> actually going to work. Also this only happens on X/Gnome and seems
> more like an issue with the compositor. Wayland/Gnome displays the
> desktop but it's unusable and glitches even with the 1/2 limit. I
> don't think wayland even abides by the mode limits as I see it trying
> to create surfaces larger than the mode. It might be using texture
> limits instead.

So the SVGA_3D_CMD_DX_PRED_COPY_REGION is only available with dx
contexts/3d enabled/gb surfaces. With 3d or gb objects disabled we
should fall back to legacy display and that command shouldn't have
been used. Is that the case? Does it work with 3d/gb objects disabled?

There's a few bugs there:
- SVGA_3D_CMD_DX_PRED_COPY_REGION should only come from userspace, the
userspace should validate that the max amount of resident memory
hasn't been exceeded before issuing those copies
- vmwgfx should be a lot better about determining whether the amount
of resident memory required by the current command buffers hasn't been
exceeded
- In case of high memory pressure vmwgfx should explicitly disable 3d
support. There's no way to run 3d workloads with anything less than
64mb of ram especially given that we do not adjust our texture limits
and they will remain either 4k, 8k or more depending on what we're
running on.

But those are secondary to making resolution switch work correctly on
basic system, i.e.:
1) Disable 3D and gb objects
2) Check if in the kernel log vmwgx says that it's using "legacy display"
3) Check if the resolution switching works correctly
4) If not lets fix that first (fix #1)
5) Disable 3D and keep gb objects active
6) Check that the kernel log select "screen target display unit" and
have 3d disabled (i.e. no SVGA_3D_CMD_DX_PRED_COPY_REGION is coming
through)
7) If that doesn't work lets fix that next (fix #2)
8) Enabled 3d and gb objects (your current default)
9) Check if max_mob_pages (i.e. max_resident_memory) is smaller than
what we'd need to hold even a single a texture limits * 4bpp, print a
warning and disable 3d (this should bring us in line with what we
fixed in point #7) (fix #3)

So basically we want to make sure that on vmwgfx all three
configurations work: 1) 3d and gb objects disabled, 2) 3d disabled, gb
objects enabled, 3) 3d and gb object enabled.

z

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-02-07 18:04 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-12 20:38 [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory Ian Forbes
2024-01-15  8:21 ` Thomas Zimmermann
2024-01-18 18:25   ` Zack Rusin
2024-01-19  9:22     ` Thomas Zimmermann
2024-01-30 18:38       ` Zack Rusin
2024-01-30 23:50         ` Daniel Stone
2024-01-31  2:31           ` Zack Rusin
2024-02-06 14:58             ` Daniel Stone
2024-01-30 19:38 ` Zack Rusin
2024-01-31  9:11   ` Thomas Zimmermann
2024-02-01  5:13 ` [PATCH v2] " Ian Forbes
2024-02-02 19:29   ` Zack Rusin
2024-02-06 21:30     ` Ian Forbes
2024-02-07 18:03       ` Zack Rusin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).