All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Add video damage tracking
@ 2022-06-06 23:43 Alexander Graf
  2022-06-06 23:43 ` [PATCH 1/6] dm: video: Add damage tracking API Alexander Graf
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Alexander Graf @ 2022-06-06 23:43 UTC (permalink / raw)
  To: u-boot
  Cc: Heinrich Schuchardt, Anatolij Gustschin, Simon Glass,
	Matthias Brugger, Da Xue

This patch set speeds up graphics output on ARM by a factor of 60x.

On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
but need it accessible by the display controller which reads directly
from a later point of consistency. Hence, we flush the frame buffer to
DRAM on every change. The full frame buffer.

Unfortunately, with the advent of 4k displays, we are seeing frame buffers
that can take a while to flush out. This was reported by Da Xue with grub,
which happily print 1000s of spaces on the screen to draw a menu. Every
printed space triggers a cache flush.

This patch set implements the easiest mitigation against this problem:
Damage tracking. We remember the lowest common denominator region that was
touched since the last video_sync() call and only flush that.

With this patch set applied, we reduce drawing a large grub menu (with
serial console attached for size information) on an RK3399-ROC system
at 1440p from 55 seconds to less than 1 second.


Alternatives considered:

  1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
     so often. We are missing timers to do this generically.

  2) Double buffering - We could try to identify whether anything changed
     at all and only draw to the FB if it did. That would require
     maintaining a second buffer that we need to scan.

  3) Text buffer - Maintain a buffer of all text printed on the screen with
     respective location. Don't write if the old and new character are
     identical. This would limit applicability to text only and is an
     optimization on top of this patch set.

  4) Hash screen lines - Create a hash (sha256?) over every line when it
     changes. Only flush when it does. I'm not sure if this would waste
     more time, memory and cache than the current approach. It would make
     full screen updates much more expensive.

Alexander Graf (6):
  dm: video: Add damage tracking API
  dm: video: Add damage notification on display clear
  vidconsole: Add damage notifications to all vidconsole drivers
  video: Add damage notification on bmp display
  efi_loader: GOP: Add damage notification on BLT
  video: Only dcache flush damaged lines

 drivers/video/Kconfig            | 15 ++++++
 drivers/video/console_normal.c   | 10 ++++
 drivers/video/console_rotate.c   | 18 +++++++
 drivers/video/console_truetype.c | 12 +++++
 drivers/video/video-uclass.c     | 89 +++++++++++++++++++++++++++++---
 drivers/video/video_bmp.c        |  2 +
 include/video.h                  | 39 +++++++++++++-
 lib/efi_loader/efi_gop.c         | 11 ++++
 8 files changed, 187 insertions(+), 9 deletions(-)

-- 
2.32.1 (Apple Git-133)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/6] dm: video: Add damage tracking API
  2022-06-06 23:43 [PATCH 0/6] Add video damage tracking Alexander Graf
@ 2022-06-06 23:43 ` Alexander Graf
  2022-06-06 23:43 ` [PATCH 2/6] dm: video: Add damage notification on display clear Alexander Graf
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Alexander Graf @ 2022-06-06 23:43 UTC (permalink / raw)
  To: u-boot
  Cc: Heinrich Schuchardt, Anatolij Gustschin, Simon Glass,
	Matthias Brugger, Da Xue

We are going to introduce image damage tracking to fasten up screen
refresh on large displays. This patch adds damage tracking for up to
one rectangle of the screen which is typically enough to hold blt or
text print updates. Callers into this API and a reduced dcache flush
code path will follow in later patches.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reported-by: Da Xue <da@libre.computer>
---
 drivers/video/Kconfig        | 15 ++++++++++++++
 drivers/video/video-uclass.c | 40 ++++++++++++++++++++++++++++++++++++
 include/video.h              | 39 +++++++++++++++++++++++++++++++++--
 3 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index 965b587927..9e1c409b37 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -64,6 +64,21 @@ config VIDEO_COPY
 	  To use this, your video driver must set @copy_base in
 	  struct video_uc_plat.
 
+config VIDEO_DAMAGE
+	bool "Enable damage tracking of frame buffer regions"
+	depends on DM_VIDEO
+	default y if ARM && !SYS_DCACHE_OFF
+	help
+	  On some machines (most ARM), the display frame buffer resides in
+	  RAM. To make the display controller pick up screen updates, we
+	  have to flush frame buffer contents from CPU caches into RAM which
+	  can be a slow operation.
+
+	  This patch adds damage tracking to collect information about regions
+	  that received updates. When we want to sync, we then only flush
+	  regions of the frame buffer that were modified before, speeding up
+	  screen refreshes significantly.
+
 config BACKLIGHT_PWM
 	bool "Generic PWM based Backlight Driver"
 	depends on BACKLIGHT && DM_PWM
diff --git a/drivers/video/video-uclass.c b/drivers/video/video-uclass.c
index 01e8af5ac6..496aa56843 100644
--- a/drivers/video/video-uclass.c
+++ b/drivers/video/video-uclass.c
@@ -21,6 +21,8 @@
 #include <dm/device_compat.h>
 #include <dm/device-internal.h>
 #include <dm/uclass-internal.h>
+#include <linux/types.h>
+#include <linux/bitmap.h>
 #ifdef CONFIG_SANDBOX
 #include <asm/sdl.h>
 #endif
@@ -180,6 +182,44 @@ void video_set_default_colors(struct udevice *dev, bool invert)
 	priv->colour_bg = vid_console_color(priv, back);
 }
 
+#ifdef CONFIG_VIDEO_DAMAGE
+/* Notify about changes in the frame buffer */
+int video_damage(struct udevice *vid, int x, int y, int width, int height)
+{
+	struct video_priv *priv = dev_get_uclass_priv(vid);
+	int endx = x + width;
+	int endy = y + height;
+
+	if (x > priv->xsize)
+		return 0;
+
+	if (y > priv->ysize)
+		return 0;
+
+	if (endx > priv->xsize)
+		endx = priv->xsize;
+
+	if (endy > priv->ysize)
+		endy = priv->ysize;
+
+	if (priv->damage.endx && priv->damage.endy) {
+		/* Span a rectangle across all old and new damage */
+		priv->damage.x = min(x, priv->damage.x);
+		priv->damage.y = min(y, priv->damage.y);
+		priv->damage.endx = max(endx, priv->damage.endx);
+		priv->damage.endy = max(endy, priv->damage.endy);
+	} else {
+		/* First damage, setting the rectangle to span it */
+		priv->damage.x = x;
+		priv->damage.y = y;
+		priv->damage.endx = endx;
+		priv->damage.endy = endy;
+	}
+
+	return 0;
+}
+#endif
+
 /* Flush video activity to the caches */
 int video_sync(struct udevice *vid, bool force)
 {
diff --git a/include/video.h b/include/video.h
index 43e2c89977..98592eb19a 100644
--- a/include/video.h
+++ b/include/video.h
@@ -109,6 +109,14 @@ struct video_priv {
 	void *fb;
 	int fb_size;
 	void *copy_fb;
+#ifdef CONFIG_VIDEO_DAMAGE
+	struct {
+		int x;
+		int y;
+		int endx;
+		int endy;
+	} damage;
+#endif
 	int line_length;
 	u32 colour_fg;
 	u32 colour_bg;
@@ -167,8 +175,9 @@ int video_clear(struct udevice *dev);
  * @return: 0 on success, error code otherwise
  *
  * Some frame buffers are cached or have a secondary frame buffer. This
- * function syncs these up so that the current contents of the U-Boot frame
- * buffer are displayed to the user.
+ * function syncs the damaged parts of them up so that the current contents
+ * of the U-Boot frame buffer are displayed to the user. It clears the damage
+ * buffer.
  */
 int video_sync(struct udevice *vid, bool force);
 
@@ -268,6 +277,32 @@ static inline int video_sync_copy_all(struct udevice *dev)
 
 #endif
 
+#ifdef CONFIG_VIDEO_DAMAGE
+/**
+ * video_damage() - Notify the video subsystem about screen updates.
+ *
+ * @vid:	Device to sync
+ * @x:	        Upper left X coordinate of the damaged rectangle
+ * @y:	        Upper left Y coordinate of the damaged rectangle
+ * @width:	Width of the damaged rectangle
+ * @height:	Height of the damaged rectangle
+ *
+ * @return: 0
+ *
+ * Some frame buffers are cached or have a secondary frame buffer. This
+ * function notifies the video subsystem about rectangles that were updated
+ * within the frame buffer. They may only get written to the screen on the
+ * next call to video_sync().
+ */
+int video_damage(struct udevice *vid, int x, int y, int width, int height);
+#else
+static inline int video_damage(struct udevice *vid, int x, int y, int width,
+			       int height)
+{
+	return 0;
+}
+#endif /* CONFIG_VIDEO_DAMAGE */
+
 /**
  * video_is_active() - Test if one video device it active
  *
-- 
2.32.1 (Apple Git-133)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/6] dm: video: Add damage notification on display clear
  2022-06-06 23:43 [PATCH 0/6] Add video damage tracking Alexander Graf
  2022-06-06 23:43 ` [PATCH 1/6] dm: video: Add damage tracking API Alexander Graf
@ 2022-06-06 23:43 ` Alexander Graf
  2022-06-06 23:43 ` [PATCH 3/6] vidconsole: Add damage notifications to all vidconsole drivers Alexander Graf
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Alexander Graf @ 2022-06-06 23:43 UTC (permalink / raw)
  To: u-boot
  Cc: Heinrich Schuchardt, Anatolij Gustschin, Simon Glass,
	Matthias Brugger, Da Xue

Let's report the video damage when we clear the screen. This
way we can later lazily flush only relevant regions to hardware.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reported-by: Da Xue <da@libre.computer>
---
 drivers/video/video-uclass.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/video/video-uclass.c b/drivers/video/video-uclass.c
index 496aa56843..9ac1974670 100644
--- a/drivers/video/video-uclass.c
+++ b/drivers/video/video-uclass.c
@@ -153,6 +153,8 @@ int video_clear(struct udevice *dev)
 	if (ret)
 		return ret;
 
+	video_damage(dev, 0, 0, priv->xsize, priv->ysize);
+
 	return video_sync(dev, false);
 }
 
-- 
2.32.1 (Apple Git-133)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/6] vidconsole: Add damage notifications to all vidconsole drivers
  2022-06-06 23:43 [PATCH 0/6] Add video damage tracking Alexander Graf
  2022-06-06 23:43 ` [PATCH 1/6] dm: video: Add damage tracking API Alexander Graf
  2022-06-06 23:43 ` [PATCH 2/6] dm: video: Add damage notification on display clear Alexander Graf
@ 2022-06-06 23:43 ` Alexander Graf
  2022-06-06 23:43 ` [PATCH 4/6] video: Add damage notification on bmp display Alexander Graf
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Alexander Graf @ 2022-06-06 23:43 UTC (permalink / raw)
  To: u-boot
  Cc: Heinrich Schuchardt, Anatolij Gustschin, Simon Glass,
	Matthias Brugger, Da Xue

Now that we have a damage tracking API, let's populate damage done by
vidconsole drivers. We try to declare as little memory as damaged as
possible, with the exception of rotated screens that I couldn't get my
head wrapped around. On those, we revert to the old behavior and mark
the full screen as damaged on every update.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reported-by: Da Xue <da@libre.computer>
---
 drivers/video/console_normal.c   | 10 ++++++++++
 drivers/video/console_rotate.c   | 18 ++++++++++++++++++
 drivers/video/console_truetype.c | 12 ++++++++++++
 3 files changed, 40 insertions(+)

diff --git a/drivers/video/console_normal.c b/drivers/video/console_normal.c
index 04f022491e..5b5586fd3e 100644
--- a/drivers/video/console_normal.c
+++ b/drivers/video/console_normal.c
@@ -57,6 +57,9 @@ static int console_normal_set_row(struct udevice *dev, uint row, int clr)
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent, 0, VIDEO_FONT_HEIGHT * row, vid_priv->xsize,
+		     VIDEO_FONT_HEIGHT);
+
 	return 0;
 }
 
@@ -76,6 +79,9 @@ static int console_normal_move_rows(struct udevice *dev, uint rowdst,
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent, 0, VIDEO_FONT_HEIGHT * rowdst, vid_priv->xsize,
+		     VIDEO_FONT_HEIGHT * count);
+
 	return 0;
 }
 
@@ -143,6 +149,10 @@ static int console_normal_putc_xy(struct udevice *dev, uint x_frac, uint y,
 		}
 		line += vid_priv->line_length;
 	}
+
+	video_damage(dev->parent, VID_TO_PIXEL(x_frac), y, VIDEO_FONT_WIDTH,
+		     VIDEO_FONT_HEIGHT);
+
 	ret = vidconsole_sync_copy(dev, start, line);
 	if (ret)
 		return ret;
diff --git a/drivers/video/console_rotate.c b/drivers/video/console_rotate.c
index 36c8d0609d..4d5084e8d1 100644
--- a/drivers/video/console_rotate.c
+++ b/drivers/video/console_rotate.c
@@ -57,6 +57,8 @@ static int console_set_row_1(struct udevice *dev, uint row, int clr)
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent, 0, 0, vid_priv->xsize, vid_priv->ysize);
+
 	return 0;
 }
 
@@ -83,6 +85,8 @@ static int console_move_rows_1(struct udevice *dev, uint rowdst, uint rowsrc,
 		dst += vid_priv->line_length;
 	}
 
+	video_damage(dev->parent, 0, 0, vid_priv->xsize, vid_priv->ysize);
+
 	return 0;
 }
 
@@ -150,6 +154,8 @@ static int console_putc_xy_1(struct udevice *dev, uint x_frac, uint y, char ch)
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent, 0, 0, vid_priv->xsize, vid_priv->ysize);
+
 	return VID_TO_POS(VIDEO_FONT_WIDTH);
 }
 
@@ -199,6 +205,8 @@ static int console_set_row_2(struct udevice *dev, uint row, int clr)
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent, 0, 0, vid_priv->xsize, vid_priv->ysize);
+
 	return 0;
 }
 
@@ -218,6 +226,8 @@ static int console_move_rows_2(struct udevice *dev, uint rowdst, uint rowsrc,
 	vidconsole_memmove(dev, dst, src,
 			   VIDEO_FONT_HEIGHT * vid_priv->line_length * count);
 
+	video_damage(dev->parent, 0, 0, vid_priv->xsize, vid_priv->ysize);
+
 	return 0;
 }
 
@@ -288,6 +298,8 @@ static int console_putc_xy_2(struct udevice *dev, uint x_frac, uint y, char ch)
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent, 0, 0, vid_priv->xsize, vid_priv->ysize);
+
 	return VID_TO_POS(VIDEO_FONT_WIDTH);
 }
 
@@ -335,6 +347,8 @@ static int console_set_row_3(struct udevice *dev, uint row, int clr)
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent, 0, 0, vid_priv->xsize, vid_priv->ysize);
+
 	return 0;
 }
 
@@ -359,6 +373,8 @@ static int console_move_rows_3(struct udevice *dev, uint rowdst, uint rowsrc,
 		dst += vid_priv->line_length;
 	}
 
+	video_damage(dev->parent, 0, 0, vid_priv->xsize, vid_priv->ysize);
+
 	return 0;
 }
 
@@ -424,6 +440,8 @@ static int console_putc_xy_3(struct udevice *dev, uint x_frac, uint y, char ch)
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent, 0, 0, vid_priv->xsize, vid_priv->ysize);
+
 	return VID_TO_POS(VIDEO_FONT_WIDTH);
 }
 
diff --git a/drivers/video/console_truetype.c b/drivers/video/console_truetype.c
index c04b449a6d..8fab28fd15 100644
--- a/drivers/video/console_truetype.c
+++ b/drivers/video/console_truetype.c
@@ -168,6 +168,9 @@ static int console_truetype_set_row(struct udevice *dev, uint row, int clr)
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent, 0, priv->y_charsize * rowdst, vid_priv->xsize,
+		     priv->y_charsize);
+
 	return 0;
 }
 
@@ -192,6 +195,9 @@ static int console_truetype_move_rows(struct udevice *dev, uint rowdst,
 	for (i = 0; i < priv->pos_ptr; i++)
 		priv->pos[i].ypos -= diff;
 
+	video_damage(dev->parent, 0, priv->y_charsize * rowdst, vid_priv->xsize,
+		     priv->y_charsize * count);
+
 	return 0;
 }
 
@@ -348,6 +354,9 @@ static int console_truetype_putc_xy(struct udevice *dev, uint x, uint y,
 
 		line += vid_priv->line_length;
 	}
+
+	video_damage(dev->parent, x, y, width, height);
+
 	ret = vidconsole_sync_copy(dev, start, line);
 	if (ret)
 		return ret;
@@ -415,6 +424,9 @@ static int console_truetype_erase(struct udevice *dev, int xstart, int ystart,
 		}
 		line += vid_priv->line_length;
 	}
+
+	video_damage(dev->parent, xstart, ystart, xend - xstart, yend - ystart);
+
 	ret = vidconsole_sync_copy(dev, start, line);
 	if (ret)
 		return ret;
-- 
2.32.1 (Apple Git-133)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/6] video: Add damage notification on bmp display
  2022-06-06 23:43 [PATCH 0/6] Add video damage tracking Alexander Graf
                   ` (2 preceding siblings ...)
  2022-06-06 23:43 ` [PATCH 3/6] vidconsole: Add damage notifications to all vidconsole drivers Alexander Graf
@ 2022-06-06 23:43 ` Alexander Graf
  2022-06-06 23:43 ` [PATCH 5/6] efi_loader: GOP: Add damage notification on BLT Alexander Graf
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Alexander Graf @ 2022-06-06 23:43 UTC (permalink / raw)
  To: u-boot
  Cc: Heinrich Schuchardt, Anatolij Gustschin, Simon Glass,
	Matthias Brugger, Da Xue

Let's report the video damage when we draw a bitmap on the screen. This
way we can later lazily flush only relevant regions to hardware.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reported-by: Da Xue <da@libre.computer>
---
 drivers/video/video_bmp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/video/video_bmp.c b/drivers/video/video_bmp.c
index 4d2d961696..da8a7b3701 100644
--- a/drivers/video/video_bmp.c
+++ b/drivers/video/video_bmp.c
@@ -416,6 +416,8 @@ int video_bmp_display(struct udevice *dev, ulong bmp_image, int x, int y,
 		break;
 	};
 
+	video_damage(dev, x, y, width, height);
+
 	/* Find the position of the top left of the image in the framebuffer */
 	fb = (uchar *)(priv->fb + y * priv->line_length + x * bpix / 8);
 	ret = video_sync_copy(dev, start, fb);
-- 
2.32.1 (Apple Git-133)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/6] efi_loader: GOP: Add damage notification on BLT
  2022-06-06 23:43 [PATCH 0/6] Add video damage tracking Alexander Graf
                   ` (3 preceding siblings ...)
  2022-06-06 23:43 ` [PATCH 4/6] video: Add damage notification on bmp display Alexander Graf
@ 2022-06-06 23:43 ` Alexander Graf
  2022-06-07  7:12   ` Heinrich Schuchardt
  2022-06-06 23:43 ` [PATCH 6/6] video: Only dcache flush damaged lines Alexander Graf
  2022-06-07  8:28 ` [PATCH 0/6] Add video damage tracking Heinrich Schuchardt
  6 siblings, 1 reply; 16+ messages in thread
From: Alexander Graf @ 2022-06-06 23:43 UTC (permalink / raw)
  To: u-boot
  Cc: Heinrich Schuchardt, Anatolij Gustschin, Simon Glass,
	Matthias Brugger, Da Xue

Now that we have a damage tracking API, let's populate damage done by
UEFI payloads when they BLT data onto the screen.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reported-by: Da Xue <da@libre.computer>
---
 lib/efi_loader/efi_gop.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/lib/efi_loader/efi_gop.c b/lib/efi_loader/efi_gop.c
index 2c81859807..67286c9a60 100644
--- a/lib/efi_loader/efi_gop.c
+++ b/lib/efi_loader/efi_gop.c
@@ -33,6 +33,9 @@ struct efi_gop_obj {
 	struct efi_gop ops;
 	struct efi_gop_mode_info info;
 	struct efi_gop_mode mode;
+#ifdef CONFIG_DM_VIDEO
+	struct udevice *vdev;
+#endif
 	/* Fields we only have access to during init */
 	u32 bpix;
 	void *fb;
@@ -244,6 +247,10 @@ static __always_inline efi_status_t gop_blt_int(struct efi_gop *this,
 		dlineoff += dwidth;
 	}
 
+#ifdef CONFIG_DM_VIDEO
+	video_damage(gopobj->vdev, dx, dy, width, height);
+#endif
+
 	return EFI_SUCCESS;
 }
 
@@ -583,5 +590,9 @@ efi_status_t efi_gop_register(void)
 	gopobj->bpix = bpix;
 	gopobj->fb = fb;
 
+#ifdef CONFIG_DM_VIDEO
+	gopobj->vdev = vdev;
+#endif
+
 	return EFI_SUCCESS;
 }
-- 
2.32.1 (Apple Git-133)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/6] video: Only dcache flush damaged lines
  2022-06-06 23:43 [PATCH 0/6] Add video damage tracking Alexander Graf
                   ` (4 preceding siblings ...)
  2022-06-06 23:43 ` [PATCH 5/6] efi_loader: GOP: Add damage notification on BLT Alexander Graf
@ 2022-06-06 23:43 ` Alexander Graf
  2022-06-07  8:00   ` Heinrich Schuchardt
  2022-06-07  8:28 ` [PATCH 0/6] Add video damage tracking Heinrich Schuchardt
  6 siblings, 1 reply; 16+ messages in thread
From: Alexander Graf @ 2022-06-06 23:43 UTC (permalink / raw)
  To: u-boot
  Cc: Heinrich Schuchardt, Anatolij Gustschin, Simon Glass,
	Matthias Brugger, Da Xue

Now that we have a damage area tells us which parts of the frame buffer
actually need updating, let's only dcache flush those on video_sync()
calls. With this optimization in place, frame buffer updates - especially
on large screen such as 4k displays - speed up significantly.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reported-by: Da Xue <da@libre.computer>
---
 drivers/video/video-uclass.c | 49 ++++++++++++++++++++++++++++++------
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/drivers/video/video-uclass.c b/drivers/video/video-uclass.c
index 9ac1974670..5661beea38 100644
--- a/drivers/video/video-uclass.c
+++ b/drivers/video/video-uclass.c
@@ -222,6 +222,39 @@ int video_damage(struct udevice *vid, int x, int y, int width, int height)
 }
 #endif
 
+#if defined(CONFIG_ARM) && !CONFIG_IS_ENABLED(SYS_DCACHE_OFF)
+static void video_flush_dcache(struct udevice *vid)
+{
+	struct video_priv *priv = dev_get_uclass_priv(vid);
+
+	if (!priv->flush_dcache)
+		return;
+
+#ifdef CONFIG_VIDEO_DAMAGE
+	if (priv->damage.endx && priv->damage.endy) {
+		int lstart = priv->damage.x * VNBYTES(priv->bpix);
+		int lend = priv->damage.endx * VNBYTES(priv->bpix);
+		int y;
+
+		for (y = priv->damage.y; y < priv->damage.endy; y++) {
+			ulong fb = (ulong)priv->fb;
+			ulong start = fb + (y * priv->line_length) + lstart;
+			ulong end = start + lend;
+
+			start = ALIGN_DOWN(start, CONFIG_SYS_CACHELINE_SIZE);
+			end = ALIGN(end, CONFIG_SYS_CACHELINE_SIZE);
+
+			flush_dcache_range(start, end);
+		}
+	}
+#else
+	flush_dcache_range((ulong)priv->fb,
+			   ALIGN((ulong)priv->fb + priv->fb_size,
+				 CONFIG_SYS_CACHELINE_SIZE));
+#endif
+}
+#endif
+
 /* Flush video activity to the caches */
 int video_sync(struct udevice *vid, bool force)
 {
@@ -240,13 +273,7 @@ int video_sync(struct udevice *vid, bool force)
 	 * out whether it exists? For now, ARM is safe.
 	 */
 #if defined(CONFIG_ARM) && !CONFIG_IS_ENABLED(SYS_DCACHE_OFF)
-	struct video_priv *priv = dev_get_uclass_priv(vid);
-
-	if (priv->flush_dcache) {
-		flush_dcache_range((ulong)priv->fb,
-				   ALIGN((ulong)priv->fb + priv->fb_size,
-					 CONFIG_SYS_CACHELINE_SIZE));
-	}
+	video_flush_dcache(vid);
 #elif defined(CONFIG_VIDEO_SANDBOX_SDL)
 	struct video_priv *priv = dev_get_uclass_priv(vid);
 	static ulong last_sync;
@@ -256,6 +283,14 @@ int video_sync(struct udevice *vid, bool force)
 		last_sync = get_timer(0);
 	}
 #endif
+
+#ifdef CONFIG_VIDEO_DAMAGE
+	struct video_priv *priv = dev_get_uclass_priv(vid);
+
+	priv->damage.endx = 0;
+	priv->damage.endy = 0;
+#endif
+
 	return 0;
 }
 
-- 
2.32.1 (Apple Git-133)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/6] efi_loader: GOP: Add damage notification on BLT
  2022-06-06 23:43 ` [PATCH 5/6] efi_loader: GOP: Add damage notification on BLT Alexander Graf
@ 2022-06-07  7:12   ` Heinrich Schuchardt
  2022-06-09 14:55     ` Alexander Graf
  0 siblings, 1 reply; 16+ messages in thread
From: Heinrich Schuchardt @ 2022-06-07  7:12 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anatolij Gustschin, Simon Glass, Matthias Brugger, Da Xue, u-boot

On 6/7/22 01:43, Alexander Graf wrote:
> Now that we have a damage tracking API, let's populate damage done by
> UEFI payloads when they BLT data onto the screen.
>
> Signed-off-by: Alexander Graf <agraf@csgraf.de>
> Reported-by: Da Xue <da@libre.computer>
> ---
>   lib/efi_loader/efi_gop.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
>
> diff --git a/lib/efi_loader/efi_gop.c b/lib/efi_loader/efi_gop.c
> index 2c81859807..67286c9a60 100644
> --- a/lib/efi_loader/efi_gop.c
> +++ b/lib/efi_loader/efi_gop.c
> @@ -33,6 +33,9 @@ struct efi_gop_obj {
>   	struct efi_gop ops;
>   	struct efi_gop_mode_info info;
>   	struct efi_gop_mode mode;
> +#ifdef CONFIG_DM_VIDEO

Please, heed the warnings provided by scripts/checkpatch.pl:

WARNING: Use 'if (IS_ENABLED(CONFIG...))' instead of '#if or #ifdef'
where possible
#174: FILE: lib/efi_loader/efi_gop.c:36:
+#ifdef CONFIG_DM_VIDEO


> +	struct udevice *vdev;
> +#endif
>   	/* Fields we only have access to during init */
>   	u32 bpix;
>   	void *fb;
> @@ -244,6 +247,10 @@ static __always_inline efi_status_t gop_blt_int(struct efi_gop *this,
>   		dlineoff += dwidth;
>   	}
>
> +#ifdef CONFIG_DM_VIDEO

WARNING: Use 'if (IS_ENABLED(CONFIG...))' instead of '#if or #ifdef'
where possible
#184: FILE: lib/efi_loader/efi_gop.c:250:
+#ifdef CONFIG_DM_VIDEO

> +	video_damage(gopobj->vdev, dx, dy, width, height);
> +#endif
> +
>   	return EFI_SUCCESS;
>   }
>
> @@ -583,5 +590,9 @@ efi_status_t efi_gop_register(void)
>   	gopobj->bpix = bpix;
>   	gopobj->fb = fb;
>
> +#ifdef CONFIG_DM_VIDEO

WARNING: Use 'if (IS_ENABLED(CONFIG...))' instead of '#if or #ifdef'
where possible
#195: FILE: lib/efi_loader/efi_gop.c:593:
+#ifdef CONFIG_DM_VIDEO

Best regards

Heinrich

> +	gopobj->vdev = vdev;
> +#endif
> +
>   	return EFI_SUCCESS;
>   }


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] video: Only dcache flush damaged lines
  2022-06-06 23:43 ` [PATCH 6/6] video: Only dcache flush damaged lines Alexander Graf
@ 2022-06-07  8:00   ` Heinrich Schuchardt
  2022-06-09 14:56     ` Alexander Graf
  0 siblings, 1 reply; 16+ messages in thread
From: Heinrich Schuchardt @ 2022-06-07  8:00 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anatolij Gustschin, Simon Glass, Matthias Brugger, Da Xue, u-boot

On 6/7/22 01:43, Alexander Graf wrote:
> Now that we have a damage area tells us which parts of the frame buffer
> actually need updating, let's only dcache flush those on video_sync()
> calls. With this optimization in place, frame buffer updates - especially
> on large screen such as 4k displays - speed up significantly.
> 
> Signed-off-by: Alexander Graf <agraf@csgraf.de>
> Reported-by: Da Xue <da@libre.computer>
> ---
>   drivers/video/video-uclass.c | 49 ++++++++++++++++++++++++++++++------
>   1 file changed, 42 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/video/video-uclass.c b/drivers/video/video-uclass.c
> index 9ac1974670..5661beea38 100644
> --- a/drivers/video/video-uclass.c
> +++ b/drivers/video/video-uclass.c
> @@ -222,6 +222,39 @@ int video_damage(struct udevice *vid, int x, int y, int width, int height)
>   }
>   #endif
>   
> +#if defined(CONFIG_ARM) && !CONFIG_IS_ENABLED(SYS_DCACHE_OFF)

Why should this be ARM specific?

Best regards

Heinrich

> +static void video_flush_dcache(struct udevice *vid)
> +{
> +	struct video_priv *priv = dev_get_uclass_priv(vid);
> +
> +	if (!priv->flush_dcache)
> +		return;
> +
> +#ifdef CONFIG_VIDEO_DAMAGE
> +	if (priv->damage.endx && priv->damage.endy) {
> +		int lstart = priv->damage.x * VNBYTES(priv->bpix);
> +		int lend = priv->damage.endx * VNBYTES(priv->bpix);
> +		int y;
> +
> +		for (y = priv->damage.y; y < priv->damage.endy; y++) {
> +			ulong fb = (ulong)priv->fb;
> +			ulong start = fb + (y * priv->line_length) + lstart;
> +			ulong end = start + lend;
> +
> +			start = ALIGN_DOWN(start, CONFIG_SYS_CACHELINE_SIZE);
> +			end = ALIGN(end, CONFIG_SYS_CACHELINE_SIZE);
> +
> +			flush_dcache_range(start, end);
> +		}
> +	}
> +#else
> +	flush_dcache_range((ulong)priv->fb,
> +			   ALIGN((ulong)priv->fb + priv->fb_size,
> +				 CONFIG_SYS_CACHELINE_SIZE));
> +#endif
> +}
> +#endif
> +
>   /* Flush video activity to the caches */
>   int video_sync(struct udevice *vid, bool force)
>   {
> @@ -240,13 +273,7 @@ int video_sync(struct udevice *vid, bool force)
>   	 * out whether it exists? For now, ARM is safe.
>   	 */
>   #if defined(CONFIG_ARM) && !CONFIG_IS_ENABLED(SYS_DCACHE_OFF)
> -	struct video_priv *priv = dev_get_uclass_priv(vid);
> -
> -	if (priv->flush_dcache) {
> -		flush_dcache_range((ulong)priv->fb,
> -				   ALIGN((ulong)priv->fb + priv->fb_size,
> -					 CONFIG_SYS_CACHELINE_SIZE));
> -	}
> +	video_flush_dcache(vid);
>   #elif defined(CONFIG_VIDEO_SANDBOX_SDL)
>   	struct video_priv *priv = dev_get_uclass_priv(vid);
>   	static ulong last_sync;
> @@ -256,6 +283,14 @@ int video_sync(struct udevice *vid, bool force)
>   		last_sync = get_timer(0);
>   	}
>   #endif
> +
> +#ifdef CONFIG_VIDEO_DAMAGE
> +	struct video_priv *priv = dev_get_uclass_priv(vid);
> +
> +	priv->damage.endx = 0;
> +	priv->damage.endy = 0;
> +#endif
> +
>   	return 0;
>   }
>   


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6] Add video damage tracking
  2022-06-06 23:43 [PATCH 0/6] Add video damage tracking Alexander Graf
                   ` (5 preceding siblings ...)
  2022-06-06 23:43 ` [PATCH 6/6] video: Only dcache flush damaged lines Alexander Graf
@ 2022-06-07  8:28 ` Heinrich Schuchardt
  2022-06-09 19:04   ` Alexander Graf
  6 siblings, 1 reply; 16+ messages in thread
From: Heinrich Schuchardt @ 2022-06-07  8:28 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anatolij Gustschin, Simon Glass, Matthias Brugger, Da Xue, u-boot

On 6/7/22 01:43, Alexander Graf wrote:
> This patch set speeds up graphics output on ARM by a factor of 60x.
>
> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
> but need it accessible by the display controller which reads directly
> from a later point of consistency. Hence, we flush the frame buffer to
> DRAM on every change. The full frame buffer.

Isn't a similar problem already solved by CONFIG_VIDEO_COPY?

Leaving the frame buffer uncached would convert the ARM problem into the
X86 case?

>
> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
> that can take a while to flush out. This was reported by Da Xue with grub,
> which happily print 1000s of spaces on the screen to draw a menu. Every
> printed space triggers a cache flush.
>
> This patch set implements the easiest mitigation against this problem:
> Damage tracking. We remember the lowest common denominator region that was
> touched since the last video_sync() call and only flush that.

If by "lowest common denominator region" you should mean a rectangle,
drawing a point in the upper left corner and another in the lower right
corner would require a full flush. So nothing gained in this case.

>
> With this patch set applied, we reduce drawing a large grub menu (with
> serial console attached for size information) on an RK3399-ROC system
> at 1440p from 55 seconds to less than 1 second.
>
>
> Alternatives considered:
>
>    1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
>       so often. We are missing timers to do this generically.
>
>    2) Double buffering - We could try to identify whether anything changed
>       at all and only draw to the FB if it did. That would require
>       maintaining a second buffer that we need to scan.
>
>    3) Text buffer - Maintain a buffer of all text printed on the screen with
>       respective location. Don't write if the old and new character are
>       identical. This would limit applicability to text only and is an
>       optimization on top of this patch set.
>
>    4) Hash screen lines - Create a hash (sha256?) over every line when it
>       changes. Only flush when it does. I'm not sure if this would waste
>       more time, memory and cache than the current approach. It would make
>       full screen updates much more expensive.
>
> Alexander Graf (6):
>    dm: video: Add damage tracking API
>    dm: video: Add damage notification on display clear
>    vidconsole: Add damage notifications to all vidconsole drivers
>    video: Add damage notification on bmp display
>    efi_loader: GOP: Add damage notification on BLT
>    video: Only dcache flush damaged lines

We need documentation describing the difference between
CONFIG_VIDEO_COPY and CONFIG_VIDEO_DAMAGE.

Best regards

Heinrich

>
>   drivers/video/Kconfig            | 15 ++++++
>   drivers/video/console_normal.c   | 10 ++++
>   drivers/video/console_rotate.c   | 18 +++++++
>   drivers/video/console_truetype.c | 12 +++++
>   drivers/video/video-uclass.c     | 89 +++++++++++++++++++++++++++++---
>   drivers/video/video_bmp.c        |  2 +
>   include/video.h                  | 39 +++++++++++++-
>   lib/efi_loader/efi_gop.c         | 11 ++++
>   8 files changed, 187 insertions(+), 9 deletions(-)
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/6] efi_loader: GOP: Add damage notification on BLT
  2022-06-07  7:12   ` Heinrich Schuchardt
@ 2022-06-09 14:55     ` Alexander Graf
  0 siblings, 0 replies; 16+ messages in thread
From: Alexander Graf @ 2022-06-09 14:55 UTC (permalink / raw)
  To: Heinrich Schuchardt
  Cc: Anatolij Gustschin, Simon Glass, Matthias Brugger, Da Xue, u-boot

Hey Heinrich,

On 07.06.22 09:12, Heinrich Schuchardt wrote:
> On 6/7/22 01:43, Alexander Graf wrote:
>> Now that we have a damage tracking API, let's populate damage done by
>> UEFI payloads when they BLT data onto the screen.
>>
>> Signed-off-by: Alexander Graf <agraf@csgraf.de>
>> Reported-by: Da Xue <da@libre.computer>
>> ---
>>   lib/efi_loader/efi_gop.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/lib/efi_loader/efi_gop.c b/lib/efi_loader/efi_gop.c
>> index 2c81859807..67286c9a60 100644
>> --- a/lib/efi_loader/efi_gop.c
>> +++ b/lib/efi_loader/efi_gop.c
>> @@ -33,6 +33,9 @@ struct efi_gop_obj {
>>       struct efi_gop ops;
>>       struct efi_gop_mode_info info;
>>       struct efi_gop_mode mode;
>> +#ifdef CONFIG_DM_VIDEO
>
> Please, heed the warnings provided by scripts/checkpatch.pl:
>
> WARNING: Use 'if (IS_ENABLED(CONFIG...))' instead of '#if or #ifdef'
> where possible
> #174: FILE: lib/efi_loader/efi_gop.c:36:
> +#ifdef CONFIG_DM_VIDEO


I was mostly afraid of adding a dependency on struct udevice here. But 
since we already include video.h, I believe we're good. Happy to change 
it to only runtime checks.

Alex



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] video: Only dcache flush damaged lines
  2022-06-07  8:00   ` Heinrich Schuchardt
@ 2022-06-09 14:56     ` Alexander Graf
  0 siblings, 0 replies; 16+ messages in thread
From: Alexander Graf @ 2022-06-09 14:56 UTC (permalink / raw)
  To: Heinrich Schuchardt
  Cc: Anatolij Gustschin, Simon Glass, Matthias Brugger, Da Xue, u-boot


On 07.06.22 10:00, Heinrich Schuchardt wrote:
> On 6/7/22 01:43, Alexander Graf wrote:
>> Now that we have a damage area tells us which parts of the frame buffer
>> actually need updating, let's only dcache flush those on video_sync()
>> calls. With this optimization in place, frame buffer updates - 
>> especially
>> on large screen such as 4k displays - speed up significantly.
>>
>> Signed-off-by: Alexander Graf <agraf@csgraf.de>
>> Reported-by: Da Xue <da@libre.computer>
>> ---
>>   drivers/video/video-uclass.c | 49 ++++++++++++++++++++++++++++++------
>>   1 file changed, 42 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/video/video-uclass.c b/drivers/video/video-uclass.c
>> index 9ac1974670..5661beea38 100644
>> --- a/drivers/video/video-uclass.c
>> +++ b/drivers/video/video-uclass.c
>> @@ -222,6 +222,39 @@ int video_damage(struct udevice *vid, int x, int 
>> y, int width, int height)
>>   }
>>   #endif
>>   +#if defined(CONFIG_ARM) && !CONFIG_IS_ENABLED(SYS_DCACHE_OFF)
>
> Why should this be ARM specific?


I don't believe it should - and that's what the existing comment also 
says. But currently it is because the dcache API isn't available on all 
platforms; I'm merely preserving the existing logic :).


Thanks,

Alex



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6] Add video damage tracking
  2022-06-07  8:28 ` [PATCH 0/6] Add video damage tracking Heinrich Schuchardt
@ 2022-06-09 19:04   ` Alexander Graf
  2022-06-09 19:26     ` Mark Kettenis
  2022-06-09 20:32     ` Heinrich Schuchardt
  0 siblings, 2 replies; 16+ messages in thread
From: Alexander Graf @ 2022-06-09 19:04 UTC (permalink / raw)
  To: Heinrich Schuchardt
  Cc: Anatolij Gustschin, Simon Glass, Matthias Brugger, Da Xue, u-boot


On 07.06.22 10:28, Heinrich Schuchardt wrote:
> On 6/7/22 01:43, Alexander Graf wrote:
>> This patch set speeds up graphics output on ARM by a factor of 60x.
>>
>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
>> but need it accessible by the display controller which reads directly
>> from a later point of consistency. Hence, we flush the frame buffer to
>> DRAM on every change. The full frame buffer.
>
> Isn't a similar problem already solved by CONFIG_VIDEO_COPY?
>
> Leaving the frame buffer uncached would convert the ARM problem into the
> X86 case?


It solves a similar problem, yes. However, it requires us to allocate 
the frame buffer size twice, and we would need to dynamically toggle the 
MMU mappings of the frame buffer to WC instead of cached. That's code we 
don't have today.

VIDEO_COPY is also terribly inefficient in the most common case: Drawing 
one or multiple characters. It basically copies every line that contains 
the character, for every character printed. The damage code in this 
patch set only flushes the relevant rectangles after a string is fully 
printed.

I think overall, damage tracking with cached memory is simple enough 
that it gives us the best of all worlds.


>
>>
>> Unfortunately, with the advent of 4k displays, we are seeing frame 
>> buffers
>> that can take a while to flush out. This was reported by Da Xue with 
>> grub,
>> which happily print 1000s of spaces on the screen to draw a menu. Every
>> printed space triggers a cache flush.
>>
>> This patch set implements the easiest mitigation against this problem:
>> Damage tracking. We remember the lowest common denominator region 
>> that was
>> touched since the last video_sync() call and only flush that.
>
> If by "lowest common denominator region" you should mean a rectangle,
> drawing a point in the upper left corner and another in the lower right
> corner would require a full flush. So nothing gained in this case.


Glad you asked! :)

While theoretically possible, this is a case that just never happens in 
U-Boot's code flow. All code that draws to the screen is either blt 
based (like gop, character drawing or logo display) or moves large 
portions of the screen (scrolling). The largest granularity we have 
between syncs is when printing strings. So the worst case you'll have 
today is a wrap around where you'd end up flushing full lines.


>
>>
>> With this patch set applied, we reduce drawing a large grub menu (with
>> serial console attached for size information) on an RK3399-ROC system
>> at 1440p from 55 seconds to less than 1 second.
>>
>>
>> Alternatives considered:
>>
>>    1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
>>       so often. We are missing timers to do this generically.
>>
>>    2) Double buffering - We could try to identify whether anything 
>> changed
>>       at all and only draw to the FB if it did. That would require
>>       maintaining a second buffer that we need to scan.
>>
>>    3) Text buffer - Maintain a buffer of all text printed on the 
>> screen with
>>       respective location. Don't write if the old and new character are
>>       identical. This would limit applicability to text only and is an
>>       optimization on top of this patch set.
>>
>>    4) Hash screen lines - Create a hash (sha256?) over every line 
>> when it
>>       changes. Only flush when it does. I'm not sure if this would waste
>>       more time, memory and cache than the current approach. It would 
>> make
>>       full screen updates much more expensive.
>>
>> Alexander Graf (6):
>>    dm: video: Add damage tracking API
>>    dm: video: Add damage notification on display clear
>>    vidconsole: Add damage notifications to all vidconsole drivers
>>    video: Add damage notification on bmp display
>>    efi_loader: GOP: Add damage notification on BLT
>>    video: Only dcache flush damaged lines
>
> We need documentation describing the difference between
> CONFIG_VIDEO_COPY and CONFIG_VIDEO_DAMAGE.


Hm, maybe we should implement CONFIG_VIDEO_COPY as a flush mechanism 
behind CONFIG_VIDEO_DAMAGE? That way we only have a single code path for 
producers left and in addition also optimize drawing individual 
characters. It would also make the feature useful beyond ARM dcache 
flushing.


Alex



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6] Add video damage tracking
  2022-06-09 19:04   ` Alexander Graf
@ 2022-06-09 19:26     ` Mark Kettenis
  2022-06-09 20:32     ` Heinrich Schuchardt
  1 sibling, 0 replies; 16+ messages in thread
From: Mark Kettenis @ 2022-06-09 19:26 UTC (permalink / raw)
  To: Alexander Graf; +Cc: xypron.glpk, agust, sjg, mbrugger, da, u-boot

> Date: Thu, 9 Jun 2022 21:04:37 +0200
> From: Alexander Graf <agraf@csgraf.de>
> 
> On 07.06.22 10:28, Heinrich Schuchardt wrote:
> > On 6/7/22 01:43, Alexander Graf wrote:
> >> This patch set speeds up graphics output on ARM by a factor of 60x.
> >>
> >> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
> >> but need it accessible by the display controller which reads directly
> >> from a later point of consistency. Hence, we flush the frame buffer to
> >> DRAM on every change. The full frame buffer.
> >
> > Isn't a similar problem already solved by CONFIG_VIDEO_COPY?
> >
> > Leaving the frame buffer uncached would convert the ARM problem into the
> > X86 case?
> 
> 
> It solves a similar problem, yes. However, it requires us to allocate 
> the frame buffer size twice, and we would need to dynamically toggle the 
> MMU mappings of the frame buffer to WC instead of cached. That's code we 
> don't have today.

For the Apple M1 the framebuffer is covered by the "memory map" and
maps it as Normal-NC, but that is because the framebuffer is already
set up at the point where u-boot takes control.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6] Add video damage tracking
  2022-06-09 19:04   ` Alexander Graf
  2022-06-09 19:26     ` Mark Kettenis
@ 2022-06-09 20:32     ` Heinrich Schuchardt
  2022-06-09 21:09       ` Alexander Graf
  1 sibling, 1 reply; 16+ messages in thread
From: Heinrich Schuchardt @ 2022-06-09 20:32 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anatolij Gustschin, Simon Glass, Matthias Brugger, Da Xue, u-boot



Am 9. Juni 2022 21:04:37 MESZ schrieb Alexander Graf <agraf@csgraf.de>:
>
>On 07.06.22 10:28, Heinrich Schuchardt wrote:
>> On 6/7/22 01:43, Alexander Graf wrote:
>>> This patch set speeds up graphics output on ARM by a factor of 60x.
>>> 
>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
>>> but need it accessible by the display controller which reads directly
>>> from a later point of consistency. Hence, we flush the frame buffer to
>>> DRAM on every change. The full frame buffer.
>> 
>> Isn't a similar problem already solved by CONFIG_VIDEO_COPY?
>> 
>> Leaving the frame buffer uncached would convert the ARM problem into the
>> X86 case?
>
>
>It solves a similar problem, yes. However, it requires us to allocate the frame buffer size twice, and we would need to dynamically toggle the MMU mappings of the frame buffer to WC instead of cached. That's code we don't have today.
>
>VIDEO_COPY is also terribly inefficient in the most common case: Drawing one or multiple characters. It basically copies every line that contains the character, for every character printed. The damage code in this patch set only flushes the relevant rectangles after a string is fully printed.
>
>I think overall, damage tracking with cached memory is simple enough that it gives us the best of all worlds.
>
>
>> 
>>> 
>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
>>> that can take a while to flush out. This was reported by Da Xue with grub,
>>> which happily print 1000s of spaces on the screen to draw a menu. Every
>>> printed space triggers a cache flush.
>>> 
>>> This patch set implements the easiest mitigation against this problem:
>>> Damage tracking. We remember the lowest common denominator region that was
>>> touched since the last video_sync() call and only flush that.
>> 
>> If by "lowest common denominator region" you should mean a rectangle,
>> drawing a point in the upper left corner and another in the lower right
>> corner would require a full flush. So nothing gained in this case.
>
>
>Glad you asked! :)
>
>While theoretically possible, this is a case that just never happens in U-Boot's code flow. All code that draws to the screen is either blt based (like gop, character drawing or logo display) or moves large portions of the screen (scrolling). The largest granularity we have between syncs is when printing strings. So the worst case you'll have today is a wrap around where you'd end up flushing full lines.
>
>
>> 
>>> 
>>> With this patch set applied, we reduce drawing a large grub menu (with
>>> serial console attached for size information) on an RK3399-ROC system
>>> at 1440p from 55 seconds to less than 1 second.
>>> 
>>> 
>>> Alternatives considered:
>>> 
>>>    1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
>>>       so often. We are missing timers to do this generically.
>>> 
>>>    2) Double buffering - We could try to identify whether anything changed
>>>       at all and only draw to the FB if it did. That would require
>>>       maintaining a second buffer that we need to scan.
>>> 
>>>    3) Text buffer - Maintain a buffer of all text printed on the screen with
>>>       respective location. Don't write if the old and new character are
>>>       identical. This would limit applicability to text only and is an
>>>       optimization on top of this patch set.
>>> 
>>>    4) Hash screen lines - Create a hash (sha256?) over every line when it
>>>       changes. Only flush when it does. I'm not sure if this would waste
>>>       more time, memory and cache than the current approach. It would make
>>>       full screen updates much more expensive.
>>> 
>>> Alexander Graf (6):
>>>    dm: video: Add damage tracking API
>>>    dm: video: Add damage notification on display clear
>>>    vidconsole: Add damage notifications to all vidconsole drivers
>>>    video: Add damage notification on bmp display
>>>    efi_loader: GOP: Add damage notification on BLT
>>>    video: Only dcache flush damaged lines
>> 
>> We need documentation describing the difference between
>> CONFIG_VIDEO_COPY and CONFIG_VIDEO_DAMAGE.
>
>
>Hm, maybe we should implement CONFIG_VIDEO_COPY as a flush mechanism behind CONFIG_VIDEO_DAMAGE? That way we only have a single code path for producers left and in addition also optimize drawing individual characters. It would also make the feature useful beyond ARM dcache flushing.


Please, consider that RISC-V has no instruction for flushing the data cache. So CONFIG_VIDEO_DAMAGE is not applicable.

Best regards

Heinrich


>
>
>Alex
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6] Add video damage tracking
  2022-06-09 20:32     ` Heinrich Schuchardt
@ 2022-06-09 21:09       ` Alexander Graf
  0 siblings, 0 replies; 16+ messages in thread
From: Alexander Graf @ 2022-06-09 21:09 UTC (permalink / raw)
  To: Heinrich Schuchardt
  Cc: Anatolij Gustschin, Simon Glass, Matthias Brugger, Da Xue, u-boot


On 09.06.22 22:32, Heinrich Schuchardt wrote:
>
> Am 9. Juni 2022 21:04:37 MESZ schrieb Alexander Graf <agraf@csgraf.de>:
>> On 07.06.22 10:28, Heinrich Schuchardt wrote:
>>> On 6/7/22 01:43, Alexander Graf wrote:
>>>> This patch set speeds up graphics output on ARM by a factor of 60x.
>>>>
>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
>>>> but need it accessible by the display controller which reads directly
>>>> from a later point of consistency. Hence, we flush the frame buffer to
>>>> DRAM on every change. The full frame buffer.
>>> Isn't a similar problem already solved by CONFIG_VIDEO_COPY?
>>>
>>> Leaving the frame buffer uncached would convert the ARM problem into the
>>> X86 case?
>>
>> It solves a similar problem, yes. However, it requires us to allocate the frame buffer size twice, and we would need to dynamically toggle the MMU mappings of the frame buffer to WC instead of cached. That's code we don't have today.
>>
>> VIDEO_COPY is also terribly inefficient in the most common case: Drawing one or multiple characters. It basically copies every line that contains the character, for every character printed. The damage code in this patch set only flushes the relevant rectangles after a string is fully printed.
>>
>> I think overall, damage tracking with cached memory is simple enough that it gives us the best of all worlds.
>>
>>
>>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
>>>> that can take a while to flush out. This was reported by Da Xue with grub,
>>>> which happily print 1000s of spaces on the screen to draw a menu. Every
>>>> printed space triggers a cache flush.
>>>>
>>>> This patch set implements the easiest mitigation against this problem:
>>>> Damage tracking. We remember the lowest common denominator region that was
>>>> touched since the last video_sync() call and only flush that.
>>> If by "lowest common denominator region" you should mean a rectangle,
>>> drawing a point in the upper left corner and another in the lower right
>>> corner would require a full flush. So nothing gained in this case.
>>
>> Glad you asked! :)
>>
>> While theoretically possible, this is a case that just never happens in U-Boot's code flow. All code that draws to the screen is either blt based (like gop, character drawing or logo display) or moves large portions of the screen (scrolling). The largest granularity we have between syncs is when printing strings. So the worst case you'll have today is a wrap around where you'd end up flushing full lines.
>>
>>
>>>> With this patch set applied, we reduce drawing a large grub menu (with
>>>> serial console attached for size information) on an RK3399-ROC system
>>>> at 1440p from 55 seconds to less than 1 second.
>>>>
>>>>
>>>> Alternatives considered:
>>>>
>>>>     1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
>>>>        so often. We are missing timers to do this generically.
>>>>
>>>>     2) Double buffering - We could try to identify whether anything changed
>>>>        at all and only draw to the FB if it did. That would require
>>>>        maintaining a second buffer that we need to scan.
>>>>
>>>>     3) Text buffer - Maintain a buffer of all text printed on the screen with
>>>>        respective location. Don't write if the old and new character are
>>>>        identical. This would limit applicability to text only and is an
>>>>        optimization on top of this patch set.
>>>>
>>>>     4) Hash screen lines - Create a hash (sha256?) over every line when it
>>>>        changes. Only flush when it does. I'm not sure if this would waste
>>>>        more time, memory and cache than the current approach. It would make
>>>>        full screen updates much more expensive.
>>>>
>>>> Alexander Graf (6):
>>>>     dm: video: Add damage tracking API
>>>>     dm: video: Add damage notification on display clear
>>>>     vidconsole: Add damage notifications to all vidconsole drivers
>>>>     video: Add damage notification on bmp display
>>>>     efi_loader: GOP: Add damage notification on BLT
>>>>     video: Only dcache flush damaged lines
>>> We need documentation describing the difference between
>>> CONFIG_VIDEO_COPY and CONFIG_VIDEO_DAMAGE.
>>
>> Hm, maybe we should implement CONFIG_VIDEO_COPY as a flush mechanism behind CONFIG_VIDEO_DAMAGE? That way we only have a single code path for producers left and in addition also optimize drawing individual characters. It would also make the feature useful beyond ARM dcache flushing.
>
> Please, consider that RISC-V has no instruction for flushing the data cache. So CONFIG_VIDEO_DAMAGE is not applicable.


I think we'll have to see what SoCs people come up with. My hope would 
be that anything that shares DRAM between the display IP block and the 
CPU speaks on a fully cache coherent bus. Then you don't need any of 
this trickery.


Alex



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-06-09 21:09 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-06 23:43 [PATCH 0/6] Add video damage tracking Alexander Graf
2022-06-06 23:43 ` [PATCH 1/6] dm: video: Add damage tracking API Alexander Graf
2022-06-06 23:43 ` [PATCH 2/6] dm: video: Add damage notification on display clear Alexander Graf
2022-06-06 23:43 ` [PATCH 3/6] vidconsole: Add damage notifications to all vidconsole drivers Alexander Graf
2022-06-06 23:43 ` [PATCH 4/6] video: Add damage notification on bmp display Alexander Graf
2022-06-06 23:43 ` [PATCH 5/6] efi_loader: GOP: Add damage notification on BLT Alexander Graf
2022-06-07  7:12   ` Heinrich Schuchardt
2022-06-09 14:55     ` Alexander Graf
2022-06-06 23:43 ` [PATCH 6/6] video: Only dcache flush damaged lines Alexander Graf
2022-06-07  8:00   ` Heinrich Schuchardt
2022-06-09 14:56     ` Alexander Graf
2022-06-07  8:28 ` [PATCH 0/6] Add video damage tracking Heinrich Schuchardt
2022-06-09 19:04   ` Alexander Graf
2022-06-09 19:26     ` Mark Kettenis
2022-06-09 20:32     ` Heinrich Schuchardt
2022-06-09 21:09       ` Alexander Graf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.