All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] fbdev: Improve performance of fbdev console
@ 2022-02-23 19:37 ` Thomas Zimmermann
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-02-23 19:37 UTC (permalink / raw)
  To: daniel, deller, javierm, geert, sam, kraxel, ppaalanen
  Cc: dri-devel, linux-fbdev, Thomas Zimmermann

Optimize performance of the fbdev console for the common case of
software-based clearing and image blitting.

The commit descripton of each patch contains resuls os a simple
microbenchmark. I also tested the full patchset's effect on the
console output by printing directory listings (i7-4790, FullHD,
simpledrm, kernel with debugging).

  > time find /usr/share/doc -type f

In the unoptimized case:

  real    0m6.173s
  user    0m0.044s
  sys     0m6.107s

With optimizations applied:

  real    0m4.754s
  user    0m0.044s
  sys     0m4.698s

In the optimized case, printing the directory listing is ~25% faster
than before.

In v2 of the patchset, after implementing Sam's suggestion to update
cfb_imageblit() as well, it turns out that the compiled code in
sys_imageblit() is still significantly slower than the CFB version. A
fix is probably a larger task and would include architecture-specific
changes. A new TODO item suggests to investigate the performance of the
various helpers and format-conversion functions in DRM and fbdev.

v3:
	* fix description of cfb_imageblit() patch (Pekka)
v2:
	* improve readability for sys_imageblit() (Gerd, Sam)
	* new TODO item for further optimization

Thomas Zimmermann (5):
  fbdev: Improve performance of sys_fillrect()
  fbdev: Improve performance of sys_imageblit()
  fbdev: Remove trailing whitespaces from cfbimgblt.c
  fbdev: Improve performance of cfb_imageblit()
  drm: Add TODO item for optimizing format helpers

 Documentation/gpu/todo.rst             |  22 +++++
 drivers/video/fbdev/core/cfbimgblt.c   | 107 ++++++++++++++++---------
 drivers/video/fbdev/core/sysfillrect.c |  16 +---
 drivers/video/fbdev/core/sysimgblt.c   |  49 ++++++++---
 4 files changed, 133 insertions(+), 61 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v3 0/5] fbdev: Improve performance of fbdev console
@ 2022-02-23 19:37 ` Thomas Zimmermann
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-02-23 19:37 UTC (permalink / raw)
  To: daniel, deller, javierm, geert, sam, kraxel, ppaalanen
  Cc: linux-fbdev, Thomas Zimmermann, dri-devel

Optimize performance of the fbdev console for the common case of
software-based clearing and image blitting.

The commit descripton of each patch contains resuls os a simple
microbenchmark. I also tested the full patchset's effect on the
console output by printing directory listings (i7-4790, FullHD,
simpledrm, kernel with debugging).

  > time find /usr/share/doc -type f

In the unoptimized case:

  real    0m6.173s
  user    0m0.044s
  sys     0m6.107s

With optimizations applied:

  real    0m4.754s
  user    0m0.044s
  sys     0m4.698s

In the optimized case, printing the directory listing is ~25% faster
than before.

In v2 of the patchset, after implementing Sam's suggestion to update
cfb_imageblit() as well, it turns out that the compiled code in
sys_imageblit() is still significantly slower than the CFB version. A
fix is probably a larger task and would include architecture-specific
changes. A new TODO item suggests to investigate the performance of the
various helpers and format-conversion functions in DRM and fbdev.

v3:
	* fix description of cfb_imageblit() patch (Pekka)
v2:
	* improve readability for sys_imageblit() (Gerd, Sam)
	* new TODO item for further optimization

Thomas Zimmermann (5):
  fbdev: Improve performance of sys_fillrect()
  fbdev: Improve performance of sys_imageblit()
  fbdev: Remove trailing whitespaces from cfbimgblt.c
  fbdev: Improve performance of cfb_imageblit()
  drm: Add TODO item for optimizing format helpers

 Documentation/gpu/todo.rst             |  22 +++++
 drivers/video/fbdev/core/cfbimgblt.c   | 107 ++++++++++++++++---------
 drivers/video/fbdev/core/sysfillrect.c |  16 +---
 drivers/video/fbdev/core/sysimgblt.c   |  49 ++++++++---
 4 files changed, 133 insertions(+), 61 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v3 1/5] fbdev: Improve performance of sys_fillrect()
  2022-02-23 19:37 ` Thomas Zimmermann
@ 2022-02-23 19:38   ` Thomas Zimmermann
  -1 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-02-23 19:38 UTC (permalink / raw)
  To: daniel, deller, javierm, geert, sam, kraxel, ppaalanen
  Cc: dri-devel, linux-fbdev, Thomas Zimmermann

Improve the performance of sys_fillrect() by using word-aligned
32/64-bit mov instructions. While the code tried to implement this,
the compiler failed to create fast instructions. The resulting
binary instructions were even slower than cfb_fillrect(), which
uses the same algorithm, but operates on I/O memory.

A microbenchmark measures the average number of CPU cycles
for sys_fillrect() after a stabilizing period of a few minutes
(i7-4790, FullHD, simpledrm, kernel with debugging). The value
for CFB is given as a reference.

  sys_fillrect(), new:  26586 cycles
  sys_fillrect(), old: 166603 cycles
  cfb_fillrect():       41012 cycles

In the optimized case, sys_fillrect() is now ~6x faster than before
and ~1.5x faster than the CFB implementation.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
---
 drivers/video/fbdev/core/sysfillrect.c | 16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/video/fbdev/core/sysfillrect.c b/drivers/video/fbdev/core/sysfillrect.c
index 33ee3d34f9d2..bcdcaeae6538 100644
--- a/drivers/video/fbdev/core/sysfillrect.c
+++ b/drivers/video/fbdev/core/sysfillrect.c
@@ -50,19 +50,9 @@ bitfill_aligned(struct fb_info *p, unsigned long *dst, int dst_idx,
 
 		/* Main chunk */
 		n /= bits;
-		while (n >= 8) {
-			*dst++ = pat;
-			*dst++ = pat;
-			*dst++ = pat;
-			*dst++ = pat;
-			*dst++ = pat;
-			*dst++ = pat;
-			*dst++ = pat;
-			*dst++ = pat;
-			n -= 8;
-		}
-		while (n--)
-			*dst++ = pat;
+		memset_l(dst, pat, n);
+		dst += n;
+
 		/* Trailing bits */
 		if (last)
 			*dst = comp(pat, *dst, last);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 1/5] fbdev: Improve performance of sys_fillrect()
@ 2022-02-23 19:38   ` Thomas Zimmermann
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-02-23 19:38 UTC (permalink / raw)
  To: daniel, deller, javierm, geert, sam, kraxel, ppaalanen
  Cc: linux-fbdev, Thomas Zimmermann, dri-devel

Improve the performance of sys_fillrect() by using word-aligned
32/64-bit mov instructions. While the code tried to implement this,
the compiler failed to create fast instructions. The resulting
binary instructions were even slower than cfb_fillrect(), which
uses the same algorithm, but operates on I/O memory.

A microbenchmark measures the average number of CPU cycles
for sys_fillrect() after a stabilizing period of a few minutes
(i7-4790, FullHD, simpledrm, kernel with debugging). The value
for CFB is given as a reference.

  sys_fillrect(), new:  26586 cycles
  sys_fillrect(), old: 166603 cycles
  cfb_fillrect():       41012 cycles

In the optimized case, sys_fillrect() is now ~6x faster than before
and ~1.5x faster than the CFB implementation.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
---
 drivers/video/fbdev/core/sysfillrect.c | 16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/video/fbdev/core/sysfillrect.c b/drivers/video/fbdev/core/sysfillrect.c
index 33ee3d34f9d2..bcdcaeae6538 100644
--- a/drivers/video/fbdev/core/sysfillrect.c
+++ b/drivers/video/fbdev/core/sysfillrect.c
@@ -50,19 +50,9 @@ bitfill_aligned(struct fb_info *p, unsigned long *dst, int dst_idx,
 
 		/* Main chunk */
 		n /= bits;
-		while (n >= 8) {
-			*dst++ = pat;
-			*dst++ = pat;
-			*dst++ = pat;
-			*dst++ = pat;
-			*dst++ = pat;
-			*dst++ = pat;
-			*dst++ = pat;
-			*dst++ = pat;
-			n -= 8;
-		}
-		while (n--)
-			*dst++ = pat;
+		memset_l(dst, pat, n);
+		dst += n;
+
 		/* Trailing bits */
 		if (last)
 			*dst = comp(pat, *dst, last);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 2/5] fbdev: Improve performance of sys_imageblit()
  2022-02-23 19:37 ` Thomas Zimmermann
@ 2022-02-23 19:38   ` Thomas Zimmermann
  -1 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-02-23 19:38 UTC (permalink / raw)
  To: daniel, deller, javierm, geert, sam, kraxel, ppaalanen
  Cc: dri-devel, linux-fbdev, Thomas Zimmermann

Improve the performance of sys_imageblit() by manually unrolling
the inner blitting loop and moving some invariants out. The compiler
failed to do this automatically. The resulting binary code was even
slower than the cfb_imageblit() helper, which uses the same algorithm,
but operates on I/O memory.

A microbenchmark measures the average number of CPU cycles
for sys_imageblit() after a stabilizing period of a few minutes
(i7-4790, FullHD, simpledrm, kernel with debugging). The value
for CFB is given as a reference.

  sys_imageblit(), new: 25934 cycles
  sys_imageblit(), old: 35944 cycles
  cfb_imageblit():      30566 cycles

In the optimized case, sys_imageblit() is now ~30% faster than before
and ~20% faster than cfb_imageblit().

v2:
	* move switch out of inner loop (Gerd)
	* remove test for alignment of dst1 (Sam)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
---
 drivers/video/fbdev/core/sysimgblt.c | 49 +++++++++++++++++++++-------
 1 file changed, 38 insertions(+), 11 deletions(-)

diff --git a/drivers/video/fbdev/core/sysimgblt.c b/drivers/video/fbdev/core/sysimgblt.c
index a4d05b1b17d7..722c327a381b 100644
--- a/drivers/video/fbdev/core/sysimgblt.c
+++ b/drivers/video/fbdev/core/sysimgblt.c
@@ -188,23 +188,29 @@ static void fast_imageblit(const struct fb_image *image, struct fb_info *p,
 {
 	u32 fgx = fgcolor, bgx = bgcolor, bpp = p->var.bits_per_pixel;
 	u32 ppw = 32/bpp, spitch = (image->width + 7)/8;
-	u32 bit_mask, end_mask, eorx, shift;
+	u32 bit_mask, eorx;
 	const char *s = image->data, *src;
 	u32 *dst;
-	const u32 *tab = NULL;
+	const u32 *tab;
+	size_t tablen;
+	u32 colortab[16];
 	int i, j, k;
 
 	switch (bpp) {
 	case 8:
 		tab = fb_be_math(p) ? cfb_tab8_be : cfb_tab8_le;
+		tablen = 16;
 		break;
 	case 16:
 		tab = fb_be_math(p) ? cfb_tab16_be : cfb_tab16_le;
+		tablen = 4;
 		break;
 	case 32:
-	default:
 		tab = cfb_tab32;
+		tablen = 2;
 		break;
+	default:
+		return;
 	}
 
 	for (i = ppw-1; i--; ) {
@@ -218,19 +224,40 @@ static void fast_imageblit(const struct fb_image *image, struct fb_info *p,
 	eorx = fgx ^ bgx;
 	k = image->width/ppw;
 
+	for (i = 0; i < tablen; ++i)
+		colortab[i] = (tab[i] & eorx) ^ bgx;
+
 	for (i = image->height; i--; ) {
 		dst = dst1;
-		shift = 8;
 		src = s;
 
-		for (j = k; j--; ) {
-			shift -= ppw;
-			end_mask = tab[(*src >> shift) & bit_mask];
-			*dst++ = (end_mask & eorx) ^ bgx;
-			if (!shift) {
-				shift = 8;
-				src++;
+		switch (ppw) {
+		case 4: /* 8 bpp */
+			for (j = k; j; j -= 2, ++src) {
+				*dst++ = colortab[(*src >> 4) & bit_mask];
+				*dst++ = colortab[(*src >> 0) & bit_mask];
+			}
+			break;
+		case 2: /* 16 bpp */
+			for (j = k; j; j -= 4, ++src) {
+				*dst++ = colortab[(*src >> 6) & bit_mask];
+				*dst++ = colortab[(*src >> 4) & bit_mask];
+				*dst++ = colortab[(*src >> 2) & bit_mask];
+				*dst++ = colortab[(*src >> 0) & bit_mask];
+			}
+			break;
+		case 1: /* 32 bpp */
+			for (j = k; j; j -= 8, ++src) {
+				*dst++ = colortab[(*src >> 7) & bit_mask];
+				*dst++ = colortab[(*src >> 6) & bit_mask];
+				*dst++ = colortab[(*src >> 5) & bit_mask];
+				*dst++ = colortab[(*src >> 4) & bit_mask];
+				*dst++ = colortab[(*src >> 3) & bit_mask];
+				*dst++ = colortab[(*src >> 2) & bit_mask];
+				*dst++ = colortab[(*src >> 1) & bit_mask];
+				*dst++ = colortab[(*src >> 0) & bit_mask];
 			}
+			break;
 		}
 		dst1 += p->fix.line_length;
 		s += spitch;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 2/5] fbdev: Improve performance of sys_imageblit()
@ 2022-02-23 19:38   ` Thomas Zimmermann
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-02-23 19:38 UTC (permalink / raw)
  To: daniel, deller, javierm, geert, sam, kraxel, ppaalanen
  Cc: linux-fbdev, Thomas Zimmermann, dri-devel

Improve the performance of sys_imageblit() by manually unrolling
the inner blitting loop and moving some invariants out. The compiler
failed to do this automatically. The resulting binary code was even
slower than the cfb_imageblit() helper, which uses the same algorithm,
but operates on I/O memory.

A microbenchmark measures the average number of CPU cycles
for sys_imageblit() after a stabilizing period of a few minutes
(i7-4790, FullHD, simpledrm, kernel with debugging). The value
for CFB is given as a reference.

  sys_imageblit(), new: 25934 cycles
  sys_imageblit(), old: 35944 cycles
  cfb_imageblit():      30566 cycles

In the optimized case, sys_imageblit() is now ~30% faster than before
and ~20% faster than cfb_imageblit().

v2:
	* move switch out of inner loop (Gerd)
	* remove test for alignment of dst1 (Sam)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
---
 drivers/video/fbdev/core/sysimgblt.c | 49 +++++++++++++++++++++-------
 1 file changed, 38 insertions(+), 11 deletions(-)

diff --git a/drivers/video/fbdev/core/sysimgblt.c b/drivers/video/fbdev/core/sysimgblt.c
index a4d05b1b17d7..722c327a381b 100644
--- a/drivers/video/fbdev/core/sysimgblt.c
+++ b/drivers/video/fbdev/core/sysimgblt.c
@@ -188,23 +188,29 @@ static void fast_imageblit(const struct fb_image *image, struct fb_info *p,
 {
 	u32 fgx = fgcolor, bgx = bgcolor, bpp = p->var.bits_per_pixel;
 	u32 ppw = 32/bpp, spitch = (image->width + 7)/8;
-	u32 bit_mask, end_mask, eorx, shift;
+	u32 bit_mask, eorx;
 	const char *s = image->data, *src;
 	u32 *dst;
-	const u32 *tab = NULL;
+	const u32 *tab;
+	size_t tablen;
+	u32 colortab[16];
 	int i, j, k;
 
 	switch (bpp) {
 	case 8:
 		tab = fb_be_math(p) ? cfb_tab8_be : cfb_tab8_le;
+		tablen = 16;
 		break;
 	case 16:
 		tab = fb_be_math(p) ? cfb_tab16_be : cfb_tab16_le;
+		tablen = 4;
 		break;
 	case 32:
-	default:
 		tab = cfb_tab32;
+		tablen = 2;
 		break;
+	default:
+		return;
 	}
 
 	for (i = ppw-1; i--; ) {
@@ -218,19 +224,40 @@ static void fast_imageblit(const struct fb_image *image, struct fb_info *p,
 	eorx = fgx ^ bgx;
 	k = image->width/ppw;
 
+	for (i = 0; i < tablen; ++i)
+		colortab[i] = (tab[i] & eorx) ^ bgx;
+
 	for (i = image->height; i--; ) {
 		dst = dst1;
-		shift = 8;
 		src = s;
 
-		for (j = k; j--; ) {
-			shift -= ppw;
-			end_mask = tab[(*src >> shift) & bit_mask];
-			*dst++ = (end_mask & eorx) ^ bgx;
-			if (!shift) {
-				shift = 8;
-				src++;
+		switch (ppw) {
+		case 4: /* 8 bpp */
+			for (j = k; j; j -= 2, ++src) {
+				*dst++ = colortab[(*src >> 4) & bit_mask];
+				*dst++ = colortab[(*src >> 0) & bit_mask];
+			}
+			break;
+		case 2: /* 16 bpp */
+			for (j = k; j; j -= 4, ++src) {
+				*dst++ = colortab[(*src >> 6) & bit_mask];
+				*dst++ = colortab[(*src >> 4) & bit_mask];
+				*dst++ = colortab[(*src >> 2) & bit_mask];
+				*dst++ = colortab[(*src >> 0) & bit_mask];
+			}
+			break;
+		case 1: /* 32 bpp */
+			for (j = k; j; j -= 8, ++src) {
+				*dst++ = colortab[(*src >> 7) & bit_mask];
+				*dst++ = colortab[(*src >> 6) & bit_mask];
+				*dst++ = colortab[(*src >> 5) & bit_mask];
+				*dst++ = colortab[(*src >> 4) & bit_mask];
+				*dst++ = colortab[(*src >> 3) & bit_mask];
+				*dst++ = colortab[(*src >> 2) & bit_mask];
+				*dst++ = colortab[(*src >> 1) & bit_mask];
+				*dst++ = colortab[(*src >> 0) & bit_mask];
 			}
+			break;
 		}
 		dst1 += p->fix.line_length;
 		s += spitch;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 3/5] fbdev: Remove trailing whitespaces from cfbimgblt.c
  2022-02-23 19:37 ` Thomas Zimmermann
@ 2022-02-23 19:38   ` Thomas Zimmermann
  -1 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-02-23 19:38 UTC (permalink / raw)
  To: daniel, deller, javierm, geert, sam, kraxel, ppaalanen
  Cc: dri-devel, linux-fbdev, Thomas Zimmermann

Fix coding style. No functional changes.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 drivers/video/fbdev/core/cfbimgblt.c | 60 ++++++++++++++--------------
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/drivers/video/fbdev/core/cfbimgblt.c b/drivers/video/fbdev/core/cfbimgblt.c
index a2bb276a8b24..01b01a279681 100644
--- a/drivers/video/fbdev/core/cfbimgblt.c
+++ b/drivers/video/fbdev/core/cfbimgblt.c
@@ -16,15 +16,15 @@
  *  must be laid out exactly in the same format as the framebuffer. Yes I know
  *  their are cards with hardware that coverts images of various depths to the
  *  framebuffer depth. But not every card has this. All images must be rounded
- *  up to the nearest byte. For example a bitmap 12 bits wide must be two 
- *  bytes width. 
+ *  up to the nearest byte. For example a bitmap 12 bits wide must be two
+ *  bytes width.
  *
- *  Tony: 
- *  Incorporate mask tables similar to fbcon-cfb*.c in 2.4 API.  This speeds 
+ *  Tony:
+ *  Incorporate mask tables similar to fbcon-cfb*.c in 2.4 API.  This speeds
  *  up the code significantly.
- *  
+ *
  *  Code for depths not multiples of BITS_PER_LONG is still kludgy, which is
- *  still processed a bit at a time.   
+ *  still processed a bit at a time.
  *
  *  Also need to add code to deal with cards endians that are different than
  *  the native cpu endians. I also need to deal with MSB position in the word.
@@ -72,8 +72,8 @@ static const u32 cfb_tab32[] = {
 #define FB_WRITEL fb_writel
 #define FB_READL  fb_readl
 
-static inline void color_imageblit(const struct fb_image *image, 
-				   struct fb_info *p, u8 __iomem *dst1, 
+static inline void color_imageblit(const struct fb_image *image,
+				   struct fb_info *p, u8 __iomem *dst1,
 				   u32 start_index,
 				   u32 pitch_index)
 {
@@ -92,7 +92,7 @@ static inline void color_imageblit(const struct fb_image *image,
 		dst = (u32 __iomem *) dst1;
 		shift = 0;
 		val = 0;
-		
+
 		if (start_index) {
 			u32 start_mask = ~fb_shifted_pixels_mask_u32(p,
 						start_index, bswapmask);
@@ -109,8 +109,8 @@ static inline void color_imageblit(const struct fb_image *image,
 			val |= FB_SHIFT_HIGH(p, color, shift ^ bswapmask);
 			if (shift >= null_bits) {
 				FB_WRITEL(val, dst++);
-	
-				val = (shift == null_bits) ? 0 : 
+
+				val = (shift == null_bits) ? 0 :
 					FB_SHIFT_LOW(p, color, 32 - shift);
 			}
 			shift += bpp;
@@ -134,9 +134,9 @@ static inline void color_imageblit(const struct fb_image *image,
 	}
 }
 
-static inline void slow_imageblit(const struct fb_image *image, struct fb_info *p, 
+static inline void slow_imageblit(const struct fb_image *image, struct fb_info *p,
 				  u8 __iomem *dst1, u32 fgcolor,
-				  u32 bgcolor, 
+				  u32 bgcolor,
 				  u32 start_index,
 				  u32 pitch_index)
 {
@@ -172,7 +172,7 @@ static inline void slow_imageblit(const struct fb_image *image, struct fb_info *
 			l--;
 			color = (*s & (1 << l)) ? fgcolor : bgcolor;
 			val |= FB_SHIFT_HIGH(p, color, shift ^ bswapmask);
-			
+
 			/* Did the bitshift spill bits to the next long? */
 			if (shift >= null_bits) {
 				FB_WRITEL(val, dst++);
@@ -191,16 +191,16 @@ static inline void slow_imageblit(const struct fb_image *image, struct fb_info *
 
 			FB_WRITEL((FB_READL(dst) & end_mask) | val, dst);
 		}
-		
+
 		dst1 += pitch;
-		src += spitch;	
+		src += spitch;
 		if (pitch_index) {
 			dst2 += pitch;
 			dst1 = (u8 __iomem *)((long __force)dst2 & ~(sizeof(u32) - 1));
 			start_index += pitch_index;
 			start_index &= 32 - 1;
 		}
-		
+
 	}
 }
 
@@ -212,9 +212,9 @@ static inline void slow_imageblit(const struct fb_image *image, struct fb_info *
  *           fix->line_legth is divisible by 4;
  *           beginning and end of a scanline is dword aligned
  */
-static inline void fast_imageblit(const struct fb_image *image, struct fb_info *p, 
-				  u8 __iomem *dst1, u32 fgcolor, 
-				  u32 bgcolor) 
+static inline void fast_imageblit(const struct fb_image *image, struct fb_info *p,
+				  u8 __iomem *dst1, u32 fgcolor,
+				  u32 bgcolor)
 {
 	u32 fgx = fgcolor, bgx = bgcolor, bpp = p->var.bits_per_pixel;
 	u32 ppw = 32/bpp, spitch = (image->width + 7)/8;
@@ -243,25 +243,25 @@ static inline void fast_imageblit(const struct fb_image *image, struct fb_info *
 		fgx |= fgcolor;
 		bgx |= bgcolor;
 	}
-	
+
 	bit_mask = (1 << ppw) - 1;
 	eorx = fgx ^ bgx;
 	k = image->width/ppw;
 
 	for (i = image->height; i--; ) {
 		dst = (u32 __iomem *) dst1, shift = 8; src = s;
-		
+
 		for (j = k; j--; ) {
 			shift -= ppw;
 			end_mask = tab[(*src >> shift) & bit_mask];
 			FB_WRITEL((end_mask & eorx)^bgx, dst++);
-			if (!shift) { shift = 8; src++; }		
+			if (!shift) { shift = 8; src++; }
 		}
 		dst1 += p->fix.line_length;
 		s += spitch;
 	}
-}	
-	
+}
+
 void cfb_imageblit(struct fb_info *p, const struct fb_image *image)
 {
 	u32 fgcolor, bgcolor, start_index, bitstart, pitch_index = 0;
@@ -292,13 +292,13 @@ void cfb_imageblit(struct fb_info *p, const struct fb_image *image)
 		} else {
 			fgcolor = image->fg_color;
 			bgcolor = image->bg_color;
-		}	
-		
-		if (32 % bpp == 0 && !start_index && !pitch_index && 
+		}
+
+		if (32 % bpp == 0 && !start_index && !pitch_index &&
 		    ((width & (32/bpp-1)) == 0) &&
-		    bpp >= 8 && bpp <= 32) 			
+		    bpp >= 8 && bpp <= 32)
 			fast_imageblit(image, p, dst1, fgcolor, bgcolor);
-		else 
+		else
 			slow_imageblit(image, p, dst1, fgcolor, bgcolor,
 					start_index, pitch_index);
 	} else
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 3/5] fbdev: Remove trailing whitespaces from cfbimgblt.c
@ 2022-02-23 19:38   ` Thomas Zimmermann
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-02-23 19:38 UTC (permalink / raw)
  To: daniel, deller, javierm, geert, sam, kraxel, ppaalanen
  Cc: linux-fbdev, Thomas Zimmermann, dri-devel

Fix coding style. No functional changes.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 drivers/video/fbdev/core/cfbimgblt.c | 60 ++++++++++++++--------------
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/drivers/video/fbdev/core/cfbimgblt.c b/drivers/video/fbdev/core/cfbimgblt.c
index a2bb276a8b24..01b01a279681 100644
--- a/drivers/video/fbdev/core/cfbimgblt.c
+++ b/drivers/video/fbdev/core/cfbimgblt.c
@@ -16,15 +16,15 @@
  *  must be laid out exactly in the same format as the framebuffer. Yes I know
  *  their are cards with hardware that coverts images of various depths to the
  *  framebuffer depth. But not every card has this. All images must be rounded
- *  up to the nearest byte. For example a bitmap 12 bits wide must be two 
- *  bytes width. 
+ *  up to the nearest byte. For example a bitmap 12 bits wide must be two
+ *  bytes width.
  *
- *  Tony: 
- *  Incorporate mask tables similar to fbcon-cfb*.c in 2.4 API.  This speeds 
+ *  Tony:
+ *  Incorporate mask tables similar to fbcon-cfb*.c in 2.4 API.  This speeds
  *  up the code significantly.
- *  
+ *
  *  Code for depths not multiples of BITS_PER_LONG is still kludgy, which is
- *  still processed a bit at a time.   
+ *  still processed a bit at a time.
  *
  *  Also need to add code to deal with cards endians that are different than
  *  the native cpu endians. I also need to deal with MSB position in the word.
@@ -72,8 +72,8 @@ static const u32 cfb_tab32[] = {
 #define FB_WRITEL fb_writel
 #define FB_READL  fb_readl
 
-static inline void color_imageblit(const struct fb_image *image, 
-				   struct fb_info *p, u8 __iomem *dst1, 
+static inline void color_imageblit(const struct fb_image *image,
+				   struct fb_info *p, u8 __iomem *dst1,
 				   u32 start_index,
 				   u32 pitch_index)
 {
@@ -92,7 +92,7 @@ static inline void color_imageblit(const struct fb_image *image,
 		dst = (u32 __iomem *) dst1;
 		shift = 0;
 		val = 0;
-		
+
 		if (start_index) {
 			u32 start_mask = ~fb_shifted_pixels_mask_u32(p,
 						start_index, bswapmask);
@@ -109,8 +109,8 @@ static inline void color_imageblit(const struct fb_image *image,
 			val |= FB_SHIFT_HIGH(p, color, shift ^ bswapmask);
 			if (shift >= null_bits) {
 				FB_WRITEL(val, dst++);
-	
-				val = (shift == null_bits) ? 0 : 
+
+				val = (shift == null_bits) ? 0 :
 					FB_SHIFT_LOW(p, color, 32 - shift);
 			}
 			shift += bpp;
@@ -134,9 +134,9 @@ static inline void color_imageblit(const struct fb_image *image,
 	}
 }
 
-static inline void slow_imageblit(const struct fb_image *image, struct fb_info *p, 
+static inline void slow_imageblit(const struct fb_image *image, struct fb_info *p,
 				  u8 __iomem *dst1, u32 fgcolor,
-				  u32 bgcolor, 
+				  u32 bgcolor,
 				  u32 start_index,
 				  u32 pitch_index)
 {
@@ -172,7 +172,7 @@ static inline void slow_imageblit(const struct fb_image *image, struct fb_info *
 			l--;
 			color = (*s & (1 << l)) ? fgcolor : bgcolor;
 			val |= FB_SHIFT_HIGH(p, color, shift ^ bswapmask);
-			
+
 			/* Did the bitshift spill bits to the next long? */
 			if (shift >= null_bits) {
 				FB_WRITEL(val, dst++);
@@ -191,16 +191,16 @@ static inline void slow_imageblit(const struct fb_image *image, struct fb_info *
 
 			FB_WRITEL((FB_READL(dst) & end_mask) | val, dst);
 		}
-		
+
 		dst1 += pitch;
-		src += spitch;	
+		src += spitch;
 		if (pitch_index) {
 			dst2 += pitch;
 			dst1 = (u8 __iomem *)((long __force)dst2 & ~(sizeof(u32) - 1));
 			start_index += pitch_index;
 			start_index &= 32 - 1;
 		}
-		
+
 	}
 }
 
@@ -212,9 +212,9 @@ static inline void slow_imageblit(const struct fb_image *image, struct fb_info *
  *           fix->line_legth is divisible by 4;
  *           beginning and end of a scanline is dword aligned
  */
-static inline void fast_imageblit(const struct fb_image *image, struct fb_info *p, 
-				  u8 __iomem *dst1, u32 fgcolor, 
-				  u32 bgcolor) 
+static inline void fast_imageblit(const struct fb_image *image, struct fb_info *p,
+				  u8 __iomem *dst1, u32 fgcolor,
+				  u32 bgcolor)
 {
 	u32 fgx = fgcolor, bgx = bgcolor, bpp = p->var.bits_per_pixel;
 	u32 ppw = 32/bpp, spitch = (image->width + 7)/8;
@@ -243,25 +243,25 @@ static inline void fast_imageblit(const struct fb_image *image, struct fb_info *
 		fgx |= fgcolor;
 		bgx |= bgcolor;
 	}
-	
+
 	bit_mask = (1 << ppw) - 1;
 	eorx = fgx ^ bgx;
 	k = image->width/ppw;
 
 	for (i = image->height; i--; ) {
 		dst = (u32 __iomem *) dst1, shift = 8; src = s;
-		
+
 		for (j = k; j--; ) {
 			shift -= ppw;
 			end_mask = tab[(*src >> shift) & bit_mask];
 			FB_WRITEL((end_mask & eorx)^bgx, dst++);
-			if (!shift) { shift = 8; src++; }		
+			if (!shift) { shift = 8; src++; }
 		}
 		dst1 += p->fix.line_length;
 		s += spitch;
 	}
-}	
-	
+}
+
 void cfb_imageblit(struct fb_info *p, const struct fb_image *image)
 {
 	u32 fgcolor, bgcolor, start_index, bitstart, pitch_index = 0;
@@ -292,13 +292,13 @@ void cfb_imageblit(struct fb_info *p, const struct fb_image *image)
 		} else {
 			fgcolor = image->fg_color;
 			bgcolor = image->bg_color;
-		}	
-		
-		if (32 % bpp == 0 && !start_index && !pitch_index && 
+		}
+
+		if (32 % bpp == 0 && !start_index && !pitch_index &&
 		    ((width & (32/bpp-1)) == 0) &&
-		    bpp >= 8 && bpp <= 32) 			
+		    bpp >= 8 && bpp <= 32)
 			fast_imageblit(image, p, dst1, fgcolor, bgcolor);
-		else 
+		else
 			slow_imageblit(image, p, dst1, fgcolor, bgcolor,
 					start_index, pitch_index);
 	} else
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
  2022-02-23 19:37 ` Thomas Zimmermann
@ 2022-02-23 19:38   ` Thomas Zimmermann
  -1 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-02-23 19:38 UTC (permalink / raw)
  To: daniel, deller, javierm, geert, sam, kraxel, ppaalanen
  Cc: dri-devel, linux-fbdev, Thomas Zimmermann

Improve the performance of cfb_imageblit() by manually unrolling
the inner blitting loop and moving some invariants out. The compiler
failed to do this automatically. This change keeps cfb_imageblit()
in sync with sys_imagebit().

A microbenchmark measures the average number of CPU cycles
for cfb_imageblit() after a stabilizing period of a few minutes
(i7-4790, FullHD, simpledrm, kernel with debugging).

cfb_imageblit(), new: 15724 cycles
cfb_imageblit(): old: 30566 cycles

In the optimized case, cfb_imageblit() is now ~2x faster than before.

v3:
	* fix commit description (Pekka)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 drivers/video/fbdev/core/cfbimgblt.c | 51 +++++++++++++++++++++++-----
 1 file changed, 42 insertions(+), 9 deletions(-)

diff --git a/drivers/video/fbdev/core/cfbimgblt.c b/drivers/video/fbdev/core/cfbimgblt.c
index 01b01a279681..7361cfabdd85 100644
--- a/drivers/video/fbdev/core/cfbimgblt.c
+++ b/drivers/video/fbdev/core/cfbimgblt.c
@@ -218,23 +218,29 @@ static inline void fast_imageblit(const struct fb_image *image, struct fb_info *
 {
 	u32 fgx = fgcolor, bgx = bgcolor, bpp = p->var.bits_per_pixel;
 	u32 ppw = 32/bpp, spitch = (image->width + 7)/8;
-	u32 bit_mask, end_mask, eorx, shift;
+	u32 bit_mask, eorx;
 	const char *s = image->data, *src;
 	u32 __iomem *dst;
 	const u32 *tab = NULL;
+	size_t tablen;
+	u32 colortab[16];
 	int i, j, k;
 
 	switch (bpp) {
 	case 8:
 		tab = fb_be_math(p) ? cfb_tab8_be : cfb_tab8_le;
+		tablen = 16;
 		break;
 	case 16:
 		tab = fb_be_math(p) ? cfb_tab16_be : cfb_tab16_le;
+		tablen = 4;
 		break;
 	case 32:
-	default:
 		tab = cfb_tab32;
+		tablen = 2;
 		break;
+	default:
+		return;
 	}
 
 	for (i = ppw-1; i--; ) {
@@ -248,15 +254,42 @@ static inline void fast_imageblit(const struct fb_image *image, struct fb_info *
 	eorx = fgx ^ bgx;
 	k = image->width/ppw;
 
-	for (i = image->height; i--; ) {
-		dst = (u32 __iomem *) dst1, shift = 8; src = s;
+	for (i = 0; i < tablen; ++i)
+		colortab[i] = (tab[i] & eorx) ^ bgx;
 
-		for (j = k; j--; ) {
-			shift -= ppw;
-			end_mask = tab[(*src >> shift) & bit_mask];
-			FB_WRITEL((end_mask & eorx)^bgx, dst++);
-			if (!shift) { shift = 8; src++; }
+	for (i = image->height; i--; ) {
+		dst = (u32 __iomem *)dst1;
+		src = s;
+
+		switch (ppw) {
+		case 4: /* 8 bpp */
+			for (j = k; j; j -= 2, ++src) {
+				FB_WRITEL(colortab[(*src >> 4) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 0) & bit_mask], dst++);
+			}
+			break;
+		case 2: /* 16 bpp */
+			for (j = k; j; j -= 4, ++src) {
+				FB_WRITEL(colortab[(*src >> 6) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 4) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 2) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 0) & bit_mask], dst++);
+			}
+			break;
+		case 1: /* 32 bpp */
+			for (j = k; j; j -= 8, ++src) {
+				FB_WRITEL(colortab[(*src >> 7) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 6) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 5) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 4) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 3) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 2) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 1) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 0) & bit_mask], dst++);
+			}
+			break;
 		}
+
 		dst1 += p->fix.line_length;
 		s += spitch;
 	}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
@ 2022-02-23 19:38   ` Thomas Zimmermann
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-02-23 19:38 UTC (permalink / raw)
  To: daniel, deller, javierm, geert, sam, kraxel, ppaalanen
  Cc: linux-fbdev, Thomas Zimmermann, dri-devel

Improve the performance of cfb_imageblit() by manually unrolling
the inner blitting loop and moving some invariants out. The compiler
failed to do this automatically. This change keeps cfb_imageblit()
in sync with sys_imagebit().

A microbenchmark measures the average number of CPU cycles
for cfb_imageblit() after a stabilizing period of a few minutes
(i7-4790, FullHD, simpledrm, kernel with debugging).

cfb_imageblit(), new: 15724 cycles
cfb_imageblit(): old: 30566 cycles

In the optimized case, cfb_imageblit() is now ~2x faster than before.

v3:
	* fix commit description (Pekka)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 drivers/video/fbdev/core/cfbimgblt.c | 51 +++++++++++++++++++++++-----
 1 file changed, 42 insertions(+), 9 deletions(-)

diff --git a/drivers/video/fbdev/core/cfbimgblt.c b/drivers/video/fbdev/core/cfbimgblt.c
index 01b01a279681..7361cfabdd85 100644
--- a/drivers/video/fbdev/core/cfbimgblt.c
+++ b/drivers/video/fbdev/core/cfbimgblt.c
@@ -218,23 +218,29 @@ static inline void fast_imageblit(const struct fb_image *image, struct fb_info *
 {
 	u32 fgx = fgcolor, bgx = bgcolor, bpp = p->var.bits_per_pixel;
 	u32 ppw = 32/bpp, spitch = (image->width + 7)/8;
-	u32 bit_mask, end_mask, eorx, shift;
+	u32 bit_mask, eorx;
 	const char *s = image->data, *src;
 	u32 __iomem *dst;
 	const u32 *tab = NULL;
+	size_t tablen;
+	u32 colortab[16];
 	int i, j, k;
 
 	switch (bpp) {
 	case 8:
 		tab = fb_be_math(p) ? cfb_tab8_be : cfb_tab8_le;
+		tablen = 16;
 		break;
 	case 16:
 		tab = fb_be_math(p) ? cfb_tab16_be : cfb_tab16_le;
+		tablen = 4;
 		break;
 	case 32:
-	default:
 		tab = cfb_tab32;
+		tablen = 2;
 		break;
+	default:
+		return;
 	}
 
 	for (i = ppw-1; i--; ) {
@@ -248,15 +254,42 @@ static inline void fast_imageblit(const struct fb_image *image, struct fb_info *
 	eorx = fgx ^ bgx;
 	k = image->width/ppw;
 
-	for (i = image->height; i--; ) {
-		dst = (u32 __iomem *) dst1, shift = 8; src = s;
+	for (i = 0; i < tablen; ++i)
+		colortab[i] = (tab[i] & eorx) ^ bgx;
 
-		for (j = k; j--; ) {
-			shift -= ppw;
-			end_mask = tab[(*src >> shift) & bit_mask];
-			FB_WRITEL((end_mask & eorx)^bgx, dst++);
-			if (!shift) { shift = 8; src++; }
+	for (i = image->height; i--; ) {
+		dst = (u32 __iomem *)dst1;
+		src = s;
+
+		switch (ppw) {
+		case 4: /* 8 bpp */
+			for (j = k; j; j -= 2, ++src) {
+				FB_WRITEL(colortab[(*src >> 4) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 0) & bit_mask], dst++);
+			}
+			break;
+		case 2: /* 16 bpp */
+			for (j = k; j; j -= 4, ++src) {
+				FB_WRITEL(colortab[(*src >> 6) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 4) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 2) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 0) & bit_mask], dst++);
+			}
+			break;
+		case 1: /* 32 bpp */
+			for (j = k; j; j -= 8, ++src) {
+				FB_WRITEL(colortab[(*src >> 7) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 6) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 5) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 4) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 3) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 2) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 1) & bit_mask], dst++);
+				FB_WRITEL(colortab[(*src >> 0) & bit_mask], dst++);
+			}
+			break;
 		}
+
 		dst1 += p->fix.line_length;
 		s += spitch;
 	}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 5/5] drm: Add TODO item for optimizing format helpers
  2022-02-23 19:37 ` Thomas Zimmermann
@ 2022-02-23 19:38   ` Thomas Zimmermann
  -1 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-02-23 19:38 UTC (permalink / raw)
  To: daniel, deller, javierm, geert, sam, kraxel, ppaalanen
  Cc: dri-devel, linux-fbdev, Thomas Zimmermann

Add a TODO item for optimizing blitting and format-conversion helpers
in DRM and fbdev. There's always demand for faster graphics output.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 Documentation/gpu/todo.rst | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/Documentation/gpu/todo.rst b/Documentation/gpu/todo.rst
index 7bf7f2111696..7f113c6a02dd 100644
--- a/Documentation/gpu/todo.rst
+++ b/Documentation/gpu/todo.rst
@@ -241,6 +241,28 @@ Contact: Thomas Zimmermann <tzimmermann@suse.de>, Daniel Vetter
 
 Level: Advanced
 
+Benchmark and optimize blitting and format-conversion function
+--------------------------------------------------------------
+
+Drawing to dispay memory quickly is crucial for many applications'
+performance.
+
+On at least x86-64, sys_imageblit() is significantly slower than
+cfb_imageblit(), even though both use the same blitting algorithm and
+the latter is written for I/O memory. It turns out that cfb_imageblit()
+uses movl instructions, while sys_imageblit apparently does not. This
+seems to be a problem with gcc's optimizer. DRM's format-conversion
+heleprs might be subject to similar issues.
+
+Benchmark and optimize fbdev's sys_() helpers and DRM's format-conversion
+helpers. In cases that can be further optimized, maybe implement a different
+algorithm, For micro-optimizations, use movl/movq instructions explicitly.
+That might possibly require architecture specific helpers (e.g., storel()
+storeq()).
+
+Contact: Thomas Zimmermann <tzimmermann@suse.de>
+
+Level: Intermediate
 
 drm_framebuffer_funcs and drm_mode_config_funcs.fb_create cleanup
 -----------------------------------------------------------------
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 5/5] drm: Add TODO item for optimizing format helpers
@ 2022-02-23 19:38   ` Thomas Zimmermann
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-02-23 19:38 UTC (permalink / raw)
  To: daniel, deller, javierm, geert, sam, kraxel, ppaalanen
  Cc: linux-fbdev, Thomas Zimmermann, dri-devel

Add a TODO item for optimizing blitting and format-conversion helpers
in DRM and fbdev. There's always demand for faster graphics output.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 Documentation/gpu/todo.rst | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/Documentation/gpu/todo.rst b/Documentation/gpu/todo.rst
index 7bf7f2111696..7f113c6a02dd 100644
--- a/Documentation/gpu/todo.rst
+++ b/Documentation/gpu/todo.rst
@@ -241,6 +241,28 @@ Contact: Thomas Zimmermann <tzimmermann@suse.de>, Daniel Vetter
 
 Level: Advanced
 
+Benchmark and optimize blitting and format-conversion function
+--------------------------------------------------------------
+
+Drawing to dispay memory quickly is crucial for many applications'
+performance.
+
+On at least x86-64, sys_imageblit() is significantly slower than
+cfb_imageblit(), even though both use the same blitting algorithm and
+the latter is written for I/O memory. It turns out that cfb_imageblit()
+uses movl instructions, while sys_imageblit apparently does not. This
+seems to be a problem with gcc's optimizer. DRM's format-conversion
+heleprs might be subject to similar issues.
+
+Benchmark and optimize fbdev's sys_() helpers and DRM's format-conversion
+helpers. In cases that can be further optimized, maybe implement a different
+algorithm, For micro-optimizations, use movl/movq instructions explicitly.
+That might possibly require architecture specific helpers (e.g., storel()
+storeq()).
+
+Contact: Thomas Zimmermann <tzimmermann@suse.de>
+
+Level: Intermediate
 
 drm_framebuffer_funcs and drm_mode_config_funcs.fb_create cleanup
 -----------------------------------------------------------------
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/5] fbdev: Remove trailing whitespaces from cfbimgblt.c
  2022-02-23 19:38   ` Thomas Zimmermann
@ 2022-02-23 20:23     ` Sam Ravnborg
  -1 siblings, 0 replies; 49+ messages in thread
From: Sam Ravnborg @ 2022-02-23 20:23 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: daniel, deller, javierm, geert, kraxel, ppaalanen, dri-devel,
	linux-fbdev

On Wed, Feb 23, 2022 at 08:38:02PM +0100, Thomas Zimmermann wrote:
> Fix coding style. No functional changes.
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/5] fbdev: Remove trailing whitespaces from cfbimgblt.c
@ 2022-02-23 20:23     ` Sam Ravnborg
  0 siblings, 0 replies; 49+ messages in thread
From: Sam Ravnborg @ 2022-02-23 20:23 UTC (permalink / raw)
  To: Thomas Zimmermann; +Cc: linux-fbdev, deller, javierm, dri-devel, geert, kraxel

On Wed, Feb 23, 2022 at 08:38:02PM +0100, Thomas Zimmermann wrote:
> Fix coding style. No functional changes.
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
  2022-02-23 19:38   ` Thomas Zimmermann
@ 2022-02-23 20:25     ` Sam Ravnborg
  -1 siblings, 0 replies; 49+ messages in thread
From: Sam Ravnborg @ 2022-02-23 20:25 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: daniel, deller, javierm, geert, kraxel, ppaalanen, dri-devel,
	linux-fbdev

On Wed, Feb 23, 2022 at 08:38:03PM +0100, Thomas Zimmermann wrote:
> Improve the performance of cfb_imageblit() by manually unrolling
> the inner blitting loop and moving some invariants out. The compiler
> failed to do this automatically. This change keeps cfb_imageblit()
> in sync with sys_imagebit().
> 
> A microbenchmark measures the average number of CPU cycles
> for cfb_imageblit() after a stabilizing period of a few minutes
> (i7-4790, FullHD, simpledrm, kernel with debugging).
> 
> cfb_imageblit(), new: 15724 cycles
> cfb_imageblit(): old: 30566 cycles
> 
> In the optimized case, cfb_imageblit() is now ~2x faster than before.
> 
> v3:
> 	* fix commit description (Pekka)
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>

The code looks equally complicated now in the sys and cfb variants.

Question: What is cfb an abbreviation for anyway?
Not related to the patch - but if I have known the memory is lost..

	Sam

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
@ 2022-02-23 20:25     ` Sam Ravnborg
  0 siblings, 0 replies; 49+ messages in thread
From: Sam Ravnborg @ 2022-02-23 20:25 UTC (permalink / raw)
  To: Thomas Zimmermann; +Cc: linux-fbdev, deller, javierm, dri-devel, geert, kraxel

On Wed, Feb 23, 2022 at 08:38:03PM +0100, Thomas Zimmermann wrote:
> Improve the performance of cfb_imageblit() by manually unrolling
> the inner blitting loop and moving some invariants out. The compiler
> failed to do this automatically. This change keeps cfb_imageblit()
> in sync with sys_imagebit().
> 
> A microbenchmark measures the average number of CPU cycles
> for cfb_imageblit() after a stabilizing period of a few minutes
> (i7-4790, FullHD, simpledrm, kernel with debugging).
> 
> cfb_imageblit(), new: 15724 cycles
> cfb_imageblit(): old: 30566 cycles
> 
> In the optimized case, cfb_imageblit() is now ~2x faster than before.
> 
> v3:
> 	* fix commit description (Pekka)
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>

The code looks equally complicated now in the sys and cfb variants.

Question: What is cfb an abbreviation for anyway?
Not related to the patch - but if I have known the memory is lost..

	Sam

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 5/5] drm: Add TODO item for optimizing format helpers
  2022-02-23 19:38   ` Thomas Zimmermann
@ 2022-02-23 20:34     ` Sam Ravnborg
  -1 siblings, 0 replies; 49+ messages in thread
From: Sam Ravnborg @ 2022-02-23 20:34 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: daniel, deller, javierm, geert, kraxel, ppaalanen, dri-devel,
	linux-fbdev

On Wed, Feb 23, 2022 at 08:38:04PM +0100, Thomas Zimmermann wrote:
> Add a TODO item for optimizing blitting and format-conversion helpers
> in DRM and fbdev. There's always demand for faster graphics output.
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
>  Documentation/gpu/todo.rst | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/Documentation/gpu/todo.rst b/Documentation/gpu/todo.rst
> index 7bf7f2111696..7f113c6a02dd 100644
> --- a/Documentation/gpu/todo.rst
> +++ b/Documentation/gpu/todo.rst
> @@ -241,6 +241,28 @@ Contact: Thomas Zimmermann <tzimmermann@suse.de>, Daniel Vetter
>  
>  Level: Advanced
>  
> +Benchmark and optimize blitting and format-conversion function
> +--------------------------------------------------------------
> +
> +Drawing to dispay memory quickly is crucial for many applications'
              display
> +performance.
> +
> +On at least x86-64, sys_imageblit() is significantly slower than
   On, at least x86-64, ...
   To me the extra comma makes sense, but grammar is not my strong side.
 
> +cfb_imageblit(), even though both use the same blitting algorithm and
> +the latter is written for I/O memory. It turns out that cfb_imageblit()
> +uses movl instructions, while sys_imageblit apparently does not. This
> +seems to be a problem with gcc's optimizer. DRM's format-conversion
> +heleprs might be subject to similar issues.
   helpers
> +
> +Benchmark and optimize fbdev's sys_() helpers and DRM's format-conversion
> +helpers. In cases that can be further optimized, maybe implement a different
> +algorithm, For micro-optimizations, use movl/movq instructions explicitly.
   algorithm. (period, not comma)
> +That might possibly require architecture specific helpers (e.g., storel()
> +storeq()).
> +
> +Contact: Thomas Zimmermann <tzimmermann@suse.de>
> +
> +Level: Intermediate

With the small fixes above:
Acked-by: Sam Ravnborg <sam@ravnborg.org>

Another option would be to re-implement imageblit() to be drm specific.
Maybe we can then throw out some legacy code and optimize only for the drm
use. And then maybe only a small part of the code would differ if this
is I/O memory or direct accessible memory.

	Sam

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 5/5] drm: Add TODO item for optimizing format helpers
@ 2022-02-23 20:34     ` Sam Ravnborg
  0 siblings, 0 replies; 49+ messages in thread
From: Sam Ravnborg @ 2022-02-23 20:34 UTC (permalink / raw)
  To: Thomas Zimmermann; +Cc: linux-fbdev, deller, javierm, dri-devel, geert, kraxel

On Wed, Feb 23, 2022 at 08:38:04PM +0100, Thomas Zimmermann wrote:
> Add a TODO item for optimizing blitting and format-conversion helpers
> in DRM and fbdev. There's always demand for faster graphics output.
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
>  Documentation/gpu/todo.rst | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/Documentation/gpu/todo.rst b/Documentation/gpu/todo.rst
> index 7bf7f2111696..7f113c6a02dd 100644
> --- a/Documentation/gpu/todo.rst
> +++ b/Documentation/gpu/todo.rst
> @@ -241,6 +241,28 @@ Contact: Thomas Zimmermann <tzimmermann@suse.de>, Daniel Vetter
>  
>  Level: Advanced
>  
> +Benchmark and optimize blitting and format-conversion function
> +--------------------------------------------------------------
> +
> +Drawing to dispay memory quickly is crucial for many applications'
              display
> +performance.
> +
> +On at least x86-64, sys_imageblit() is significantly slower than
   On, at least x86-64, ...
   To me the extra comma makes sense, but grammar is not my strong side.
 
> +cfb_imageblit(), even though both use the same blitting algorithm and
> +the latter is written for I/O memory. It turns out that cfb_imageblit()
> +uses movl instructions, while sys_imageblit apparently does not. This
> +seems to be a problem with gcc's optimizer. DRM's format-conversion
> +heleprs might be subject to similar issues.
   helpers
> +
> +Benchmark and optimize fbdev's sys_() helpers and DRM's format-conversion
> +helpers. In cases that can be further optimized, maybe implement a different
> +algorithm, For micro-optimizations, use movl/movq instructions explicitly.
   algorithm. (period, not comma)
> +That might possibly require architecture specific helpers (e.g., storel()
> +storeq()).
> +
> +Contact: Thomas Zimmermann <tzimmermann@suse.de>
> +
> +Level: Intermediate

With the small fixes above:
Acked-by: Sam Ravnborg <sam@ravnborg.org>

Another option would be to re-implement imageblit() to be drm specific.
Maybe we can then throw out some legacy code and optimize only for the drm
use. And then maybe only a small part of the code would differ if this
is I/O memory or direct accessible memory.

	Sam

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/5] fbdev: Remove trailing whitespaces from cfbimgblt.c
  2022-02-23 19:38   ` Thomas Zimmermann
@ 2022-02-24  8:22     ` Javier Martinez Canillas
  -1 siblings, 0 replies; 49+ messages in thread
From: Javier Martinez Canillas @ 2022-02-24  8:22 UTC (permalink / raw)
  To: Thomas Zimmermann, daniel, deller, geert, sam, kraxel, ppaalanen
  Cc: dri-devel, linux-fbdev

On 2/23/22 20:38, Thomas Zimmermann wrote:
> Fix coding style. No functional changes.
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

Best regards,
-- 
Javier Martinez Canillas
Linux Engineering
Red Hat


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/5] fbdev: Remove trailing whitespaces from cfbimgblt.c
@ 2022-02-24  8:22     ` Javier Martinez Canillas
  0 siblings, 0 replies; 49+ messages in thread
From: Javier Martinez Canillas @ 2022-02-24  8:22 UTC (permalink / raw)
  To: Thomas Zimmermann, daniel, deller, geert, sam, kraxel, ppaalanen
  Cc: linux-fbdev, dri-devel

On 2/23/22 20:38, Thomas Zimmermann wrote:
> Fix coding style. No functional changes.
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

Best regards,
-- 
Javier Martinez Canillas
Linux Engineering
Red Hat


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
  2022-02-23 19:38   ` Thomas Zimmermann
@ 2022-02-24  8:31     ` Javier Martinez Canillas
  -1 siblings, 0 replies; 49+ messages in thread
From: Javier Martinez Canillas @ 2022-02-24  8:31 UTC (permalink / raw)
  To: Thomas Zimmermann, daniel, deller, geert, sam, kraxel, ppaalanen
  Cc: linux-fbdev, dri-devel

On 2/23/22 20:38, Thomas Zimmermann wrote:
> Improve the performance of cfb_imageblit() by manually unrolling
> the inner blitting loop and moving some invariants out. The compiler
> failed to do this automatically. This change keeps cfb_imageblit()
> in sync with sys_imagebit().
> 
> A microbenchmark measures the average number of CPU cycles
> for cfb_imageblit() after a stabilizing period of a few minutes
> (i7-4790, FullHD, simpledrm, kernel with debugging).
> 
> cfb_imageblit(), new: 15724 cycles
> cfb_imageblit(): old: 30566 cycles
> 
> In the optimized case, cfb_imageblit() is now ~2x faster than before.
> 
> v3:
> 	* fix commit description (Pekka)
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---

Makes sense, improves perf and makes the two more consistent as you mention.

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

Best regards,
-- 
Javier Martinez Canillas
Linux Engineering
Red Hat


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
@ 2022-02-24  8:31     ` Javier Martinez Canillas
  0 siblings, 0 replies; 49+ messages in thread
From: Javier Martinez Canillas @ 2022-02-24  8:31 UTC (permalink / raw)
  To: Thomas Zimmermann, daniel, deller, geert, sam, kraxel, ppaalanen
  Cc: dri-devel, linux-fbdev

On 2/23/22 20:38, Thomas Zimmermann wrote:
> Improve the performance of cfb_imageblit() by manually unrolling
> the inner blitting loop and moving some invariants out. The compiler
> failed to do this automatically. This change keeps cfb_imageblit()
> in sync with sys_imagebit().
> 
> A microbenchmark measures the average number of CPU cycles
> for cfb_imageblit() after a stabilizing period of a few minutes
> (i7-4790, FullHD, simpledrm, kernel with debugging).
> 
> cfb_imageblit(), new: 15724 cycles
> cfb_imageblit(): old: 30566 cycles
> 
> In the optimized case, cfb_imageblit() is now ~2x faster than before.
> 
> v3:
> 	* fix commit description (Pekka)
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---

Makes sense, improves perf and makes the two more consistent as you mention.

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

Best regards,
-- 
Javier Martinez Canillas
Linux Engineering
Red Hat


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 5/5] drm: Add TODO item for optimizing format helpers
  2022-02-23 19:38   ` Thomas Zimmermann
@ 2022-02-24  8:39     ` Javier Martinez Canillas
  -1 siblings, 0 replies; 49+ messages in thread
From: Javier Martinez Canillas @ 2022-02-24  8:39 UTC (permalink / raw)
  To: Thomas Zimmermann, daniel, deller, geert, sam, kraxel, ppaalanen
  Cc: dri-devel, linux-fbdev

On 2/23/22 20:38, Thomas Zimmermann wrote:
> Add a TODO item for optimizing blitting and format-conversion helpers
> in DRM and fbdev. There's always demand for faster graphics output.
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---

After fixing the typos mentioned by Sam:

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

Best regards,
-- 
Javier Martinez Canillas
Linux Engineering
Red Hat


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 5/5] drm: Add TODO item for optimizing format helpers
@ 2022-02-24  8:39     ` Javier Martinez Canillas
  0 siblings, 0 replies; 49+ messages in thread
From: Javier Martinez Canillas @ 2022-02-24  8:39 UTC (permalink / raw)
  To: Thomas Zimmermann, daniel, deller, geert, sam, kraxel, ppaalanen
  Cc: linux-fbdev, dri-devel

On 2/23/22 20:38, Thomas Zimmermann wrote:
> Add a TODO item for optimizing blitting and format-conversion helpers
> in DRM and fbdev. There's always demand for faster graphics output.
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---

After fixing the typos mentioned by Sam:

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

Best regards,
-- 
Javier Martinez Canillas
Linux Engineering
Red Hat


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
  2022-02-23 20:25     ` Sam Ravnborg
@ 2022-02-24  9:02       ` Javier Martinez Canillas
  -1 siblings, 0 replies; 49+ messages in thread
From: Javier Martinez Canillas @ 2022-02-24  9:02 UTC (permalink / raw)
  To: Sam Ravnborg, Thomas Zimmermann
  Cc: daniel, deller, geert, kraxel, ppaalanen, dri-devel, linux-fbdev

Hello Sam,

On 2/23/22 21:25, Sam Ravnborg wrote:

[snip]

> 
> Question: What is cfb an abbreviation for anyway?
> Not related to the patch - but if I have known the memory is lost..
> 

I was curious so I dug on this. It seems CFB stands for Color Frame Buffer.
Doing a `git grep "(CFB)"` in the linux history repo [0], I get this:

  Documentation/isdn/README.diversion:   (CFB). 
  drivers/video/pmag-ba-fb.c: *   PMAG-BA TURBOchannel Color Frame Buffer (CFB) card support,
  include/video/pmag-ba-fb.h: *   TURBOchannel PMAG-BA Color Frame Buffer (CFB) card support,

Probably the helpers are called like this because they were for any fbdev
driver but assumed that the framebuffer was always in I/O memory. Later some
drivers were allocating the framebuffer in system memory and still using the
helpers, that were using I/O memory accessors and it's ilegal on some arches.

So the sys_* variants where introduced by commit 68648ed1f58d ("fbdev: add
drawing functions for framebuffers in system RAM") to fix this. The old
ones just kept their name, but probably it should had been renamed to io_*
for the naming to be consistent with the sys_* functions.

[0]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/

Best regards,
-- 
Javier Martinez Canillas
Linux Engineering
Red Hat


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
@ 2022-02-24  9:02       ` Javier Martinez Canillas
  0 siblings, 0 replies; 49+ messages in thread
From: Javier Martinez Canillas @ 2022-02-24  9:02 UTC (permalink / raw)
  To: Sam Ravnborg, Thomas Zimmermann
  Cc: linux-fbdev, deller, dri-devel, geert, kraxel

Hello Sam,

On 2/23/22 21:25, Sam Ravnborg wrote:

[snip]

> 
> Question: What is cfb an abbreviation for anyway?
> Not related to the patch - but if I have known the memory is lost..
> 

I was curious so I dug on this. It seems CFB stands for Color Frame Buffer.
Doing a `git grep "(CFB)"` in the linux history repo [0], I get this:

  Documentation/isdn/README.diversion:   (CFB). 
  drivers/video/pmag-ba-fb.c: *   PMAG-BA TURBOchannel Color Frame Buffer (CFB) card support,
  include/video/pmag-ba-fb.h: *   TURBOchannel PMAG-BA Color Frame Buffer (CFB) card support,

Probably the helpers are called like this because they were for any fbdev
driver but assumed that the framebuffer was always in I/O memory. Later some
drivers were allocating the framebuffer in system memory and still using the
helpers, that were using I/O memory accessors and it's ilegal on some arches.

So the sys_* variants where introduced by commit 68648ed1f58d ("fbdev: add
drawing functions for framebuffers in system RAM") to fix this. The old
ones just kept their name, but probably it should had been renamed to io_*
for the naming to be consistent with the sys_* functions.

[0]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/

Best regards,
-- 
Javier Martinez Canillas
Linux Engineering
Red Hat


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
  2022-02-24  9:02       ` Javier Martinez Canillas
@ 2022-02-24 10:29         ` Sam Ravnborg
  -1 siblings, 0 replies; 49+ messages in thread
From: Sam Ravnborg @ 2022-02-24 10:29 UTC (permalink / raw)
  To: Javier Martinez Canillas
  Cc: Thomas Zimmermann, daniel, deller, geert, kraxel, ppaalanen,
	dri-devel, linux-fbdev

Hi Javier,
On Thu, Feb 24, 2022 at 10:02:59AM +0100, Javier Martinez Canillas wrote:
> Hello Sam,
> 
> On 2/23/22 21:25, Sam Ravnborg wrote:
> 
> [snip]
> 
> > 
> > Question: What is cfb an abbreviation for anyway?
> > Not related to the patch - but if I have known the memory is lost..
> > 
> 
> I was curious so I dug on this. It seems CFB stands for Color Frame Buffer.
> Doing a `git grep "(CFB)"` in the linux history repo [0], I get this:
> 
>   Documentation/isdn/README.diversion:   (CFB). 
>   drivers/video/pmag-ba-fb.c: *   PMAG-BA TURBOchannel Color Frame Buffer (CFB) card support,
>   include/video/pmag-ba-fb.h: *   TURBOchannel PMAG-BA Color Frame Buffer (CFB) card support,
> 
> Probably the helpers are called like this because they were for any fbdev
> driver but assumed that the framebuffer was always in I/O memory. Later some
> drivers were allocating the framebuffer in system memory and still using the
> helpers, that were using I/O memory accessors and it's ilegal on some arches.
> 
> So the sys_* variants where introduced by commit 68648ed1f58d ("fbdev: add
> drawing functions for framebuffers in system RAM") to fix this. The old
> ones just kept their name, but probably it should had been renamed to io_*
> for the naming to be consistent with the sys_* functions.
> 
> [0]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/

Interesting - thanks for the history lesson and thanks for taking your
time to share your findings too.

	Sam

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
@ 2022-02-24 10:29         ` Sam Ravnborg
  0 siblings, 0 replies; 49+ messages in thread
From: Sam Ravnborg @ 2022-02-24 10:29 UTC (permalink / raw)
  To: Javier Martinez Canillas
  Cc: linux-fbdev, Thomas Zimmermann, deller, dri-devel, geert, kraxel

Hi Javier,
On Thu, Feb 24, 2022 at 10:02:59AM +0100, Javier Martinez Canillas wrote:
> Hello Sam,
> 
> On 2/23/22 21:25, Sam Ravnborg wrote:
> 
> [snip]
> 
> > 
> > Question: What is cfb an abbreviation for anyway?
> > Not related to the patch - but if I have known the memory is lost..
> > 
> 
> I was curious so I dug on this. It seems CFB stands for Color Frame Buffer.
> Doing a `git grep "(CFB)"` in the linux history repo [0], I get this:
> 
>   Documentation/isdn/README.diversion:   (CFB). 
>   drivers/video/pmag-ba-fb.c: *   PMAG-BA TURBOchannel Color Frame Buffer (CFB) card support,
>   include/video/pmag-ba-fb.h: *   TURBOchannel PMAG-BA Color Frame Buffer (CFB) card support,
> 
> Probably the helpers are called like this because they were for any fbdev
> driver but assumed that the framebuffer was always in I/O memory. Later some
> drivers were allocating the framebuffer in system memory and still using the
> helpers, that were using I/O memory accessors and it's ilegal on some arches.
> 
> So the sys_* variants where introduced by commit 68648ed1f58d ("fbdev: add
> drawing functions for framebuffers in system RAM") to fix this. The old
> ones just kept their name, but probably it should had been renamed to io_*
> for the naming to be consistent with the sys_* functions.
> 
> [0]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/

Interesting - thanks for the history lesson and thanks for taking your
time to share your findings too.

	Sam

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
  2022-02-24  9:02       ` Javier Martinez Canillas
@ 2022-02-24 10:31         ` Geert Uytterhoeven
  -1 siblings, 0 replies; 49+ messages in thread
From: Geert Uytterhoeven @ 2022-02-24 10:31 UTC (permalink / raw)
  To: Javier Martinez Canillas
  Cc: Sam Ravnborg, Thomas Zimmermann, Daniel Vetter, Helge Deller,
	Gerd Hoffmann, Pekka Paalanen, DRI Development,
	Linux Fbdev development list

Hi Javier,

On Thu, Feb 24, 2022 at 10:03 AM Javier Martinez Canillas
<javierm@redhat.com> wrote:
> On 2/23/22 21:25, Sam Ravnborg wrote:
> > Question: What is cfb an abbreviation for anyway?
> > Not related to the patch - but if I have known the memory is lost..
>
> I was curious so I dug on this. It seems CFB stands for Color Frame Buffer.
> Doing a `git grep "(CFB)"` in the linux history repo [0], I get this:

The naming actually comes from X11.
"mfb" is a monochrome frame buffer (bpp = 1).
"cfb" is a color frame buffer (bpp > 1), which uses a chunky format.

> Probably the helpers are called like this because they were for any fbdev
> driver but assumed that the framebuffer was always in I/O memory. Later some
> drivers were allocating the framebuffer in system memory and still using the
> helpers, that were using I/O memory accessors and it's ilegal on some arches.

Yep.  Graphics memory used to be on a graphics card.
On systems (usually non-x86) where it was part of main memory, usually
it didn't matter at all whether you used I/O memory or plain memory
accessors anyway.

Then x86 got unified memory...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
@ 2022-02-24 10:31         ` Geert Uytterhoeven
  0 siblings, 0 replies; 49+ messages in thread
From: Geert Uytterhoeven @ 2022-02-24 10:31 UTC (permalink / raw)
  To: Javier Martinez Canillas
  Cc: Linux Fbdev development list, Thomas Zimmermann, Helge Deller,
	DRI Development, Gerd Hoffmann, Sam Ravnborg

Hi Javier,

On Thu, Feb 24, 2022 at 10:03 AM Javier Martinez Canillas
<javierm@redhat.com> wrote:
> On 2/23/22 21:25, Sam Ravnborg wrote:
> > Question: What is cfb an abbreviation for anyway?
> > Not related to the patch - but if I have known the memory is lost..
>
> I was curious so I dug on this. It seems CFB stands for Color Frame Buffer.
> Doing a `git grep "(CFB)"` in the linux history repo [0], I get this:

The naming actually comes from X11.
"mfb" is a monochrome frame buffer (bpp = 1).
"cfb" is a color frame buffer (bpp > 1), which uses a chunky format.

> Probably the helpers are called like this because they were for any fbdev
> driver but assumed that the framebuffer was always in I/O memory. Later some
> drivers were allocating the framebuffer in system memory and still using the
> helpers, that were using I/O memory accessors and it's ilegal on some arches.

Yep.  Graphics memory used to be on a graphics card.
On systems (usually non-x86) where it was part of main memory, usually
it didn't matter at all whether you used I/O memory or plain memory
accessors anyway.

Then x86 got unified memory...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/5] fbdev: Improve performance of fbdev console
  2022-02-23 19:37 ` Thomas Zimmermann
@ 2022-03-02 19:30   ` Thomas Zimmermann
  -1 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-03-02 19:30 UTC (permalink / raw)
  To: daniel, deller, javierm, geert, sam, kraxel, ppaalanen
  Cc: dri-devel, linux-fbdev


[-- Attachment #1.1: Type: text/plain, Size: 2276 bytes --]

Hi,

merged with fixes for the typoes in the final patch. Thanks for reviewing.

Best regards
Thomas

Am 23.02.22 um 20:37 schrieb Thomas Zimmermann:
> Optimize performance of the fbdev console for the common case of
> software-based clearing and image blitting.
> 
> The commit descripton of each patch contains resuls os a simple
> microbenchmark. I also tested the full patchset's effect on the
> console output by printing directory listings (i7-4790, FullHD,
> simpledrm, kernel with debugging).
> 
>    > time find /usr/share/doc -type f
> 
> In the unoptimized case:
> 
>    real    0m6.173s
>    user    0m0.044s
>    sys     0m6.107s
> 
> With optimizations applied:
> 
>    real    0m4.754s
>    user    0m0.044s
>    sys     0m4.698s
> 
> In the optimized case, printing the directory listing is ~25% faster
> than before.
> 
> In v2 of the patchset, after implementing Sam's suggestion to update
> cfb_imageblit() as well, it turns out that the compiled code in
> sys_imageblit() is still significantly slower than the CFB version. A
> fix is probably a larger task and would include architecture-specific
> changes. A new TODO item suggests to investigate the performance of the
> various helpers and format-conversion functions in DRM and fbdev.
> 
> v3:
> 	* fix description of cfb_imageblit() patch (Pekka)
> v2:
> 	* improve readability for sys_imageblit() (Gerd, Sam)
> 	* new TODO item for further optimization
> 
> Thomas Zimmermann (5):
>    fbdev: Improve performance of sys_fillrect()
>    fbdev: Improve performance of sys_imageblit()
>    fbdev: Remove trailing whitespaces from cfbimgblt.c
>    fbdev: Improve performance of cfb_imageblit()
>    drm: Add TODO item for optimizing format helpers
> 
>   Documentation/gpu/todo.rst             |  22 +++++
>   drivers/video/fbdev/core/cfbimgblt.c   | 107 ++++++++++++++++---------
>   drivers/video/fbdev/core/sysfillrect.c |  16 +---
>   drivers/video/fbdev/core/sysimgblt.c   |  49 ++++++++---
>   4 files changed, 133 insertions(+), 61 deletions(-)
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/5] fbdev: Improve performance of fbdev console
@ 2022-03-02 19:30   ` Thomas Zimmermann
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-03-02 19:30 UTC (permalink / raw)
  To: daniel, deller, javierm, geert, sam, kraxel, ppaalanen
  Cc: linux-fbdev, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2276 bytes --]

Hi,

merged with fixes for the typoes in the final patch. Thanks for reviewing.

Best regards
Thomas

Am 23.02.22 um 20:37 schrieb Thomas Zimmermann:
> Optimize performance of the fbdev console for the common case of
> software-based clearing and image blitting.
> 
> The commit descripton of each patch contains resuls os a simple
> microbenchmark. I also tested the full patchset's effect on the
> console output by printing directory listings (i7-4790, FullHD,
> simpledrm, kernel with debugging).
> 
>    > time find /usr/share/doc -type f
> 
> In the unoptimized case:
> 
>    real    0m6.173s
>    user    0m0.044s
>    sys     0m6.107s
> 
> With optimizations applied:
> 
>    real    0m4.754s
>    user    0m0.044s
>    sys     0m4.698s
> 
> In the optimized case, printing the directory listing is ~25% faster
> than before.
> 
> In v2 of the patchset, after implementing Sam's suggestion to update
> cfb_imageblit() as well, it turns out that the compiled code in
> sys_imageblit() is still significantly slower than the CFB version. A
> fix is probably a larger task and would include architecture-specific
> changes. A new TODO item suggests to investigate the performance of the
> various helpers and format-conversion functions in DRM and fbdev.
> 
> v3:
> 	* fix description of cfb_imageblit() patch (Pekka)
> v2:
> 	* improve readability for sys_imageblit() (Gerd, Sam)
> 	* new TODO item for further optimization
> 
> Thomas Zimmermann (5):
>    fbdev: Improve performance of sys_fillrect()
>    fbdev: Improve performance of sys_imageblit()
>    fbdev: Remove trailing whitespaces from cfbimgblt.c
>    fbdev: Improve performance of cfb_imageblit()
>    drm: Add TODO item for optimizing format helpers
> 
>   Documentation/gpu/todo.rst             |  22 +++++
>   drivers/video/fbdev/core/cfbimgblt.c   | 107 ++++++++++++++++---------
>   drivers/video/fbdev/core/sysfillrect.c |  16 +---
>   drivers/video/fbdev/core/sysimgblt.c   |  49 ++++++++---
>   4 files changed, 133 insertions(+), 61 deletions(-)
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [v3,4/5] fbdev: Improve performance of cfb_imageblit()
       [not found]   ` <CGME20220308225225eucas1p12fcdd6e5dc83308b19d51ad7b2a13141@eucas1p1.samsung.com>
@ 2022-03-08 22:52     ` Marek Szyprowski
  2022-03-09  8:22       ` Thomas Zimmermann
  0 siblings, 1 reply; 49+ messages in thread
From: Marek Szyprowski @ 2022-03-08 22:52 UTC (permalink / raw)
  To: Thomas Zimmermann, daniel, deller, javierm, geert, sam, kraxel,
	ppaalanen
  Cc: linux-fbdev, dri-devel

Hi Thomas,

On 23.02.2022 20:38, Thomas Zimmermann wrote:
> Improve the performance of cfb_imageblit() by manually unrolling
> the inner blitting loop and moving some invariants out. The compiler
> failed to do this automatically. This change keeps cfb_imageblit()
> in sync with sys_imagebit().
>
> A microbenchmark measures the average number of CPU cycles
> for cfb_imageblit() after a stabilizing period of a few minutes
> (i7-4790, FullHD, simpledrm, kernel with debugging).
>
> cfb_imageblit(), new: 15724 cycles
> cfb_imageblit(): old: 30566 cycles
>
> In the optimized case, cfb_imageblit() is now ~2x faster than before.
>
> v3:
> 	* fix commit description (Pekka)
>
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Acked-by: Sam Ravnborg <sam@ravnborg.org>
> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
This patch landed recently in linux next-20220308 as commit 0d03011894d2 
("fbdev: Improve performance of cfb_imageblit()"). Sadly it causes a 
freeze after DRM and emulated fbdev initialization on various Samsung 
Exynos ARM 32bit based boards. This happens when kernel is compiled from 
exynos_defconfig. Surprisingly when kernel is compiled from 
multi_v7_defconfig all those boards boot fine, so this is a matter of 
one of the debugging options enabled in the exynos_defconfig. I will try 
to analyze this further and share the results. Reverting $subject on top 
of next-20220308 fixes the boot issue.
> ---
>   drivers/video/fbdev/core/cfbimgblt.c | 51 +++++++++++++++++++++++-----
>   1 file changed, 42 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/video/fbdev/core/cfbimgblt.c b/drivers/video/fbdev/core/cfbimgblt.c
> index 01b01a279681..7361cfabdd85 100644
> --- a/drivers/video/fbdev/core/cfbimgblt.c
> +++ b/drivers/video/fbdev/core/cfbimgblt.c
> @@ -218,23 +218,29 @@ static inline void fast_imageblit(const struct fb_image *image, struct fb_info *
>   {
>   	u32 fgx = fgcolor, bgx = bgcolor, bpp = p->var.bits_per_pixel;
>   	u32 ppw = 32/bpp, spitch = (image->width + 7)/8;
> -	u32 bit_mask, end_mask, eorx, shift;
> +	u32 bit_mask, eorx;
>   	const char *s = image->data, *src;
>   	u32 __iomem *dst;
>   	const u32 *tab = NULL;
> +	size_t tablen;
> +	u32 colortab[16];
>   	int i, j, k;
>   
>   	switch (bpp) {
>   	case 8:
>   		tab = fb_be_math(p) ? cfb_tab8_be : cfb_tab8_le;
> +		tablen = 16;
>   		break;
>   	case 16:
>   		tab = fb_be_math(p) ? cfb_tab16_be : cfb_tab16_le;
> +		tablen = 4;
>   		break;
>   	case 32:
> -	default:
>   		tab = cfb_tab32;
> +		tablen = 2;
>   		break;
> +	default:
> +		return;
>   	}
>   
>   	for (i = ppw-1; i--; ) {
> @@ -248,15 +254,42 @@ static inline void fast_imageblit(const struct fb_image *image, struct fb_info *
>   	eorx = fgx ^ bgx;
>   	k = image->width/ppw;
>   
> -	for (i = image->height; i--; ) {
> -		dst = (u32 __iomem *) dst1, shift = 8; src = s;
> +	for (i = 0; i < tablen; ++i)
> +		colortab[i] = (tab[i] & eorx) ^ bgx;
>   
> -		for (j = k; j--; ) {
> -			shift -= ppw;
> -			end_mask = tab[(*src >> shift) & bit_mask];
> -			FB_WRITEL((end_mask & eorx)^bgx, dst++);
> -			if (!shift) { shift = 8; src++; }
> +	for (i = image->height; i--; ) {
> +		dst = (u32 __iomem *)dst1;
> +		src = s;
> +
> +		switch (ppw) {
> +		case 4: /* 8 bpp */
> +			for (j = k; j; j -= 2, ++src) {
> +				FB_WRITEL(colortab[(*src >> 4) & bit_mask], dst++);
> +				FB_WRITEL(colortab[(*src >> 0) & bit_mask], dst++);
> +			}
> +			break;
> +		case 2: /* 16 bpp */
> +			for (j = k; j; j -= 4, ++src) {
> +				FB_WRITEL(colortab[(*src >> 6) & bit_mask], dst++);
> +				FB_WRITEL(colortab[(*src >> 4) & bit_mask], dst++);
> +				FB_WRITEL(colortab[(*src >> 2) & bit_mask], dst++);
> +				FB_WRITEL(colortab[(*src >> 0) & bit_mask], dst++);
> +			}
> +			break;
> +		case 1: /* 32 bpp */
> +			for (j = k; j; j -= 8, ++src) {
> +				FB_WRITEL(colortab[(*src >> 7) & bit_mask], dst++);
> +				FB_WRITEL(colortab[(*src >> 6) & bit_mask], dst++);
> +				FB_WRITEL(colortab[(*src >> 5) & bit_mask], dst++);
> +				FB_WRITEL(colortab[(*src >> 4) & bit_mask], dst++);
> +				FB_WRITEL(colortab[(*src >> 3) & bit_mask], dst++);
> +				FB_WRITEL(colortab[(*src >> 2) & bit_mask], dst++);
> +				FB_WRITEL(colortab[(*src >> 1) & bit_mask], dst++);
> +				FB_WRITEL(colortab[(*src >> 0) & bit_mask], dst++);
> +			}
> +			break;
>   		}
> +
>   		dst1 += p->fix.line_length;
>   		s += spitch;
>   	}

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [v3,4/5] fbdev: Improve performance of cfb_imageblit()
  2022-03-08 22:52     ` [v3,4/5] " Marek Szyprowski
@ 2022-03-09  8:22       ` Thomas Zimmermann
  2022-03-09  9:22         ` Marek Szyprowski
  0 siblings, 1 reply; 49+ messages in thread
From: Thomas Zimmermann @ 2022-03-09  8:22 UTC (permalink / raw)
  To: Marek Szyprowski, daniel, deller, javierm, geert, sam, kraxel, ppaalanen
  Cc: linux-fbdev, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 5244 bytes --]

Hi

Am 08.03.22 um 23:52 schrieb Marek Szyprowski:
> Hi Thomas,
> 
> On 23.02.2022 20:38, Thomas Zimmermann wrote:
>> Improve the performance of cfb_imageblit() by manually unrolling
>> the inner blitting loop and moving some invariants out. The compiler
>> failed to do this automatically. This change keeps cfb_imageblit()
>> in sync with sys_imagebit().
>>
>> A microbenchmark measures the average number of CPU cycles
>> for cfb_imageblit() after a stabilizing period of a few minutes
>> (i7-4790, FullHD, simpledrm, kernel with debugging).
>>
>> cfb_imageblit(), new: 15724 cycles
>> cfb_imageblit(): old: 30566 cycles
>>
>> In the optimized case, cfb_imageblit() is now ~2x faster than before.
>>
>> v3:
>> 	* fix commit description (Pekka)
>>
>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>> Acked-by: Sam Ravnborg <sam@ravnborg.org>
>> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
> This patch landed recently in linux next-20220308 as commit 0d03011894d2
> ("fbdev: Improve performance of cfb_imageblit()"). Sadly it causes a
> freeze after DRM and emulated fbdev initialization on various Samsung
> Exynos ARM 32bit based boards. This happens when kernel is compiled from
> exynos_defconfig. Surprisingly when kernel is compiled from
> multi_v7_defconfig all those boards boot fine, so this is a matter of
> one of the debugging options enabled in the exynos_defconfig. I will try
> to analyze this further and share the results. Reverting $subject on top
> of next-20220308 fixes the boot issue.

Thanks for reporting. I don't have the hardware to reproduce it and 
there's no obvious difference to the original version. It's supposed to 
be the same algorithm with a different implementation. Unless you can 
figure out the issue, we can also revert the patch easily.

Best regards
Thomas

>> ---
>>    drivers/video/fbdev/core/cfbimgblt.c | 51 +++++++++++++++++++++++-----
>>    1 file changed, 42 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/video/fbdev/core/cfbimgblt.c b/drivers/video/fbdev/core/cfbimgblt.c
>> index 01b01a279681..7361cfabdd85 100644
>> --- a/drivers/video/fbdev/core/cfbimgblt.c
>> +++ b/drivers/video/fbdev/core/cfbimgblt.c
>> @@ -218,23 +218,29 @@ static inline void fast_imageblit(const struct fb_image *image, struct fb_info *
>>    {
>>    	u32 fgx = fgcolor, bgx = bgcolor, bpp = p->var.bits_per_pixel;
>>    	u32 ppw = 32/bpp, spitch = (image->width + 7)/8;
>> -	u32 bit_mask, end_mask, eorx, shift;
>> +	u32 bit_mask, eorx;
>>    	const char *s = image->data, *src;
>>    	u32 __iomem *dst;
>>    	const u32 *tab = NULL;
>> +	size_t tablen;
>> +	u32 colortab[16];
>>    	int i, j, k;
>>    
>>    	switch (bpp) {
>>    	case 8:
>>    		tab = fb_be_math(p) ? cfb_tab8_be : cfb_tab8_le;
>> +		tablen = 16;
>>    		break;
>>    	case 16:
>>    		tab = fb_be_math(p) ? cfb_tab16_be : cfb_tab16_le;
>> +		tablen = 4;
>>    		break;
>>    	case 32:
>> -	default:
>>    		tab = cfb_tab32;
>> +		tablen = 2;
>>    		break;
>> +	default:
>> +		return;
>>    	}
>>    
>>    	for (i = ppw-1; i--; ) {
>> @@ -248,15 +254,42 @@ static inline void fast_imageblit(const struct fb_image *image, struct fb_info *
>>    	eorx = fgx ^ bgx;
>>    	k = image->width/ppw;
>>    
>> -	for (i = image->height; i--; ) {
>> -		dst = (u32 __iomem *) dst1, shift = 8; src = s;
>> +	for (i = 0; i < tablen; ++i)
>> +		colortab[i] = (tab[i] & eorx) ^ bgx;
>>    
>> -		for (j = k; j--; ) {
>> -			shift -= ppw;
>> -			end_mask = tab[(*src >> shift) & bit_mask];
>> -			FB_WRITEL((end_mask & eorx)^bgx, dst++);
>> -			if (!shift) { shift = 8; src++; }
>> +	for (i = image->height; i--; ) {
>> +		dst = (u32 __iomem *)dst1;
>> +		src = s;
>> +
>> +		switch (ppw) {
>> +		case 4: /* 8 bpp */
>> +			for (j = k; j; j -= 2, ++src) {
>> +				FB_WRITEL(colortab[(*src >> 4) & bit_mask], dst++);
>> +				FB_WRITEL(colortab[(*src >> 0) & bit_mask], dst++);
>> +			}
>> +			break;
>> +		case 2: /* 16 bpp */
>> +			for (j = k; j; j -= 4, ++src) {
>> +				FB_WRITEL(colortab[(*src >> 6) & bit_mask], dst++);
>> +				FB_WRITEL(colortab[(*src >> 4) & bit_mask], dst++);
>> +				FB_WRITEL(colortab[(*src >> 2) & bit_mask], dst++);
>> +				FB_WRITEL(colortab[(*src >> 0) & bit_mask], dst++);
>> +			}
>> +			break;
>> +		case 1: /* 32 bpp */
>> +			for (j = k; j; j -= 8, ++src) {
>> +				FB_WRITEL(colortab[(*src >> 7) & bit_mask], dst++);
>> +				FB_WRITEL(colortab[(*src >> 6) & bit_mask], dst++);
>> +				FB_WRITEL(colortab[(*src >> 5) & bit_mask], dst++);
>> +				FB_WRITEL(colortab[(*src >> 4) & bit_mask], dst++);
>> +				FB_WRITEL(colortab[(*src >> 3) & bit_mask], dst++);
>> +				FB_WRITEL(colortab[(*src >> 2) & bit_mask], dst++);
>> +				FB_WRITEL(colortab[(*src >> 1) & bit_mask], dst++);
>> +				FB_WRITEL(colortab[(*src >> 0) & bit_mask], dst++);
>> +			}
>> +			break;
>>    		}
>> +
>>    		dst1 += p->fix.line_length;
>>    		s += spitch;
>>    	}
> 
> Best regards

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [v3,4/5] fbdev: Improve performance of cfb_imageblit()
  2022-03-09  8:22       ` Thomas Zimmermann
@ 2022-03-09  9:22         ` Marek Szyprowski
  2022-03-09 10:39             ` Geert Uytterhoeven
  0 siblings, 1 reply; 49+ messages in thread
From: Marek Szyprowski @ 2022-03-09  9:22 UTC (permalink / raw)
  To: Thomas Zimmermann, daniel, deller, javierm, geert, sam, kraxel,
	ppaalanen
  Cc: linux-fbdev, dri-devel

Hi,

On 09.03.2022 09:22, Thomas Zimmermann wrote:
> Am 08.03.22 um 23:52 schrieb Marek Szyprowski:
>> On 23.02.2022 20:38, Thomas Zimmermann wrote:
>>> Improve the performance of cfb_imageblit() by manually unrolling
>>> the inner blitting loop and moving some invariants out. The compiler
>>> failed to do this automatically. This change keeps cfb_imageblit()
>>> in sync with sys_imagebit().
>>>
>>> A microbenchmark measures the average number of CPU cycles
>>> for cfb_imageblit() after a stabilizing period of a few minutes
>>> (i7-4790, FullHD, simpledrm, kernel with debugging).
>>>
>>> cfb_imageblit(), new: 15724 cycles
>>> cfb_imageblit(): old: 30566 cycles
>>>
>>> In the optimized case, cfb_imageblit() is now ~2x faster than before.
>>>
>>> v3:
>>>     * fix commit description (Pekka)
>>>
>>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>>> Acked-by: Sam Ravnborg <sam@ravnborg.org>
>>> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
>> This patch landed recently in linux next-20220308 as commit 0d03011894d2
>> ("fbdev: Improve performance of cfb_imageblit()"). Sadly it causes a
>> freeze after DRM and emulated fbdev initialization on various Samsung
>> Exynos ARM 32bit based boards. This happens when kernel is compiled from
>> exynos_defconfig. Surprisingly when kernel is compiled from
>> multi_v7_defconfig all those boards boot fine, so this is a matter of
>> one of the debugging options enabled in the exynos_defconfig. I will try
>> to analyze this further and share the results. Reverting $subject on top
>> of next-20220308 fixes the boot issue.
>
> Thanks for reporting. I don't have the hardware to reproduce it and 
> there's no obvious difference to the original version. It's supposed 
> to be the same algorithm with a different implementation. Unless you 
> can figure out the issue, we can also revert the patch easily.

I've played a bit with .config options and found that the issue is 
caused by the compiled-in fonts used for the framebuffer. For some 
reasons (so far unknown to me), exynos_defconfig has the following odd 
setup:

CONFIG_FONT_SUPPORT=y
CONFIG_FONTS=y
# CONFIG_FONT_8x8 is not set
# CONFIG_FONT_8x16 is not set
# CONFIG_FONT_6x11 is not set
CONFIG_FONT_7x14=y
# CONFIG_FONT_PEARL_8x8 is not set
# CONFIG_FONT_ACORN_8x8 is not set
# CONFIG_FONT_MINI_4x6 is not set
# CONFIG_FONT_6x10 is not set
# CONFIG_FONT_10x18 is not set
# CONFIG_FONT_SUN8x16 is not set
# CONFIG_FONT_SUN12x22 is not set
# CONFIG_FONT_TER16x32 is not set
# CONFIG_FONT_6x8 is not set

Such setup causes a freeze during framebuffer initialization (or just 
after it got registered). I've reproduced this even on Raspberry Pi 3B 
with multi_v7_defconfig and changed fonts configuration (this also 
required to disable vivid driver, which forces 8x16 font), where I got 
the following panic:

simple-framebuffer 3eace000.framebuffer: framebuffer at 0x3eace000, 
0x12c000 bytes
simple-framebuffer 3eace000.framebuffer: format=a8r8g8b8, 
mode=640x480x32, linelength=2560
8<--- cut here ---
Unable to handle kernel paging request at virtual address f0aac000
[f0aac000] *pgd=01d8b811, *pte=00000000, *ppte=00000000
Internal error: Oops: 807 [#1] SMP ARM
Modules linked in:
CPU: 3 PID: 1 Comm: swapper/0 Not tainted 
5.17.0-rc7-next-20220308-00002-g9e9894c98f8c #11471
Hardware name: BCM2835
PC is at cfb_imageblit+0x52c/0x64c
LR is at 0x1
pc : [<c0603dd8>]    lr : [<00000001>]    psr: a0000013
sp : f081da68  ip : c1d5ffff  fp : f081dad8
r10: f0980000  r9 : c1d69600  r8 : fffb5007
r7 : 00000000  r6 : 00000001  r5 : 00000a00  r4 : 00000001
r3 : 00000055  r2 : f0aac000  r1 : f081dad8  r0 : 00000007
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5383d  Table: 0000406a  DAC: 00000051
Register r0 information: non-paged memory
Register r1 information: 2-page vmalloc region starting at 0xf081c000 
allocated at kernel_clone+0xc0/0x428
Register r2 information: 0-page vmalloc region starting at 0xf0980000 
allocated at simplefb_probe+0x284/0x9b0
Register r3 information: non-paged memory
Register r4 information: non-paged memory
Register r5 information: non-paged memory
Register r6 information: non-paged memory
Register r7 information: NULL pointer
Register r8 information: non-paged memory
Register r9 information: non-slab/vmalloc memory
Register r10 information: 0-page vmalloc region starting at 0xf0980000 
allocated at simplefb_probe+0x284/0x9b0
Register r11 information: 2-page vmalloc region starting at 0xf081c000 
allocated at kernel_clone+0xc0/0x428
Register r12 information: non-slab/vmalloc memory
Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
Stack: (0xf081da68 to 0xf081e000)
...
  cfb_imageblit from soft_cursor+0x164/0x1cc
  soft_cursor from bit_cursor+0x4c0/0x4fc
  bit_cursor from fbcon_cursor+0xf8/0x108
  fbcon_cursor from hide_cursor+0x34/0x94
  hide_cursor from redraw_screen+0x13c/0x22c
  redraw_screen from fbcon_prepare_logo+0x164/0x444
  fbcon_prepare_logo from fbcon_init+0x38c/0x4bc
  fbcon_init from visual_init+0xc0/0x108
  visual_init from do_bind_con_driver+0x1ac/0x38c
  do_bind_con_driver from do_take_over_console+0x13c/0x1c8
  do_take_over_console from do_fbcon_takeover+0x74/0xcc
  do_fbcon_takeover from register_framebuffer+0x1bc/0x2cc
  register_framebuffer from simplefb_probe+0x8dc/0x9b0
  simplefb_probe from platform_probe+0x80/0xc0
  platform_probe from really_probe+0xc0/0x304
  really_probe from __driver_probe_device+0x88/0xe0
  __driver_probe_device from driver_probe_device+0x34/0xd4
  driver_probe_device from __driver_attach+0x8c/0xe0
  __driver_attach from bus_for_each_dev+0x64/0xb0
  bus_for_each_dev from bus_add_driver+0x160/0x1e4
  bus_add_driver from driver_register+0x78/0x10c
  driver_register from do_one_initcall+0x44/0x1e0
  do_one_initcall from kernel_init_freeable+0x1bc/0x20c
  kernel_init_freeable from kernel_init+0x18/0x12c
  kernel_init from ret_from_fork+0x14/0x2c
Code: e28db070 e00473a3 e08b7107 e5177044 (e5827000)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
CPU0: stopping
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D 
5.17.0-rc7-next-20220308-00002-g9e9894c98f8c #11471
Hardware name: BCM2835
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from 0xc1201e64
CPU2: stopping
CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D 
5.17.0-rc7-next-20220308-00002-g9e9894c98f8c #11471
Hardware name: BCM2835
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from 0xf0809f5c
CPU1: stopping
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D 
5.17.0-rc7-next-20220308-00002-g9e9894c98f8c #11471
Hardware name: BCM2835
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from 0xf0805f5c
---[ end Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0000000b ]---

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [v3,4/5] fbdev: Improve performance of cfb_imageblit()
  2022-03-09  9:22         ` Marek Szyprowski
@ 2022-03-09 10:39             ` Geert Uytterhoeven
  0 siblings, 0 replies; 49+ messages in thread
From: Geert Uytterhoeven @ 2022-03-09 10:39 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Thomas Zimmermann, Daniel Vetter, Helge Deller,
	Javier Martinez Canillas, Sam Ravnborg, Gerd Hoffmann,
	Pekka Paalanen, Linux Fbdev development list, DRI Development

Hi Marek,

On Wed, Mar 9, 2022 at 10:22 AM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> On 09.03.2022 09:22, Thomas Zimmermann wrote:
> > Am 08.03.22 um 23:52 schrieb Marek Szyprowski:
> >> On 23.02.2022 20:38, Thomas Zimmermann wrote:
> >>> Improve the performance of cfb_imageblit() by manually unrolling
> >>> the inner blitting loop and moving some invariants out. The compiler
> >>> failed to do this automatically. This change keeps cfb_imageblit()
> >>> in sync with sys_imagebit().
> >>>
> >>> A microbenchmark measures the average number of CPU cycles
> >>> for cfb_imageblit() after a stabilizing period of a few minutes
> >>> (i7-4790, FullHD, simpledrm, kernel with debugging).
> >>>
> >>> cfb_imageblit(), new: 15724 cycles
> >>> cfb_imageblit(): old: 30566 cycles
> >>>
> >>> In the optimized case, cfb_imageblit() is now ~2x faster than before.
> >>>
> >>> v3:
> >>>     * fix commit description (Pekka)
> >>>
> >>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> >>> Acked-by: Sam Ravnborg <sam@ravnborg.org>
> >>> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
> >> This patch landed recently in linux next-20220308 as commit 0d03011894d2
> >> ("fbdev: Improve performance of cfb_imageblit()"). Sadly it causes a
> >> freeze after DRM and emulated fbdev initialization on various Samsung
> >> Exynos ARM 32bit based boards. This happens when kernel is compiled from
> >> exynos_defconfig. Surprisingly when kernel is compiled from
> >> multi_v7_defconfig all those boards boot fine, so this is a matter of
> >> one of the debugging options enabled in the exynos_defconfig. I will try
> >> to analyze this further and share the results. Reverting $subject on top
> >> of next-20220308 fixes the boot issue.
> >
> > Thanks for reporting. I don't have the hardware to reproduce it and
> > there's no obvious difference to the original version. It's supposed
> > to be the same algorithm with a different implementation. Unless you
> > can figure out the issue, we can also revert the patch easily.
>
> I've played a bit with .config options and found that the issue is
> caused by the compiled-in fonts used for the framebuffer. For some
> reasons (so far unknown to me), exynos_defconfig has the following odd
> setup:
>
> CONFIG_FONT_SUPPORT=y
> CONFIG_FONTS=y
> # CONFIG_FONT_8x8 is not set
> # CONFIG_FONT_8x16 is not set
> # CONFIG_FONT_6x11 is not set
> CONFIG_FONT_7x14=y
> # CONFIG_FONT_PEARL_8x8 is not set
> # CONFIG_FONT_ACORN_8x8 is not set
> # CONFIG_FONT_MINI_4x6 is not set
> # CONFIG_FONT_6x10 is not set
> # CONFIG_FONT_10x18 is not set
> # CONFIG_FONT_SUN8x16 is not set
> # CONFIG_FONT_SUN12x22 is not set
> # CONFIG_FONT_TER16x32 is not set
> # CONFIG_FONT_6x8 is not set
>
> Such setup causes a freeze during framebuffer initialization (or just
> after it got registered). I've reproduced this even on Raspberry Pi 3B
> with multi_v7_defconfig and changed fonts configuration (this also
> required to disable vivid driver, which forces 8x16 font), where I got
> the following panic:
>
> simple-framebuffer 3eace000.framebuffer: framebuffer at 0x3eace000,
> 0x12c000 bytes
> simple-framebuffer 3eace000.framebuffer: format=a8r8g8b8,
> mode=640x480x32, linelength=2560
> 8<--- cut here ---
> Unable to handle kernel paging request at virtual address f0aac000

So support for images with offsets or widths that are not a multiple
of 8 got broken in cfb_imageblit(). Oops...

BTW, the various drawing routines used to set a bitmask indicating
which alignments were supported (see blit_x), but most of them no
longer do, presumably because all alignments are now supported
(since ca. 20 years?).
So you can (temporarily) work around this by filling in blit_x,
preventing the use of the 7x14 font.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [v3,4/5] fbdev: Improve performance of cfb_imageblit()
@ 2022-03-09 10:39             ` Geert Uytterhoeven
  0 siblings, 0 replies; 49+ messages in thread
From: Geert Uytterhoeven @ 2022-03-09 10:39 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Linux Fbdev development list, Thomas Zimmermann, Helge Deller,
	Javier Martinez Canillas, DRI Development, Gerd Hoffmann,
	Sam Ravnborg

Hi Marek,

On Wed, Mar 9, 2022 at 10:22 AM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> On 09.03.2022 09:22, Thomas Zimmermann wrote:
> > Am 08.03.22 um 23:52 schrieb Marek Szyprowski:
> >> On 23.02.2022 20:38, Thomas Zimmermann wrote:
> >>> Improve the performance of cfb_imageblit() by manually unrolling
> >>> the inner blitting loop and moving some invariants out. The compiler
> >>> failed to do this automatically. This change keeps cfb_imageblit()
> >>> in sync with sys_imagebit().
> >>>
> >>> A microbenchmark measures the average number of CPU cycles
> >>> for cfb_imageblit() after a stabilizing period of a few minutes
> >>> (i7-4790, FullHD, simpledrm, kernel with debugging).
> >>>
> >>> cfb_imageblit(), new: 15724 cycles
> >>> cfb_imageblit(): old: 30566 cycles
> >>>
> >>> In the optimized case, cfb_imageblit() is now ~2x faster than before.
> >>>
> >>> v3:
> >>>     * fix commit description (Pekka)
> >>>
> >>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> >>> Acked-by: Sam Ravnborg <sam@ravnborg.org>
> >>> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
> >> This patch landed recently in linux next-20220308 as commit 0d03011894d2
> >> ("fbdev: Improve performance of cfb_imageblit()"). Sadly it causes a
> >> freeze after DRM and emulated fbdev initialization on various Samsung
> >> Exynos ARM 32bit based boards. This happens when kernel is compiled from
> >> exynos_defconfig. Surprisingly when kernel is compiled from
> >> multi_v7_defconfig all those boards boot fine, so this is a matter of
> >> one of the debugging options enabled in the exynos_defconfig. I will try
> >> to analyze this further and share the results. Reverting $subject on top
> >> of next-20220308 fixes the boot issue.
> >
> > Thanks for reporting. I don't have the hardware to reproduce it and
> > there's no obvious difference to the original version. It's supposed
> > to be the same algorithm with a different implementation. Unless you
> > can figure out the issue, we can also revert the patch easily.
>
> I've played a bit with .config options and found that the issue is
> caused by the compiled-in fonts used for the framebuffer. For some
> reasons (so far unknown to me), exynos_defconfig has the following odd
> setup:
>
> CONFIG_FONT_SUPPORT=y
> CONFIG_FONTS=y
> # CONFIG_FONT_8x8 is not set
> # CONFIG_FONT_8x16 is not set
> # CONFIG_FONT_6x11 is not set
> CONFIG_FONT_7x14=y
> # CONFIG_FONT_PEARL_8x8 is not set
> # CONFIG_FONT_ACORN_8x8 is not set
> # CONFIG_FONT_MINI_4x6 is not set
> # CONFIG_FONT_6x10 is not set
> # CONFIG_FONT_10x18 is not set
> # CONFIG_FONT_SUN8x16 is not set
> # CONFIG_FONT_SUN12x22 is not set
> # CONFIG_FONT_TER16x32 is not set
> # CONFIG_FONT_6x8 is not set
>
> Such setup causes a freeze during framebuffer initialization (or just
> after it got registered). I've reproduced this even on Raspberry Pi 3B
> with multi_v7_defconfig and changed fonts configuration (this also
> required to disable vivid driver, which forces 8x16 font), where I got
> the following panic:
>
> simple-framebuffer 3eace000.framebuffer: framebuffer at 0x3eace000,
> 0x12c000 bytes
> simple-framebuffer 3eace000.framebuffer: format=a8r8g8b8,
> mode=640x480x32, linelength=2560
> 8<--- cut here ---
> Unable to handle kernel paging request at virtual address f0aac000

So support for images with offsets or widths that are not a multiple
of 8 got broken in cfb_imageblit(). Oops...

BTW, the various drawing routines used to set a bitmask indicating
which alignments were supported (see blit_x), but most of them no
longer do, presumably because all alignments are now supported
(since ca. 20 years?).
So you can (temporarily) work around this by filling in blit_x,
preventing the use of the 7x14 font.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [v3,4/5] fbdev: Improve performance of cfb_imageblit()
  2022-03-09 10:39             ` Geert Uytterhoeven
@ 2022-03-10 19:21               ` Thomas Zimmermann
  -1 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-03-10 19:21 UTC (permalink / raw)
  To: Geert Uytterhoeven, Marek Szyprowski
  Cc: Daniel Vetter, Helge Deller, Javier Martinez Canillas,
	Sam Ravnborg, Gerd Hoffmann, Pekka Paalanen,
	Linux Fbdev development list, DRI Development


[-- Attachment #1.1: Type: text/plain, Size: 4651 bytes --]

Hi

Am 09.03.22 um 11:39 schrieb Geert Uytterhoeven:
> Hi Marek,
> 
> On Wed, Mar 9, 2022 at 10:22 AM Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
>> On 09.03.2022 09:22, Thomas Zimmermann wrote:
>>> Am 08.03.22 um 23:52 schrieb Marek Szyprowski:
>>>> On 23.02.2022 20:38, Thomas Zimmermann wrote:
>>>>> Improve the performance of cfb_imageblit() by manually unrolling
>>>>> the inner blitting loop and moving some invariants out. The compiler
>>>>> failed to do this automatically. This change keeps cfb_imageblit()
>>>>> in sync with sys_imagebit().
>>>>>
>>>>> A microbenchmark measures the average number of CPU cycles
>>>>> for cfb_imageblit() after a stabilizing period of a few minutes
>>>>> (i7-4790, FullHD, simpledrm, kernel with debugging).
>>>>>
>>>>> cfb_imageblit(), new: 15724 cycles
>>>>> cfb_imageblit(): old: 30566 cycles
>>>>>
>>>>> In the optimized case, cfb_imageblit() is now ~2x faster than before.
>>>>>
>>>>> v3:
>>>>>      * fix commit description (Pekka)
>>>>>
>>>>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>>>>> Acked-by: Sam Ravnborg <sam@ravnborg.org>
>>>>> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
>>>> This patch landed recently in linux next-20220308 as commit 0d03011894d2
>>>> ("fbdev: Improve performance of cfb_imageblit()"). Sadly it causes a
>>>> freeze after DRM and emulated fbdev initialization on various Samsung
>>>> Exynos ARM 32bit based boards. This happens when kernel is compiled from
>>>> exynos_defconfig. Surprisingly when kernel is compiled from
>>>> multi_v7_defconfig all those boards boot fine, so this is a matter of
>>>> one of the debugging options enabled in the exynos_defconfig. I will try
>>>> to analyze this further and share the results. Reverting $subject on top
>>>> of next-20220308 fixes the boot issue.
>>>
>>> Thanks for reporting. I don't have the hardware to reproduce it and
>>> there's no obvious difference to the original version. It's supposed
>>> to be the same algorithm with a different implementation. Unless you
>>> can figure out the issue, we can also revert the patch easily.
>>
>> I've played a bit with .config options and found that the issue is
>> caused by the compiled-in fonts used for the framebuffer. For some
>> reasons (so far unknown to me), exynos_defconfig has the following odd
>> setup:
>>
>> CONFIG_FONT_SUPPORT=y
>> CONFIG_FONTS=y
>> # CONFIG_FONT_8x8 is not set
>> # CONFIG_FONT_8x16 is not set
>> # CONFIG_FONT_6x11 is not set
>> CONFIG_FONT_7x14=y
>> # CONFIG_FONT_PEARL_8x8 is not set
>> # CONFIG_FONT_ACORN_8x8 is not set
>> # CONFIG_FONT_MINI_4x6 is not set
>> # CONFIG_FONT_6x10 is not set
>> # CONFIG_FONT_10x18 is not set
>> # CONFIG_FONT_SUN8x16 is not set
>> # CONFIG_FONT_SUN12x22 is not set
>> # CONFIG_FONT_TER16x32 is not set
>> # CONFIG_FONT_6x8 is not set
>>
>> Such setup causes a freeze during framebuffer initialization (or just
>> after it got registered). I've reproduced this even on Raspberry Pi 3B
>> with multi_v7_defconfig and changed fonts configuration (this also
>> required to disable vivid driver, which forces 8x16 font), where I got
>> the following panic:
>>
>> simple-framebuffer 3eace000.framebuffer: framebuffer at 0x3eace000,
>> 0x12c000 bytes
>> simple-framebuffer 3eace000.framebuffer: format=a8r8g8b8,
>> mode=640x480x32, linelength=2560
>> 8<--- cut here ---
>> Unable to handle kernel paging request at virtual address f0aac000
> 
> So support for images with offsets or widths that are not a multiple
> of 8 got broken in cfb_imageblit(). Oops...
> 
> BTW, the various drawing routines used to set a bitmask indicating
> which alignments were supported (see blit_x), but most of them no
> longer do, presumably because all alignments are now supported
> (since ca. 20 years?).
> So you can (temporarily) work around this by filling in blit_x,
> preventing the use of the 7x14 font.

How do I activate the 7x14 font? It's compiled into the kernel already
(CONFIG_FONT_7x14=y).

Best regards
Thomas

> 
> Gr{oetje,eeting}s,
> 
>                          Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                  -- Linus Torvalds

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [v3,4/5] fbdev: Improve performance of cfb_imageblit()
@ 2022-03-10 19:21               ` Thomas Zimmermann
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-03-10 19:21 UTC (permalink / raw)
  To: Geert Uytterhoeven, Marek Szyprowski
  Cc: Linux Fbdev development list, Helge Deller,
	Javier Martinez Canillas, DRI Development, Gerd Hoffmann,
	Sam Ravnborg


[-- Attachment #1.1: Type: text/plain, Size: 4651 bytes --]

Hi

Am 09.03.22 um 11:39 schrieb Geert Uytterhoeven:
> Hi Marek,
> 
> On Wed, Mar 9, 2022 at 10:22 AM Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
>> On 09.03.2022 09:22, Thomas Zimmermann wrote:
>>> Am 08.03.22 um 23:52 schrieb Marek Szyprowski:
>>>> On 23.02.2022 20:38, Thomas Zimmermann wrote:
>>>>> Improve the performance of cfb_imageblit() by manually unrolling
>>>>> the inner blitting loop and moving some invariants out. The compiler
>>>>> failed to do this automatically. This change keeps cfb_imageblit()
>>>>> in sync with sys_imagebit().
>>>>>
>>>>> A microbenchmark measures the average number of CPU cycles
>>>>> for cfb_imageblit() after a stabilizing period of a few minutes
>>>>> (i7-4790, FullHD, simpledrm, kernel with debugging).
>>>>>
>>>>> cfb_imageblit(), new: 15724 cycles
>>>>> cfb_imageblit(): old: 30566 cycles
>>>>>
>>>>> In the optimized case, cfb_imageblit() is now ~2x faster than before.
>>>>>
>>>>> v3:
>>>>>      * fix commit description (Pekka)
>>>>>
>>>>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>>>>> Acked-by: Sam Ravnborg <sam@ravnborg.org>
>>>>> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
>>>> This patch landed recently in linux next-20220308 as commit 0d03011894d2
>>>> ("fbdev: Improve performance of cfb_imageblit()"). Sadly it causes a
>>>> freeze after DRM and emulated fbdev initialization on various Samsung
>>>> Exynos ARM 32bit based boards. This happens when kernel is compiled from
>>>> exynos_defconfig. Surprisingly when kernel is compiled from
>>>> multi_v7_defconfig all those boards boot fine, so this is a matter of
>>>> one of the debugging options enabled in the exynos_defconfig. I will try
>>>> to analyze this further and share the results. Reverting $subject on top
>>>> of next-20220308 fixes the boot issue.
>>>
>>> Thanks for reporting. I don't have the hardware to reproduce it and
>>> there's no obvious difference to the original version. It's supposed
>>> to be the same algorithm with a different implementation. Unless you
>>> can figure out the issue, we can also revert the patch easily.
>>
>> I've played a bit with .config options and found that the issue is
>> caused by the compiled-in fonts used for the framebuffer. For some
>> reasons (so far unknown to me), exynos_defconfig has the following odd
>> setup:
>>
>> CONFIG_FONT_SUPPORT=y
>> CONFIG_FONTS=y
>> # CONFIG_FONT_8x8 is not set
>> # CONFIG_FONT_8x16 is not set
>> # CONFIG_FONT_6x11 is not set
>> CONFIG_FONT_7x14=y
>> # CONFIG_FONT_PEARL_8x8 is not set
>> # CONFIG_FONT_ACORN_8x8 is not set
>> # CONFIG_FONT_MINI_4x6 is not set
>> # CONFIG_FONT_6x10 is not set
>> # CONFIG_FONT_10x18 is not set
>> # CONFIG_FONT_SUN8x16 is not set
>> # CONFIG_FONT_SUN12x22 is not set
>> # CONFIG_FONT_TER16x32 is not set
>> # CONFIG_FONT_6x8 is not set
>>
>> Such setup causes a freeze during framebuffer initialization (or just
>> after it got registered). I've reproduced this even on Raspberry Pi 3B
>> with multi_v7_defconfig and changed fonts configuration (this also
>> required to disable vivid driver, which forces 8x16 font), where I got
>> the following panic:
>>
>> simple-framebuffer 3eace000.framebuffer: framebuffer at 0x3eace000,
>> 0x12c000 bytes
>> simple-framebuffer 3eace000.framebuffer: format=a8r8g8b8,
>> mode=640x480x32, linelength=2560
>> 8<--- cut here ---
>> Unable to handle kernel paging request at virtual address f0aac000
> 
> So support for images with offsets or widths that are not a multiple
> of 8 got broken in cfb_imageblit(). Oops...
> 
> BTW, the various drawing routines used to set a bitmask indicating
> which alignments were supported (see blit_x), but most of them no
> longer do, presumably because all alignments are now supported
> (since ca. 20 years?).
> So you can (temporarily) work around this by filling in blit_x,
> preventing the use of the 7x14 font.

How do I activate the 7x14 font? It's compiled into the kernel already
(CONFIG_FONT_7x14=y).

Best regards
Thomas

> 
> Gr{oetje,eeting}s,
> 
>                          Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                  -- Linus Torvalds

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [v3,4/5] fbdev: Improve performance of cfb_imageblit()
  2022-03-10 19:21               ` Thomas Zimmermann
@ 2022-03-10 19:23                 ` Geert Uytterhoeven
  -1 siblings, 0 replies; 49+ messages in thread
From: Geert Uytterhoeven @ 2022-03-10 19:23 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: Marek Szyprowski, Daniel Vetter, Helge Deller,
	Javier Martinez Canillas, Sam Ravnborg, Gerd Hoffmann,
	Pekka Paalanen, Linux Fbdev development list, DRI Development

Hi Thomas,

On Thu, Mar 10, 2022 at 8:22 PM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> Am 09.03.22 um 11:39 schrieb Geert Uytterhoeven:
> > On Wed, Mar 9, 2022 at 10:22 AM Marek Szyprowski
> > <m.szyprowski@samsung.com> wrote:
> >> On 09.03.2022 09:22, Thomas Zimmermann wrote:
> >>> Am 08.03.22 um 23:52 schrieb Marek Szyprowski:
> >>>> On 23.02.2022 20:38, Thomas Zimmermann wrote:
> >>>>> Improve the performance of cfb_imageblit() by manually unrolling
> >>>>> the inner blitting loop and moving some invariants out. The compiler
> >>>>> failed to do this automatically. This change keeps cfb_imageblit()
> >>>>> in sync with sys_imagebit().
> >>>>>
> >>>>> A microbenchmark measures the average number of CPU cycles
> >>>>> for cfb_imageblit() after a stabilizing period of a few minutes
> >>>>> (i7-4790, FullHD, simpledrm, kernel with debugging).
> >>>>>
> >>>>> cfb_imageblit(), new: 15724 cycles
> >>>>> cfb_imageblit(): old: 30566 cycles
> >>>>>
> >>>>> In the optimized case, cfb_imageblit() is now ~2x faster than before.
> >>>>>
> >>>>> v3:
> >>>>>      * fix commit description (Pekka)
> >>>>>
> >>>>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> >>>>> Acked-by: Sam Ravnborg <sam@ravnborg.org>
> >>>>> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
> >>>> This patch landed recently in linux next-20220308 as commit 0d03011894d2
> >>>> ("fbdev: Improve performance of cfb_imageblit()"). Sadly it causes a
> >>>> freeze after DRM and emulated fbdev initialization on various Samsung
> >>>> Exynos ARM 32bit based boards. This happens when kernel is compiled from
> >>>> exynos_defconfig. Surprisingly when kernel is compiled from
> >>>> multi_v7_defconfig all those boards boot fine, so this is a matter of
> >>>> one of the debugging options enabled in the exynos_defconfig. I will try
> >>>> to analyze this further and share the results. Reverting $subject on top
> >>>> of next-20220308 fixes the boot issue.
> >>>
> >>> Thanks for reporting. I don't have the hardware to reproduce it and
> >>> there's no obvious difference to the original version. It's supposed
> >>> to be the same algorithm with a different implementation. Unless you
> >>> can figure out the issue, we can also revert the patch easily.
> >>
> >> I've played a bit with .config options and found that the issue is
> >> caused by the compiled-in fonts used for the framebuffer. For some
> >> reasons (so far unknown to me), exynos_defconfig has the following odd
> >> setup:
> >>
> >> CONFIG_FONT_SUPPORT=y
> >> CONFIG_FONTS=y
> >> # CONFIG_FONT_8x8 is not set
> >> # CONFIG_FONT_8x16 is not set
> >> # CONFIG_FONT_6x11 is not set
> >> CONFIG_FONT_7x14=y
> >> # CONFIG_FONT_PEARL_8x8 is not set
> >> # CONFIG_FONT_ACORN_8x8 is not set
> >> # CONFIG_FONT_MINI_4x6 is not set
> >> # CONFIG_FONT_6x10 is not set
> >> # CONFIG_FONT_10x18 is not set
> >> # CONFIG_FONT_SUN8x16 is not set
> >> # CONFIG_FONT_SUN12x22 is not set
> >> # CONFIG_FONT_TER16x32 is not set
> >> # CONFIG_FONT_6x8 is not set
> >>
> >> Such setup causes a freeze during framebuffer initialization (or just
> >> after it got registered). I've reproduced this even on Raspberry Pi 3B
> >> with multi_v7_defconfig and changed fonts configuration (this also
> >> required to disable vivid driver, which forces 8x16 font), where I got
> >> the following panic:
> >>
> >> simple-framebuffer 3eace000.framebuffer: framebuffer at 0x3eace000,
> >> 0x12c000 bytes
> >> simple-framebuffer 3eace000.framebuffer: format=a8r8g8b8,
> >> mode=640x480x32, linelength=2560
> >> 8<--- cut here ---
> >> Unable to handle kernel paging request at virtual address f0aac000
> >
> > So support for images with offsets or widths that are not a multiple
> > of 8 got broken in cfb_imageblit(). Oops...
> >
> > BTW, the various drawing routines used to set a bitmask indicating
> > which alignments were supported (see blit_x), but most of them no
> > longer do, presumably because all alignments are now supported
> > (since ca. 20 years?).
> > So you can (temporarily) work around this by filling in blit_x,
> > preventing the use of the 7x14 font.
>
> How do I activate the 7x14 font? It's compiled into the kernel already
> (CONFIG_FONT_7x14=y).

Documentation/fb/fbcon.rst:1. fbcon=font:<name>

Or just disable all other fonts.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [v3,4/5] fbdev: Improve performance of cfb_imageblit()
@ 2022-03-10 19:23                 ` Geert Uytterhoeven
  0 siblings, 0 replies; 49+ messages in thread
From: Geert Uytterhoeven @ 2022-03-10 19:23 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: Linux Fbdev development list, Helge Deller,
	Javier Martinez Canillas, DRI Development, Gerd Hoffmann,
	Sam Ravnborg, Marek Szyprowski

Hi Thomas,

On Thu, Mar 10, 2022 at 8:22 PM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> Am 09.03.22 um 11:39 schrieb Geert Uytterhoeven:
> > On Wed, Mar 9, 2022 at 10:22 AM Marek Szyprowski
> > <m.szyprowski@samsung.com> wrote:
> >> On 09.03.2022 09:22, Thomas Zimmermann wrote:
> >>> Am 08.03.22 um 23:52 schrieb Marek Szyprowski:
> >>>> On 23.02.2022 20:38, Thomas Zimmermann wrote:
> >>>>> Improve the performance of cfb_imageblit() by manually unrolling
> >>>>> the inner blitting loop and moving some invariants out. The compiler
> >>>>> failed to do this automatically. This change keeps cfb_imageblit()
> >>>>> in sync with sys_imagebit().
> >>>>>
> >>>>> A microbenchmark measures the average number of CPU cycles
> >>>>> for cfb_imageblit() after a stabilizing period of a few minutes
> >>>>> (i7-4790, FullHD, simpledrm, kernel with debugging).
> >>>>>
> >>>>> cfb_imageblit(), new: 15724 cycles
> >>>>> cfb_imageblit(): old: 30566 cycles
> >>>>>
> >>>>> In the optimized case, cfb_imageblit() is now ~2x faster than before.
> >>>>>
> >>>>> v3:
> >>>>>      * fix commit description (Pekka)
> >>>>>
> >>>>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> >>>>> Acked-by: Sam Ravnborg <sam@ravnborg.org>
> >>>>> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
> >>>> This patch landed recently in linux next-20220308 as commit 0d03011894d2
> >>>> ("fbdev: Improve performance of cfb_imageblit()"). Sadly it causes a
> >>>> freeze after DRM and emulated fbdev initialization on various Samsung
> >>>> Exynos ARM 32bit based boards. This happens when kernel is compiled from
> >>>> exynos_defconfig. Surprisingly when kernel is compiled from
> >>>> multi_v7_defconfig all those boards boot fine, so this is a matter of
> >>>> one of the debugging options enabled in the exynos_defconfig. I will try
> >>>> to analyze this further and share the results. Reverting $subject on top
> >>>> of next-20220308 fixes the boot issue.
> >>>
> >>> Thanks for reporting. I don't have the hardware to reproduce it and
> >>> there's no obvious difference to the original version. It's supposed
> >>> to be the same algorithm with a different implementation. Unless you
> >>> can figure out the issue, we can also revert the patch easily.
> >>
> >> I've played a bit with .config options and found that the issue is
> >> caused by the compiled-in fonts used for the framebuffer. For some
> >> reasons (so far unknown to me), exynos_defconfig has the following odd
> >> setup:
> >>
> >> CONFIG_FONT_SUPPORT=y
> >> CONFIG_FONTS=y
> >> # CONFIG_FONT_8x8 is not set
> >> # CONFIG_FONT_8x16 is not set
> >> # CONFIG_FONT_6x11 is not set
> >> CONFIG_FONT_7x14=y
> >> # CONFIG_FONT_PEARL_8x8 is not set
> >> # CONFIG_FONT_ACORN_8x8 is not set
> >> # CONFIG_FONT_MINI_4x6 is not set
> >> # CONFIG_FONT_6x10 is not set
> >> # CONFIG_FONT_10x18 is not set
> >> # CONFIG_FONT_SUN8x16 is not set
> >> # CONFIG_FONT_SUN12x22 is not set
> >> # CONFIG_FONT_TER16x32 is not set
> >> # CONFIG_FONT_6x8 is not set
> >>
> >> Such setup causes a freeze during framebuffer initialization (or just
> >> after it got registered). I've reproduced this even on Raspberry Pi 3B
> >> with multi_v7_defconfig and changed fonts configuration (this also
> >> required to disable vivid driver, which forces 8x16 font), where I got
> >> the following panic:
> >>
> >> simple-framebuffer 3eace000.framebuffer: framebuffer at 0x3eace000,
> >> 0x12c000 bytes
> >> simple-framebuffer 3eace000.framebuffer: format=a8r8g8b8,
> >> mode=640x480x32, linelength=2560
> >> 8<--- cut here ---
> >> Unable to handle kernel paging request at virtual address f0aac000
> >
> > So support for images with offsets or widths that are not a multiple
> > of 8 got broken in cfb_imageblit(). Oops...
> >
> > BTW, the various drawing routines used to set a bitmask indicating
> > which alignments were supported (see blit_x), but most of them no
> > longer do, presumably because all alignments are now supported
> > (since ca. 20 years?).
> > So you can (temporarily) work around this by filling in blit_x,
> > preventing the use of the 7x14 font.
>
> How do I activate the 7x14 font? It's compiled into the kernel already
> (CONFIG_FONT_7x14=y).

Documentation/fb/fbcon.rst:1. fbcon=font:<name>

Or just disable all other fonts.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [v3,4/5] fbdev: Improve performance of cfb_imageblit()
  2022-03-10 19:23                 ` Geert Uytterhoeven
@ 2022-03-13 19:23                   ` Thomas Zimmermann
  -1 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-03-13 19:23 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Marek Szyprowski, Daniel Vetter, Helge Deller,
	Javier Martinez Canillas, Sam Ravnborg, Gerd Hoffmann,
	Pekka Paalanen, Linux Fbdev development list, DRI Development


[-- Attachment #1.1: Type: text/plain, Size: 941 bytes --]

Hi Geert

Am 10.03.22 um 20:23 schrieb Geert Uytterhoeven:
[...]
>>
>> How do I activate the 7x14 font? It's compiled into the kernel already
>> (CONFIG_FONT_7x14=y).
> 
> Documentation/fb/fbcon.rst:1. fbcon=font:<name>
> 
> Or just disable all other fonts.

Thanks. I've been able to reproduce the problem and will send a patch soon.

Best regards
Thomas

> 
> Gr{oetje,eeting}s,
> 
>                          Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                  -- Linus Torvalds

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [v3,4/5] fbdev: Improve performance of cfb_imageblit()
@ 2022-03-13 19:23                   ` Thomas Zimmermann
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-03-13 19:23 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Linux Fbdev development list, Helge Deller,
	Javier Martinez Canillas, DRI Development, Gerd Hoffmann,
	Sam Ravnborg, Marek Szyprowski


[-- Attachment #1.1: Type: text/plain, Size: 941 bytes --]

Hi Geert

Am 10.03.22 um 20:23 schrieb Geert Uytterhoeven:
[...]
>>
>> How do I activate the 7x14 font? It's compiled into the kernel already
>> (CONFIG_FONT_7x14=y).
> 
> Documentation/fb/fbcon.rst:1. fbcon=font:<name>
> 
> Or just disable all other fonts.

Thanks. I've been able to reproduce the problem and will send a patch soon.

Best regards
Thomas

> 
> Gr{oetje,eeting}s,
> 
>                          Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                  -- Linus Torvalds

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
  2022-02-23 19:38   ` Thomas Zimmermann
@ 2022-03-24 19:11     ` Guenter Roeck
  -1 siblings, 0 replies; 49+ messages in thread
From: Guenter Roeck @ 2022-03-24 19:11 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: daniel, deller, javierm, geert, sam, kraxel, ppaalanen,
	dri-devel, linux-fbdev

Hi,

On Wed, Feb 23, 2022 at 08:38:03PM +0100, Thomas Zimmermann wrote:
> Improve the performance of cfb_imageblit() by manually unrolling
> the inner blitting loop and moving some invariants out. The compiler
> failed to do this automatically. This change keeps cfb_imageblit()
> in sync with sys_imagebit().
> 
> A microbenchmark measures the average number of CPU cycles
> for cfb_imageblit() after a stabilizing period of a few minutes
> (i7-4790, FullHD, simpledrm, kernel with debugging).
> 
> cfb_imageblit(), new: 15724 cycles
> cfb_imageblit(): old: 30566 cycles
> 
> In the optimized case, cfb_imageblit() is now ~2x faster than before.
> 
> v3:
> 	* fix commit description (Pekka)
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>

This patch causes crashes with arm mainstone, z2, and collie emulations.
Reverting it fixes the problem.

collie crash log and bisect log attached.

Guenter

---
8<--- cut here ---
Unable to handle kernel paging request at virtual address e090d000
[e090d000] *pgd=c0c0b811c0c0b811, *pte=c0c0b000, *ppte=00000000
Internal error: Oops: 807 [#1] ARM
CPU: 0 PID: 1 Comm: swapper Not tainted 5.17.0-next-20220324 #1
Hardware name: Sharp-Collie
PC is at cfb_imageblit+0x58c/0x6e0
LR is at 0x5
pc : [<c040eab0>]    lr : [<00000005>]    psr: a0000153
sp : e0809958  ip : e090d000  fp : e08099f4
r10: e08099c8  r9 : c0c70600  r8 : ffff6802
r7 : c0c6e000  r6 : 00000000  r5 : e08e7000  r4 : 00000280
r3 : 00000020  r2 : 00000003  r1 : 00000002  r0 : 00000002
Flags: NzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
Control: 0000717f  Table: c0004000  DAC: 00000053
Register r0 information: non-paged memory
Register r1 information: non-paged memory
Register r2 information: non-paged memory
Register r3 information: non-paged memory
Register r4 information: non-paged memory
Register r5 information: 0-page vmalloc region starting at 0xe08e6000 allocated at dma_common_contiguous_remap+0x94/0xb0
Register r6 information: NULL pointer
Register r7 information: non-slab/vmalloc memory
Register r8 information: non-paged memory
Register r9 information: non-slab/vmalloc memory
Register r10 information: 2-page vmalloc region starting at 0xe0808000 allocated at kernel_clone+0x78/0x4e4
Register r11 information: 2-page vmalloc region starting at 0xe0808000 allocated at kernel_clone+0x78/0x4e4
Register r12 information: 0-page vmalloc region starting at 0xe08e6000 allocated at dma_common_contiguous_remap+0x94/0xb0
Process swapper (pid: 1, stack limit = 0x(ptrval))
Stack: (0xe0809958 to 0xe080a000)
9940:                                                       80000153 0000005e
9960: dfb1b424 00000020 00000000 00000000 00000001 00000002 00000003 00000004
9980: dfb1b420 00000000 00000000 00000000 00000000 c067f338 e08099ab 00000026
99a0: 80000153 00000820 007fe178 c07db82c e08099d4 0000003e 00000820 c0e32b00
99c0: 00000006 c07db82c 00000001 c0da1e40 e0809a54 c0e32b00 00000006 00000001
99e0: 00000001 c0c6e000 e0809a34 e08099f8 c040a3f8 c040e530 00000006 00000001
9a00: c0e61920 c0da1e78 00000000 c0e61920 00000000 e0809a54 c06ad89c c0e32b00
9a20: c0da1e00 00000020 e0809acc e0809a38 c040a040 c040a26c e0809a7c 00000140
9a40: 00000002 00000002 00000001 00000007 00000000 00000039 00000001 c0da1e00
9a60: 00000000 00000000 00000000 00000004 00000006 00000007 00000000 00000001
9a80: c06ad89c 00000000 00000000 00000000 00000000 ffffffff ffffffff c07db82c
9aa0: e0809acc c0c0c3c0 c0e32b00 00000007 00000002 00000720 c0409cf0 00000028
9ac0: e0809afc e0809ad0 c040665c c0409cfc 00000000 00000000 c0c0c3c0 c0807584
9ae0: 00000000 00000000 ffffff60 c0c70000 e0809b1c e0809b00 c0439a50 c040656c
9b00: c0c0c3c0 00000000 00000000 00000000 e0809b54 e0809b20 c043a798 c0439a24
9b20: c04095c8 c0c6ff60 00000000 c07db82c e0809b54 c0c0c3c0 c0c6ff60 00000000
9b40: 00000000 ffffff60 e0809ba4 e0809b58 c0407254 c043a5ac e0809b7c e0809b68
9b60: c04145d8 00000000 00000000 00000000 00000720 00000000 00000050 c0c0c3c0
9b80: c0e32b00 c0e61920 00000050 00000028 c0a00df8 00000028 e0809bec e0809ba8
9ba0: c0407748 c0406f04 00000050 00000028 00000050 00000001 c0a02f70 00000000
9bc0: 00000000 c0c0c3c0 c0c0c624 00000000 c0a02f84 0000003e 00000000 c0a03080
9be0: e0809c0c e0809bf0 c0438b10 c040734c c0c0c3c0 c06affbc 00000001 c0a02f84
9c00: e0809c54 e0809c10 c043be28 c0438a80 0000003e 00000001 00000000 c0779d88
9c20: 00000000 00000001 c08075a8 c06affbc 00000000 00000001 00000000 0000003e
9c40: 00000001 c0a02f8c e0809c9c e0809c58 c043c6ec c043bc98 c08075a8 c077c29c
9c60: 00000001 00000000 c0e32b44 c0a03a58 c067f354 c0805a24 c0a00cc8 c0805a24
9c80: 00000000 c07dbabc c0e32da4 fffff000 e0809cb4 e0809ca0 c0405d5c c043c5d4
9ca0: c0a00dac 00000000 e0809cd4 e0809cb8 c0408f48 c0405cfc c0e32b00 00000000
9cc0: c0a00ca8 c0e32b10 e0809d44 e0809cd8 c03ff9e4 c0408e70 c0779a14 00000000
9ce0: c000ea7c 00000000 00000041 00000140 000000f0 00029e01 0000000b 0000001e
9d00: 00000002 00000000 00000005 00000001 00000003 00000000 00000020 c07db82c
9d20: c0e32b00 00000000 c07dfe08 00000004 0000000d 00000000 e0809d84 e0809d48
9d40: c040f550 c03ff7e8 00000004 c077a1b8 c0e32b00 c0180a04 c07dfe18 00000000
9d60: c07dfe18 c0805abc 00000000 00000000 c07cb87c c07cb85c e0809da4 e0809d88
9d80: c045c2c8 c040f228 00000000 c07dfe18 c0805abc 00000000 e0809dc4 e0809da8
9da0: c045a304 c045c288 c07dfe18 c0805abc c07dfe18 00000000 e0809ddc e0809dc8
9dc0: c045a548 c045a250 c0a04c6c 60000153 e0809e04 e0809de0 c045a5f4 c045a4d0
9de0: e0809e04 e0809df0 c07dfe18 c0805abc c045a7d0 c080af60 e0809e24 e0809e08
9e00: c045a860 c045a5b4 00000000 c0805abc c045a7d0 c080af60 e0809e54 e0809e28
9e20: c0458694 c045a7dc c0c30e20 c0c30dec c0c9f3b4 c07db82c c06559b8 c0805abc
9e40: c0e5d340 00000000 e0809e64 e0809e58 c045adf0 c0458620 e0809e8c e0809e68
9e60: c0458fb8 c045addc c0749e14 e0809e78 c0805abc c0c19000 c07bc340 c07a4bc8
9e80: e0809ea4 e0809e90 c045b684 c0458e84 c0818000 c0c19000 e0809eb4 e0809ea8
9ea0: c045d10c c045b614 e0809ec4 e0809eb8 c07bc368 c045d0f8 e0809f4c e0809ec8
9ec0: c07a821c c07bc34c e0809eec e0809ed8 c00374f4 c065f390 c0c427da c07a5600
9ee0: e0809f4c e0809ef0 c00376f4 c07a7448 e0809f03 00000006 00000006 00000000
9f00: 00000000 c07a743c c0793b5c c07a4bc8 00000dc0 c0c427cc c0c427d4 c07db82c
9f20: 00000000 c07d2060 c0c427a0 00000007 c07a4bc8 c0818000 c07cb87c c07cb85c
9f40: e0809f94 e0809f50 c07a85a0 c07a81b0 00000006 00000006 00000000 c07a743c
9f60: c0c19000 0000008c e0809f8c 00000000 c0675804 00000000 00000000 00000000
9f80: 00000000 00000000 e0809fac e0809f98 c067581c c07a842c 00000000 c0675804
9fa0: 00000000 e0809fb0 c0008328 c0675810 00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
Backtrace:
 cfb_imageblit from soft_cursor+0x198/0x1fc
 r10:c0c6e000 r9:00000001 r8:00000001 r7:00000006 r6:c0e32b00 r5:e0809a54
 r4:c0da1e40
 soft_cursor from bit_cursor+0x350/0x4fc
 r10:00000020 r9:c0da1e00 r8:c0e32b00 r7:c06ad89c r6:e0809a54 r5:00000000
 r4:c0e61920
 bit_cursor from fbcon_cursor+0xfc/0x110
 r10:00000028 r9:c0409cf0 r8:00000720 r7:00000002 r6:00000007 r5:c0e32b00
 r4:c0c0c3c0
 fbcon_cursor from hide_cursor+0x38/0xac
 r9:c0c70000 r8:ffffff60 r7:00000000 r6:00000000 r5:c0807584 r4:c0c0c3c0
 hide_cursor from redraw_screen+0x1f8/0x258
 r7:00000000 r6:00000000 r5:00000000 r4:c0c0c3c0
 redraw_screen from fbcon_prepare_logo+0x35c/0x448
 r8:ffffff60 r7:00000000 r6:00000000 r5:c0c6ff60 r4:c0c0c3c0
 fbcon_prepare_logo from fbcon_init+0x408/0x4f8
 r10:00000028 r9:c0a00df8 r8:00000028 r7:00000050 r6:c0e61920 r5:c0e32b00
 r4:c0c0c3c0
 fbcon_init from visual_init+0x9c/0xe0
 r10:c0a03080 r9:00000000 r8:0000003e r7:c0a02f84 r6:00000000 r5:c0c0c624
 r4:c0c0c3c0
 visual_init from do_bind_con_driver+0x19c/0x370
 r7:c0a02f84 r6:00000001 r5:c06affbc r4:c0c0c3c0
 do_bind_con_driver from do_take_over_console+0x124/0x1b8
 r10:c0a02f8c r9:00000001 r8:0000003e r7:00000000 r6:00000001 r5:00000000
 r4:c06affbc
 do_take_over_console from do_fbcon_takeover+0x6c/0xcc
 r10:fffff000 r9:c0e32da4 r8:c07dbabc r7:00000000 r6:c0805a24 r5:c0a00cc8
 r4:c0805a24
 do_fbcon_takeover from fbcon_fb_registered+0xe4/0x128
 r5:00000000 r4:c0a00dac
 fbcon_fb_registered from register_framebuffer+0x208/0x318
 r7:c0e32b10 r6:c0a00ca8 r5:00000000 r4:c0e32b00
 register_framebuffer from sa1100fb_probe+0x334/0x420
 r9:00000000 r8:0000000d r7:00000004 r6:c07dfe08 r5:00000000 r4:c0e32b00
 sa1100fb_probe from platform_probe+0x4c/0xac
 r10:c07cb85c r9:c07cb87c r8:00000000 r7:00000000 r6:c0805abc r5:c07dfe18
 r4:00000000
 platform_probe from really_probe+0xc0/0x280
 r7:00000000 r6:c0805abc r5:c07dfe18 r4:00000000
 really_probe from __driver_probe_device+0x84/0xe4
 r7:00000000 r6:c07dfe18 r5:c0805abc r4:c07dfe18
 __driver_probe_device from driver_probe_device+0x4c/0x10c
 r5:60000153 r4:c0a04c6c
 driver_probe_device from __driver_attach+0x90/0x104
 r7:c080af60 r6:c045a7d0 r5:c0805abc r4:c07dfe18
 __driver_attach from bus_for_each_dev+0x80/0xcc
 r7:c080af60 r6:c045a7d0 r5:c0805abc r4:00000000
 bus_for_each_dev from driver_attach+0x20/0x28
 r6:00000000 r5:c0e5d340 r4:c0805abc
 driver_attach from bus_add_driver+0x140/0x1c8
 bus_add_driver from driver_register+0x7c/0x110
 r7:c07a4bc8 r6:c07bc340 r5:c0c19000 r4:c0805abc
 driver_register from __platform_driver_register+0x20/0x28
 r5:c0c19000 r4:c0818000
 __platform_driver_register from sa1100fb_init+0x28/0x3c
 sa1100fb_init from do_one_initcall+0x78/0x220
 do_one_initcall from kernel_init_freeable+0x180/0x1fc
 r10:c07cb85c r9:c07cb87c r8:c0818000 r7:c07a4bc8 r6:00000007 r5:c0c427a0
 r4:c07d2060
 kernel_init_freeable from kernel_init+0x18/0x10c
 r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0675804
 r4:00000000
 kernel_init from ret_from_fork+0x14/0x2c
Exception stack(0xe0809fb0 to 0xe0809ff8)
9fa0:                                     00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
 r5:c0675804 r4:00000000
Code: e24ba02c e0026323 e08a6106 e5166044 (e58c6000)
---[ end trace 00000000c08187d8 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
Reboot failed -- System halted

---
# bad: [dd315b5800612e6913343524aa9b993f9a8bb0cf] Add linux-next specific files for 20220324
# good: [f443e374ae131c168a065ea1748feac6b2e76613] Linux 5.17
git bisect start 'HEAD' 'v5.17'
# good: [6788381e2f3c20c25cf7ab91df9cf0d6bec153f9] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
git bisect good 6788381e2f3c20c25cf7ab91df9cf0d6bec153f9
# bad: [59c7e0caa3e7bc21dd1b6c681c87d2b307f399ee] Merge branch 'drm-next' of git://git.freedesktop.org/git/drm/drm.git
git bisect bad 59c7e0caa3e7bc21dd1b6c681c87d2b307f399ee
# good: [4d17d43de9d186150b3289ce99d7a79fcff202f9] net: usb: asix: suspend embedded PHY if external is used
git bisect good 4d17d43de9d186150b3289ce99d7a79fcff202f9
# good: [6c64ae228f0826859c56711ce133aff037d6205f] Backmerge tag 'v5.17-rc6' into drm-next
git bisect good 6c64ae228f0826859c56711ce133aff037d6205f
# good: [01fd8d2522c49a333b0ee46ba19a6fedfc1c9a60] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git
git bisect good 01fd8d2522c49a333b0ee46ba19a6fedfc1c9a60
# bad: [6de7e4f02640fba2ffa6ac04e2be13785d614175] Merge tag 'drm-msm-next-2022-03-01' of https://gitlab.freedesktop.org/drm/msm into drm-next
git bisect bad 6de7e4f02640fba2ffa6ac04e2be13785d614175
# bad: [c9e9ce0b6f85ac330adee912745048a0af5f315d] Merge tag 'drm-misc-next-2022-03-03' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
git bisect bad c9e9ce0b6f85ac330adee912745048a0af5f315d
# good: [e2573d5f2a5cebe789bbf415e484b589d8eebad7] drm/amd/display: limit unbounded requesting to 5k
git bisect good e2573d5f2a5cebe789bbf415e484b589d8eebad7
# good: [3c54c95bd917d43d12fe1b192df9aa4c5973449b] fbdev: Remove trailing whitespaces from cfbimgblt.c
git bisect good 3c54c95bd917d43d12fe1b192df9aa4c5973449b
# good: [ed6e76676b2657b71a0b9e5e847d96e4de0b394b] drm: rcar-du: lvds: Add r8a77961 support
git bisect good ed6e76676b2657b71a0b9e5e847d96e4de0b394b
# good: [66a8af1f6e3c10190dff14a5668661c092a2a85f] Merge tag 'drm/tegra/for-5.18-rc1' of https://gitlab.freedesktop.org/drm/tegra into drm-next
git bisect good 66a8af1f6e3c10190dff14a5668661c092a2a85f
# bad: [701920ca9822eb63b420b3bcb627f2c1ec759903] drm/ssd130x: remove redundant initialization of pointer mode
git bisect bad 701920ca9822eb63b420b3bcb627f2c1ec759903
# bad: [9ae2ac4d31a85ce59cc560d514a31b95f4ace154] drm: Add TODO item for optimizing format helpers
git bisect bad 9ae2ac4d31a85ce59cc560d514a31b95f4ace154
# bad: [0d03011894d23241db1a1cad5c12aede60897d5e] fbdev: Improve performance of cfb_imageblit()
git bisect bad 0d03011894d23241db1a1cad5c12aede60897d5e
# first bad commit: [0d03011894d23241db1a1cad5c12aede60897d5e] fbdev: Improve performance of cfb_imageblit()

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
@ 2022-03-24 19:11     ` Guenter Roeck
  0 siblings, 0 replies; 49+ messages in thread
From: Guenter Roeck @ 2022-03-24 19:11 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: linux-fbdev, deller, javierm, dri-devel, ppaalanen, geert, kraxel, sam

Hi,

On Wed, Feb 23, 2022 at 08:38:03PM +0100, Thomas Zimmermann wrote:
> Improve the performance of cfb_imageblit() by manually unrolling
> the inner blitting loop and moving some invariants out. The compiler
> failed to do this automatically. This change keeps cfb_imageblit()
> in sync with sys_imagebit().
> 
> A microbenchmark measures the average number of CPU cycles
> for cfb_imageblit() after a stabilizing period of a few minutes
> (i7-4790, FullHD, simpledrm, kernel with debugging).
> 
> cfb_imageblit(), new: 15724 cycles
> cfb_imageblit(): old: 30566 cycles
> 
> In the optimized case, cfb_imageblit() is now ~2x faster than before.
> 
> v3:
> 	* fix commit description (Pekka)
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>

This patch causes crashes with arm mainstone, z2, and collie emulations.
Reverting it fixes the problem.

collie crash log and bisect log attached.

Guenter

---
8<--- cut here ---
Unable to handle kernel paging request at virtual address e090d000
[e090d000] *pgd=c0c0b811c0c0b811, *pte=c0c0b000, *ppte=00000000
Internal error: Oops: 807 [#1] ARM
CPU: 0 PID: 1 Comm: swapper Not tainted 5.17.0-next-20220324 #1
Hardware name: Sharp-Collie
PC is at cfb_imageblit+0x58c/0x6e0
LR is at 0x5
pc : [<c040eab0>]    lr : [<00000005>]    psr: a0000153
sp : e0809958  ip : e090d000  fp : e08099f4
r10: e08099c8  r9 : c0c70600  r8 : ffff6802
r7 : c0c6e000  r6 : 00000000  r5 : e08e7000  r4 : 00000280
r3 : 00000020  r2 : 00000003  r1 : 00000002  r0 : 00000002
Flags: NzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
Control: 0000717f  Table: c0004000  DAC: 00000053
Register r0 information: non-paged memory
Register r1 information: non-paged memory
Register r2 information: non-paged memory
Register r3 information: non-paged memory
Register r4 information: non-paged memory
Register r5 information: 0-page vmalloc region starting at 0xe08e6000 allocated at dma_common_contiguous_remap+0x94/0xb0
Register r6 information: NULL pointer
Register r7 information: non-slab/vmalloc memory
Register r8 information: non-paged memory
Register r9 information: non-slab/vmalloc memory
Register r10 information: 2-page vmalloc region starting at 0xe0808000 allocated at kernel_clone+0x78/0x4e4
Register r11 information: 2-page vmalloc region starting at 0xe0808000 allocated at kernel_clone+0x78/0x4e4
Register r12 information: 0-page vmalloc region starting at 0xe08e6000 allocated at dma_common_contiguous_remap+0x94/0xb0
Process swapper (pid: 1, stack limit = 0x(ptrval))
Stack: (0xe0809958 to 0xe080a000)
9940:                                                       80000153 0000005e
9960: dfb1b424 00000020 00000000 00000000 00000001 00000002 00000003 00000004
9980: dfb1b420 00000000 00000000 00000000 00000000 c067f338 e08099ab 00000026
99a0: 80000153 00000820 007fe178 c07db82c e08099d4 0000003e 00000820 c0e32b00
99c0: 00000006 c07db82c 00000001 c0da1e40 e0809a54 c0e32b00 00000006 00000001
99e0: 00000001 c0c6e000 e0809a34 e08099f8 c040a3f8 c040e530 00000006 00000001
9a00: c0e61920 c0da1e78 00000000 c0e61920 00000000 e0809a54 c06ad89c c0e32b00
9a20: c0da1e00 00000020 e0809acc e0809a38 c040a040 c040a26c e0809a7c 00000140
9a40: 00000002 00000002 00000001 00000007 00000000 00000039 00000001 c0da1e00
9a60: 00000000 00000000 00000000 00000004 00000006 00000007 00000000 00000001
9a80: c06ad89c 00000000 00000000 00000000 00000000 ffffffff ffffffff c07db82c
9aa0: e0809acc c0c0c3c0 c0e32b00 00000007 00000002 00000720 c0409cf0 00000028
9ac0: e0809afc e0809ad0 c040665c c0409cfc 00000000 00000000 c0c0c3c0 c0807584
9ae0: 00000000 00000000 ffffff60 c0c70000 e0809b1c e0809b00 c0439a50 c040656c
9b00: c0c0c3c0 00000000 00000000 00000000 e0809b54 e0809b20 c043a798 c0439a24
9b20: c04095c8 c0c6ff60 00000000 c07db82c e0809b54 c0c0c3c0 c0c6ff60 00000000
9b40: 00000000 ffffff60 e0809ba4 e0809b58 c0407254 c043a5ac e0809b7c e0809b68
9b60: c04145d8 00000000 00000000 00000000 00000720 00000000 00000050 c0c0c3c0
9b80: c0e32b00 c0e61920 00000050 00000028 c0a00df8 00000028 e0809bec e0809ba8
9ba0: c0407748 c0406f04 00000050 00000028 00000050 00000001 c0a02f70 00000000
9bc0: 00000000 c0c0c3c0 c0c0c624 00000000 c0a02f84 0000003e 00000000 c0a03080
9be0: e0809c0c e0809bf0 c0438b10 c040734c c0c0c3c0 c06affbc 00000001 c0a02f84
9c00: e0809c54 e0809c10 c043be28 c0438a80 0000003e 00000001 00000000 c0779d88
9c20: 00000000 00000001 c08075a8 c06affbc 00000000 00000001 00000000 0000003e
9c40: 00000001 c0a02f8c e0809c9c e0809c58 c043c6ec c043bc98 c08075a8 c077c29c
9c60: 00000001 00000000 c0e32b44 c0a03a58 c067f354 c0805a24 c0a00cc8 c0805a24
9c80: 00000000 c07dbabc c0e32da4 fffff000 e0809cb4 e0809ca0 c0405d5c c043c5d4
9ca0: c0a00dac 00000000 e0809cd4 e0809cb8 c0408f48 c0405cfc c0e32b00 00000000
9cc0: c0a00ca8 c0e32b10 e0809d44 e0809cd8 c03ff9e4 c0408e70 c0779a14 00000000
9ce0: c000ea7c 00000000 00000041 00000140 000000f0 00029e01 0000000b 0000001e
9d00: 00000002 00000000 00000005 00000001 00000003 00000000 00000020 c07db82c
9d20: c0e32b00 00000000 c07dfe08 00000004 0000000d 00000000 e0809d84 e0809d48
9d40: c040f550 c03ff7e8 00000004 c077a1b8 c0e32b00 c0180a04 c07dfe18 00000000
9d60: c07dfe18 c0805abc 00000000 00000000 c07cb87c c07cb85c e0809da4 e0809d88
9d80: c045c2c8 c040f228 00000000 c07dfe18 c0805abc 00000000 e0809dc4 e0809da8
9da0: c045a304 c045c288 c07dfe18 c0805abc c07dfe18 00000000 e0809ddc e0809dc8
9dc0: c045a548 c045a250 c0a04c6c 60000153 e0809e04 e0809de0 c045a5f4 c045a4d0
9de0: e0809e04 e0809df0 c07dfe18 c0805abc c045a7d0 c080af60 e0809e24 e0809e08
9e00: c045a860 c045a5b4 00000000 c0805abc c045a7d0 c080af60 e0809e54 e0809e28
9e20: c0458694 c045a7dc c0c30e20 c0c30dec c0c9f3b4 c07db82c c06559b8 c0805abc
9e40: c0e5d340 00000000 e0809e64 e0809e58 c045adf0 c0458620 e0809e8c e0809e68
9e60: c0458fb8 c045addc c0749e14 e0809e78 c0805abc c0c19000 c07bc340 c07a4bc8
9e80: e0809ea4 e0809e90 c045b684 c0458e84 c0818000 c0c19000 e0809eb4 e0809ea8
9ea0: c045d10c c045b614 e0809ec4 e0809eb8 c07bc368 c045d0f8 e0809f4c e0809ec8
9ec0: c07a821c c07bc34c e0809eec e0809ed8 c00374f4 c065f390 c0c427da c07a5600
9ee0: e0809f4c e0809ef0 c00376f4 c07a7448 e0809f03 00000006 00000006 00000000
9f00: 00000000 c07a743c c0793b5c c07a4bc8 00000dc0 c0c427cc c0c427d4 c07db82c
9f20: 00000000 c07d2060 c0c427a0 00000007 c07a4bc8 c0818000 c07cb87c c07cb85c
9f40: e0809f94 e0809f50 c07a85a0 c07a81b0 00000006 00000006 00000000 c07a743c
9f60: c0c19000 0000008c e0809f8c 00000000 c0675804 00000000 00000000 00000000
9f80: 00000000 00000000 e0809fac e0809f98 c067581c c07a842c 00000000 c0675804
9fa0: 00000000 e0809fb0 c0008328 c0675810 00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
Backtrace:
 cfb_imageblit from soft_cursor+0x198/0x1fc
 r10:c0c6e000 r9:00000001 r8:00000001 r7:00000006 r6:c0e32b00 r5:e0809a54
 r4:c0da1e40
 soft_cursor from bit_cursor+0x350/0x4fc
 r10:00000020 r9:c0da1e00 r8:c0e32b00 r7:c06ad89c r6:e0809a54 r5:00000000
 r4:c0e61920
 bit_cursor from fbcon_cursor+0xfc/0x110
 r10:00000028 r9:c0409cf0 r8:00000720 r7:00000002 r6:00000007 r5:c0e32b00
 r4:c0c0c3c0
 fbcon_cursor from hide_cursor+0x38/0xac
 r9:c0c70000 r8:ffffff60 r7:00000000 r6:00000000 r5:c0807584 r4:c0c0c3c0
 hide_cursor from redraw_screen+0x1f8/0x258
 r7:00000000 r6:00000000 r5:00000000 r4:c0c0c3c0
 redraw_screen from fbcon_prepare_logo+0x35c/0x448
 r8:ffffff60 r7:00000000 r6:00000000 r5:c0c6ff60 r4:c0c0c3c0
 fbcon_prepare_logo from fbcon_init+0x408/0x4f8
 r10:00000028 r9:c0a00df8 r8:00000028 r7:00000050 r6:c0e61920 r5:c0e32b00
 r4:c0c0c3c0
 fbcon_init from visual_init+0x9c/0xe0
 r10:c0a03080 r9:00000000 r8:0000003e r7:c0a02f84 r6:00000000 r5:c0c0c624
 r4:c0c0c3c0
 visual_init from do_bind_con_driver+0x19c/0x370
 r7:c0a02f84 r6:00000001 r5:c06affbc r4:c0c0c3c0
 do_bind_con_driver from do_take_over_console+0x124/0x1b8
 r10:c0a02f8c r9:00000001 r8:0000003e r7:00000000 r6:00000001 r5:00000000
 r4:c06affbc
 do_take_over_console from do_fbcon_takeover+0x6c/0xcc
 r10:fffff000 r9:c0e32da4 r8:c07dbabc r7:00000000 r6:c0805a24 r5:c0a00cc8
 r4:c0805a24
 do_fbcon_takeover from fbcon_fb_registered+0xe4/0x128
 r5:00000000 r4:c0a00dac
 fbcon_fb_registered from register_framebuffer+0x208/0x318
 r7:c0e32b10 r6:c0a00ca8 r5:00000000 r4:c0e32b00
 register_framebuffer from sa1100fb_probe+0x334/0x420
 r9:00000000 r8:0000000d r7:00000004 r6:c07dfe08 r5:00000000 r4:c0e32b00
 sa1100fb_probe from platform_probe+0x4c/0xac
 r10:c07cb85c r9:c07cb87c r8:00000000 r7:00000000 r6:c0805abc r5:c07dfe18
 r4:00000000
 platform_probe from really_probe+0xc0/0x280
 r7:00000000 r6:c0805abc r5:c07dfe18 r4:00000000
 really_probe from __driver_probe_device+0x84/0xe4
 r7:00000000 r6:c07dfe18 r5:c0805abc r4:c07dfe18
 __driver_probe_device from driver_probe_device+0x4c/0x10c
 r5:60000153 r4:c0a04c6c
 driver_probe_device from __driver_attach+0x90/0x104
 r7:c080af60 r6:c045a7d0 r5:c0805abc r4:c07dfe18
 __driver_attach from bus_for_each_dev+0x80/0xcc
 r7:c080af60 r6:c045a7d0 r5:c0805abc r4:00000000
 bus_for_each_dev from driver_attach+0x20/0x28
 r6:00000000 r5:c0e5d340 r4:c0805abc
 driver_attach from bus_add_driver+0x140/0x1c8
 bus_add_driver from driver_register+0x7c/0x110
 r7:c07a4bc8 r6:c07bc340 r5:c0c19000 r4:c0805abc
 driver_register from __platform_driver_register+0x20/0x28
 r5:c0c19000 r4:c0818000
 __platform_driver_register from sa1100fb_init+0x28/0x3c
 sa1100fb_init from do_one_initcall+0x78/0x220
 do_one_initcall from kernel_init_freeable+0x180/0x1fc
 r10:c07cb85c r9:c07cb87c r8:c0818000 r7:c07a4bc8 r6:00000007 r5:c0c427a0
 r4:c07d2060
 kernel_init_freeable from kernel_init+0x18/0x10c
 r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0675804
 r4:00000000
 kernel_init from ret_from_fork+0x14/0x2c
Exception stack(0xe0809fb0 to 0xe0809ff8)
9fa0:                                     00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
 r5:c0675804 r4:00000000
Code: e24ba02c e0026323 e08a6106 e5166044 (e58c6000)
---[ end trace 00000000c08187d8 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
Reboot failed -- System halted

---
# bad: [dd315b5800612e6913343524aa9b993f9a8bb0cf] Add linux-next specific files for 20220324
# good: [f443e374ae131c168a065ea1748feac6b2e76613] Linux 5.17
git bisect start 'HEAD' 'v5.17'
# good: [6788381e2f3c20c25cf7ab91df9cf0d6bec153f9] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
git bisect good 6788381e2f3c20c25cf7ab91df9cf0d6bec153f9
# bad: [59c7e0caa3e7bc21dd1b6c681c87d2b307f399ee] Merge branch 'drm-next' of git://git.freedesktop.org/git/drm/drm.git
git bisect bad 59c7e0caa3e7bc21dd1b6c681c87d2b307f399ee
# good: [4d17d43de9d186150b3289ce99d7a79fcff202f9] net: usb: asix: suspend embedded PHY if external is used
git bisect good 4d17d43de9d186150b3289ce99d7a79fcff202f9
# good: [6c64ae228f0826859c56711ce133aff037d6205f] Backmerge tag 'v5.17-rc6' into drm-next
git bisect good 6c64ae228f0826859c56711ce133aff037d6205f
# good: [01fd8d2522c49a333b0ee46ba19a6fedfc1c9a60] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git
git bisect good 01fd8d2522c49a333b0ee46ba19a6fedfc1c9a60
# bad: [6de7e4f02640fba2ffa6ac04e2be13785d614175] Merge tag 'drm-msm-next-2022-03-01' of https://gitlab.freedesktop.org/drm/msm into drm-next
git bisect bad 6de7e4f02640fba2ffa6ac04e2be13785d614175
# bad: [c9e9ce0b6f85ac330adee912745048a0af5f315d] Merge tag 'drm-misc-next-2022-03-03' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
git bisect bad c9e9ce0b6f85ac330adee912745048a0af5f315d
# good: [e2573d5f2a5cebe789bbf415e484b589d8eebad7] drm/amd/display: limit unbounded requesting to 5k
git bisect good e2573d5f2a5cebe789bbf415e484b589d8eebad7
# good: [3c54c95bd917d43d12fe1b192df9aa4c5973449b] fbdev: Remove trailing whitespaces from cfbimgblt.c
git bisect good 3c54c95bd917d43d12fe1b192df9aa4c5973449b
# good: [ed6e76676b2657b71a0b9e5e847d96e4de0b394b] drm: rcar-du: lvds: Add r8a77961 support
git bisect good ed6e76676b2657b71a0b9e5e847d96e4de0b394b
# good: [66a8af1f6e3c10190dff14a5668661c092a2a85f] Merge tag 'drm/tegra/for-5.18-rc1' of https://gitlab.freedesktop.org/drm/tegra into drm-next
git bisect good 66a8af1f6e3c10190dff14a5668661c092a2a85f
# bad: [701920ca9822eb63b420b3bcb627f2c1ec759903] drm/ssd130x: remove redundant initialization of pointer mode
git bisect bad 701920ca9822eb63b420b3bcb627f2c1ec759903
# bad: [9ae2ac4d31a85ce59cc560d514a31b95f4ace154] drm: Add TODO item for optimizing format helpers
git bisect bad 9ae2ac4d31a85ce59cc560d514a31b95f4ace154
# bad: [0d03011894d23241db1a1cad5c12aede60897d5e] fbdev: Improve performance of cfb_imageblit()
git bisect bad 0d03011894d23241db1a1cad5c12aede60897d5e
# first bad commit: [0d03011894d23241db1a1cad5c12aede60897d5e] fbdev: Improve performance of cfb_imageblit()

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
  2022-03-24 19:11     ` Guenter Roeck
@ 2022-03-24 19:18       ` Thomas Zimmermann
  -1 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-03-24 19:18 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: daniel, deller, javierm, geert, sam, kraxel, ppaalanen,
	dri-devel, linux-fbdev


[-- Attachment #1.1: Type: text/plain, Size: 14073 bytes --]

Hi

Am 24.03.22 um 20:11 schrieb Guenter Roeck:
> Hi,
> 
> On Wed, Feb 23, 2022 at 08:38:03PM +0100, Thomas Zimmermann wrote:
>> Improve the performance of cfb_imageblit() by manually unrolling
>> the inner blitting loop and moving some invariants out. The compiler
>> failed to do this automatically. This change keeps cfb_imageblit()
>> in sync with sys_imagebit().
>>
>> A microbenchmark measures the average number of CPU cycles
>> for cfb_imageblit() after a stabilizing period of a few minutes
>> (i7-4790, FullHD, simpledrm, kernel with debugging).
>>
>> cfb_imageblit(), new: 15724 cycles
>> cfb_imageblit(): old: 30566 cycles
>>
>> In the optimized case, cfb_imageblit() is now ~2x faster than before.
>>
>> v3:
>> 	* fix commit description (Pekka)
>>
>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> 
> This patch causes crashes with arm mainstone, z2, and collie emulations.
> Reverting it fixes the problem.
> 
> collie crash log and bisect log attached.

Does it work if you apply the fixes at

https://patchwork.freedesktop.org/series/101321/

?

Best regards
Thomas

> 
> Guenter
> 
> ---
> 8<--- cut here ---
> Unable to handle kernel paging request at virtual address e090d000
> [e090d000] *pgd=c0c0b811c0c0b811, *pte=c0c0b000, *ppte=00000000
> Internal error: Oops: 807 [#1] ARM
> CPU: 0 PID: 1 Comm: swapper Not tainted 5.17.0-next-20220324 #1
> Hardware name: Sharp-Collie
> PC is at cfb_imageblit+0x58c/0x6e0
> LR is at 0x5
> pc : [<c040eab0>]    lr : [<00000005>]    psr: a0000153
> sp : e0809958  ip : e090d000  fp : e08099f4
> r10: e08099c8  r9 : c0c70600  r8 : ffff6802
> r7 : c0c6e000  r6 : 00000000  r5 : e08e7000  r4 : 00000280
> r3 : 00000020  r2 : 00000003  r1 : 00000002  r0 : 00000002
> Flags: NzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
> Control: 0000717f  Table: c0004000  DAC: 00000053
> Register r0 information: non-paged memory
> Register r1 information: non-paged memory
> Register r2 information: non-paged memory
> Register r3 information: non-paged memory
> Register r4 information: non-paged memory
> Register r5 information: 0-page vmalloc region starting at 0xe08e6000 allocated at dma_common_contiguous_remap+0x94/0xb0
> Register r6 information: NULL pointer
> Register r7 information: non-slab/vmalloc memory
> Register r8 information: non-paged memory
> Register r9 information: non-slab/vmalloc memory
> Register r10 information: 2-page vmalloc region starting at 0xe0808000 allocated at kernel_clone+0x78/0x4e4
> Register r11 information: 2-page vmalloc region starting at 0xe0808000 allocated at kernel_clone+0x78/0x4e4
> Register r12 information: 0-page vmalloc region starting at 0xe08e6000 allocated at dma_common_contiguous_remap+0x94/0xb0
> Process swapper (pid: 1, stack limit = 0x(ptrval))
> Stack: (0xe0809958 to 0xe080a000)
> 9940:                                                       80000153 0000005e
> 9960: dfb1b424 00000020 00000000 00000000 00000001 00000002 00000003 00000004
> 9980: dfb1b420 00000000 00000000 00000000 00000000 c067f338 e08099ab 00000026
> 99a0: 80000153 00000820 007fe178 c07db82c e08099d4 0000003e 00000820 c0e32b00
> 99c0: 00000006 c07db82c 00000001 c0da1e40 e0809a54 c0e32b00 00000006 00000001
> 99e0: 00000001 c0c6e000 e0809a34 e08099f8 c040a3f8 c040e530 00000006 00000001
> 9a00: c0e61920 c0da1e78 00000000 c0e61920 00000000 e0809a54 c06ad89c c0e32b00
> 9a20: c0da1e00 00000020 e0809acc e0809a38 c040a040 c040a26c e0809a7c 00000140
> 9a40: 00000002 00000002 00000001 00000007 00000000 00000039 00000001 c0da1e00
> 9a60: 00000000 00000000 00000000 00000004 00000006 00000007 00000000 00000001
> 9a80: c06ad89c 00000000 00000000 00000000 00000000 ffffffff ffffffff c07db82c
> 9aa0: e0809acc c0c0c3c0 c0e32b00 00000007 00000002 00000720 c0409cf0 00000028
> 9ac0: e0809afc e0809ad0 c040665c c0409cfc 00000000 00000000 c0c0c3c0 c0807584
> 9ae0: 00000000 00000000 ffffff60 c0c70000 e0809b1c e0809b00 c0439a50 c040656c
> 9b00: c0c0c3c0 00000000 00000000 00000000 e0809b54 e0809b20 c043a798 c0439a24
> 9b20: c04095c8 c0c6ff60 00000000 c07db82c e0809b54 c0c0c3c0 c0c6ff60 00000000
> 9b40: 00000000 ffffff60 e0809ba4 e0809b58 c0407254 c043a5ac e0809b7c e0809b68
> 9b60: c04145d8 00000000 00000000 00000000 00000720 00000000 00000050 c0c0c3c0
> 9b80: c0e32b00 c0e61920 00000050 00000028 c0a00df8 00000028 e0809bec e0809ba8
> 9ba0: c0407748 c0406f04 00000050 00000028 00000050 00000001 c0a02f70 00000000
> 9bc0: 00000000 c0c0c3c0 c0c0c624 00000000 c0a02f84 0000003e 00000000 c0a03080
> 9be0: e0809c0c e0809bf0 c0438b10 c040734c c0c0c3c0 c06affbc 00000001 c0a02f84
> 9c00: e0809c54 e0809c10 c043be28 c0438a80 0000003e 00000001 00000000 c0779d88
> 9c20: 00000000 00000001 c08075a8 c06affbc 00000000 00000001 00000000 0000003e
> 9c40: 00000001 c0a02f8c e0809c9c e0809c58 c043c6ec c043bc98 c08075a8 c077c29c
> 9c60: 00000001 00000000 c0e32b44 c0a03a58 c067f354 c0805a24 c0a00cc8 c0805a24
> 9c80: 00000000 c07dbabc c0e32da4 fffff000 e0809cb4 e0809ca0 c0405d5c c043c5d4
> 9ca0: c0a00dac 00000000 e0809cd4 e0809cb8 c0408f48 c0405cfc c0e32b00 00000000
> 9cc0: c0a00ca8 c0e32b10 e0809d44 e0809cd8 c03ff9e4 c0408e70 c0779a14 00000000
> 9ce0: c000ea7c 00000000 00000041 00000140 000000f0 00029e01 0000000b 0000001e
> 9d00: 00000002 00000000 00000005 00000001 00000003 00000000 00000020 c07db82c
> 9d20: c0e32b00 00000000 c07dfe08 00000004 0000000d 00000000 e0809d84 e0809d48
> 9d40: c040f550 c03ff7e8 00000004 c077a1b8 c0e32b00 c0180a04 c07dfe18 00000000
> 9d60: c07dfe18 c0805abc 00000000 00000000 c07cb87c c07cb85c e0809da4 e0809d88
> 9d80: c045c2c8 c040f228 00000000 c07dfe18 c0805abc 00000000 e0809dc4 e0809da8
> 9da0: c045a304 c045c288 c07dfe18 c0805abc c07dfe18 00000000 e0809ddc e0809dc8
> 9dc0: c045a548 c045a250 c0a04c6c 60000153 e0809e04 e0809de0 c045a5f4 c045a4d0
> 9de0: e0809e04 e0809df0 c07dfe18 c0805abc c045a7d0 c080af60 e0809e24 e0809e08
> 9e00: c045a860 c045a5b4 00000000 c0805abc c045a7d0 c080af60 e0809e54 e0809e28
> 9e20: c0458694 c045a7dc c0c30e20 c0c30dec c0c9f3b4 c07db82c c06559b8 c0805abc
> 9e40: c0e5d340 00000000 e0809e64 e0809e58 c045adf0 c0458620 e0809e8c e0809e68
> 9e60: c0458fb8 c045addc c0749e14 e0809e78 c0805abc c0c19000 c07bc340 c07a4bc8
> 9e80: e0809ea4 e0809e90 c045b684 c0458e84 c0818000 c0c19000 e0809eb4 e0809ea8
> 9ea0: c045d10c c045b614 e0809ec4 e0809eb8 c07bc368 c045d0f8 e0809f4c e0809ec8
> 9ec0: c07a821c c07bc34c e0809eec e0809ed8 c00374f4 c065f390 c0c427da c07a5600
> 9ee0: e0809f4c e0809ef0 c00376f4 c07a7448 e0809f03 00000006 00000006 00000000
> 9f00: 00000000 c07a743c c0793b5c c07a4bc8 00000dc0 c0c427cc c0c427d4 c07db82c
> 9f20: 00000000 c07d2060 c0c427a0 00000007 c07a4bc8 c0818000 c07cb87c c07cb85c
> 9f40: e0809f94 e0809f50 c07a85a0 c07a81b0 00000006 00000006 00000000 c07a743c
> 9f60: c0c19000 0000008c e0809f8c 00000000 c0675804 00000000 00000000 00000000
> 9f80: 00000000 00000000 e0809fac e0809f98 c067581c c07a842c 00000000 c0675804
> 9fa0: 00000000 e0809fb0 c0008328 c0675810 00000000 00000000 00000000 00000000
> 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
> Backtrace:
>   cfb_imageblit from soft_cursor+0x198/0x1fc
>   r10:c0c6e000 r9:00000001 r8:00000001 r7:00000006 r6:c0e32b00 r5:e0809a54
>   r4:c0da1e40
>   soft_cursor from bit_cursor+0x350/0x4fc
>   r10:00000020 r9:c0da1e00 r8:c0e32b00 r7:c06ad89c r6:e0809a54 r5:00000000
>   r4:c0e61920
>   bit_cursor from fbcon_cursor+0xfc/0x110
>   r10:00000028 r9:c0409cf0 r8:00000720 r7:00000002 r6:00000007 r5:c0e32b00
>   r4:c0c0c3c0
>   fbcon_cursor from hide_cursor+0x38/0xac
>   r9:c0c70000 r8:ffffff60 r7:00000000 r6:00000000 r5:c0807584 r4:c0c0c3c0
>   hide_cursor from redraw_screen+0x1f8/0x258
>   r7:00000000 r6:00000000 r5:00000000 r4:c0c0c3c0
>   redraw_screen from fbcon_prepare_logo+0x35c/0x448
>   r8:ffffff60 r7:00000000 r6:00000000 r5:c0c6ff60 r4:c0c0c3c0
>   fbcon_prepare_logo from fbcon_init+0x408/0x4f8
>   r10:00000028 r9:c0a00df8 r8:00000028 r7:00000050 r6:c0e61920 r5:c0e32b00
>   r4:c0c0c3c0
>   fbcon_init from visual_init+0x9c/0xe0
>   r10:c0a03080 r9:00000000 r8:0000003e r7:c0a02f84 r6:00000000 r5:c0c0c624
>   r4:c0c0c3c0
>   visual_init from do_bind_con_driver+0x19c/0x370
>   r7:c0a02f84 r6:00000001 r5:c06affbc r4:c0c0c3c0
>   do_bind_con_driver from do_take_over_console+0x124/0x1b8
>   r10:c0a02f8c r9:00000001 r8:0000003e r7:00000000 r6:00000001 r5:00000000
>   r4:c06affbc
>   do_take_over_console from do_fbcon_takeover+0x6c/0xcc
>   r10:fffff000 r9:c0e32da4 r8:c07dbabc r7:00000000 r6:c0805a24 r5:c0a00cc8
>   r4:c0805a24
>   do_fbcon_takeover from fbcon_fb_registered+0xe4/0x128
>   r5:00000000 r4:c0a00dac
>   fbcon_fb_registered from register_framebuffer+0x208/0x318
>   r7:c0e32b10 r6:c0a00ca8 r5:00000000 r4:c0e32b00
>   register_framebuffer from sa1100fb_probe+0x334/0x420
>   r9:00000000 r8:0000000d r7:00000004 r6:c07dfe08 r5:00000000 r4:c0e32b00
>   sa1100fb_probe from platform_probe+0x4c/0xac
>   r10:c07cb85c r9:c07cb87c r8:00000000 r7:00000000 r6:c0805abc r5:c07dfe18
>   r4:00000000
>   platform_probe from really_probe+0xc0/0x280
>   r7:00000000 r6:c0805abc r5:c07dfe18 r4:00000000
>   really_probe from __driver_probe_device+0x84/0xe4
>   r7:00000000 r6:c07dfe18 r5:c0805abc r4:c07dfe18
>   __driver_probe_device from driver_probe_device+0x4c/0x10c
>   r5:60000153 r4:c0a04c6c
>   driver_probe_device from __driver_attach+0x90/0x104
>   r7:c080af60 r6:c045a7d0 r5:c0805abc r4:c07dfe18
>   __driver_attach from bus_for_each_dev+0x80/0xcc
>   r7:c080af60 r6:c045a7d0 r5:c0805abc r4:00000000
>   bus_for_each_dev from driver_attach+0x20/0x28
>   r6:00000000 r5:c0e5d340 r4:c0805abc
>   driver_attach from bus_add_driver+0x140/0x1c8
>   bus_add_driver from driver_register+0x7c/0x110
>   r7:c07a4bc8 r6:c07bc340 r5:c0c19000 r4:c0805abc
>   driver_register from __platform_driver_register+0x20/0x28
>   r5:c0c19000 r4:c0818000
>   __platform_driver_register from sa1100fb_init+0x28/0x3c
>   sa1100fb_init from do_one_initcall+0x78/0x220
>   do_one_initcall from kernel_init_freeable+0x180/0x1fc
>   r10:c07cb85c r9:c07cb87c r8:c0818000 r7:c07a4bc8 r6:00000007 r5:c0c427a0
>   r4:c07d2060
>   kernel_init_freeable from kernel_init+0x18/0x10c
>   r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0675804
>   r4:00000000
>   kernel_init from ret_from_fork+0x14/0x2c
> Exception stack(0xe0809fb0 to 0xe0809ff8)
> 9fa0:                                     00000000 00000000 00000000 00000000
> 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
>   r5:c0675804 r4:00000000
> Code: e24ba02c e0026323 e08a6106 e5166044 (e58c6000)
> ---[ end trace 00000000c08187d8 ]---
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> Reboot failed -- System halted
> 
> ---
> # bad: [dd315b5800612e6913343524aa9b993f9a8bb0cf] Add linux-next specific files for 20220324
> # good: [f443e374ae131c168a065ea1748feac6b2e76613] Linux 5.17
> git bisect start 'HEAD' 'v5.17'
> # good: [6788381e2f3c20c25cf7ab91df9cf0d6bec153f9] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
> git bisect good 6788381e2f3c20c25cf7ab91df9cf0d6bec153f9
> # bad: [59c7e0caa3e7bc21dd1b6c681c87d2b307f399ee] Merge branch 'drm-next' of git://git.freedesktop.org/git/drm/drm.git
> git bisect bad 59c7e0caa3e7bc21dd1b6c681c87d2b307f399ee
> # good: [4d17d43de9d186150b3289ce99d7a79fcff202f9] net: usb: asix: suspend embedded PHY if external is used
> git bisect good 4d17d43de9d186150b3289ce99d7a79fcff202f9
> # good: [6c64ae228f0826859c56711ce133aff037d6205f] Backmerge tag 'v5.17-rc6' into drm-next
> git bisect good 6c64ae228f0826859c56711ce133aff037d6205f
> # good: [01fd8d2522c49a333b0ee46ba19a6fedfc1c9a60] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git
> git bisect good 01fd8d2522c49a333b0ee46ba19a6fedfc1c9a60
> # bad: [6de7e4f02640fba2ffa6ac04e2be13785d614175] Merge tag 'drm-msm-next-2022-03-01' of https://gitlab.freedesktop.org/drm/msm into drm-next
> git bisect bad 6de7e4f02640fba2ffa6ac04e2be13785d614175
> # bad: [c9e9ce0b6f85ac330adee912745048a0af5f315d] Merge tag 'drm-misc-next-2022-03-03' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
> git bisect bad c9e9ce0b6f85ac330adee912745048a0af5f315d
> # good: [e2573d5f2a5cebe789bbf415e484b589d8eebad7] drm/amd/display: limit unbounded requesting to 5k
> git bisect good e2573d5f2a5cebe789bbf415e484b589d8eebad7
> # good: [3c54c95bd917d43d12fe1b192df9aa4c5973449b] fbdev: Remove trailing whitespaces from cfbimgblt.c
> git bisect good 3c54c95bd917d43d12fe1b192df9aa4c5973449b
> # good: [ed6e76676b2657b71a0b9e5e847d96e4de0b394b] drm: rcar-du: lvds: Add r8a77961 support
> git bisect good ed6e76676b2657b71a0b9e5e847d96e4de0b394b
> # good: [66a8af1f6e3c10190dff14a5668661c092a2a85f] Merge tag 'drm/tegra/for-5.18-rc1' of https://gitlab.freedesktop.org/drm/tegra into drm-next
> git bisect good 66a8af1f6e3c10190dff14a5668661c092a2a85f
> # bad: [701920ca9822eb63b420b3bcb627f2c1ec759903] drm/ssd130x: remove redundant initialization of pointer mode
> git bisect bad 701920ca9822eb63b420b3bcb627f2c1ec759903
> # bad: [9ae2ac4d31a85ce59cc560d514a31b95f4ace154] drm: Add TODO item for optimizing format helpers
> git bisect bad 9ae2ac4d31a85ce59cc560d514a31b95f4ace154
> # bad: [0d03011894d23241db1a1cad5c12aede60897d5e] fbdev: Improve performance of cfb_imageblit()
> git bisect bad 0d03011894d23241db1a1cad5c12aede60897d5e
> # first bad commit: [0d03011894d23241db1a1cad5c12aede60897d5e] fbdev: Improve performance of cfb_imageblit()

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
@ 2022-03-24 19:18       ` Thomas Zimmermann
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Zimmermann @ 2022-03-24 19:18 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-fbdev, deller, javierm, dri-devel, ppaalanen, geert, kraxel, sam


[-- Attachment #1.1: Type: text/plain, Size: 14073 bytes --]

Hi

Am 24.03.22 um 20:11 schrieb Guenter Roeck:
> Hi,
> 
> On Wed, Feb 23, 2022 at 08:38:03PM +0100, Thomas Zimmermann wrote:
>> Improve the performance of cfb_imageblit() by manually unrolling
>> the inner blitting loop and moving some invariants out. The compiler
>> failed to do this automatically. This change keeps cfb_imageblit()
>> in sync with sys_imagebit().
>>
>> A microbenchmark measures the average number of CPU cycles
>> for cfb_imageblit() after a stabilizing period of a few minutes
>> (i7-4790, FullHD, simpledrm, kernel with debugging).
>>
>> cfb_imageblit(), new: 15724 cycles
>> cfb_imageblit(): old: 30566 cycles
>>
>> In the optimized case, cfb_imageblit() is now ~2x faster than before.
>>
>> v3:
>> 	* fix commit description (Pekka)
>>
>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> 
> This patch causes crashes with arm mainstone, z2, and collie emulations.
> Reverting it fixes the problem.
> 
> collie crash log and bisect log attached.

Does it work if you apply the fixes at

https://patchwork.freedesktop.org/series/101321/

?

Best regards
Thomas

> 
> Guenter
> 
> ---
> 8<--- cut here ---
> Unable to handle kernel paging request at virtual address e090d000
> [e090d000] *pgd=c0c0b811c0c0b811, *pte=c0c0b000, *ppte=00000000
> Internal error: Oops: 807 [#1] ARM
> CPU: 0 PID: 1 Comm: swapper Not tainted 5.17.0-next-20220324 #1
> Hardware name: Sharp-Collie
> PC is at cfb_imageblit+0x58c/0x6e0
> LR is at 0x5
> pc : [<c040eab0>]    lr : [<00000005>]    psr: a0000153
> sp : e0809958  ip : e090d000  fp : e08099f4
> r10: e08099c8  r9 : c0c70600  r8 : ffff6802
> r7 : c0c6e000  r6 : 00000000  r5 : e08e7000  r4 : 00000280
> r3 : 00000020  r2 : 00000003  r1 : 00000002  r0 : 00000002
> Flags: NzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
> Control: 0000717f  Table: c0004000  DAC: 00000053
> Register r0 information: non-paged memory
> Register r1 information: non-paged memory
> Register r2 information: non-paged memory
> Register r3 information: non-paged memory
> Register r4 information: non-paged memory
> Register r5 information: 0-page vmalloc region starting at 0xe08e6000 allocated at dma_common_contiguous_remap+0x94/0xb0
> Register r6 information: NULL pointer
> Register r7 information: non-slab/vmalloc memory
> Register r8 information: non-paged memory
> Register r9 information: non-slab/vmalloc memory
> Register r10 information: 2-page vmalloc region starting at 0xe0808000 allocated at kernel_clone+0x78/0x4e4
> Register r11 information: 2-page vmalloc region starting at 0xe0808000 allocated at kernel_clone+0x78/0x4e4
> Register r12 information: 0-page vmalloc region starting at 0xe08e6000 allocated at dma_common_contiguous_remap+0x94/0xb0
> Process swapper (pid: 1, stack limit = 0x(ptrval))
> Stack: (0xe0809958 to 0xe080a000)
> 9940:                                                       80000153 0000005e
> 9960: dfb1b424 00000020 00000000 00000000 00000001 00000002 00000003 00000004
> 9980: dfb1b420 00000000 00000000 00000000 00000000 c067f338 e08099ab 00000026
> 99a0: 80000153 00000820 007fe178 c07db82c e08099d4 0000003e 00000820 c0e32b00
> 99c0: 00000006 c07db82c 00000001 c0da1e40 e0809a54 c0e32b00 00000006 00000001
> 99e0: 00000001 c0c6e000 e0809a34 e08099f8 c040a3f8 c040e530 00000006 00000001
> 9a00: c0e61920 c0da1e78 00000000 c0e61920 00000000 e0809a54 c06ad89c c0e32b00
> 9a20: c0da1e00 00000020 e0809acc e0809a38 c040a040 c040a26c e0809a7c 00000140
> 9a40: 00000002 00000002 00000001 00000007 00000000 00000039 00000001 c0da1e00
> 9a60: 00000000 00000000 00000000 00000004 00000006 00000007 00000000 00000001
> 9a80: c06ad89c 00000000 00000000 00000000 00000000 ffffffff ffffffff c07db82c
> 9aa0: e0809acc c0c0c3c0 c0e32b00 00000007 00000002 00000720 c0409cf0 00000028
> 9ac0: e0809afc e0809ad0 c040665c c0409cfc 00000000 00000000 c0c0c3c0 c0807584
> 9ae0: 00000000 00000000 ffffff60 c0c70000 e0809b1c e0809b00 c0439a50 c040656c
> 9b00: c0c0c3c0 00000000 00000000 00000000 e0809b54 e0809b20 c043a798 c0439a24
> 9b20: c04095c8 c0c6ff60 00000000 c07db82c e0809b54 c0c0c3c0 c0c6ff60 00000000
> 9b40: 00000000 ffffff60 e0809ba4 e0809b58 c0407254 c043a5ac e0809b7c e0809b68
> 9b60: c04145d8 00000000 00000000 00000000 00000720 00000000 00000050 c0c0c3c0
> 9b80: c0e32b00 c0e61920 00000050 00000028 c0a00df8 00000028 e0809bec e0809ba8
> 9ba0: c0407748 c0406f04 00000050 00000028 00000050 00000001 c0a02f70 00000000
> 9bc0: 00000000 c0c0c3c0 c0c0c624 00000000 c0a02f84 0000003e 00000000 c0a03080
> 9be0: e0809c0c e0809bf0 c0438b10 c040734c c0c0c3c0 c06affbc 00000001 c0a02f84
> 9c00: e0809c54 e0809c10 c043be28 c0438a80 0000003e 00000001 00000000 c0779d88
> 9c20: 00000000 00000001 c08075a8 c06affbc 00000000 00000001 00000000 0000003e
> 9c40: 00000001 c0a02f8c e0809c9c e0809c58 c043c6ec c043bc98 c08075a8 c077c29c
> 9c60: 00000001 00000000 c0e32b44 c0a03a58 c067f354 c0805a24 c0a00cc8 c0805a24
> 9c80: 00000000 c07dbabc c0e32da4 fffff000 e0809cb4 e0809ca0 c0405d5c c043c5d4
> 9ca0: c0a00dac 00000000 e0809cd4 e0809cb8 c0408f48 c0405cfc c0e32b00 00000000
> 9cc0: c0a00ca8 c0e32b10 e0809d44 e0809cd8 c03ff9e4 c0408e70 c0779a14 00000000
> 9ce0: c000ea7c 00000000 00000041 00000140 000000f0 00029e01 0000000b 0000001e
> 9d00: 00000002 00000000 00000005 00000001 00000003 00000000 00000020 c07db82c
> 9d20: c0e32b00 00000000 c07dfe08 00000004 0000000d 00000000 e0809d84 e0809d48
> 9d40: c040f550 c03ff7e8 00000004 c077a1b8 c0e32b00 c0180a04 c07dfe18 00000000
> 9d60: c07dfe18 c0805abc 00000000 00000000 c07cb87c c07cb85c e0809da4 e0809d88
> 9d80: c045c2c8 c040f228 00000000 c07dfe18 c0805abc 00000000 e0809dc4 e0809da8
> 9da0: c045a304 c045c288 c07dfe18 c0805abc c07dfe18 00000000 e0809ddc e0809dc8
> 9dc0: c045a548 c045a250 c0a04c6c 60000153 e0809e04 e0809de0 c045a5f4 c045a4d0
> 9de0: e0809e04 e0809df0 c07dfe18 c0805abc c045a7d0 c080af60 e0809e24 e0809e08
> 9e00: c045a860 c045a5b4 00000000 c0805abc c045a7d0 c080af60 e0809e54 e0809e28
> 9e20: c0458694 c045a7dc c0c30e20 c0c30dec c0c9f3b4 c07db82c c06559b8 c0805abc
> 9e40: c0e5d340 00000000 e0809e64 e0809e58 c045adf0 c0458620 e0809e8c e0809e68
> 9e60: c0458fb8 c045addc c0749e14 e0809e78 c0805abc c0c19000 c07bc340 c07a4bc8
> 9e80: e0809ea4 e0809e90 c045b684 c0458e84 c0818000 c0c19000 e0809eb4 e0809ea8
> 9ea0: c045d10c c045b614 e0809ec4 e0809eb8 c07bc368 c045d0f8 e0809f4c e0809ec8
> 9ec0: c07a821c c07bc34c e0809eec e0809ed8 c00374f4 c065f390 c0c427da c07a5600
> 9ee0: e0809f4c e0809ef0 c00376f4 c07a7448 e0809f03 00000006 00000006 00000000
> 9f00: 00000000 c07a743c c0793b5c c07a4bc8 00000dc0 c0c427cc c0c427d4 c07db82c
> 9f20: 00000000 c07d2060 c0c427a0 00000007 c07a4bc8 c0818000 c07cb87c c07cb85c
> 9f40: e0809f94 e0809f50 c07a85a0 c07a81b0 00000006 00000006 00000000 c07a743c
> 9f60: c0c19000 0000008c e0809f8c 00000000 c0675804 00000000 00000000 00000000
> 9f80: 00000000 00000000 e0809fac e0809f98 c067581c c07a842c 00000000 c0675804
> 9fa0: 00000000 e0809fb0 c0008328 c0675810 00000000 00000000 00000000 00000000
> 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
> Backtrace:
>   cfb_imageblit from soft_cursor+0x198/0x1fc
>   r10:c0c6e000 r9:00000001 r8:00000001 r7:00000006 r6:c0e32b00 r5:e0809a54
>   r4:c0da1e40
>   soft_cursor from bit_cursor+0x350/0x4fc
>   r10:00000020 r9:c0da1e00 r8:c0e32b00 r7:c06ad89c r6:e0809a54 r5:00000000
>   r4:c0e61920
>   bit_cursor from fbcon_cursor+0xfc/0x110
>   r10:00000028 r9:c0409cf0 r8:00000720 r7:00000002 r6:00000007 r5:c0e32b00
>   r4:c0c0c3c0
>   fbcon_cursor from hide_cursor+0x38/0xac
>   r9:c0c70000 r8:ffffff60 r7:00000000 r6:00000000 r5:c0807584 r4:c0c0c3c0
>   hide_cursor from redraw_screen+0x1f8/0x258
>   r7:00000000 r6:00000000 r5:00000000 r4:c0c0c3c0
>   redraw_screen from fbcon_prepare_logo+0x35c/0x448
>   r8:ffffff60 r7:00000000 r6:00000000 r5:c0c6ff60 r4:c0c0c3c0
>   fbcon_prepare_logo from fbcon_init+0x408/0x4f8
>   r10:00000028 r9:c0a00df8 r8:00000028 r7:00000050 r6:c0e61920 r5:c0e32b00
>   r4:c0c0c3c0
>   fbcon_init from visual_init+0x9c/0xe0
>   r10:c0a03080 r9:00000000 r8:0000003e r7:c0a02f84 r6:00000000 r5:c0c0c624
>   r4:c0c0c3c0
>   visual_init from do_bind_con_driver+0x19c/0x370
>   r7:c0a02f84 r6:00000001 r5:c06affbc r4:c0c0c3c0
>   do_bind_con_driver from do_take_over_console+0x124/0x1b8
>   r10:c0a02f8c r9:00000001 r8:0000003e r7:00000000 r6:00000001 r5:00000000
>   r4:c06affbc
>   do_take_over_console from do_fbcon_takeover+0x6c/0xcc
>   r10:fffff000 r9:c0e32da4 r8:c07dbabc r7:00000000 r6:c0805a24 r5:c0a00cc8
>   r4:c0805a24
>   do_fbcon_takeover from fbcon_fb_registered+0xe4/0x128
>   r5:00000000 r4:c0a00dac
>   fbcon_fb_registered from register_framebuffer+0x208/0x318
>   r7:c0e32b10 r6:c0a00ca8 r5:00000000 r4:c0e32b00
>   register_framebuffer from sa1100fb_probe+0x334/0x420
>   r9:00000000 r8:0000000d r7:00000004 r6:c07dfe08 r5:00000000 r4:c0e32b00
>   sa1100fb_probe from platform_probe+0x4c/0xac
>   r10:c07cb85c r9:c07cb87c r8:00000000 r7:00000000 r6:c0805abc r5:c07dfe18
>   r4:00000000
>   platform_probe from really_probe+0xc0/0x280
>   r7:00000000 r6:c0805abc r5:c07dfe18 r4:00000000
>   really_probe from __driver_probe_device+0x84/0xe4
>   r7:00000000 r6:c07dfe18 r5:c0805abc r4:c07dfe18
>   __driver_probe_device from driver_probe_device+0x4c/0x10c
>   r5:60000153 r4:c0a04c6c
>   driver_probe_device from __driver_attach+0x90/0x104
>   r7:c080af60 r6:c045a7d0 r5:c0805abc r4:c07dfe18
>   __driver_attach from bus_for_each_dev+0x80/0xcc
>   r7:c080af60 r6:c045a7d0 r5:c0805abc r4:00000000
>   bus_for_each_dev from driver_attach+0x20/0x28
>   r6:00000000 r5:c0e5d340 r4:c0805abc
>   driver_attach from bus_add_driver+0x140/0x1c8
>   bus_add_driver from driver_register+0x7c/0x110
>   r7:c07a4bc8 r6:c07bc340 r5:c0c19000 r4:c0805abc
>   driver_register from __platform_driver_register+0x20/0x28
>   r5:c0c19000 r4:c0818000
>   __platform_driver_register from sa1100fb_init+0x28/0x3c
>   sa1100fb_init from do_one_initcall+0x78/0x220
>   do_one_initcall from kernel_init_freeable+0x180/0x1fc
>   r10:c07cb85c r9:c07cb87c r8:c0818000 r7:c07a4bc8 r6:00000007 r5:c0c427a0
>   r4:c07d2060
>   kernel_init_freeable from kernel_init+0x18/0x10c
>   r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0675804
>   r4:00000000
>   kernel_init from ret_from_fork+0x14/0x2c
> Exception stack(0xe0809fb0 to 0xe0809ff8)
> 9fa0:                                     00000000 00000000 00000000 00000000
> 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
>   r5:c0675804 r4:00000000
> Code: e24ba02c e0026323 e08a6106 e5166044 (e58c6000)
> ---[ end trace 00000000c08187d8 ]---
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> Reboot failed -- System halted
> 
> ---
> # bad: [dd315b5800612e6913343524aa9b993f9a8bb0cf] Add linux-next specific files for 20220324
> # good: [f443e374ae131c168a065ea1748feac6b2e76613] Linux 5.17
> git bisect start 'HEAD' 'v5.17'
> # good: [6788381e2f3c20c25cf7ab91df9cf0d6bec153f9] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
> git bisect good 6788381e2f3c20c25cf7ab91df9cf0d6bec153f9
> # bad: [59c7e0caa3e7bc21dd1b6c681c87d2b307f399ee] Merge branch 'drm-next' of git://git.freedesktop.org/git/drm/drm.git
> git bisect bad 59c7e0caa3e7bc21dd1b6c681c87d2b307f399ee
> # good: [4d17d43de9d186150b3289ce99d7a79fcff202f9] net: usb: asix: suspend embedded PHY if external is used
> git bisect good 4d17d43de9d186150b3289ce99d7a79fcff202f9
> # good: [6c64ae228f0826859c56711ce133aff037d6205f] Backmerge tag 'v5.17-rc6' into drm-next
> git bisect good 6c64ae228f0826859c56711ce133aff037d6205f
> # good: [01fd8d2522c49a333b0ee46ba19a6fedfc1c9a60] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git
> git bisect good 01fd8d2522c49a333b0ee46ba19a6fedfc1c9a60
> # bad: [6de7e4f02640fba2ffa6ac04e2be13785d614175] Merge tag 'drm-msm-next-2022-03-01' of https://gitlab.freedesktop.org/drm/msm into drm-next
> git bisect bad 6de7e4f02640fba2ffa6ac04e2be13785d614175
> # bad: [c9e9ce0b6f85ac330adee912745048a0af5f315d] Merge tag 'drm-misc-next-2022-03-03' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
> git bisect bad c9e9ce0b6f85ac330adee912745048a0af5f315d
> # good: [e2573d5f2a5cebe789bbf415e484b589d8eebad7] drm/amd/display: limit unbounded requesting to 5k
> git bisect good e2573d5f2a5cebe789bbf415e484b589d8eebad7
> # good: [3c54c95bd917d43d12fe1b192df9aa4c5973449b] fbdev: Remove trailing whitespaces from cfbimgblt.c
> git bisect good 3c54c95bd917d43d12fe1b192df9aa4c5973449b
> # good: [ed6e76676b2657b71a0b9e5e847d96e4de0b394b] drm: rcar-du: lvds: Add r8a77961 support
> git bisect good ed6e76676b2657b71a0b9e5e847d96e4de0b394b
> # good: [66a8af1f6e3c10190dff14a5668661c092a2a85f] Merge tag 'drm/tegra/for-5.18-rc1' of https://gitlab.freedesktop.org/drm/tegra into drm-next
> git bisect good 66a8af1f6e3c10190dff14a5668661c092a2a85f
> # bad: [701920ca9822eb63b420b3bcb627f2c1ec759903] drm/ssd130x: remove redundant initialization of pointer mode
> git bisect bad 701920ca9822eb63b420b3bcb627f2c1ec759903
> # bad: [9ae2ac4d31a85ce59cc560d514a31b95f4ace154] drm: Add TODO item for optimizing format helpers
> git bisect bad 9ae2ac4d31a85ce59cc560d514a31b95f4ace154
> # bad: [0d03011894d23241db1a1cad5c12aede60897d5e] fbdev: Improve performance of cfb_imageblit()
> git bisect bad 0d03011894d23241db1a1cad5c12aede60897d5e
> # first bad commit: [0d03011894d23241db1a1cad5c12aede60897d5e] fbdev: Improve performance of cfb_imageblit()

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
  2022-03-24 19:18       ` Thomas Zimmermann
@ 2022-03-24 21:18         ` Guenter Roeck
  -1 siblings, 0 replies; 49+ messages in thread
From: Guenter Roeck @ 2022-03-24 21:18 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: daniel, deller, javierm, geert, sam, kraxel, ppaalanen,
	dri-devel, linux-fbdev

On 3/24/22 12:18, Thomas Zimmermann wrote:
> Hi
> 
> Am 24.03.22 um 20:11 schrieb Guenter Roeck:
>> Hi,
>>
>> On Wed, Feb 23, 2022 at 08:38:03PM +0100, Thomas Zimmermann wrote:
>>> Improve the performance of cfb_imageblit() by manually unrolling
>>> the inner blitting loop and moving some invariants out. The compiler
>>> failed to do this automatically. This change keeps cfb_imageblit()
>>> in sync with sys_imagebit().
>>>
>>> A microbenchmark measures the average number of CPU cycles
>>> for cfb_imageblit() after a stabilizing period of a few minutes
>>> (i7-4790, FullHD, simpledrm, kernel with debugging).
>>>
>>> cfb_imageblit(), new: 15724 cycles
>>> cfb_imageblit(): old: 30566 cycles
>>>
>>> In the optimized case, cfb_imageblit() is now ~2x faster than before.
>>>
>>> v3:
>>>     * fix commit description (Pekka)
>>>
>>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>>
>> This patch causes crashes with arm mainstone, z2, and collie emulations.
>> Reverting it fixes the problem.
>>
>> collie crash log and bisect log attached.
> 
> Does it work if you apply the fixes at
> 
> https://patchwork.freedesktop.org/series/101321/
> 
> ?
> 

Yes, it does, specifically the cfb related patch. I sent a Tested-by:.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit()
@ 2022-03-24 21:18         ` Guenter Roeck
  0 siblings, 0 replies; 49+ messages in thread
From: Guenter Roeck @ 2022-03-24 21:18 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: linux-fbdev, deller, javierm, dri-devel, ppaalanen, geert, kraxel, sam

On 3/24/22 12:18, Thomas Zimmermann wrote:
> Hi
> 
> Am 24.03.22 um 20:11 schrieb Guenter Roeck:
>> Hi,
>>
>> On Wed, Feb 23, 2022 at 08:38:03PM +0100, Thomas Zimmermann wrote:
>>> Improve the performance of cfb_imageblit() by manually unrolling
>>> the inner blitting loop and moving some invariants out. The compiler
>>> failed to do this automatically. This change keeps cfb_imageblit()
>>> in sync with sys_imagebit().
>>>
>>> A microbenchmark measures the average number of CPU cycles
>>> for cfb_imageblit() after a stabilizing period of a few minutes
>>> (i7-4790, FullHD, simpledrm, kernel with debugging).
>>>
>>> cfb_imageblit(), new: 15724 cycles
>>> cfb_imageblit(): old: 30566 cycles
>>>
>>> In the optimized case, cfb_imageblit() is now ~2x faster than before.
>>>
>>> v3:
>>>     * fix commit description (Pekka)
>>>
>>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>>
>> This patch causes crashes with arm mainstone, z2, and collie emulations.
>> Reverting it fixes the problem.
>>
>> collie crash log and bisect log attached.
> 
> Does it work if you apply the fixes at
> 
> https://patchwork.freedesktop.org/series/101321/
> 
> ?
> 

Yes, it does, specifically the cfb related patch. I sent a Tested-by:.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2022-03-24 22:13 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-23 19:37 [PATCH v3 0/5] fbdev: Improve performance of fbdev console Thomas Zimmermann
2022-02-23 19:37 ` Thomas Zimmermann
2022-02-23 19:38 ` [PATCH v3 1/5] fbdev: Improve performance of sys_fillrect() Thomas Zimmermann
2022-02-23 19:38   ` Thomas Zimmermann
2022-02-23 19:38 ` [PATCH v3 2/5] fbdev: Improve performance of sys_imageblit() Thomas Zimmermann
2022-02-23 19:38   ` Thomas Zimmermann
2022-02-23 19:38 ` [PATCH v3 3/5] fbdev: Remove trailing whitespaces from cfbimgblt.c Thomas Zimmermann
2022-02-23 19:38   ` Thomas Zimmermann
2022-02-23 20:23   ` Sam Ravnborg
2022-02-23 20:23     ` Sam Ravnborg
2022-02-24  8:22   ` Javier Martinez Canillas
2022-02-24  8:22     ` Javier Martinez Canillas
2022-02-23 19:38 ` [PATCH v3 4/5] fbdev: Improve performance of cfb_imageblit() Thomas Zimmermann
2022-02-23 19:38   ` Thomas Zimmermann
2022-02-23 20:25   ` Sam Ravnborg
2022-02-23 20:25     ` Sam Ravnborg
2022-02-24  9:02     ` Javier Martinez Canillas
2022-02-24  9:02       ` Javier Martinez Canillas
2022-02-24 10:29       ` Sam Ravnborg
2022-02-24 10:29         ` Sam Ravnborg
2022-02-24 10:31       ` Geert Uytterhoeven
2022-02-24 10:31         ` Geert Uytterhoeven
2022-02-24  8:31   ` Javier Martinez Canillas
2022-02-24  8:31     ` Javier Martinez Canillas
     [not found]   ` <CGME20220308225225eucas1p12fcdd6e5dc83308b19d51ad7b2a13141@eucas1p1.samsung.com>
2022-03-08 22:52     ` [v3,4/5] " Marek Szyprowski
2022-03-09  8:22       ` Thomas Zimmermann
2022-03-09  9:22         ` Marek Szyprowski
2022-03-09 10:39           ` Geert Uytterhoeven
2022-03-09 10:39             ` Geert Uytterhoeven
2022-03-10 19:21             ` Thomas Zimmermann
2022-03-10 19:21               ` Thomas Zimmermann
2022-03-10 19:23               ` Geert Uytterhoeven
2022-03-10 19:23                 ` Geert Uytterhoeven
2022-03-13 19:23                 ` Thomas Zimmermann
2022-03-13 19:23                   ` Thomas Zimmermann
2022-03-24 19:11   ` [PATCH v3 4/5] " Guenter Roeck
2022-03-24 19:11     ` Guenter Roeck
2022-03-24 19:18     ` Thomas Zimmermann
2022-03-24 19:18       ` Thomas Zimmermann
2022-03-24 21:18       ` Guenter Roeck
2022-03-24 21:18         ` Guenter Roeck
2022-02-23 19:38 ` [PATCH v3 5/5] drm: Add TODO item for optimizing format helpers Thomas Zimmermann
2022-02-23 19:38   ` Thomas Zimmermann
2022-02-23 20:34   ` Sam Ravnborg
2022-02-23 20:34     ` Sam Ravnborg
2022-02-24  8:39   ` Javier Martinez Canillas
2022-02-24  8:39     ` Javier Martinez Canillas
2022-03-02 19:30 ` [PATCH v3 0/5] fbdev: Improve performance of fbdev console Thomas Zimmermann
2022-03-02 19:30   ` Thomas Zimmermann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.