* [PATCH v2 0/5] fbdev: Improve performance of fbdev console
@ 2022-02-21 19:54 Thomas Zimmermann
2022-02-21 19:54 ` [PATCH v2 1/5] fbdev: Improve performance of sys_fillrect() Thomas Zimmermann
` (4 more replies)
0 siblings, 5 replies; 8+ messages in thread
From: Thomas Zimmermann @ 2022-02-21 19:54 UTC (permalink / raw)
To: daniel, deller, javierm, geert, sam, kraxel
Cc: linux-fbdev, Thomas Zimmermann, dri-devel
Optimize performance of the fbdev console for the common case of
software-based clearing and image blitting.
The commit descripton of each patch contains resuls os a simple
microbenchmark. I also tested the full patchset's effect on the
console output by printing directory listings (i7-4790, FullHD,
simpledrm, kernel with debugging).
> time find /usr/share/doc -type f
In the unoptimized case:
real 0m6.173s
user 0m0.044s
sys 0m6.107s
With optimizations applied:
real 0m4.754s
user 0m0.044s
sys 0m4.698s
In the optimized case, printing the directory listing is ~25% faster
than before.
In v2 of the patchset, after implementing Sam's suggestion to update
cfb_imageblit() as well, it turns out that the compiled code in
sys_imageblit() is still significantly slower than the CFB version. A
fix is probably a larger task and would include architecture-specific
changes. A new TODO item suggests to investigate the performance of the
various helpers and format-conversion functions in DRM and fbdev.
v2:
* improve readability for sys_imageblit() (Gerd, Sam)
* new TODO item for further optimization
Thomas Zimmermann (5):
fbdev: Improve performance of sys_fillrect()
fbdev: Improve performance of sys_imageblit()
fbdev: Remove trailing whitespaces from cfbimgblt.c
fbdev: Improve performance of cfb_imageblit()
drm: Add TODO item for optimizing format helpers
Documentation/gpu/todo.rst | 22 +++++
drivers/video/fbdev/core/cfbimgblt.c | 107 ++++++++++++++++---------
drivers/video/fbdev/core/sysfillrect.c | 16 +---
drivers/video/fbdev/core/sysimgblt.c | 49 ++++++++---
4 files changed, 133 insertions(+), 61 deletions(-)
--
2.35.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 1/5] fbdev: Improve performance of sys_fillrect()
2022-02-21 19:54 [PATCH v2 0/5] fbdev: Improve performance of fbdev console Thomas Zimmermann
@ 2022-02-21 19:54 ` Thomas Zimmermann
2022-02-21 19:54 ` [PATCH v2 2/5] fbdev: Improve performance of sys_imageblit() Thomas Zimmermann
` (3 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Thomas Zimmermann @ 2022-02-21 19:54 UTC (permalink / raw)
To: daniel, deller, javierm, geert, sam, kraxel
Cc: linux-fbdev, Thomas Zimmermann, dri-devel
Improve the performance of sys_fillrect() by using word-aligned
32/64-bit mov instructions. While the code tried to implement this,
the compiler failed to create fast instructions. The resulting
binary instructions were even slower than cfb_fillrect(), which
uses the same algorithm, but operates on I/O memory.
A microbenchmark measures the average number of CPU cycles
for sys_fillrect() after a stabilizing period of a few minutes
(i7-4790, FullHD, simpledrm, kernel with debugging). The value
for CFB is given as a reference.
sys_fillrect(), new: 26586 cycles
sys_fillrect(), old: 166603 cycles
cfb_fillrect(): 41012 cycles
In the optimized case, sys_fillrect() is now ~6x faster than before
and ~1.5x faster than the CFB implementation.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
---
drivers/video/fbdev/core/sysfillrect.c | 16 +++-------------
1 file changed, 3 insertions(+), 13 deletions(-)
diff --git a/drivers/video/fbdev/core/sysfillrect.c b/drivers/video/fbdev/core/sysfillrect.c
index 33ee3d34f9d2..bcdcaeae6538 100644
--- a/drivers/video/fbdev/core/sysfillrect.c
+++ b/drivers/video/fbdev/core/sysfillrect.c
@@ -50,19 +50,9 @@ bitfill_aligned(struct fb_info *p, unsigned long *dst, int dst_idx,
/* Main chunk */
n /= bits;
- while (n >= 8) {
- *dst++ = pat;
- *dst++ = pat;
- *dst++ = pat;
- *dst++ = pat;
- *dst++ = pat;
- *dst++ = pat;
- *dst++ = pat;
- *dst++ = pat;
- n -= 8;
- }
- while (n--)
- *dst++ = pat;
+ memset_l(dst, pat, n);
+ dst += n;
+
/* Trailing bits */
if (last)
*dst = comp(pat, *dst, last);
--
2.35.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 2/5] fbdev: Improve performance of sys_imageblit()
2022-02-21 19:54 [PATCH v2 0/5] fbdev: Improve performance of fbdev console Thomas Zimmermann
2022-02-21 19:54 ` [PATCH v2 1/5] fbdev: Improve performance of sys_fillrect() Thomas Zimmermann
@ 2022-02-21 19:54 ` Thomas Zimmermann
2022-02-21 19:54 ` [PATCH v2 3/5] fbdev: Remove trailing whitespaces from cfbimgblt.c Thomas Zimmermann
` (2 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Thomas Zimmermann @ 2022-02-21 19:54 UTC (permalink / raw)
To: daniel, deller, javierm, geert, sam, kraxel
Cc: linux-fbdev, Thomas Zimmermann, dri-devel
Improve the performance of sys_imageblit() by manually unrolling
the inner blitting loop and moving some invariants out. The compiler
failed to do this automatically. The resulting binary code was even
slower than the cfb_imageblit() helper, which uses the same algorithm,
but operates on I/O memory.
A microbenchmark measures the average number of CPU cycles
for sys_imageblit() after a stabilizing period of a few minutes
(i7-4790, FullHD, simpledrm, kernel with debugging). The value
for CFB is given as a reference.
sys_imageblit(), new: 25934 cycles
sys_imageblit(), old: 35944 cycles
cfb_imageblit(): 30566 cycles
In the optimized case, sys_imageblit() is now ~30% faster than before
and ~20% faster than cfb_imageblit().
v2:
* move switch out of inner loop (Gerd)
* remove test for alignment of dst1 (Sam)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
---
drivers/video/fbdev/core/sysimgblt.c | 49 +++++++++++++++++++++-------
1 file changed, 38 insertions(+), 11 deletions(-)
diff --git a/drivers/video/fbdev/core/sysimgblt.c b/drivers/video/fbdev/core/sysimgblt.c
index a4d05b1b17d7..722c327a381b 100644
--- a/drivers/video/fbdev/core/sysimgblt.c
+++ b/drivers/video/fbdev/core/sysimgblt.c
@@ -188,23 +188,29 @@ static void fast_imageblit(const struct fb_image *image, struct fb_info *p,
{
u32 fgx = fgcolor, bgx = bgcolor, bpp = p->var.bits_per_pixel;
u32 ppw = 32/bpp, spitch = (image->width + 7)/8;
- u32 bit_mask, end_mask, eorx, shift;
+ u32 bit_mask, eorx;
const char *s = image->data, *src;
u32 *dst;
- const u32 *tab = NULL;
+ const u32 *tab;
+ size_t tablen;
+ u32 colortab[16];
int i, j, k;
switch (bpp) {
case 8:
tab = fb_be_math(p) ? cfb_tab8_be : cfb_tab8_le;
+ tablen = 16;
break;
case 16:
tab = fb_be_math(p) ? cfb_tab16_be : cfb_tab16_le;
+ tablen = 4;
break;
case 32:
- default:
tab = cfb_tab32;
+ tablen = 2;
break;
+ default:
+ return;
}
for (i = ppw-1; i--; ) {
@@ -218,19 +224,40 @@ static void fast_imageblit(const struct fb_image *image, struct fb_info *p,
eorx = fgx ^ bgx;
k = image->width/ppw;
+ for (i = 0; i < tablen; ++i)
+ colortab[i] = (tab[i] & eorx) ^ bgx;
+
for (i = image->height; i--; ) {
dst = dst1;
- shift = 8;
src = s;
- for (j = k; j--; ) {
- shift -= ppw;
- end_mask = tab[(*src >> shift) & bit_mask];
- *dst++ = (end_mask & eorx) ^ bgx;
- if (!shift) {
- shift = 8;
- src++;
+ switch (ppw) {
+ case 4: /* 8 bpp */
+ for (j = k; j; j -= 2, ++src) {
+ *dst++ = colortab[(*src >> 4) & bit_mask];
+ *dst++ = colortab[(*src >> 0) & bit_mask];
+ }
+ break;
+ case 2: /* 16 bpp */
+ for (j = k; j; j -= 4, ++src) {
+ *dst++ = colortab[(*src >> 6) & bit_mask];
+ *dst++ = colortab[(*src >> 4) & bit_mask];
+ *dst++ = colortab[(*src >> 2) & bit_mask];
+ *dst++ = colortab[(*src >> 0) & bit_mask];
+ }
+ break;
+ case 1: /* 32 bpp */
+ for (j = k; j; j -= 8, ++src) {
+ *dst++ = colortab[(*src >> 7) & bit_mask];
+ *dst++ = colortab[(*src >> 6) & bit_mask];
+ *dst++ = colortab[(*src >> 5) & bit_mask];
+ *dst++ = colortab[(*src >> 4) & bit_mask];
+ *dst++ = colortab[(*src >> 3) & bit_mask];
+ *dst++ = colortab[(*src >> 2) & bit_mask];
+ *dst++ = colortab[(*src >> 1) & bit_mask];
+ *dst++ = colortab[(*src >> 0) & bit_mask];
}
+ break;
}
dst1 += p->fix.line_length;
s += spitch;
--
2.35.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 3/5] fbdev: Remove trailing whitespaces from cfbimgblt.c
2022-02-21 19:54 [PATCH v2 0/5] fbdev: Improve performance of fbdev console Thomas Zimmermann
2022-02-21 19:54 ` [PATCH v2 1/5] fbdev: Improve performance of sys_fillrect() Thomas Zimmermann
2022-02-21 19:54 ` [PATCH v2 2/5] fbdev: Improve performance of sys_imageblit() Thomas Zimmermann
@ 2022-02-21 19:54 ` Thomas Zimmermann
2022-02-21 19:54 ` [PATCH v2 4/5] fbdev: Improve performance of cfb_imageblit() Thomas Zimmermann
2022-02-21 19:54 ` [PATCH v2 5/5] drm: Add TODO item for optimizing format helpers Thomas Zimmermann
4 siblings, 0 replies; 8+ messages in thread
From: Thomas Zimmermann @ 2022-02-21 19:54 UTC (permalink / raw)
To: daniel, deller, javierm, geert, sam, kraxel
Cc: linux-fbdev, Thomas Zimmermann, dri-devel
Fix coding style. No functional changes.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
drivers/video/fbdev/core/cfbimgblt.c | 60 ++++++++++++++--------------
1 file changed, 30 insertions(+), 30 deletions(-)
diff --git a/drivers/video/fbdev/core/cfbimgblt.c b/drivers/video/fbdev/core/cfbimgblt.c
index a2bb276a8b24..01b01a279681 100644
--- a/drivers/video/fbdev/core/cfbimgblt.c
+++ b/drivers/video/fbdev/core/cfbimgblt.c
@@ -16,15 +16,15 @@
* must be laid out exactly in the same format as the framebuffer. Yes I know
* their are cards with hardware that coverts images of various depths to the
* framebuffer depth. But not every card has this. All images must be rounded
- * up to the nearest byte. For example a bitmap 12 bits wide must be two
- * bytes width.
+ * up to the nearest byte. For example a bitmap 12 bits wide must be two
+ * bytes width.
*
- * Tony:
- * Incorporate mask tables similar to fbcon-cfb*.c in 2.4 API. This speeds
+ * Tony:
+ * Incorporate mask tables similar to fbcon-cfb*.c in 2.4 API. This speeds
* up the code significantly.
- *
+ *
* Code for depths not multiples of BITS_PER_LONG is still kludgy, which is
- * still processed a bit at a time.
+ * still processed a bit at a time.
*
* Also need to add code to deal with cards endians that are different than
* the native cpu endians. I also need to deal with MSB position in the word.
@@ -72,8 +72,8 @@ static const u32 cfb_tab32[] = {
#define FB_WRITEL fb_writel
#define FB_READL fb_readl
-static inline void color_imageblit(const struct fb_image *image,
- struct fb_info *p, u8 __iomem *dst1,
+static inline void color_imageblit(const struct fb_image *image,
+ struct fb_info *p, u8 __iomem *dst1,
u32 start_index,
u32 pitch_index)
{
@@ -92,7 +92,7 @@ static inline void color_imageblit(const struct fb_image *image,
dst = (u32 __iomem *) dst1;
shift = 0;
val = 0;
-
+
if (start_index) {
u32 start_mask = ~fb_shifted_pixels_mask_u32(p,
start_index, bswapmask);
@@ -109,8 +109,8 @@ static inline void color_imageblit(const struct fb_image *image,
val |= FB_SHIFT_HIGH(p, color, shift ^ bswapmask);
if (shift >= null_bits) {
FB_WRITEL(val, dst++);
-
- val = (shift == null_bits) ? 0 :
+
+ val = (shift == null_bits) ? 0 :
FB_SHIFT_LOW(p, color, 32 - shift);
}
shift += bpp;
@@ -134,9 +134,9 @@ static inline void color_imageblit(const struct fb_image *image,
}
}
-static inline void slow_imageblit(const struct fb_image *image, struct fb_info *p,
+static inline void slow_imageblit(const struct fb_image *image, struct fb_info *p,
u8 __iomem *dst1, u32 fgcolor,
- u32 bgcolor,
+ u32 bgcolor,
u32 start_index,
u32 pitch_index)
{
@@ -172,7 +172,7 @@ static inline void slow_imageblit(const struct fb_image *image, struct fb_info *
l--;
color = (*s & (1 << l)) ? fgcolor : bgcolor;
val |= FB_SHIFT_HIGH(p, color, shift ^ bswapmask);
-
+
/* Did the bitshift spill bits to the next long? */
if (shift >= null_bits) {
FB_WRITEL(val, dst++);
@@ -191,16 +191,16 @@ static inline void slow_imageblit(const struct fb_image *image, struct fb_info *
FB_WRITEL((FB_READL(dst) & end_mask) | val, dst);
}
-
+
dst1 += pitch;
- src += spitch;
+ src += spitch;
if (pitch_index) {
dst2 += pitch;
dst1 = (u8 __iomem *)((long __force)dst2 & ~(sizeof(u32) - 1));
start_index += pitch_index;
start_index &= 32 - 1;
}
-
+
}
}
@@ -212,9 +212,9 @@ static inline void slow_imageblit(const struct fb_image *image, struct fb_info *
* fix->line_legth is divisible by 4;
* beginning and end of a scanline is dword aligned
*/
-static inline void fast_imageblit(const struct fb_image *image, struct fb_info *p,
- u8 __iomem *dst1, u32 fgcolor,
- u32 bgcolor)
+static inline void fast_imageblit(const struct fb_image *image, struct fb_info *p,
+ u8 __iomem *dst1, u32 fgcolor,
+ u32 bgcolor)
{
u32 fgx = fgcolor, bgx = bgcolor, bpp = p->var.bits_per_pixel;
u32 ppw = 32/bpp, spitch = (image->width + 7)/8;
@@ -243,25 +243,25 @@ static inline void fast_imageblit(const struct fb_image *image, struct fb_info *
fgx |= fgcolor;
bgx |= bgcolor;
}
-
+
bit_mask = (1 << ppw) - 1;
eorx = fgx ^ bgx;
k = image->width/ppw;
for (i = image->height; i--; ) {
dst = (u32 __iomem *) dst1, shift = 8; src = s;
-
+
for (j = k; j--; ) {
shift -= ppw;
end_mask = tab[(*src >> shift) & bit_mask];
FB_WRITEL((end_mask & eorx)^bgx, dst++);
- if (!shift) { shift = 8; src++; }
+ if (!shift) { shift = 8; src++; }
}
dst1 += p->fix.line_length;
s += spitch;
}
-}
-
+}
+
void cfb_imageblit(struct fb_info *p, const struct fb_image *image)
{
u32 fgcolor, bgcolor, start_index, bitstart, pitch_index = 0;
@@ -292,13 +292,13 @@ void cfb_imageblit(struct fb_info *p, const struct fb_image *image)
} else {
fgcolor = image->fg_color;
bgcolor = image->bg_color;
- }
-
- if (32 % bpp == 0 && !start_index && !pitch_index &&
+ }
+
+ if (32 % bpp == 0 && !start_index && !pitch_index &&
((width & (32/bpp-1)) == 0) &&
- bpp >= 8 && bpp <= 32)
+ bpp >= 8 && bpp <= 32)
fast_imageblit(image, p, dst1, fgcolor, bgcolor);
- else
+ else
slow_imageblit(image, p, dst1, fgcolor, bgcolor,
start_index, pitch_index);
} else
--
2.35.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 4/5] fbdev: Improve performance of cfb_imageblit()
2022-02-21 19:54 [PATCH v2 0/5] fbdev: Improve performance of fbdev console Thomas Zimmermann
` (2 preceding siblings ...)
2022-02-21 19:54 ` [PATCH v2 3/5] fbdev: Remove trailing whitespaces from cfbimgblt.c Thomas Zimmermann
@ 2022-02-21 19:54 ` Thomas Zimmermann
2022-02-22 13:01 ` Pekka Paalanen
2022-02-21 19:54 ` [PATCH v2 5/5] drm: Add TODO item for optimizing format helpers Thomas Zimmermann
4 siblings, 1 reply; 8+ messages in thread
From: Thomas Zimmermann @ 2022-02-21 19:54 UTC (permalink / raw)
To: daniel, deller, javierm, geert, sam, kraxel
Cc: linux-fbdev, Thomas Zimmermann, dri-devel
Improve the performance of sys_imageblit() by manually unrolling
the inner blitting loop and moving some invariants out. The compiler
failed to do this automatically. This change keeps cfb_imageblit()
in sync with sys_imagebit().
A microbenchmark measures the average number of CPU cycles
for sys_imageblit() after a stabilizing period of a few minutes
(i7-4790, FullHD, simpledrm, kernel with debugging).
sys_imageblit(), new: 15724 cycles
cfb_imageblit(): old: 30566 cycles
In the optimized case, cfb_imageblit() is now ~2x faster than before.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
drivers/video/fbdev/core/cfbimgblt.c | 51 +++++++++++++++++++++++-----
1 file changed, 42 insertions(+), 9 deletions(-)
diff --git a/drivers/video/fbdev/core/cfbimgblt.c b/drivers/video/fbdev/core/cfbimgblt.c
index 01b01a279681..7361cfabdd85 100644
--- a/drivers/video/fbdev/core/cfbimgblt.c
+++ b/drivers/video/fbdev/core/cfbimgblt.c
@@ -218,23 +218,29 @@ static inline void fast_imageblit(const struct fb_image *image, struct fb_info *
{
u32 fgx = fgcolor, bgx = bgcolor, bpp = p->var.bits_per_pixel;
u32 ppw = 32/bpp, spitch = (image->width + 7)/8;
- u32 bit_mask, end_mask, eorx, shift;
+ u32 bit_mask, eorx;
const char *s = image->data, *src;
u32 __iomem *dst;
const u32 *tab = NULL;
+ size_t tablen;
+ u32 colortab[16];
int i, j, k;
switch (bpp) {
case 8:
tab = fb_be_math(p) ? cfb_tab8_be : cfb_tab8_le;
+ tablen = 16;
break;
case 16:
tab = fb_be_math(p) ? cfb_tab16_be : cfb_tab16_le;
+ tablen = 4;
break;
case 32:
- default:
tab = cfb_tab32;
+ tablen = 2;
break;
+ default:
+ return;
}
for (i = ppw-1; i--; ) {
@@ -248,15 +254,42 @@ static inline void fast_imageblit(const struct fb_image *image, struct fb_info *
eorx = fgx ^ bgx;
k = image->width/ppw;
- for (i = image->height; i--; ) {
- dst = (u32 __iomem *) dst1, shift = 8; src = s;
+ for (i = 0; i < tablen; ++i)
+ colortab[i] = (tab[i] & eorx) ^ bgx;
- for (j = k; j--; ) {
- shift -= ppw;
- end_mask = tab[(*src >> shift) & bit_mask];
- FB_WRITEL((end_mask & eorx)^bgx, dst++);
- if (!shift) { shift = 8; src++; }
+ for (i = image->height; i--; ) {
+ dst = (u32 __iomem *)dst1;
+ src = s;
+
+ switch (ppw) {
+ case 4: /* 8 bpp */
+ for (j = k; j; j -= 2, ++src) {
+ FB_WRITEL(colortab[(*src >> 4) & bit_mask], dst++);
+ FB_WRITEL(colortab[(*src >> 0) & bit_mask], dst++);
+ }
+ break;
+ case 2: /* 16 bpp */
+ for (j = k; j; j -= 4, ++src) {
+ FB_WRITEL(colortab[(*src >> 6) & bit_mask], dst++);
+ FB_WRITEL(colortab[(*src >> 4) & bit_mask], dst++);
+ FB_WRITEL(colortab[(*src >> 2) & bit_mask], dst++);
+ FB_WRITEL(colortab[(*src >> 0) & bit_mask], dst++);
+ }
+ break;
+ case 1: /* 32 bpp */
+ for (j = k; j; j -= 8, ++src) {
+ FB_WRITEL(colortab[(*src >> 7) & bit_mask], dst++);
+ FB_WRITEL(colortab[(*src >> 6) & bit_mask], dst++);
+ FB_WRITEL(colortab[(*src >> 5) & bit_mask], dst++);
+ FB_WRITEL(colortab[(*src >> 4) & bit_mask], dst++);
+ FB_WRITEL(colortab[(*src >> 3) & bit_mask], dst++);
+ FB_WRITEL(colortab[(*src >> 2) & bit_mask], dst++);
+ FB_WRITEL(colortab[(*src >> 1) & bit_mask], dst++);
+ FB_WRITEL(colortab[(*src >> 0) & bit_mask], dst++);
+ }
+ break;
}
+
dst1 += p->fix.line_length;
s += spitch;
}
--
2.35.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 5/5] drm: Add TODO item for optimizing format helpers
2022-02-21 19:54 [PATCH v2 0/5] fbdev: Improve performance of fbdev console Thomas Zimmermann
` (3 preceding siblings ...)
2022-02-21 19:54 ` [PATCH v2 4/5] fbdev: Improve performance of cfb_imageblit() Thomas Zimmermann
@ 2022-02-21 19:54 ` Thomas Zimmermann
4 siblings, 0 replies; 8+ messages in thread
From: Thomas Zimmermann @ 2022-02-21 19:54 UTC (permalink / raw)
To: daniel, deller, javierm, geert, sam, kraxel
Cc: linux-fbdev, Thomas Zimmermann, dri-devel
Add a TODO item for optimizing blitting and format-conversion helpers
in DRM and fbdev. There's always demand for faster graphics output.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
Documentation/gpu/todo.rst | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/Documentation/gpu/todo.rst b/Documentation/gpu/todo.rst
index 7bf7f2111696..7f113c6a02dd 100644
--- a/Documentation/gpu/todo.rst
+++ b/Documentation/gpu/todo.rst
@@ -241,6 +241,28 @@ Contact: Thomas Zimmermann <tzimmermann@suse.de>, Daniel Vetter
Level: Advanced
+Benchmark and optimize blitting and format-conversion function
+--------------------------------------------------------------
+
+Drawing to dispay memory quickly is crucial for many applications'
+performance.
+
+On at least x86-64, sys_imageblit() is significantly slower than
+cfb_imageblit(), even though both use the same blitting algorithm and
+the latter is written for I/O memory. It turns out that cfb_imageblit()
+uses movl instructions, while sys_imageblit apparently does not. This
+seems to be a problem with gcc's optimizer. DRM's format-conversion
+heleprs might be subject to similar issues.
+
+Benchmark and optimize fbdev's sys_() helpers and DRM's format-conversion
+helpers. In cases that can be further optimized, maybe implement a different
+algorithm, For micro-optimizations, use movl/movq instructions explicitly.
+That might possibly require architecture specific helpers (e.g., storel()
+storeq()).
+
+Contact: Thomas Zimmermann <tzimmermann@suse.de>
+
+Level: Intermediate
drm_framebuffer_funcs and drm_mode_config_funcs.fb_create cleanup
-----------------------------------------------------------------
--
2.35.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 4/5] fbdev: Improve performance of cfb_imageblit()
2022-02-21 19:54 ` [PATCH v2 4/5] fbdev: Improve performance of cfb_imageblit() Thomas Zimmermann
@ 2022-02-22 13:01 ` Pekka Paalanen
2022-02-22 18:48 ` Thomas Zimmermann
0 siblings, 1 reply; 8+ messages in thread
From: Pekka Paalanen @ 2022-02-22 13:01 UTC (permalink / raw)
To: Thomas Zimmermann
Cc: linux-fbdev, deller, javierm, dri-devel, geert, kraxel, sam
[-- Attachment #1: Type: text/plain, Size: 999 bytes --]
On Mon, 21 Feb 2022 20:54:09 +0100
Thomas Zimmermann <tzimmermann@suse.de> wrote:
> Improve the performance of sys_imageblit() by manually unrolling
sys?
> the inner blitting loop and moving some invariants out. The compiler
> failed to do this automatically. This change keeps cfb_imageblit()
> in sync with sys_imagebit().
This is correct here.
>
> A microbenchmark measures the average number of CPU cycles
> for sys_imageblit() after a stabilizing period of a few minutes
sys?
> (i7-4790, FullHD, simpledrm, kernel with debugging).
>
> sys_imageblit(), new: 15724 cycles
sys?
> cfb_imageblit(): old: 30566 cycles
>
> In the optimized case, cfb_imageblit() is now ~2x faster than before.
>
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
> drivers/video/fbdev/core/cfbimgblt.c | 51 +++++++++++++++++++++++-----
> 1 file changed, 42 insertions(+), 9 deletions(-)
Just noticed some confusion in the commit message.
Thanks,
pq
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 4/5] fbdev: Improve performance of cfb_imageblit()
2022-02-22 13:01 ` Pekka Paalanen
@ 2022-02-22 18:48 ` Thomas Zimmermann
0 siblings, 0 replies; 8+ messages in thread
From: Thomas Zimmermann @ 2022-02-22 18:48 UTC (permalink / raw)
To: Pekka Paalanen
Cc: linux-fbdev, deller, javierm, dri-devel, geert, kraxel, sam
[-- Attachment #1.1: Type: text/plain, Size: 1434 bytes --]
Hi
Am 22.02.22 um 14:01 schrieb Pekka Paalanen:
> On Mon, 21 Feb 2022 20:54:09 +0100
> Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
>> Improve the performance of sys_imageblit() by manually unrolling
>
> sys?
>
>> the inner blitting loop and moving some invariants out. The compiler
>> failed to do this automatically. This change keeps cfb_imageblit()
>> in sync with sys_imagebit().
>
> This is correct here.
>
>>
>> A microbenchmark measures the average number of CPU cycles
>> for sys_imageblit() after a stabilizing period of a few minutes
>
> sys?
>
>> (i7-4790, FullHD, simpledrm, kernel with debugging).
>>
>> sys_imageblit(), new: 15724 cycles
>
> sys?
>
>> cfb_imageblit(): old: 30566 cycles
>>
>> In the optimized case, cfb_imageblit() is now ~2x faster than before.
>>
>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>> ---
>> drivers/video/fbdev/core/cfbimgblt.c | 51 +++++++++++++++++++++++-----
>> 1 file changed, 42 insertions(+), 9 deletions(-)
>
> Just noticed some confusion in the commit message.
I copied some of the text from the other commit and I could have sworn I
updated it. But apparently not.
Best regards
Thomas
>
>
> Thanks,
> pq
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-02-22 18:48 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-21 19:54 [PATCH v2 0/5] fbdev: Improve performance of fbdev console Thomas Zimmermann
2022-02-21 19:54 ` [PATCH v2 1/5] fbdev: Improve performance of sys_fillrect() Thomas Zimmermann
2022-02-21 19:54 ` [PATCH v2 2/5] fbdev: Improve performance of sys_imageblit() Thomas Zimmermann
2022-02-21 19:54 ` [PATCH v2 3/5] fbdev: Remove trailing whitespaces from cfbimgblt.c Thomas Zimmermann
2022-02-21 19:54 ` [PATCH v2 4/5] fbdev: Improve performance of cfb_imageblit() Thomas Zimmermann
2022-02-22 13:01 ` Pekka Paalanen
2022-02-22 18:48 ` Thomas Zimmermann
2022-02-21 19:54 ` [PATCH v2 5/5] drm: Add TODO item for optimizing format helpers Thomas Zimmermann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).