* Re: fbdev: Garbage collect fbdev scrolling acceleration @ 2022-01-13 16:36 Helge Deller 2022-01-13 21:46 ` Sven Schnelle 0 siblings, 1 reply; 15+ messages in thread From: Helge Deller @ 2022-01-13 16:36 UTC (permalink / raw) To: Hamza Mahfooz, Thomas Zimmermann, linux-fbdev, dri-devel, Geert Uytterhoeven Cc: Sven Schnelle I may have missed some discussions, but I'm objecting against this patch: b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)") Can we please (partly) revert it and restore the scrolling behaviour, where fbcon uses fb_copyarea() to copy the screen contents instead of redrawing the whole screen? I'm fine with dropping the ypan-functionality. Maybe on fast new x86 boxes the performance difference isn't huge, but for all old systems, or when emulated in qemu, this makes a big difference. Helge ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fbdev: Garbage collect fbdev scrolling acceleration 2022-01-13 16:36 fbdev: Garbage collect fbdev scrolling acceleration Helge Deller @ 2022-01-13 21:46 ` Sven Schnelle 0 siblings, 0 replies; 15+ messages in thread From: Sven Schnelle @ 2022-01-13 21:46 UTC (permalink / raw) To: Helge Deller Cc: linux-fbdev, Geert Uytterhoeven, dri-devel, Thomas Zimmermann, Hamza Mahfooz Helge Deller <deller@gmx.de> writes: > I may have missed some discussions, but I'm objecting against this patch: > > b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)") > > Can we please (partly) revert it and restore the scrolling behaviour, > where fbcon uses fb_copyarea() to copy the screen contents instead of > redrawing the whole screen? > > I'm fine with dropping the ypan-functionality. > > Maybe on fast new x86 boxes the performance difference isn't huge, > but for all old systems, or when emulated in qemu, this makes > a big difference. > > Helge I second that. For most people, the framebuffer isn't important as they're mostly interested in getting to X11/wayland as fast as possible. But for systems like servers without X11 it's nice to have a fast console. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fbdev: Garbage collect fbdev scrolling acceleration @ 2022-01-13 21:46 ` Sven Schnelle 0 siblings, 0 replies; 15+ messages in thread From: Sven Schnelle @ 2022-01-13 21:46 UTC (permalink / raw) To: Helge Deller Cc: Hamza Mahfooz, Thomas Zimmermann, linux-fbdev, dri-devel, Geert Uytterhoeven Helge Deller <deller@gmx.de> writes: > I may have missed some discussions, but I'm objecting against this patch: > > b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)") > > Can we please (partly) revert it and restore the scrolling behaviour, > where fbcon uses fb_copyarea() to copy the screen contents instead of > redrawing the whole screen? > > I'm fine with dropping the ypan-functionality. > > Maybe on fast new x86 boxes the performance difference isn't huge, > but for all old systems, or when emulated in qemu, this makes > a big difference. > > Helge I second that. For most people, the framebuffer isn't important as they're mostly interested in getting to X11/wayland as fast as possible. But for systems like servers without X11 it's nice to have a fast console. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fbdev: Garbage collect fbdev scrolling acceleration 2022-01-13 21:46 ` Sven Schnelle @ 2022-01-19 15:39 ` Daniel Vetter -1 siblings, 0 replies; 15+ messages in thread From: Daniel Vetter @ 2022-01-19 15:39 UTC (permalink / raw) To: Sven Schnelle Cc: Helge Deller, linux-fbdev, Geert Uytterhoeven, dri-devel, Thomas Zimmermann, Hamza Mahfooz On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote: > Helge Deller <deller@gmx.de> writes: > > > I may have missed some discussions, but I'm objecting against this patch: > > > > b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)") > > > > Can we please (partly) revert it and restore the scrolling behaviour, > > where fbcon uses fb_copyarea() to copy the screen contents instead of > > redrawing the whole screen? > > > > I'm fine with dropping the ypan-functionality. > > > > Maybe on fast new x86 boxes the performance difference isn't huge, > > but for all old systems, or when emulated in qemu, this makes > > a big difference. > > > > Helge > > I second that. For most people, the framebuffer isn't important as > they're mostly interested in getting to X11/wayland as fast as possible. > But for systems like servers without X11 it's nice to have a fast > console. Fast console howto: - shadow buffer in cached memory - timer based upload of changed areas to the real framebuffer This one is actually fast, instead of trying to use hw bltcopy and having the most terrible fallback path if that's gone. Yes drm fbdev helpers has this (but not enabled on most drivers because very, very few people care). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fbdev: Garbage collect fbdev scrolling acceleration @ 2022-01-19 15:39 ` Daniel Vetter 0 siblings, 0 replies; 15+ messages in thread From: Daniel Vetter @ 2022-01-19 15:39 UTC (permalink / raw) To: Sven Schnelle Cc: linux-fbdev, Hamza Mahfooz, Helge Deller, dri-devel, Geert Uytterhoeven, Thomas Zimmermann On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote: > Helge Deller <deller@gmx.de> writes: > > > I may have missed some discussions, but I'm objecting against this patch: > > > > b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)") > > > > Can we please (partly) revert it and restore the scrolling behaviour, > > where fbcon uses fb_copyarea() to copy the screen contents instead of > > redrawing the whole screen? > > > > I'm fine with dropping the ypan-functionality. > > > > Maybe on fast new x86 boxes the performance difference isn't huge, > > but for all old systems, or when emulated in qemu, this makes > > a big difference. > > > > Helge > > I second that. For most people, the framebuffer isn't important as > they're mostly interested in getting to X11/wayland as fast as possible. > But for systems like servers without X11 it's nice to have a fast > console. Fast console howto: - shadow buffer in cached memory - timer based upload of changed areas to the real framebuffer This one is actually fast, instead of trying to use hw bltcopy and having the most terrible fallback path if that's gone. Yes drm fbdev helpers has this (but not enabled on most drivers because very, very few people care). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fbdev: Garbage collect fbdev scrolling acceleration 2022-01-19 15:39 ` Daniel Vetter @ 2022-01-19 16:15 ` Sven Schnelle -1 siblings, 0 replies; 15+ messages in thread From: Sven Schnelle @ 2022-01-19 16:15 UTC (permalink / raw) To: Daniel Vetter Cc: Helge Deller, linux-fbdev, Geert Uytterhoeven, dri-devel, Thomas Zimmermann, Hamza Mahfooz Hi Daniel, Daniel Vetter <daniel@ffwll.ch> writes: > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote: >> Helge Deller <deller@gmx.de> writes: >> > Maybe on fast new x86 boxes the performance difference isn't huge, >> > but for all old systems, or when emulated in qemu, this makes >> > a big difference. >> > >> > Helge >> >> I second that. For most people, the framebuffer isn't important as >> they're mostly interested in getting to X11/wayland as fast as possible. >> But for systems like servers without X11 it's nice to have a fast >> console. > > Fast console howto: > - shadow buffer in cached memory > - timer based upload of changed areas to the real framebuffer > > This one is actually fast, instead of trying to use hw bltcopy and having > the most terrible fallback path if that's gone. Yes drm fbdev helpers has > this (but not enabled on most drivers because very, very few people care). Hmm.... Take my Laptop with a 4k (3180x2160) screen as an example: Lets say on average the half of every line is filled with text. So 3840/2*2160 pixels that change = 4147200 pixels. Every pixel takes 4 bytes = 16,588800 bytes per timer interrupt. In another Mail updating on vsync was mentioned, so multiply that by 60 and get ~927MB. And even if you only update the screen ony 4 times per second, that would be ~64MB of data. I'm likely missing something here. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fbdev: Garbage collect fbdev scrolling acceleration @ 2022-01-19 16:15 ` Sven Schnelle 0 siblings, 0 replies; 15+ messages in thread From: Sven Schnelle @ 2022-01-19 16:15 UTC (permalink / raw) To: Daniel Vetter Cc: linux-fbdev, Hamza Mahfooz, Helge Deller, dri-devel, Geert Uytterhoeven, Thomas Zimmermann Hi Daniel, Daniel Vetter <daniel@ffwll.ch> writes: > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote: >> Helge Deller <deller@gmx.de> writes: >> > Maybe on fast new x86 boxes the performance difference isn't huge, >> > but for all old systems, or when emulated in qemu, this makes >> > a big difference. >> > >> > Helge >> >> I second that. For most people, the framebuffer isn't important as >> they're mostly interested in getting to X11/wayland as fast as possible. >> But for systems like servers without X11 it's nice to have a fast >> console. > > Fast console howto: > - shadow buffer in cached memory > - timer based upload of changed areas to the real framebuffer > > This one is actually fast, instead of trying to use hw bltcopy and having > the most terrible fallback path if that's gone. Yes drm fbdev helpers has > this (but not enabled on most drivers because very, very few people care). Hmm.... Take my Laptop with a 4k (3180x2160) screen as an example: Lets say on average the half of every line is filled with text. So 3840/2*2160 pixels that change = 4147200 pixels. Every pixel takes 4 bytes = 16,588800 bytes per timer interrupt. In another Mail updating on vsync was mentioned, so multiply that by 60 and get ~927MB. And even if you only update the screen ony 4 times per second, that would be ~64MB of data. I'm likely missing something here. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fbdev: Garbage collect fbdev scrolling acceleration 2022-01-19 16:15 ` Sven Schnelle @ 2022-01-19 16:21 ` Daniel Vetter -1 siblings, 0 replies; 15+ messages in thread From: Daniel Vetter @ 2022-01-19 16:21 UTC (permalink / raw) To: Sven Schnelle Cc: Daniel Vetter, Helge Deller, linux-fbdev, Geert Uytterhoeven, dri-devel, Thomas Zimmermann, Hamza Mahfooz On Wed, Jan 19, 2022 at 05:15:44PM +0100, Sven Schnelle wrote: > Hi Daniel, > > Daniel Vetter <daniel@ffwll.ch> writes: > > > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote: > >> Helge Deller <deller@gmx.de> writes: > >> > Maybe on fast new x86 boxes the performance difference isn't huge, > >> > but for all old systems, or when emulated in qemu, this makes > >> > a big difference. > >> > > >> > Helge > >> > >> I second that. For most people, the framebuffer isn't important as > >> they're mostly interested in getting to X11/wayland as fast as possible. > >> But for systems like servers without X11 it's nice to have a fast > >> console. > > > > Fast console howto: > > - shadow buffer in cached memory > > - timer based upload of changed areas to the real framebuffer > > > > This one is actually fast, instead of trying to use hw bltcopy and having > > the most terrible fallback path if that's gone. Yes drm fbdev helpers has > > this (but not enabled on most drivers because very, very few people care). > > Hmm.... Take my Laptop with a 4k (3180x2160) screen as an example: > > Lets say on average the half of every line is filled with text. > > So 3840/2*2160 pixels that change = 4147200 pixels. Every pixel takes 4 > bytes = 16,588800 bytes per timer interrupt. In another Mail updating on > vsync was mentioned, so multiply that by 60 and get ~927MB. And even if > you only update the screen ony 4 times per second, that would be ~64MB > of data. I'm likely missing something here. Since you say 4k it's a modern box, so you have on the order of 10GB/s of write bandwidth. And around 100MB/s of read bandwidth. Both from the cpu. It all adds up. It's that uncached read which kills you and means dmesg takes seconds to display. Also since this is 4k looking at sales volume we're talking integrated, so whether it's the gpu or the cpu that's doing the memcpy, it's the same memory bw budget you're burning down. And at that point doing less copying (which the shadow buffer thing will do compared to fbcon accelerated scrolling for every line) is the win. And since max&usual resolutions pretty much scales down with pcie or memory bandwidth for roughly the last 2-3 decades, this all works as well on old stuff. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fbdev: Garbage collect fbdev scrolling acceleration @ 2022-01-19 16:21 ` Daniel Vetter 0 siblings, 0 replies; 15+ messages in thread From: Daniel Vetter @ 2022-01-19 16:21 UTC (permalink / raw) To: Sven Schnelle Cc: linux-fbdev, Hamza Mahfooz, Helge Deller, dri-devel, Geert Uytterhoeven, Thomas Zimmermann On Wed, Jan 19, 2022 at 05:15:44PM +0100, Sven Schnelle wrote: > Hi Daniel, > > Daniel Vetter <daniel@ffwll.ch> writes: > > > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote: > >> Helge Deller <deller@gmx.de> writes: > >> > Maybe on fast new x86 boxes the performance difference isn't huge, > >> > but for all old systems, or when emulated in qemu, this makes > >> > a big difference. > >> > > >> > Helge > >> > >> I second that. For most people, the framebuffer isn't important as > >> they're mostly interested in getting to X11/wayland as fast as possible. > >> But for systems like servers without X11 it's nice to have a fast > >> console. > > > > Fast console howto: > > - shadow buffer in cached memory > > - timer based upload of changed areas to the real framebuffer > > > > This one is actually fast, instead of trying to use hw bltcopy and having > > the most terrible fallback path if that's gone. Yes drm fbdev helpers has > > this (but not enabled on most drivers because very, very few people care). > > Hmm.... Take my Laptop with a 4k (3180x2160) screen as an example: > > Lets say on average the half of every line is filled with text. > > So 3840/2*2160 pixels that change = 4147200 pixels. Every pixel takes 4 > bytes = 16,588800 bytes per timer interrupt. In another Mail updating on > vsync was mentioned, so multiply that by 60 and get ~927MB. And even if > you only update the screen ony 4 times per second, that would be ~64MB > of data. I'm likely missing something here. Since you say 4k it's a modern box, so you have on the order of 10GB/s of write bandwidth. And around 100MB/s of read bandwidth. Both from the cpu. It all adds up. It's that uncached read which kills you and means dmesg takes seconds to display. Also since this is 4k looking at sales volume we're talking integrated, so whether it's the gpu or the cpu that's doing the memcpy, it's the same memory bw budget you're burning down. And at that point doing less copying (which the shadow buffer thing will do compared to fbcon accelerated scrolling for every line) is the win. And since max&usual resolutions pretty much scales down with pcie or memory bandwidth for roughly the last 2-3 decades, this all works as well on old stuff. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fbdev: Garbage collect fbdev scrolling acceleration 2022-01-19 16:21 ` Daniel Vetter @ 2022-01-19 16:33 ` Sven Schnelle -1 siblings, 0 replies; 15+ messages in thread From: Sven Schnelle @ 2022-01-19 16:33 UTC (permalink / raw) To: Daniel Vetter Cc: linux-fbdev, Hamza Mahfooz, Helge Deller, dri-devel, Geert Uytterhoeven, Thomas Zimmermann Hi Daniel, Daniel Vetter <daniel@ffwll.ch> writes: > On Wed, Jan 19, 2022 at 05:15:44PM +0100, Sven Schnelle wrote: >> Hi Daniel, >> >> Daniel Vetter <daniel@ffwll.ch> writes: >> >> > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote: >> >> Helge Deller <deller@gmx.de> writes: >> >> > Maybe on fast new x86 boxes the performance difference isn't huge, >> >> > but for all old systems, or when emulated in qemu, this makes >> >> > a big difference. >> >> > >> >> > Helge >> >> >> >> I second that. For most people, the framebuffer isn't important as >> >> they're mostly interested in getting to X11/wayland as fast as possible. >> >> But for systems like servers without X11 it's nice to have a fast >> >> console. >> > >> > Fast console howto: >> > - shadow buffer in cached memory >> > - timer based upload of changed areas to the real framebuffer >> > >> > This one is actually fast, instead of trying to use hw bltcopy and having >> > the most terrible fallback path if that's gone. Yes drm fbdev helpers has >> > this (but not enabled on most drivers because very, very few people care). >> >> Hmm.... Take my Laptop with a 4k (3180x2160) screen as an example: >> >> Lets say on average the half of every line is filled with text. >> >> So 3840/2*2160 pixels that change = 4147200 pixels. Every pixel takes 4 >> bytes = 16,588800 bytes per timer interrupt. In another Mail updating on >> vsync was mentioned, so multiply that by 60 and get ~927MB. And even if >> you only update the screen ony 4 times per second, that would be ~64MB >> of data. I'm likely missing something here. > > Since you say 4k it's a modern box, so you have on the order of 10GB/s of > write bandwidth. > > And around 100MB/s of read bandwidth. Both from the cpu. It all adds up. > It's that uncached read which kills you and means dmesg takes seconds to > display. > > Also since this is 4k looking at sales volume we're talking integrated, so > whether it's the gpu or the cpu that's doing the memcpy, it's the same > memory bw budget you're burning down. That might be true for integrated graphics, as said, i don't know the architecture. But saying it's good just because it's good on one architecture doesn't mean it's good for everyone. If you have an external GPU, than the memory/system bus BW would be different whether it's memcpy or the GPU doing the scrolling. And whether internal or external graphics - the CPU could do other stuff while the GPU scrolls stuff. Quite a lot of discussion for a revert of a patch that was already in the kernel for more than 20(?) years. /Sven ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fbdev: Garbage collect fbdev scrolling acceleration @ 2022-01-19 16:33 ` Sven Schnelle 0 siblings, 0 replies; 15+ messages in thread From: Sven Schnelle @ 2022-01-19 16:33 UTC (permalink / raw) To: Daniel Vetter Cc: Helge Deller, linux-fbdev, Geert Uytterhoeven, dri-devel, Thomas Zimmermann, Hamza Mahfooz Hi Daniel, Daniel Vetter <daniel@ffwll.ch> writes: > On Wed, Jan 19, 2022 at 05:15:44PM +0100, Sven Schnelle wrote: >> Hi Daniel, >> >> Daniel Vetter <daniel@ffwll.ch> writes: >> >> > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote: >> >> Helge Deller <deller@gmx.de> writes: >> >> > Maybe on fast new x86 boxes the performance difference isn't huge, >> >> > but for all old systems, or when emulated in qemu, this makes >> >> > a big difference. >> >> > >> >> > Helge >> >> >> >> I second that. For most people, the framebuffer isn't important as >> >> they're mostly interested in getting to X11/wayland as fast as possible. >> >> But for systems like servers without X11 it's nice to have a fast >> >> console. >> > >> > Fast console howto: >> > - shadow buffer in cached memory >> > - timer based upload of changed areas to the real framebuffer >> > >> > This one is actually fast, instead of trying to use hw bltcopy and having >> > the most terrible fallback path if that's gone. Yes drm fbdev helpers has >> > this (but not enabled on most drivers because very, very few people care). >> >> Hmm.... Take my Laptop with a 4k (3180x2160) screen as an example: >> >> Lets say on average the half of every line is filled with text. >> >> So 3840/2*2160 pixels that change = 4147200 pixels. Every pixel takes 4 >> bytes = 16,588800 bytes per timer interrupt. In another Mail updating on >> vsync was mentioned, so multiply that by 60 and get ~927MB. And even if >> you only update the screen ony 4 times per second, that would be ~64MB >> of data. I'm likely missing something here. > > Since you say 4k it's a modern box, so you have on the order of 10GB/s of > write bandwidth. > > And around 100MB/s of read bandwidth. Both from the cpu. It all adds up. > It's that uncached read which kills you and means dmesg takes seconds to > display. > > Also since this is 4k looking at sales volume we're talking integrated, so > whether it's the gpu or the cpu that's doing the memcpy, it's the same > memory bw budget you're burning down. That might be true for integrated graphics, as said, i don't know the architecture. But saying it's good just because it's good on one architecture doesn't mean it's good for everyone. If you have an external GPU, than the memory/system bus BW would be different whether it's memcpy or the GPU doing the scrolling. And whether internal or external graphics - the CPU could do other stuff while the GPU scrolls stuff. Quite a lot of discussion for a revert of a patch that was already in the kernel for more than 20(?) years. /Sven ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fbdev: Garbage collect fbdev scrolling acceleration 2022-01-19 15:39 ` Daniel Vetter @ 2022-01-24 18:27 ` Geert Uytterhoeven -1 siblings, 0 replies; 15+ messages in thread From: Geert Uytterhoeven @ 2022-01-24 18:27 UTC (permalink / raw) To: Daniel Vetter Cc: Sven Schnelle, Helge Deller, Linux Fbdev development list, DRI Development, Thomas Zimmermann, Hamza Mahfooz Hi Daniel et al, On Wed, Jan 19, 2022 at 4:39 PM Daniel Vetter <daniel@ffwll.ch> wrote: > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote: > > Helge Deller <deller@gmx.de> writes: > > > I may have missed some discussions, but I'm objecting against this patch: > > > > > > b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)") > > > > > > Can we please (partly) revert it and restore the scrolling behaviour, > > > where fbcon uses fb_copyarea() to copy the screen contents instead of > > > redrawing the whole screen? > > > > > > I'm fine with dropping the ypan-functionality. > > > > > > Maybe on fast new x86 boxes the performance difference isn't huge, > > > but for all old systems, or when emulated in qemu, this makes > > > a big difference. > > > > > > Helge > > > > I second that. For most people, the framebuffer isn't important as > > they're mostly interested in getting to X11/wayland as fast as possible. > > But for systems like servers without X11 it's nice to have a fast > > console. > > Fast console howto: > - shadow buffer in cached memory > - timer based upload of changed areas to the real framebuffer > > This one is actually fast, instead of trying to use hw bltcopy and having > the most terrible fallback path if that's gone. Yes drm fbdev helpers has > this (but not enabled on most drivers because very, very few people care). That depends on the hardware, and the balance between CPU-to-RAM, CPU-to-VRAM, and GPU-to-VRAM bandwidths, and CPU and GPU performance. When scrolling, the fastest copy is the copy that doesn't need to copy much. So that's why fbcon supports (or supported :-( many strategies: scrolling by wrapping, panning, copying (either by CPU or by (simple) GPU), re-rendering (useful for a GPU with bitmap expansion). So forcing everybody to render into a fully cached shadow buffer and upload changed areas is not the silver bullet. Whether text output is rendered immediately or not is completely orthogonal to this. While timer-based updates would speed up printing of large hunks of text (where no one actually reads what was printed at the top), that would have almost no impact on actual interactive console work: it may still take 0.5s to scroll the screen if you press "enter" when your cursor is positioned on the last line. BTW, implementing timer-based updates would make measuring real-world performance more difficult, as we would have to use a different benchmark than "time dmesg" ;-) Both Daniel and Thomas said: fbdev is not suitable for modern hardware. Fine, we do not debate that, and do not want to prevent you from using DRM for modern hardware. Then please accept us saying that DRM (in its current form) is not suitable for other types of graphics hardware. Still, even modern (embedded) hardware may have small low-color displays. For the last +5 years, we've been pointed to the tinydrm drivers, to serve as examples for converting existing fbdev drivers to drm drivers. All but one of them are drivers for hi-color or better hardware, thus surpassing the capabilities of lots of hardware driven by fbdev drivers. The other one is an e-ink driver that exposes an XRGB8888 shadow frame buffer, and converts that in a two-step process, first to 8-bit grayscale, second to 1-bit monochrome. If that is considered a good example, should I be impressed? Compare that to other subsystems boasting about zero-copy... Furthermore, for a contemporary e-ink device like[1], the shadow buffer would consume 10 MiB. Of course this device has 4 GiB of RAM, and quad Cortex-A55 CPU cores, but not all systems have 10 MiB to spare... [1] https://linuxgizmos.com/rk3566-based-pinenote-e-ink-tablet-ships-at-399/ Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fbdev: Garbage collect fbdev scrolling acceleration @ 2022-01-24 18:27 ` Geert Uytterhoeven 0 siblings, 0 replies; 15+ messages in thread From: Geert Uytterhoeven @ 2022-01-24 18:27 UTC (permalink / raw) To: Daniel Vetter Cc: Linux Fbdev development list, Hamza Mahfooz, Helge Deller, DRI Development, Thomas Zimmermann, Sven Schnelle Hi Daniel et al, On Wed, Jan 19, 2022 at 4:39 PM Daniel Vetter <daniel@ffwll.ch> wrote: > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote: > > Helge Deller <deller@gmx.de> writes: > > > I may have missed some discussions, but I'm objecting against this patch: > > > > > > b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)") > > > > > > Can we please (partly) revert it and restore the scrolling behaviour, > > > where fbcon uses fb_copyarea() to copy the screen contents instead of > > > redrawing the whole screen? > > > > > > I'm fine with dropping the ypan-functionality. > > > > > > Maybe on fast new x86 boxes the performance difference isn't huge, > > > but for all old systems, or when emulated in qemu, this makes > > > a big difference. > > > > > > Helge > > > > I second that. For most people, the framebuffer isn't important as > > they're mostly interested in getting to X11/wayland as fast as possible. > > But for systems like servers without X11 it's nice to have a fast > > console. > > Fast console howto: > - shadow buffer in cached memory > - timer based upload of changed areas to the real framebuffer > > This one is actually fast, instead of trying to use hw bltcopy and having > the most terrible fallback path if that's gone. Yes drm fbdev helpers has > this (but not enabled on most drivers because very, very few people care). That depends on the hardware, and the balance between CPU-to-RAM, CPU-to-VRAM, and GPU-to-VRAM bandwidths, and CPU and GPU performance. When scrolling, the fastest copy is the copy that doesn't need to copy much. So that's why fbcon supports (or supported :-( many strategies: scrolling by wrapping, panning, copying (either by CPU or by (simple) GPU), re-rendering (useful for a GPU with bitmap expansion). So forcing everybody to render into a fully cached shadow buffer and upload changed areas is not the silver bullet. Whether text output is rendered immediately or not is completely orthogonal to this. While timer-based updates would speed up printing of large hunks of text (where no one actually reads what was printed at the top), that would have almost no impact on actual interactive console work: it may still take 0.5s to scroll the screen if you press "enter" when your cursor is positioned on the last line. BTW, implementing timer-based updates would make measuring real-world performance more difficult, as we would have to use a different benchmark than "time dmesg" ;-) Both Daniel and Thomas said: fbdev is not suitable for modern hardware. Fine, we do not debate that, and do not want to prevent you from using DRM for modern hardware. Then please accept us saying that DRM (in its current form) is not suitable for other types of graphics hardware. Still, even modern (embedded) hardware may have small low-color displays. For the last +5 years, we've been pointed to the tinydrm drivers, to serve as examples for converting existing fbdev drivers to drm drivers. All but one of them are drivers for hi-color or better hardware, thus surpassing the capabilities of lots of hardware driven by fbdev drivers. The other one is an e-ink driver that exposes an XRGB8888 shadow frame buffer, and converts that in a two-step process, first to 8-bit grayscale, second to 1-bit monochrome. If that is considered a good example, should I be impressed? Compare that to other subsystems boasting about zero-copy... Furthermore, for a contemporary e-ink device like[1], the shadow buffer would consume 10 MiB. Of course this device has 4 GiB of RAM, and quad Cortex-A55 CPU cores, but not all systems have 10 MiB to spare... [1] https://linuxgizmos.com/rk3566-based-pinenote-e-ink-tablet-ships-at-399/ Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fbdev: Garbage collect fbdev scrolling acceleration 2022-01-24 18:27 ` Geert Uytterhoeven @ 2022-01-24 19:58 ` Daniel Vetter -1 siblings, 0 replies; 15+ messages in thread From: Daniel Vetter @ 2022-01-24 19:58 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Linux Fbdev development list, Hamza Mahfooz, Helge Deller, DRI Development, Thomas Zimmermann, Sven Schnelle On Mon, Jan 24, 2022 at 7:27 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > Hi Daniel et al, > > On Wed, Jan 19, 2022 at 4:39 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote: > > > Helge Deller <deller@gmx.de> writes: > > > > I may have missed some discussions, but I'm objecting against this patch: > > > > > > > > b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)") > > > > > > > > Can we please (partly) revert it and restore the scrolling behaviour, > > > > where fbcon uses fb_copyarea() to copy the screen contents instead of > > > > redrawing the whole screen? > > > > > > > > I'm fine with dropping the ypan-functionality. > > > > > > > > Maybe on fast new x86 boxes the performance difference isn't huge, > > > > but for all old systems, or when emulated in qemu, this makes > > > > a big difference. > > > > > > > > Helge > > > > > > I second that. For most people, the framebuffer isn't important as > > > they're mostly interested in getting to X11/wayland as fast as possible. > > > But for systems like servers without X11 it's nice to have a fast > > > console. > > > > Fast console howto: > > - shadow buffer in cached memory > > - timer based upload of changed areas to the real framebuffer > > > > This one is actually fast, instead of trying to use hw bltcopy and having > > the most terrible fallback path if that's gone. Yes drm fbdev helpers has > > this (but not enabled on most drivers because very, very few people care). > > That depends on the hardware, and the balance between CPU-to-RAM, > CPU-to-VRAM, and GPU-to-VRAM bandwidths, and CPU and GPU performance. > > When scrolling, the fastest copy is the copy that doesn't need to copy > much. So that's why fbcon supports (or supported :-( many strategies: > scrolling by wrapping, panning, copying (either by CPU or by (simple) > GPU), re-rendering (useful for a GPU with bitmap expansion). So forcing > everybody to render into a fully cached shadow buffer and upload changed > areas is not the silver bullet. > > Whether text output is rendered immediately or not is completely > orthogonal to this. While timer-based updates would speed up printing > of large hunks of text (where no one actually reads what was printed at > the top), that would have almost no impact on actual interactive console > work: it may still take 0.5s to scroll the screen if you press "enter" > when your cursor is positioned on the last line. > BTW, implementing timer-based updates would make measuring real-world > performance more difficult, as we would have to use a different > benchmark than "time dmesg" ;-) > > Both Daniel and Thomas said: fbdev is not suitable for modern hardware. > Fine, we do not debate that, and do not want to prevent you from using > DRM for modern hardware. Then please accept us saying that DRM (in its > current form) is not suitable for other types of graphics hardware. > Still, even modern (embedded) hardware may have small low-color > displays. > > For the last +5 years, we've been pointed to the tinydrm drivers, to > serve as examples for converting existing fbdev drivers to drm drivers. > All but one of them are drivers for hi-color or better hardware, thus > surpassing the capabilities of lots of hardware driven by fbdev drivers. > The other one is an e-ink driver that exposes an XRGB8888 shadow frame > buffer, and converts that in a two-step process, first to 8-bit > grayscale, second to 1-bit monochrome. If that is considered a good > example, should I be impressed? > Compare that to other subsystems boasting about zero-copy... tiny drivers are the state of the art for small neat drivers. As you pointed out multiple times now there's not Rx or Cx support for x < 8 in drm or fbdev yet, so that would need to be added. If someone cares enough for that. Some of the fbtft drivers have gone down substantially when ported to tiny, which is really the claim we've put down. Not that you'll find the perfect C4 pixel format example in there, at most you find C8 support in some of the really old drivers like i915/radeon/nouveau for old platforms. But that's very well burried. I guess in practice (as you point out below) the repaper display is so glacially slow anyway and connected to machines with enough ram that generally the only case that mattered was convenience and hence supporting what every drm userspace can cope with minimally. Which is xrgb8888. So yeah don't look at a driver which updates at roughly 0.5fps for efficient upload code :-) The space wasting is a bit more important and should be trivial to add if someone cares enough to do that. -Daniel > Furthermore, for a contemporary e-ink device like[1], the shadow buffer > would consume 10 MiB. Of course this device has 4 GiB of RAM, and quad > Cortex-A55 CPU cores, but not all systems have 10 MiB to spare... > > [1] https://linuxgizmos.com/rk3566-based-pinenote-e-ink-tablet-ships-at-399/ > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fbdev: Garbage collect fbdev scrolling acceleration @ 2022-01-24 19:58 ` Daniel Vetter 0 siblings, 0 replies; 15+ messages in thread From: Daniel Vetter @ 2022-01-24 19:58 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Sven Schnelle, Helge Deller, Linux Fbdev development list, DRI Development, Thomas Zimmermann, Hamza Mahfooz On Mon, Jan 24, 2022 at 7:27 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > Hi Daniel et al, > > On Wed, Jan 19, 2022 at 4:39 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote: > > > Helge Deller <deller@gmx.de> writes: > > > > I may have missed some discussions, but I'm objecting against this patch: > > > > > > > > b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)") > > > > > > > > Can we please (partly) revert it and restore the scrolling behaviour, > > > > where fbcon uses fb_copyarea() to copy the screen contents instead of > > > > redrawing the whole screen? > > > > > > > > I'm fine with dropping the ypan-functionality. > > > > > > > > Maybe on fast new x86 boxes the performance difference isn't huge, > > > > but for all old systems, or when emulated in qemu, this makes > > > > a big difference. > > > > > > > > Helge > > > > > > I second that. For most people, the framebuffer isn't important as > > > they're mostly interested in getting to X11/wayland as fast as possible. > > > But for systems like servers without X11 it's nice to have a fast > > > console. > > > > Fast console howto: > > - shadow buffer in cached memory > > - timer based upload of changed areas to the real framebuffer > > > > This one is actually fast, instead of trying to use hw bltcopy and having > > the most terrible fallback path if that's gone. Yes drm fbdev helpers has > > this (but not enabled on most drivers because very, very few people care). > > That depends on the hardware, and the balance between CPU-to-RAM, > CPU-to-VRAM, and GPU-to-VRAM bandwidths, and CPU and GPU performance. > > When scrolling, the fastest copy is the copy that doesn't need to copy > much. So that's why fbcon supports (or supported :-( many strategies: > scrolling by wrapping, panning, copying (either by CPU or by (simple) > GPU), re-rendering (useful for a GPU with bitmap expansion). So forcing > everybody to render into a fully cached shadow buffer and upload changed > areas is not the silver bullet. > > Whether text output is rendered immediately or not is completely > orthogonal to this. While timer-based updates would speed up printing > of large hunks of text (where no one actually reads what was printed at > the top), that would have almost no impact on actual interactive console > work: it may still take 0.5s to scroll the screen if you press "enter" > when your cursor is positioned on the last line. > BTW, implementing timer-based updates would make measuring real-world > performance more difficult, as we would have to use a different > benchmark than "time dmesg" ;-) > > Both Daniel and Thomas said: fbdev is not suitable for modern hardware. > Fine, we do not debate that, and do not want to prevent you from using > DRM for modern hardware. Then please accept us saying that DRM (in its > current form) is not suitable for other types of graphics hardware. > Still, even modern (embedded) hardware may have small low-color > displays. > > For the last +5 years, we've been pointed to the tinydrm drivers, to > serve as examples for converting existing fbdev drivers to drm drivers. > All but one of them are drivers for hi-color or better hardware, thus > surpassing the capabilities of lots of hardware driven by fbdev drivers. > The other one is an e-ink driver that exposes an XRGB8888 shadow frame > buffer, and converts that in a two-step process, first to 8-bit > grayscale, second to 1-bit monochrome. If that is considered a good > example, should I be impressed? > Compare that to other subsystems boasting about zero-copy... tiny drivers are the state of the art for small neat drivers. As you pointed out multiple times now there's not Rx or Cx support for x < 8 in drm or fbdev yet, so that would need to be added. If someone cares enough for that. Some of the fbtft drivers have gone down substantially when ported to tiny, which is really the claim we've put down. Not that you'll find the perfect C4 pixel format example in there, at most you find C8 support in some of the really old drivers like i915/radeon/nouveau for old platforms. But that's very well burried. I guess in practice (as you point out below) the repaper display is so glacially slow anyway and connected to machines with enough ram that generally the only case that mattered was convenience and hence supporting what every drm userspace can cope with minimally. Which is xrgb8888. So yeah don't look at a driver which updates at roughly 0.5fps for efficient upload code :-) The space wasting is a bit more important and should be trivial to add if someone cares enough to do that. -Daniel > Furthermore, for a contemporary e-ink device like[1], the shadow buffer > would consume 10 MiB. Of course this device has 4 GiB of RAM, and quad > Cortex-A55 CPU cores, but not all systems have 10 MiB to spare... > > [1] https://linuxgizmos.com/rk3566-based-pinenote-e-ink-tablet-ships-at-399/ > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2022-01-24 21:12 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-01-13 16:36 fbdev: Garbage collect fbdev scrolling acceleration Helge Deller 2022-01-13 21:46 ` Sven Schnelle 2022-01-13 21:46 ` Sven Schnelle 2022-01-19 15:39 ` Daniel Vetter 2022-01-19 15:39 ` Daniel Vetter 2022-01-19 16:15 ` Sven Schnelle 2022-01-19 16:15 ` Sven Schnelle 2022-01-19 16:21 ` Daniel Vetter 2022-01-19 16:21 ` Daniel Vetter 2022-01-19 16:33 ` Sven Schnelle 2022-01-19 16:33 ` Sven Schnelle 2022-01-24 18:27 ` Geert Uytterhoeven 2022-01-24 18:27 ` Geert Uytterhoeven 2022-01-24 19:58 ` Daniel Vetter 2022-01-24 19:58 ` Daniel Vetter
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.