All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: fbdev: Garbage collect fbdev scrolling acceleration
@ 2022-01-13 16:36 Helge Deller
  2022-01-13 21:46   ` Sven Schnelle
  0 siblings, 1 reply; 15+ messages in thread
From: Helge Deller @ 2022-01-13 16:36 UTC (permalink / raw)
  To: Hamza Mahfooz, Thomas Zimmermann, linux-fbdev, dri-devel,
	Geert Uytterhoeven
  Cc: Sven Schnelle

I may have missed some discussions, but I'm objecting against this patch:

	b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)")

Can we please (partly) revert it and restore the scrolling behaviour,
where fbcon uses fb_copyarea() to copy the screen contents instead of
redrawing the whole screen?

I'm fine with dropping the ypan-functionality.

Maybe on fast new x86 boxes the performance difference isn't huge,
but for all old systems, or when emulated in qemu, this makes
a big difference.

Helge

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fbdev: Garbage collect fbdev scrolling acceleration
  2022-01-13 16:36 fbdev: Garbage collect fbdev scrolling acceleration Helge Deller
@ 2022-01-13 21:46   ` Sven Schnelle
  0 siblings, 0 replies; 15+ messages in thread
From: Sven Schnelle @ 2022-01-13 21:46 UTC (permalink / raw)
  To: Helge Deller
  Cc: linux-fbdev, Geert Uytterhoeven, dri-devel, Thomas Zimmermann,
	Hamza Mahfooz

Helge Deller <deller@gmx.de> writes:

> I may have missed some discussions, but I'm objecting against this patch:
>
> 	b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)")
>
> Can we please (partly) revert it and restore the scrolling behaviour,
> where fbcon uses fb_copyarea() to copy the screen contents instead of
> redrawing the whole screen?
>
> I'm fine with dropping the ypan-functionality.
>
> Maybe on fast new x86 boxes the performance difference isn't huge,
> but for all old systems, or when emulated in qemu, this makes
> a big difference.
>
> Helge

I second that. For most people, the framebuffer isn't important as
they're mostly interested in getting to X11/wayland as fast as possible.
But for systems like servers without X11 it's nice to have a fast
console.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fbdev: Garbage collect fbdev scrolling acceleration
@ 2022-01-13 21:46   ` Sven Schnelle
  0 siblings, 0 replies; 15+ messages in thread
From: Sven Schnelle @ 2022-01-13 21:46 UTC (permalink / raw)
  To: Helge Deller
  Cc: Hamza Mahfooz, Thomas Zimmermann, linux-fbdev, dri-devel,
	Geert Uytterhoeven

Helge Deller <deller@gmx.de> writes:

> I may have missed some discussions, but I'm objecting against this patch:
>
> 	b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)")
>
> Can we please (partly) revert it and restore the scrolling behaviour,
> where fbcon uses fb_copyarea() to copy the screen contents instead of
> redrawing the whole screen?
>
> I'm fine with dropping the ypan-functionality.
>
> Maybe on fast new x86 boxes the performance difference isn't huge,
> but for all old systems, or when emulated in qemu, this makes
> a big difference.
>
> Helge

I second that. For most people, the framebuffer isn't important as
they're mostly interested in getting to X11/wayland as fast as possible.
But for systems like servers without X11 it's nice to have a fast
console.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fbdev: Garbage collect fbdev scrolling acceleration
  2022-01-13 21:46   ` Sven Schnelle
@ 2022-01-19 15:39     ` Daniel Vetter
  -1 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2022-01-19 15:39 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Helge Deller, linux-fbdev, Geert Uytterhoeven, dri-devel,
	Thomas Zimmermann, Hamza Mahfooz

On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote:
> Helge Deller <deller@gmx.de> writes:
> 
> > I may have missed some discussions, but I'm objecting against this patch:
> >
> > 	b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)")
> >
> > Can we please (partly) revert it and restore the scrolling behaviour,
> > where fbcon uses fb_copyarea() to copy the screen contents instead of
> > redrawing the whole screen?
> >
> > I'm fine with dropping the ypan-functionality.
> >
> > Maybe on fast new x86 boxes the performance difference isn't huge,
> > but for all old systems, or when emulated in qemu, this makes
> > a big difference.
> >
> > Helge
> 
> I second that. For most people, the framebuffer isn't important as
> they're mostly interested in getting to X11/wayland as fast as possible.
> But for systems like servers without X11 it's nice to have a fast
> console.

Fast console howto:
- shadow buffer in cached memory
- timer based upload of changed areas to the real framebuffer

This one is actually fast, instead of trying to use hw bltcopy and having
the most terrible fallback path if that's gone. Yes drm fbdev helpers has
this (but not enabled on most drivers because very, very few people care).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fbdev: Garbage collect fbdev scrolling acceleration
@ 2022-01-19 15:39     ` Daniel Vetter
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2022-01-19 15:39 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: linux-fbdev, Hamza Mahfooz, Helge Deller, dri-devel,
	Geert Uytterhoeven, Thomas Zimmermann

On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote:
> Helge Deller <deller@gmx.de> writes:
> 
> > I may have missed some discussions, but I'm objecting against this patch:
> >
> > 	b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)")
> >
> > Can we please (partly) revert it and restore the scrolling behaviour,
> > where fbcon uses fb_copyarea() to copy the screen contents instead of
> > redrawing the whole screen?
> >
> > I'm fine with dropping the ypan-functionality.
> >
> > Maybe on fast new x86 boxes the performance difference isn't huge,
> > but for all old systems, or when emulated in qemu, this makes
> > a big difference.
> >
> > Helge
> 
> I second that. For most people, the framebuffer isn't important as
> they're mostly interested in getting to X11/wayland as fast as possible.
> But for systems like servers without X11 it's nice to have a fast
> console.

Fast console howto:
- shadow buffer in cached memory
- timer based upload of changed areas to the real framebuffer

This one is actually fast, instead of trying to use hw bltcopy and having
the most terrible fallback path if that's gone. Yes drm fbdev helpers has
this (but not enabled on most drivers because very, very few people care).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fbdev: Garbage collect fbdev scrolling acceleration
  2022-01-19 15:39     ` Daniel Vetter
@ 2022-01-19 16:15       ` Sven Schnelle
  -1 siblings, 0 replies; 15+ messages in thread
From: Sven Schnelle @ 2022-01-19 16:15 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Helge Deller, linux-fbdev, Geert Uytterhoeven, dri-devel,
	Thomas Zimmermann, Hamza Mahfooz

Hi Daniel,

Daniel Vetter <daniel@ffwll.ch> writes:

> On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote:
>> Helge Deller <deller@gmx.de> writes:
>> > Maybe on fast new x86 boxes the performance difference isn't huge,
>> > but for all old systems, or when emulated in qemu, this makes
>> > a big difference.
>> >
>> > Helge
>> 
>> I second that. For most people, the framebuffer isn't important as
>> they're mostly interested in getting to X11/wayland as fast as possible.
>> But for systems like servers without X11 it's nice to have a fast
>> console.
>
> Fast console howto:
> - shadow buffer in cached memory
> - timer based upload of changed areas to the real framebuffer
>
> This one is actually fast, instead of trying to use hw bltcopy and having
> the most terrible fallback path if that's gone. Yes drm fbdev helpers has
> this (but not enabled on most drivers because very, very few people care).

Hmm.... Take my Laptop with a 4k (3180x2160) screen as an example:

Lets say on average the half of every line is filled with text.

So 3840/2*2160 pixels that change = 4147200 pixels. Every pixel takes 4
bytes = 16,588800 bytes per timer interrupt. In another Mail updating on
vsync was mentioned, so multiply that by 60 and get ~927MB. And even if
you only update the screen ony 4 times per second, that would be ~64MB
of data. I'm likely missing something here.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fbdev: Garbage collect fbdev scrolling acceleration
@ 2022-01-19 16:15       ` Sven Schnelle
  0 siblings, 0 replies; 15+ messages in thread
From: Sven Schnelle @ 2022-01-19 16:15 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-fbdev, Hamza Mahfooz, Helge Deller, dri-devel,
	Geert Uytterhoeven, Thomas Zimmermann

Hi Daniel,

Daniel Vetter <daniel@ffwll.ch> writes:

> On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote:
>> Helge Deller <deller@gmx.de> writes:
>> > Maybe on fast new x86 boxes the performance difference isn't huge,
>> > but for all old systems, or when emulated in qemu, this makes
>> > a big difference.
>> >
>> > Helge
>> 
>> I second that. For most people, the framebuffer isn't important as
>> they're mostly interested in getting to X11/wayland as fast as possible.
>> But for systems like servers without X11 it's nice to have a fast
>> console.
>
> Fast console howto:
> - shadow buffer in cached memory
> - timer based upload of changed areas to the real framebuffer
>
> This one is actually fast, instead of trying to use hw bltcopy and having
> the most terrible fallback path if that's gone. Yes drm fbdev helpers has
> this (but not enabled on most drivers because very, very few people care).

Hmm.... Take my Laptop with a 4k (3180x2160) screen as an example:

Lets say on average the half of every line is filled with text.

So 3840/2*2160 pixels that change = 4147200 pixels. Every pixel takes 4
bytes = 16,588800 bytes per timer interrupt. In another Mail updating on
vsync was mentioned, so multiply that by 60 and get ~927MB. And even if
you only update the screen ony 4 times per second, that would be ~64MB
of data. I'm likely missing something here.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fbdev: Garbage collect fbdev scrolling acceleration
  2022-01-19 16:15       ` Sven Schnelle
@ 2022-01-19 16:21         ` Daniel Vetter
  -1 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2022-01-19 16:21 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Daniel Vetter, Helge Deller, linux-fbdev, Geert Uytterhoeven,
	dri-devel, Thomas Zimmermann, Hamza Mahfooz

On Wed, Jan 19, 2022 at 05:15:44PM +0100, Sven Schnelle wrote:
> Hi Daniel,
> 
> Daniel Vetter <daniel@ffwll.ch> writes:
> 
> > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote:
> >> Helge Deller <deller@gmx.de> writes:
> >> > Maybe on fast new x86 boxes the performance difference isn't huge,
> >> > but for all old systems, or when emulated in qemu, this makes
> >> > a big difference.
> >> >
> >> > Helge
> >> 
> >> I second that. For most people, the framebuffer isn't important as
> >> they're mostly interested in getting to X11/wayland as fast as possible.
> >> But for systems like servers without X11 it's nice to have a fast
> >> console.
> >
> > Fast console howto:
> > - shadow buffer in cached memory
> > - timer based upload of changed areas to the real framebuffer
> >
> > This one is actually fast, instead of trying to use hw bltcopy and having
> > the most terrible fallback path if that's gone. Yes drm fbdev helpers has
> > this (but not enabled on most drivers because very, very few people care).
> 
> Hmm.... Take my Laptop with a 4k (3180x2160) screen as an example:
> 
> Lets say on average the half of every line is filled with text.
> 
> So 3840/2*2160 pixels that change = 4147200 pixels. Every pixel takes 4
> bytes = 16,588800 bytes per timer interrupt. In another Mail updating on
> vsync was mentioned, so multiply that by 60 and get ~927MB. And even if
> you only update the screen ony 4 times per second, that would be ~64MB
> of data. I'm likely missing something here.

Since you say 4k it's a modern box, so you have on the order of 10GB/s of
write bandwidth.

And around 100MB/s of read bandwidth. Both from the cpu. It all adds up.
It's that uncached read which kills you and means dmesg takes seconds to
display.

Also since this is 4k looking at sales volume we're talking integrated, so
whether it's the gpu or the cpu that's doing the memcpy, it's the same
memory bw budget you're burning down. And at that point doing less copying
(which the shadow buffer thing will do compared to fbcon accelerated
scrolling for every line) is the win.

And since max&usual resolutions pretty much scales down with pcie or
memory bandwidth for roughly the last 2-3 decades, this all works as well
on old stuff.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fbdev: Garbage collect fbdev scrolling acceleration
@ 2022-01-19 16:21         ` Daniel Vetter
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2022-01-19 16:21 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: linux-fbdev, Hamza Mahfooz, Helge Deller, dri-devel,
	Geert Uytterhoeven, Thomas Zimmermann

On Wed, Jan 19, 2022 at 05:15:44PM +0100, Sven Schnelle wrote:
> Hi Daniel,
> 
> Daniel Vetter <daniel@ffwll.ch> writes:
> 
> > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote:
> >> Helge Deller <deller@gmx.de> writes:
> >> > Maybe on fast new x86 boxes the performance difference isn't huge,
> >> > but for all old systems, or when emulated in qemu, this makes
> >> > a big difference.
> >> >
> >> > Helge
> >> 
> >> I second that. For most people, the framebuffer isn't important as
> >> they're mostly interested in getting to X11/wayland as fast as possible.
> >> But for systems like servers without X11 it's nice to have a fast
> >> console.
> >
> > Fast console howto:
> > - shadow buffer in cached memory
> > - timer based upload of changed areas to the real framebuffer
> >
> > This one is actually fast, instead of trying to use hw bltcopy and having
> > the most terrible fallback path if that's gone. Yes drm fbdev helpers has
> > this (but not enabled on most drivers because very, very few people care).
> 
> Hmm.... Take my Laptop with a 4k (3180x2160) screen as an example:
> 
> Lets say on average the half of every line is filled with text.
> 
> So 3840/2*2160 pixels that change = 4147200 pixels. Every pixel takes 4
> bytes = 16,588800 bytes per timer interrupt. In another Mail updating on
> vsync was mentioned, so multiply that by 60 and get ~927MB. And even if
> you only update the screen ony 4 times per second, that would be ~64MB
> of data. I'm likely missing something here.

Since you say 4k it's a modern box, so you have on the order of 10GB/s of
write bandwidth.

And around 100MB/s of read bandwidth. Both from the cpu. It all adds up.
It's that uncached read which kills you and means dmesg takes seconds to
display.

Also since this is 4k looking at sales volume we're talking integrated, so
whether it's the gpu or the cpu that's doing the memcpy, it's the same
memory bw budget you're burning down. And at that point doing less copying
(which the shadow buffer thing will do compared to fbcon accelerated
scrolling for every line) is the win.

And since max&usual resolutions pretty much scales down with pcie or
memory bandwidth for roughly the last 2-3 decades, this all works as well
on old stuff.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fbdev: Garbage collect fbdev scrolling acceleration
  2022-01-19 16:21         ` Daniel Vetter
@ 2022-01-19 16:33           ` Sven Schnelle
  -1 siblings, 0 replies; 15+ messages in thread
From: Sven Schnelle @ 2022-01-19 16:33 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-fbdev, Hamza Mahfooz, Helge Deller, dri-devel,
	Geert Uytterhoeven, Thomas Zimmermann

Hi Daniel,

Daniel Vetter <daniel@ffwll.ch> writes:

> On Wed, Jan 19, 2022 at 05:15:44PM +0100, Sven Schnelle wrote:
>> Hi Daniel,
>> 
>> Daniel Vetter <daniel@ffwll.ch> writes:
>> 
>> > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote:
>> >> Helge Deller <deller@gmx.de> writes:
>> >> > Maybe on fast new x86 boxes the performance difference isn't huge,
>> >> > but for all old systems, or when emulated in qemu, this makes
>> >> > a big difference.
>> >> >
>> >> > Helge
>> >> 
>> >> I second that. For most people, the framebuffer isn't important as
>> >> they're mostly interested in getting to X11/wayland as fast as possible.
>> >> But for systems like servers without X11 it's nice to have a fast
>> >> console.
>> >
>> > Fast console howto:
>> > - shadow buffer in cached memory
>> > - timer based upload of changed areas to the real framebuffer
>> >
>> > This one is actually fast, instead of trying to use hw bltcopy and having
>> > the most terrible fallback path if that's gone. Yes drm fbdev helpers has
>> > this (but not enabled on most drivers because very, very few people care).
>> 
>> Hmm.... Take my Laptop with a 4k (3180x2160) screen as an example:
>> 
>> Lets say on average the half of every line is filled with text.
>> 
>> So 3840/2*2160 pixels that change = 4147200 pixels. Every pixel takes 4
>> bytes = 16,588800 bytes per timer interrupt. In another Mail updating on
>> vsync was mentioned, so multiply that by 60 and get ~927MB. And even if
>> you only update the screen ony 4 times per second, that would be ~64MB
>> of data. I'm likely missing something here.
>
> Since you say 4k it's a modern box, so you have on the order of 10GB/s of
> write bandwidth.
>
> And around 100MB/s of read bandwidth. Both from the cpu. It all adds up.
> It's that uncached read which kills you and means dmesg takes seconds to
> display.
>
> Also since this is 4k looking at sales volume we're talking integrated, so
> whether it's the gpu or the cpu that's doing the memcpy, it's the same
> memory bw budget you're burning down.

That might be true for integrated graphics, as said, i don't know the
architecture. But saying it's good just because it's good on one
architecture doesn't mean it's good for everyone. If you have an
external GPU, than the memory/system bus BW would be different whether
it's memcpy or the GPU doing the scrolling. And whether internal or external
graphics - the CPU could do other stuff while the GPU scrolls stuff.

Quite a lot of discussion for a revert of a patch that was already in
the kernel for more than 20(?) years.

/Sven

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fbdev: Garbage collect fbdev scrolling acceleration
@ 2022-01-19 16:33           ` Sven Schnelle
  0 siblings, 0 replies; 15+ messages in thread
From: Sven Schnelle @ 2022-01-19 16:33 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Helge Deller, linux-fbdev, Geert Uytterhoeven, dri-devel,
	Thomas Zimmermann, Hamza Mahfooz

Hi Daniel,

Daniel Vetter <daniel@ffwll.ch> writes:

> On Wed, Jan 19, 2022 at 05:15:44PM +0100, Sven Schnelle wrote:
>> Hi Daniel,
>> 
>> Daniel Vetter <daniel@ffwll.ch> writes:
>> 
>> > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote:
>> >> Helge Deller <deller@gmx.de> writes:
>> >> > Maybe on fast new x86 boxes the performance difference isn't huge,
>> >> > but for all old systems, or when emulated in qemu, this makes
>> >> > a big difference.
>> >> >
>> >> > Helge
>> >> 
>> >> I second that. For most people, the framebuffer isn't important as
>> >> they're mostly interested in getting to X11/wayland as fast as possible.
>> >> But for systems like servers without X11 it's nice to have a fast
>> >> console.
>> >
>> > Fast console howto:
>> > - shadow buffer in cached memory
>> > - timer based upload of changed areas to the real framebuffer
>> >
>> > This one is actually fast, instead of trying to use hw bltcopy and having
>> > the most terrible fallback path if that's gone. Yes drm fbdev helpers has
>> > this (but not enabled on most drivers because very, very few people care).
>> 
>> Hmm.... Take my Laptop with a 4k (3180x2160) screen as an example:
>> 
>> Lets say on average the half of every line is filled with text.
>> 
>> So 3840/2*2160 pixels that change = 4147200 pixels. Every pixel takes 4
>> bytes = 16,588800 bytes per timer interrupt. In another Mail updating on
>> vsync was mentioned, so multiply that by 60 and get ~927MB. And even if
>> you only update the screen ony 4 times per second, that would be ~64MB
>> of data. I'm likely missing something here.
>
> Since you say 4k it's a modern box, so you have on the order of 10GB/s of
> write bandwidth.
>
> And around 100MB/s of read bandwidth. Both from the cpu. It all adds up.
> It's that uncached read which kills you and means dmesg takes seconds to
> display.
>
> Also since this is 4k looking at sales volume we're talking integrated, so
> whether it's the gpu or the cpu that's doing the memcpy, it's the same
> memory bw budget you're burning down.

That might be true for integrated graphics, as said, i don't know the
architecture. But saying it's good just because it's good on one
architecture doesn't mean it's good for everyone. If you have an
external GPU, than the memory/system bus BW would be different whether
it's memcpy or the GPU doing the scrolling. And whether internal or external
graphics - the CPU could do other stuff while the GPU scrolls stuff.

Quite a lot of discussion for a revert of a patch that was already in
the kernel for more than 20(?) years.

/Sven

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fbdev: Garbage collect fbdev scrolling acceleration
  2022-01-19 15:39     ` Daniel Vetter
@ 2022-01-24 18:27       ` Geert Uytterhoeven
  -1 siblings, 0 replies; 15+ messages in thread
From: Geert Uytterhoeven @ 2022-01-24 18:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Sven Schnelle, Helge Deller, Linux Fbdev development list,
	DRI Development, Thomas Zimmermann, Hamza Mahfooz

Hi Daniel et al,

On Wed, Jan 19, 2022 at 4:39 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote:
> > Helge Deller <deller@gmx.de> writes:
> > > I may have missed some discussions, but I'm objecting against this patch:
> > >
> > >     b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)")
> > >
> > > Can we please (partly) revert it and restore the scrolling behaviour,
> > > where fbcon uses fb_copyarea() to copy the screen contents instead of
> > > redrawing the whole screen?
> > >
> > > I'm fine with dropping the ypan-functionality.
> > >
> > > Maybe on fast new x86 boxes the performance difference isn't huge,
> > > but for all old systems, or when emulated in qemu, this makes
> > > a big difference.
> > >
> > > Helge
> >
> > I second that. For most people, the framebuffer isn't important as
> > they're mostly interested in getting to X11/wayland as fast as possible.
> > But for systems like servers without X11 it's nice to have a fast
> > console.
>
> Fast console howto:
> - shadow buffer in cached memory
> - timer based upload of changed areas to the real framebuffer
>
> This one is actually fast, instead of trying to use hw bltcopy and having
> the most terrible fallback path if that's gone. Yes drm fbdev helpers has
> this (but not enabled on most drivers because very, very few people care).

That depends on the hardware, and the balance between CPU-to-RAM,
CPU-to-VRAM, and GPU-to-VRAM bandwidths, and CPU and GPU performance.

When scrolling, the fastest copy is the copy that doesn't need to copy
much.  So that's why fbcon supports (or supported :-( many strategies:
scrolling by wrapping, panning, copying (either by CPU or by (simple)
GPU), re-rendering (useful for a GPU with bitmap expansion).  So forcing
everybody to render into a fully cached shadow buffer and upload changed
areas is not the silver bullet.

Whether text output is rendered immediately or not is completely
orthogonal to this.  While timer-based updates would speed up printing
of large hunks of text (where no one actually reads what was printed at
the top), that would have almost no impact on actual interactive console
work: it may still take 0.5s to scroll the screen if you press "enter"
when your cursor is positioned on the last line.
BTW, implementing timer-based updates would make measuring real-world
performance more difficult, as we would have to use a different
benchmark than "time dmesg" ;-)

Both Daniel and Thomas said: fbdev is not suitable for modern hardware.
Fine, we do not debate that, and do not want to prevent you from using
DRM for modern hardware.  Then please accept us saying that DRM (in its
current form) is not suitable for other types of graphics hardware.
Still, even modern (embedded) hardware may have small low-color
displays.

For the last +5 years, we've been pointed to the tinydrm drivers, to
serve as examples for converting existing fbdev drivers to drm drivers.
All but one of them are drivers for hi-color or better hardware, thus
surpassing the capabilities of lots of hardware driven by fbdev drivers.
The other one is an e-ink driver that exposes an XRGB8888 shadow frame
buffer, and converts that in a two-step process, first to 8-bit
grayscale, second to 1-bit monochrome.  If that is considered a good
example, should I be impressed?
Compare that to other subsystems boasting about zero-copy...

Furthermore, for a contemporary e-ink device like[1], the shadow buffer
would consume 10 MiB.  Of course this device has 4 GiB of RAM, and quad
Cortex-A55 CPU cores, but not all systems have 10 MiB to spare...

[1] https://linuxgizmos.com/rk3566-based-pinenote-e-ink-tablet-ships-at-399/

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fbdev: Garbage collect fbdev scrolling acceleration
@ 2022-01-24 18:27       ` Geert Uytterhoeven
  0 siblings, 0 replies; 15+ messages in thread
From: Geert Uytterhoeven @ 2022-01-24 18:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Linux Fbdev development list, Hamza Mahfooz, Helge Deller,
	DRI Development, Thomas Zimmermann, Sven Schnelle

Hi Daniel et al,

On Wed, Jan 19, 2022 at 4:39 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote:
> > Helge Deller <deller@gmx.de> writes:
> > > I may have missed some discussions, but I'm objecting against this patch:
> > >
> > >     b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)")
> > >
> > > Can we please (partly) revert it and restore the scrolling behaviour,
> > > where fbcon uses fb_copyarea() to copy the screen contents instead of
> > > redrawing the whole screen?
> > >
> > > I'm fine with dropping the ypan-functionality.
> > >
> > > Maybe on fast new x86 boxes the performance difference isn't huge,
> > > but for all old systems, or when emulated in qemu, this makes
> > > a big difference.
> > >
> > > Helge
> >
> > I second that. For most people, the framebuffer isn't important as
> > they're mostly interested in getting to X11/wayland as fast as possible.
> > But for systems like servers without X11 it's nice to have a fast
> > console.
>
> Fast console howto:
> - shadow buffer in cached memory
> - timer based upload of changed areas to the real framebuffer
>
> This one is actually fast, instead of trying to use hw bltcopy and having
> the most terrible fallback path if that's gone. Yes drm fbdev helpers has
> this (but not enabled on most drivers because very, very few people care).

That depends on the hardware, and the balance between CPU-to-RAM,
CPU-to-VRAM, and GPU-to-VRAM bandwidths, and CPU and GPU performance.

When scrolling, the fastest copy is the copy that doesn't need to copy
much.  So that's why fbcon supports (or supported :-( many strategies:
scrolling by wrapping, panning, copying (either by CPU or by (simple)
GPU), re-rendering (useful for a GPU with bitmap expansion).  So forcing
everybody to render into a fully cached shadow buffer and upload changed
areas is not the silver bullet.

Whether text output is rendered immediately or not is completely
orthogonal to this.  While timer-based updates would speed up printing
of large hunks of text (where no one actually reads what was printed at
the top), that would have almost no impact on actual interactive console
work: it may still take 0.5s to scroll the screen if you press "enter"
when your cursor is positioned on the last line.
BTW, implementing timer-based updates would make measuring real-world
performance more difficult, as we would have to use a different
benchmark than "time dmesg" ;-)

Both Daniel and Thomas said: fbdev is not suitable for modern hardware.
Fine, we do not debate that, and do not want to prevent you from using
DRM for modern hardware.  Then please accept us saying that DRM (in its
current form) is not suitable for other types of graphics hardware.
Still, even modern (embedded) hardware may have small low-color
displays.

For the last +5 years, we've been pointed to the tinydrm drivers, to
serve as examples for converting existing fbdev drivers to drm drivers.
All but one of them are drivers for hi-color or better hardware, thus
surpassing the capabilities of lots of hardware driven by fbdev drivers.
The other one is an e-ink driver that exposes an XRGB8888 shadow frame
buffer, and converts that in a two-step process, first to 8-bit
grayscale, second to 1-bit monochrome.  If that is considered a good
example, should I be impressed?
Compare that to other subsystems boasting about zero-copy...

Furthermore, for a contemporary e-ink device like[1], the shadow buffer
would consume 10 MiB.  Of course this device has 4 GiB of RAM, and quad
Cortex-A55 CPU cores, but not all systems have 10 MiB to spare...

[1] https://linuxgizmos.com/rk3566-based-pinenote-e-ink-tablet-ships-at-399/

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fbdev: Garbage collect fbdev scrolling acceleration
  2022-01-24 18:27       ` Geert Uytterhoeven
@ 2022-01-24 19:58         ` Daniel Vetter
  -1 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2022-01-24 19:58 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Linux Fbdev development list, Hamza Mahfooz, Helge Deller,
	DRI Development, Thomas Zimmermann, Sven Schnelle

On Mon, Jan 24, 2022 at 7:27 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Daniel et al,
>
> On Wed, Jan 19, 2022 at 4:39 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote:
> > > Helge Deller <deller@gmx.de> writes:
> > > > I may have missed some discussions, but I'm objecting against this patch:
> > > >
> > > >     b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)")
> > > >
> > > > Can we please (partly) revert it and restore the scrolling behaviour,
> > > > where fbcon uses fb_copyarea() to copy the screen contents instead of
> > > > redrawing the whole screen?
> > > >
> > > > I'm fine with dropping the ypan-functionality.
> > > >
> > > > Maybe on fast new x86 boxes the performance difference isn't huge,
> > > > but for all old systems, or when emulated in qemu, this makes
> > > > a big difference.
> > > >
> > > > Helge
> > >
> > > I second that. For most people, the framebuffer isn't important as
> > > they're mostly interested in getting to X11/wayland as fast as possible.
> > > But for systems like servers without X11 it's nice to have a fast
> > > console.
> >
> > Fast console howto:
> > - shadow buffer in cached memory
> > - timer based upload of changed areas to the real framebuffer
> >
> > This one is actually fast, instead of trying to use hw bltcopy and having
> > the most terrible fallback path if that's gone. Yes drm fbdev helpers has
> > this (but not enabled on most drivers because very, very few people care).
>
> That depends on the hardware, and the balance between CPU-to-RAM,
> CPU-to-VRAM, and GPU-to-VRAM bandwidths, and CPU and GPU performance.
>
> When scrolling, the fastest copy is the copy that doesn't need to copy
> much.  So that's why fbcon supports (or supported :-( many strategies:
> scrolling by wrapping, panning, copying (either by CPU or by (simple)
> GPU), re-rendering (useful for a GPU with bitmap expansion).  So forcing
> everybody to render into a fully cached shadow buffer and upload changed
> areas is not the silver bullet.
>
> Whether text output is rendered immediately or not is completely
> orthogonal to this.  While timer-based updates would speed up printing
> of large hunks of text (where no one actually reads what was printed at
> the top), that would have almost no impact on actual interactive console
> work: it may still take 0.5s to scroll the screen if you press "enter"
> when your cursor is positioned on the last line.
> BTW, implementing timer-based updates would make measuring real-world
> performance more difficult, as we would have to use a different
> benchmark than "time dmesg" ;-)
>
> Both Daniel and Thomas said: fbdev is not suitable for modern hardware.
> Fine, we do not debate that, and do not want to prevent you from using
> DRM for modern hardware.  Then please accept us saying that DRM (in its
> current form) is not suitable for other types of graphics hardware.
> Still, even modern (embedded) hardware may have small low-color
> displays.
>
> For the last +5 years, we've been pointed to the tinydrm drivers, to
> serve as examples for converting existing fbdev drivers to drm drivers.
> All but one of them are drivers for hi-color or better hardware, thus
> surpassing the capabilities of lots of hardware driven by fbdev drivers.
> The other one is an e-ink driver that exposes an XRGB8888 shadow frame
> buffer, and converts that in a two-step process, first to 8-bit
> grayscale, second to 1-bit monochrome.  If that is considered a good
> example, should I be impressed?
> Compare that to other subsystems boasting about zero-copy...

tiny drivers are the state of the art for small neat drivers. As you
pointed out multiple times now there's not Rx or Cx support for x < 8
in drm or fbdev yet, so that would need to be added. If someone cares
enough for that. Some of the fbtft drivers have gone down
substantially when ported to tiny, which is really the claim we've put
down. Not that you'll find the perfect C4 pixel format example in
there, at most you find C8 support in some of the really old drivers
like i915/radeon/nouveau for old platforms. But that's very well
burried.

I guess in practice (as you point out below) the repaper display is so
glacially slow anyway and connected to machines with enough ram that
generally the only case that mattered was convenience and hence
supporting what every drm userspace can cope with minimally. Which is
xrgb8888. So yeah don't look at a driver which updates at roughly
0.5fps for efficient upload code :-) The space wasting is a bit more
important and should be trivial to add if someone cares enough to do
that.
-Daniel

> Furthermore, for a contemporary e-ink device like[1], the shadow buffer
> would consume 10 MiB.  Of course this device has 4 GiB of RAM, and quad
> Cortex-A55 CPU cores, but not all systems have 10 MiB to spare...
>
> [1] https://linuxgizmos.com/rk3566-based-pinenote-e-ink-tablet-ships-at-399/
>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fbdev: Garbage collect fbdev scrolling acceleration
@ 2022-01-24 19:58         ` Daniel Vetter
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2022-01-24 19:58 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Sven Schnelle, Helge Deller, Linux Fbdev development list,
	DRI Development, Thomas Zimmermann, Hamza Mahfooz

On Mon, Jan 24, 2022 at 7:27 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Daniel et al,
>
> On Wed, Jan 19, 2022 at 4:39 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote:
> > > Helge Deller <deller@gmx.de> writes:
> > > > I may have missed some discussions, but I'm objecting against this patch:
> > > >
> > > >     b3ec8cdf457e5 ("fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)")
> > > >
> > > > Can we please (partly) revert it and restore the scrolling behaviour,
> > > > where fbcon uses fb_copyarea() to copy the screen contents instead of
> > > > redrawing the whole screen?
> > > >
> > > > I'm fine with dropping the ypan-functionality.
> > > >
> > > > Maybe on fast new x86 boxes the performance difference isn't huge,
> > > > but for all old systems, or when emulated in qemu, this makes
> > > > a big difference.
> > > >
> > > > Helge
> > >
> > > I second that. For most people, the framebuffer isn't important as
> > > they're mostly interested in getting to X11/wayland as fast as possible.
> > > But for systems like servers without X11 it's nice to have a fast
> > > console.
> >
> > Fast console howto:
> > - shadow buffer in cached memory
> > - timer based upload of changed areas to the real framebuffer
> >
> > This one is actually fast, instead of trying to use hw bltcopy and having
> > the most terrible fallback path if that's gone. Yes drm fbdev helpers has
> > this (but not enabled on most drivers because very, very few people care).
>
> That depends on the hardware, and the balance between CPU-to-RAM,
> CPU-to-VRAM, and GPU-to-VRAM bandwidths, and CPU and GPU performance.
>
> When scrolling, the fastest copy is the copy that doesn't need to copy
> much.  So that's why fbcon supports (or supported :-( many strategies:
> scrolling by wrapping, panning, copying (either by CPU or by (simple)
> GPU), re-rendering (useful for a GPU with bitmap expansion).  So forcing
> everybody to render into a fully cached shadow buffer and upload changed
> areas is not the silver bullet.
>
> Whether text output is rendered immediately or not is completely
> orthogonal to this.  While timer-based updates would speed up printing
> of large hunks of text (where no one actually reads what was printed at
> the top), that would have almost no impact on actual interactive console
> work: it may still take 0.5s to scroll the screen if you press "enter"
> when your cursor is positioned on the last line.
> BTW, implementing timer-based updates would make measuring real-world
> performance more difficult, as we would have to use a different
> benchmark than "time dmesg" ;-)
>
> Both Daniel and Thomas said: fbdev is not suitable for modern hardware.
> Fine, we do not debate that, and do not want to prevent you from using
> DRM for modern hardware.  Then please accept us saying that DRM (in its
> current form) is not suitable for other types of graphics hardware.
> Still, even modern (embedded) hardware may have small low-color
> displays.
>
> For the last +5 years, we've been pointed to the tinydrm drivers, to
> serve as examples for converting existing fbdev drivers to drm drivers.
> All but one of them are drivers for hi-color or better hardware, thus
> surpassing the capabilities of lots of hardware driven by fbdev drivers.
> The other one is an e-ink driver that exposes an XRGB8888 shadow frame
> buffer, and converts that in a two-step process, first to 8-bit
> grayscale, second to 1-bit monochrome.  If that is considered a good
> example, should I be impressed?
> Compare that to other subsystems boasting about zero-copy...

tiny drivers are the state of the art for small neat drivers. As you
pointed out multiple times now there's not Rx or Cx support for x < 8
in drm or fbdev yet, so that would need to be added. If someone cares
enough for that. Some of the fbtft drivers have gone down
substantially when ported to tiny, which is really the claim we've put
down. Not that you'll find the perfect C4 pixel format example in
there, at most you find C8 support in some of the really old drivers
like i915/radeon/nouveau for old platforms. But that's very well
burried.

I guess in practice (as you point out below) the repaper display is so
glacially slow anyway and connected to machines with enough ram that
generally the only case that mattered was convenience and hence
supporting what every drm userspace can cope with minimally. Which is
xrgb8888. So yeah don't look at a driver which updates at roughly
0.5fps for efficient upload code :-) The space wasting is a bit more
important and should be trivial to add if someone cares enough to do
that.
-Daniel

> Furthermore, for a contemporary e-ink device like[1], the shadow buffer
> would consume 10 MiB.  Of course this device has 4 GiB of RAM, and quad
> Cortex-A55 CPU cores, but not all systems have 10 MiB to spare...
>
> [1] https://linuxgizmos.com/rk3566-based-pinenote-e-ink-tablet-ships-at-399/
>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-01-24 21:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-13 16:36 fbdev: Garbage collect fbdev scrolling acceleration Helge Deller
2022-01-13 21:46 ` Sven Schnelle
2022-01-13 21:46   ` Sven Schnelle
2022-01-19 15:39   ` Daniel Vetter
2022-01-19 15:39     ` Daniel Vetter
2022-01-19 16:15     ` Sven Schnelle
2022-01-19 16:15       ` Sven Schnelle
2022-01-19 16:21       ` Daniel Vetter
2022-01-19 16:21         ` Daniel Vetter
2022-01-19 16:33         ` Sven Schnelle
2022-01-19 16:33           ` Sven Schnelle
2022-01-24 18:27     ` Geert Uytterhoeven
2022-01-24 18:27       ` Geert Uytterhoeven
2022-01-24 19:58       ` Daniel Vetter
2022-01-24 19:58         ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.