On Wed, Jun 01, 2022 at 02:35:35PM +0200, Daniel Vetter wrote:
> On Tue, May 31, 2022 at 10:58:35AM +0200, Maxime Ripard wrote:
> > Hi Daniel,
> > 
> > Thanks for your feedback
> > 
> > On Wed, May 25, 2022 at 07:18:07PM +0200, Daniel Vetter wrote:
> > > > > VBLANK Events and Asynchronous Commits
> > > > > ======================================
> > > > > When should the VBLANK event complete? When the pixels have been blitted
> > > > > to the kernel's shadow buffer? When the first frame of the waveform is
> > > > > sent to the panel? When the last frame is sent to the panel?
> > > > > 
> > > > > Currently, the driver is taking the first option, letting
> > > > > drm_atomic_helper_fake_vblank() send the VBLANK event without waiting on
> > > > > the refresh thread. This is the only way I was able to get good
> > > > > performance with existing userspace.
> > > > 
> > > > I've been having the same kind of discussions in private lately, so I'm
> > > > interested by the answer as well :)
> > > > 
> > > > It would be worth looking into the SPI/I2C panels for this, since it's
> > > > basically the same case.
> > > 
> > > So it's maybe a bit misnamed and maybe kerneldocs aren't super clear (pls
> > > help improve them), but there's two modes:
> > > 
> > > - drivers which have vblank, which might be somewhat variable (VRR) or
> > >   become simulated (self-refresh panels), but otherwise is a more-or-less
> > >   regular clock. For this case the atomic commit event must match the
> > >   vblank events exactly (frame count and timestamp)
> > 
> > Part of my interrogation there is do we have any kind of expectation
> > on whether or not, when we commit, the next vblank is going to be the
> > one matching that commit or we're allowed to defer it by an arbitrary
> > number of frames (provided that the frame count and timestamps are
> > correct) ?
> 
> In general yes, but there's no guarantee. The only guarante we give for
> drivers with vblank counters is that if you receive a vblank event (flip
> complete or vblank event) for frame #n, then an immediate flip/atomic
> ioctl call will display earliest for frame #n+1.
> 
> Also usually you should be able to hit #n+1, but even today with fun stuff
> like self refresh panels getting out of self refresh mode might take a bit
> more than a few frames, and so you might end up being late. But otoh if
> you just do a page flip loop then on average (after the crtc is fully
> resumed) you should be able to update at vrefresh rate exactly.

I had more the next item in mind there: if we were to write something in
the kernel that would transparently behave like a full-blown KMS driver,
but would pipe the commits through a KMS writeback driver before sending
them to our SPI panel, we would always be at best two vblanks late.

So this would mean that userspace would do a page flip, get a first
vblank, but the actual vblank for that commit would be the next one (at
best), consistently.

> > > - drivers which don't have vblank at all, mostly these are i2c/spi panels
> > >   or virtual hw and stuff like that. In this case the event simply happens
> > >   when the driver is done with refresh/upload, and the frame count should
> > >   be zero (since it's meaningless).
> > > 
> > > Unfortuantely the helper to dtrt has fake_vblank in it's name, maybe
> > > should be renamed to no_vblank or so (the various flags that control it
> > > are a bit better named).
> > > 
> > > Again the docs should explain it all, but maybe we should clarify them or
> > > perhaps rename that helper to be more meaningful.
> > > 
> > > > > Blitting/Blending in Software
> > > > > =============================
> > > > > There are multiple layers to this topic (pun slightly intended):
> > > > >  1) Today's userspace does not expect a grayscale framebuffer.
> > > > >     Currently, the driver advertises XRGB8888 and converts to Y4
> > > > >     in software. This seems to match other drivers (e.g. repaper).
> > > > >
> > > > >  2) Ignoring what userspace "wants", the closest existing format is
> > > > >     DRM_FORMAT_R8. Geert sent a series[4] adding DRM_FORMAT_R1 through
> > > > >     DRM_FORMAT_R4 (patch 9), which I believe are the "correct" formats
> > > > >     to use.
> > > > > 
> > > > >  3) The RK356x SoCs have an "RGA" hardware block that can do the
> > > > >     RGB-to-grayscale conversion, and also RGB-to-dithered-monochrome
> > > > >     which is needed for animation/video. Currently this is exposed with
> > > > >     a V4L2 platform driver. Can this be inserted into the pipeline in a
> > > > >     way that is transparent to userspace? Or must some userspace library
> > > > >     be responsible for setting up the RGA => EBC pipeline?
> > > > 
> > > > I'm very interested in this answer as well :)
> > > > 
> > > > I think the current consensus is that it's up to userspace to set this
> > > > up though.
> > > 
> > > Yeah I think v4l mem2mem device is the answer for these, and then
> > > userspace gets to set it all up.
> > 
> > I think the question wasn't really about where that driver should be,
> > but more about who gets to set it up, and if the kernel could have
> > some component to expose the formats supported by the converter, but
> > whenever a commit is being done pipe that to the v4l2 device before
> > doing a page flip.
> > 
> > We have a similar use-case for the RaspberryPi where the hardware
> > codec will produce a framebuffer format that isn't standard. That
> > format is understood by the display pipeline, and it can do
> > writeback.
> > 
> > However, some people are using a separate display (like a SPI display
> > supported by tinydrm) and we would still like to be able to output the
> > decoded frames there.
> > 
> > Is there some way we could plumb things to "route" that buffer through
> > the writeback engine to perform a format conversion before sending it
> > over to the SPI display automatically?
> 
> Currently not transparently. Or at least no one has done that, and I'm not
> sure that's really a great idea. With big gpus all that stuff is done with
> separate command submission to the render side of things, and you can
> fully pipeline all that with in/out-fences.
> 
> Doing that in the kms driver side in the kernel feels very wrong to me :-/

So I guess what you're saying is that there's a close to 0% chance of it
being accepted if we were to come up with such an architecture?

Thanks!
Maxime