Le mercredi 17 avril 2019 à 14:39 +0900, Alexandre Courbot a écrit :
> Hi Paul,
> 
> On Tue, Apr 16, 2019 at 4:55 PM Paul Kocialkowski
> <paul.kocialkowski@bootlin.com> wrote:
> > Hi,
> > 
> > Le mardi 16 avril 2019 à 16:22 +0900, Alexandre Courbot a écrit :
> > 
> > [...]
> > 
> > > Thanks for this great discussion. Let me try to summarize the status
> > > of this thread + the IRC discussion and add my own thoughts:
> > > 
> > > Proper support for multiple decoding units (e.g. H.264 slices) per
> > > frame should not be an afterthought ; compliance to encoded formats
> > > depend on it, and the benefit of lower latency is a significant
> > > consideration for vendors.
> > > 
> > > m2m, which we use for all stateless codecs, has a strong assumption
> > > that one OUTPUT buffer consumed results in one CAPTURE buffer being
> > > produced. This assumption can however be overruled: at least the venus
> > > driver does it to implement the stateful specification.
> > > 
> > > So we need a way to specify frame boundaries when submitting encoded
> > > content to the driver. One request should contain a single OUTPUT
> > > buffer, containing a single decoding unit, but we need a way to
> > > specify whether the driver should directly produce a CAPTURE buffer
> > > from this request, or keep using the same CAPTURE buffer with
> > > subsequent requests.
> > > 
> > > I can think of 2 ways this can be expressed:
> > > 1) We keep the current m2m behavior as the default (a CAPTURE buffer
> > > is produced), and add a flag to ask the driver to change that behavior
> > > and hold on the CAPTURE buffer and reuse it with the next request(s) ;
> > 
> > That would kind of break the stateless idea. I think we need requests
> > to be fully independent of eachother and have some entity that
> > coordinates requests for this kind of things.
> 
> Side note: the idea that stateless decoders are entirely stateless is
> not completely accurate anyway. When we specify a resolution on the
> OUTPUT queue, we already store some state. What matters IIUC is that
> the *hardware* behaves in a stateless manner. I don't think we should
> refrain from storing some internal driver state if it makes sense.
> 
> Back to the topic: the effect of this flag would just be that the
> first buffer is the CAPTURE queue is not removed, i.e. the next
> request will work on the same buffer. It doesn't really preserve any
> state - if the next request is the beginning of a different frame,
> then the previous work will be discarded and the driver will behave as
> it should, not considering any previous state.
> 
> > > 2) We specify that no CAPTURE buffer is produced by default, unless a
> > > flag asking so is specified.
> > > 
> > > The flag could be specified in one of two ways:
> > > a) As a new v4l2_buffer.flag for the OUTPUT buffer ;
> > > b) As a dedicated control, either format-specific or more common to all codecs.
> > 
> > I think we must aim for a generic solution that would be at least
> > common to all codecs, and if possible common to requests regardless of
> > whether they concern video decoding or not.
> > 
> > I really like the idea of introducing a requests batch/group/queue,
> > which groups requests together and allows marking them done when the
> > whole group is done being decoded. For that, we explicitly mark one of
> > the requests as the final one, so that we can continue adding requests
> > to the batch even when it's already being processed. When all the
> > requests are done being decoded, we can mark them done.
> 
> I'd need to see this idea more developed (with maybe an example of the
> sequence of IOCTLs) to form an opinion about it. Also would need to be
> given a few examples of where this could be used outside of stateless
> codecs. Then we will have to address what this means for requests:
> your argument against using a "release CAPTURE buffer" flag was that
> requests won't be fully independent from each other anymore, but I
> don't see that situation changing with batches. Then, does the end of
> a batch only means that a CAPTURE buffer should be released, or are
> other actions required for non-codec use-cases? There are lots and
> lots of questions like this one lurking.
> 
> > With that, we also need some tweaking in the core to look for an
> > available capture buffer that matches the output buffer's timestamp
> > before trying to dequeue the next available capture buffer
> 
> I don't think that would be strictly necessary, unless we want to be
> able to decode slices from different frames before the first one is
> completed?
> 
> > This way,
> > the first request of the batch will get any queued capture buffer, but
> > subsequent requests will find the matchung capture buffer by timestamp.
> > 
> > I think that's basically all we need to handle that and the two aspects
> > (picking by timestamp and requests groups) are rather independent and
> > the latter could probably be used in other situations than video
> > decoding.
> > 
> > What do you think?
> 
> At the current point I'd like to avoid over-engineering things.
> Introducing a request batch mechanism would mean more months spent
> before we can set the stateless codec API in stone, and at some point
> we need to settle and release something that people can use. We don't
> even have clear idea of what batches would look like and in which
> cases they would be used. The idea of an extra flag is simple and
> AFAICT would do the job nicely, so why not proceed with this for the
> time being?

I also share this feeling that this might be a bit over-engineered for
what we want to solve. But I also don't fully understand Paul's
proposal.

> 
> Cheers,
> Alex.