On Wed, May 22, 2019 at 09:29:24AM +0200, Boris Brezillon wrote:
> On Wed, 22 May 2019 15:39:37 +0900
> Tomasz Figa <tfiga@chromium.org> wrote:
> 
> > > It would be premature to state that we are excluding. We are just
> > > trying to find one format to get things upstream, and make sure we have
> > > a plan how to extend it. Trying to support everything on the first try
> > > is not going to work so well.
> > >
> > > What is interesting to provide is how does you IP achieve multi-slice
> > > decoding per frame. That's what we are studying on the RK/Hantro chip.
> > > Typical questions are:
> > >
> > >   1. Do all slices have to be contiguous in memory
> > >   2. If 1., do you place start-code, AVC header or pass a seperate index to let the HW locate the start of each NAL ?
> > >   3. Does the HW do support single interrupt per frame (RK3288 as an example does not, but RK3399 do)  
> > 
> > AFAICT, the bit about RK3288 isn't true. At least in our downstream
> > driver that was created mostly by RK themselves, we've been assuming
> > that the interrupt is for the complete frame, without any problems.
> 
> I confirm that's what happens when all slices forming a frame are packed
> in a single output buffer: you only get one interrupt at the end of the
> decoding process (in that case, when the frame is decoded). Of course,
> if you split things up and do per-slice decoding instead (one slice per
> buffer) you get an interrupt per slice, though I didn't manage to make
> that work.
> I get a DEC_BUFFER interrupt (AKA, "buffer is empty but frame is not
> fully decoded") on the first slice and an ASO (Arbitrary Slice Ordering)
> interrupt on the second slice, which makes me think some states are
> reset between the 2 operations leading the engine to think that the
> second slice is part of a new frame.

That sounds a lot like how this works on Tegra. My understanding is that
for slice decoding you'd also get an interrupt every time a full slice
has been decoded perhaps coupled with another "frame done" interrupt
when the full frame has been decoded after the last slice.

In frame-level decode mode you don't get interrupts in between and
instead only get the "frame done" interrupt. Unless something went wrong
during decoding, in which case you also get an interrupt but with error
flags and status registers that help determine what exactly happened.

> Anyway, it doesn't sound like a crazy idea to support both per-slice
> and per-frame decoding and maybe have a way to expose what a
> specific codec can do (through an extra cap mechanism).

Yeah, I think it makes sense to support both for devices that can do
both. From what Nicolas said it may make sense for an application to
want to do slice-level decoding if receiving a stream from the network
and frame-level decoding if playing back from a local file. If a driver
supports both, the application could detect that and choose the
appropriate format.

It sounds to me like using different input formats for that would be a
very natural way to describe it. Applications can already detect the set
of supported input formats and set the format when they allocate buffers
so that should work very nicely.

> The other option would be to support only per-slice decoding with a
> mandatory START_FRAME/END_FRAME sequence to let drivers for HW that
> only support per-frame decoding know when they should trigger the
> decoding operation. The downside is that it implies having a bounce
> buffer where the driver can pack slices to be decoded on the END_FRAME
> event.

I vaguely remember that that's what the video codec abstraction does in
Mesa/Gallium. I'm not very familiar with V4L2, but this seems like it
could be problematic to integrate with the way that V4L2 works in
general. Perhaps sending a special buffer (0 length or whatever) to mark
the end of a frame would work. But this is probably something that
others have already thought about, since slice-level decoding is what
most people are using, hence there must already be a way for userspace
to somehow synchronize input vs. output buffers. Or does this currently
just work by queueing bitstream buffers as fast as possible and then
dequeueing frame buffers as they become available?

Thierry