Hi, Le vendredi 19 octobre 2018 à 11:40 +0200, Hans Verkuil a écrit : > From Alexandre's '[RFC PATCH v3] media: docs-rst: Document m2m stateless > video decoder interface': > > On 10/19/18 10:09, Alexandre Courbot wrote: > > Two points being currently discussed have not been changed in this > > revision due to lack of better idea. Of course this is open to change: > > > > > * The other hot topic is the use of capture buffer indexes in order to > > reference frames. I understand the concerns, but I doesn't seem like > > we have come with a better proposal so far - and since capture buffers > > are essentially well, frames, using their buffer index to directly > > reference them doesn't sound too inappropriate to me. There is also > > the restriction that drivers must return capture buffers in queue > > order. Do we have any concrete example where this scenario would not > > work? > > I'll stick to decoders in describing the issue. Stateless encoders probably > do not have this issue. > > To recap: the application provides a buffer with compressed data to the > decoder. After the request is finished the application can dequeue the > decompressed frame from the capture queue. > > In order to decompress the decoder needs to access previously decoded > reference frames. The request passed to the decoder contained state > information containing the buffer index (or indices) of capture buffers > that contain the reference frame(s). > > This approach puts restrictions on the framework and the application: > > 1) It assumes that the application can predict the capture indices. > This works as long as there is a simple relationship between the > buffer passed to the decoder and the buffer you get back. > > But that may not be true for future codecs. And what if one buffer > produces multiple capture buffers? (E.g. if you want to get back > decompressed slices instead of full frames to reduce output latency). I don't really understand how there could be multiple capture buffers for the same frame used as reference here. I believe the stateless VPU API should make it a hard requirement that reference frames must be fully held in a single capture buffer. This is because the hardware will generally expect one address for the reference frame (that's definitely the case for Cedrus: we cannot deal with reference frames if each reference is scattered accross multiple buffers). So I don't believe it's a problem to associate a reference frame and a single capture buffer for the stateless VPU case. Rather that it should be specified as a requriement of the API. I don't see any problem with allowing multiple output buffers though (that should then be rendered to the same capture buffer), so the 1:1 relationship between output and capture cannot be assumed either way. > This API should be designed to be future-proof (within reason of course), > and I am not at all convinced that future codecs will be just as easy > to predict. > > 2) It assumes that neither drivers nor applications mess with the buffers. > One case that might happen today is if the DMA fails and a buffer is > returned marked ERROR and the DMA is retried with the next buffer. There > is nothing in the spec that prevents you from doing that, but it will mess > up the capture index numbering. And does the application always know in > what order capture buffers are queued? Perhaps there are two threads: one > queueing buffers with compressed data, and the other dequeueing the > decompressed buffers, and they are running mostly independently. > > I believe that assuming that you can always predict the indices of the > capture queue is dangerous and asking for problems in the future. I agree, assuming that userspace can predict the matching capture buffer index seems very fragile. > I am very much in favor of using a dedicated cookie. The application sets > it for the compressed buffer and the driver copies it to the uncompressed > capture buffer. It keeps track of the association between capture index > and cookie. If a compressed buffer decompresses into multiple capture > buffers, then they will all be associated with the same cookie, so > that simplifies how you refer to reference frames if they are split > over multiple buffers. This seems like a great idea, I'm all for it! > The codec controls refer to reference frames by cookie(s). > > For existing applications that use the capture index all you need to do > is to set the capture index as the cookie value in the output buffer. > > It is my understanding that ChromeOS was using the timestamp as the > cookie value. > > I have thought about that, but I am not in favor of doing that. One > reason is that struct timeval comes in various flavors (32 bit, 64 bit, > and a y2038-compatible 32-bit type in the future). > > The y2038 compat code that we will need concerns me in particular since > it will mean that the timeval is converted from 32 to 64 bit and back > again, and that might well mangle the data. I'm not so sure if you can > stick a 64-bit pointer in the timeval (e.g. the high 32 bits go to into > the tv_sec field, the low 32 bits go to the usecs). The y2038 conversion > might mangle the tv_usec field (e.g. divide by 1000000 and add the seconds > to the tv_sec field). > > I would really prefer an independent 64-bit cookie value that the application > can set instead of abusing something else. > > I propose to make a union with v4l2_timecode (which nobody uses) and a > new V4L2_BUF_FLAG_COOKIE flag. > > struct v4l2_buffer_cookie { > __u32 high; > __u32 low; > }; > > And in v4l2_buffer: > > union { > struct v4l2_timecode timecode; > struct v4l2_buffer_cookie cookie; > }; > > And static inlines: > > void v4l2_buffer_set_cookie(struct v4l2_buffer *buf, __u64 cookie) > { > buf->cookie.high = cookie >> 32; > buf->cookie.low = cookie & 0xffffffffULL; > buf->flags |= V4L2_BUF_FLAG_COOKIE; > } > > void v4l2_buffer_set_cookie_ptr(struct v4l2_buffer *buf, void *cookie) > { > v4l2_buffer_set_cookie(buf, (__u64)cookie); > } > > __u64 v4l2_buffer_get_cookie(struct v4l2_buffer *buf) > { > if (!(buf->flags & V4L2_BUF_FLAG_COOKIE)) > return 0; > return (((__u64)buf->cookie.high) << 32) | (__u64)buf->cookie.low; > } > > void *v4l2_buffer_get_cookie_ptr(struct v4l2_buffer *buf) > { > return (void *)v4l2_buffer_get_cookie(buf); > } > > Why not just use __u64? Because the alignment in v4l2_buffer is a nightmare. > Using __u64 would create holes, made even worse by different struct timeval > sizes depending on the architecture. > > I'm proposing a struct v4l2_ext_buffer together with new streaming ioctls > during the media summit that has a clean layout and there this can be just > a __u64. > > I'm calling it a 'cookie' here, but that's just a suggestion. Better > names are welcome. Cheers, Paul -- Developer of free digital technology and hardware support. Website: https://www.paulk.fr/ Coding blog: https://code.paulk.fr/ Git repositories: https://git.paulk.fr/ https://git.code.paulk.fr/