From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0112C10F14 for ; Wed, 10 Apr 2019 16:05:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 49A7320652 for ; Wed, 10 Apr 2019 16:05:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ndufresne-ca.20150623.gappssmtp.com header.i=@ndufresne-ca.20150623.gappssmtp.com header.b="d6c2Amrh" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387645AbfDJQFo (ORCPT ); Wed, 10 Apr 2019 12:05:44 -0400 Received: from mail-qt1-f193.google.com ([209.85.160.193]:43321 "EHLO mail-qt1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729716AbfDJQFo (ORCPT ); Wed, 10 Apr 2019 12:05:44 -0400 Received: by mail-qt1-f193.google.com with SMTP id v32so3455855qtc.10 for ; Wed, 10 Apr 2019 09:05:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ndufresne-ca.20150623.gappssmtp.com; s=20150623; h=message-id:subject:from:to:cc:date:in-reply-to:references :user-agent:mime-version; bh=Jh9uAXxulGOY2KDc3Y6G0joBKw8Zne8VkheIfBP6RzE=; b=d6c2AmrhIK2HkO+BNTAT4TTFjOcBYlMGGgjtvx//d2S0nOa92J9SEKb6RI9FbO53Hl z8O4LHP4aJTrX+oWUM9BLCdsM73vgVc2MnQK5JoUOqhtoSh9jOZGtTsZ4i7Tq4lTt159 vfEWMyhSgXS1wiu7ZyK4OSNvvVz1T0ZIWToyiYw0YDxbTS4U0zPr/kqEQJwuj3DPWQR6 HaLLFYRZOILC/YMbfaPBvFiXdZ7ICGOCaDA5ULh4ydtkKocVXMBcKFXiFrJn4qLDp5EH l1MtzhVnSdxsm+ko6OogumBMoLJSe6IN5N6TvhqzKRATjI6qk3GuRIRBYk8P+FUNKqqL j2ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version; bh=Jh9uAXxulGOY2KDc3Y6G0joBKw8Zne8VkheIfBP6RzE=; b=GK8Zceo7e+IqwDZAbY9btxOP7efGiEgu+A0S5aMGzvmGTqca/R0Ixg+Z8+2cXpVKqf 9awmikzWDRRzoBe0cuZI/HWJGAD27V7TfYgZnaBk9V5iSQZJzh+aVdSyD9s0CYlGf8P/ +GwJ+2fuDI7bTHnFJTSydavWyuY2c5sNgDIj+U/mhN/1MXDAWSuLHrGmvSmV7MUrZPMX nWwzlG9GQ0PYbUwZpKqAJ/9ttPjR9prM0dgtqyP0gKO30u7Z2oeBP3WUsSx8P5/yHkoy AAPJcbY1lXMbp3W58prMbpcls2sq3ck+je+nLwND6uWURDz1w7VgTtUiE/ZeB3TQCB5G R29w== X-Gm-Message-State: APjAAAWpHkgkfZVXbNowBCFwhcoeYp5P/XwwmPMUwxQGgoq0rLig8kvx 2oQCWES/JGjPFCahaBP1XCrJLw== X-Google-Smtp-Source: APXvYqwZ4EIXsxV3UphHD7aAYgmkpiv5NxbBIPCuqLqDTGQ+960+vXPDUALwMNhQ24dbPiSm3+r5UQ== X-Received: by 2002:ac8:1b63:: with SMTP id p32mr37642035qtk.173.1554912342613; Wed, 10 Apr 2019 09:05:42 -0700 (PDT) Received: from tpx230-nicolas (modemcable154.55-37-24.static.videotron.ca. [24.37.55.154]) by smtp.gmail.com with ESMTPSA id n5sm20455057qkk.4.2019.04.10.09.05.40 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 10 Apr 2019 09:05:41 -0700 (PDT) Message-ID: <03751bb884a443ec1cea7b5c023c9d520ffcc3a0.camel@ndufresne.ca> Subject: Re: [PATCH v3 2/2] media: docs-rst: Document memory-to-memory video encoder interface From: Nicolas Dufresne To: Hans Verkuil , Tomasz Figa Cc: Linux Media Mailing List , Linux Kernel Mailing List , Mauro Carvalho Chehab , Pawel Osciak , Alexandre Courbot , Kamil Debski , Andrzej Hajda , Kyungmin Park , Jeongtae Park , Philipp Zabel , Tiffany Lin =?UTF-8?Q?=28=E6=9E=97=E6=85=A7=E7=8F=8A=29?= , Andrew-CT Chen =?UTF-8?Q?=28=E9=99=B3=E6=99=BA=E8=BF=AA=29?= , Stanimir Varbanov , Todor Tomov , Paul Kocialkowski , Laurent Pinchart , dave.stevenson@raspberrypi.org, Ezequiel Garcia , Maxime Jourdan Date: Wed, 10 Apr 2019 12:05:39 -0400 In-Reply-To: <1ec36515-b6ec-b355-47fb-2fe5ad4b3241@xs4all.nl> References: <20190124100419.26492-1-tfiga@chromium.org> <20190124100419.26492-3-tfiga@chromium.org> <4bbe4ce4-615a-b981-0855-cd78c7a002d9@xs4all.nl> <471720b7-e304-271b-256d-a3dd394773c9@xs4all.nl> <787ddc1f-388d-82be-2702-0d7d256f636c@xs4all.nl> <6cb0caf1-61a6-0719-1ade-1dcf8ed8a020@xs4all.nl> <1ec36515-b6ec-b355-47fb-2fe5ad4b3241@xs4all.nl> Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-Bw8g8yr+c7woSQXcNO9C" User-Agent: Evolution 3.30.5 (3.30.5-1.fc29) MIME-Version: 1.0 Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org --=-Bw8g8yr+c7woSQXcNO9C Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Le mercredi 10 avril 2019 =C3=A0 10:50 +0200, Hans Verkuil a =C3=A9crit : > On 4/9/19 11:35 AM, Tomasz Figa wrote: > > On Mon, Apr 8, 2019 at 8:11 PM Hans Verkuil wrote: > > > On 4/8/19 11:23 AM, Tomasz Figa wrote: > > > > On Fri, Apr 5, 2019 at 7:03 PM Hans Verkuil wr= ote: > > > > > On 4/5/19 10:12 AM, Tomasz Figa wrote: > > > > > > On Thu, Mar 14, 2019 at 10:57 PM Hans Verkuil wrote: > > > > > > > Hi Tomasz, > > > > > > >=20 > > > > > > > Some more comments... > > > > > > >=20 > > > > > > > On 1/29/19 2:52 PM, Hans Verkuil wrote: > > > > > > > > Hi Tomasz, > > > > > > > >=20 > > > > > > > > Some comments below. Nothing major, so I think a v4 should = be ready to be > > > > > > > > merged. > > > > > > > >=20 > > > > > > > > On 1/24/19 11:04 AM, Tomasz Figa wrote: > > > > > > > > > Due to complexity of the video encoding process, the V4L2= drivers of > > > > > > > > > stateful encoder hardware require specific sequences of V= 4L2 API calls > > > > > > > > > to be followed. These include capability enumeration, ini= tialization, > > > > > > > > > encoding, encode parameters change, drain and reset. > > > > > > > > >=20 > > > > > > > > > Specifics of the above have been discussed during Media W= orkshops at > > > > > > > > > LinuxCon Europe 2012 in Barcelona and then later Embedded= Linux > > > > > > > > > Conference Europe 2014 in D=C3=BCsseldorf. The de facto C= odec API that > > > > > > > > > originated at those events was later implemented by the d= rivers we already > > > > > > > > > have merged in mainline, such as s5p-mfc or coda. > > > > > > > > >=20 > > > > > > > > > The only thing missing was the real specification include= d as a part of > > > > > > > > > Linux Media documentation. Fix it now and document the en= coder part of > > > > > > > > > the Codec API. > > > > > > > > >=20 > > > > > > > > > Signed-off-by: Tomasz Figa > > > > > > > > > --- > > > > > > > > > Documentation/media/uapi/v4l/dev-encoder.rst | 586 ++++= ++++++++++++++ > > > > > > > > > Documentation/media/uapi/v4l/dev-mem2mem.rst | 1 + > > > > > > > > > Documentation/media/uapi/v4l/pixfmt-v4l2.rst | 5 + > > > > > > > > > Documentation/media/uapi/v4l/v4l2.rst | 2 + > > > > > > > > > .../media/uapi/v4l/vidioc-encoder-cmd.rst | 38 +- > > > > > > > > > 5 files changed, 617 insertions(+), 15 deletions(-) > > > > > > > > > create mode 100644 Documentation/media/uapi/v4l/dev-enco= der.rst > > > > > > > > >=20 > > > > > > > > > diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst= b/Documentation/media/uapi/v4l/dev-encoder.rst > > > > > > > > > new file mode 100644 > > > > > > > > > index 000000000000..fb8b05a132ee > > > > > > > > > --- /dev/null > > > > > > > > > +++ b/Documentation/media/uapi/v4l/dev-encoder.rst > > > > > > > > > @@ -0,0 +1,586 @@ > > > > > > > > > +.. -*- coding: utf-8; mode: rst -*- > > > > > > > > > + > > > > > > > > > +.. _encoder: > > > > > > > > > + > > > > > > > > > +************************************************* > > > > > > > > > +Memory-to-memory Stateful Video Encoder Interface > > > > > > > > > +************************************************* > > > > > > > > > + > > > > > > > > > +A stateful video encoder takes raw video frames in displ= ay order and encodes > > > > > > > > > +them into a bitstream. It generates complete chunks of t= he bitstream, including > > > > > > > > > +all metadata, headers, etc. The resulting bitstream does= not require any > > > > > > > > > +further post-processing by the client. > > > > > > > > > + > > > > > > > > > +Performing software stream processing, header generation= etc. in the driver > > > > > > > > > +in order to support this interface is strongly discourag= ed. In case such > > > > > > > > > +operations are needed, use of the Stateless Video Encode= r Interface (in > > > > > > > > > +development) is strongly advised. > > > > > > > > > + > > > > > > > > > +Conventions and notation used in this document > > > > > > > > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > > > > > > > > > + > > > > > > > > > +1. The general V4L2 API rules apply if not specified in = this document > > > > > > > > > + otherwise. > > > > > > > > > + > > > > > > > > > +2. The meaning of words "must", "may", "should", etc. is= as per `RFC > > > > > > > > > + 2119 `_. > > > > > > > > > + > > > > > > > > > +3. All steps not marked "optional" are required. > > > > > > > > > + > > > > > > > > > +4. :c:func:`VIDIOC_G_EXT_CTRLS` and :c:func:`VIDIOC_S_EX= T_CTRLS` may be used > > > > > > > > > + interchangeably with :c:func:`VIDIOC_G_CTRL` and :c:f= unc:`VIDIOC_S_CTRL`, > > > > > > > > > + unless specified otherwise. > > > > > > > > > + > > > > > > > > > +5. Single-planar API (see :ref:`planar-apis`) and applic= able structures may be > > > > > > > > > + used interchangeably with multi-planar API, unless sp= ecified otherwise, > > > > > > > > > + depending on decoder capabilities and following the g= eneral V4L2 guidelines. > > > > > > > > > + > > > > > > > > > +6. i =3D [a..b]: sequence of integers from a to b, inclu= sive, i.e. i =3D > > > > > > > > > + [0..2]: i =3D 0, 1, 2. > > > > > > > > > + > > > > > > > > > +7. Given an ``OUTPUT`` buffer A, then A=E2=80=99 represe= nts a buffer on the ``CAPTURE`` > > > > > > > > > + queue containing data that resulted from processing b= uffer A. > > > > > > > > > + > > > > > > > > > +Glossary > > > > > > > > > +=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > > > > + > > > > > > > > > +Refer to :ref:`decoder-glossary`. > > > > > > > > > + > > > > > > > > > +State machine > > > > > > > > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > > > > + > > > > > > > > > +.. kernel-render:: DOT > > > > > > > > > + :alt: DOT digraph of encoder state machine > > > > > > > > > + :caption: Encoder state machine > > > > > > > > > + > > > > > > > > > + digraph encoder_state_machine { > > > > > > > > > + node [shape =3D doublecircle, label=3D"Encoding"]= Encoding; > > > > > > > > > + > > > > > > > > > + node [shape =3D circle, label=3D"Initialization"]= Initialization; > > > > > > > > > + node [shape =3D circle, label=3D"Stopped"] Stoppe= d; > > > > > > > > > + node [shape =3D circle, label=3D"Drain"] Drain; > > > > > > > > > + node [shape =3D circle, label=3D"Reset"] Reset; > > > > > > > > > + > > > > > > > > > + node [shape =3D point]; qi > > > > > > > > > + qi -> Initialization [ label =3D "open()" ]; > > > > > > > > > + > > > > > > > > > + Initialization -> Encoding [ label =3D "Both queu= es streaming" ]; > > > > > > > > > + > > > > > > > > > + Encoding -> Drain [ label =3D "V4L2_DEC_CMD_STOP"= ]; > > > > > > > > > + Encoding -> Reset [ label =3D "VIDIOC_STREAMOFF(C= APTURE)" ]; > > > > > > > > > + Encoding -> Stopped [ label =3D "VIDIOC_STREAMOFF= (OUTPUT)" ]; > > > > > > > > > + Encoding -> Encoding; > > > > > > > > > + > > > > > > > > > + Drain -> Stopped [ label =3D "All CAPTURE\nbuffer= s dequeued\nor\nVIDIOC_STREAMOFF(CAPTURE)" ]; > > > > > > > > > + Drain -> Reset [ label =3D "VIDIOC_STREAMOFF(CAPT= URE)" ]; > > > > > > > > > + > > > > > > > > > + Reset -> Encoding [ label =3D "VIDIOC_STREAMON(CA= PTURE)" ]; > > > > > > > > > + Reset -> Initialization [ label =3D "VIDIOC_REQBU= FS(OUTPUT, 0)" ]; > > > > > > > > > + > > > > > > > > > + Stopped -> Encoding [ label =3D "V4L2_DEC_CMD_STA= RT\nor\nVIDIOC_STREAMON(OUTPUT)" ]; > > > > > > > > > + Stopped -> Reset [ label =3D "VIDIOC_STREAMOFF(CA= PTURE)" ]; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > +Querying capabilities > > > > > > > > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > > > > > > > > > + > > > > > > > > > +1. To enumerate the set of coded formats supported by th= e encoder, the > > > > > > > > > + client may call :c:func:`VIDIOC_ENUM_FMT` on ``CAPTUR= E``. > > > > > > > > > + > > > > > > > > > + * The full set of supported formats will be returned,= regardless of the > > > > > > > > > + format set on ``OUTPUT``. > > > > > > > > > + > > > > > > > > > +2. To enumerate the set of supported raw formats, the cl= ient may call > > > > > > > > > + :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``. > > > > > > > > > + > > > > > > > > > + * Only the formats supported for the format currently= active on ``CAPTURE`` > > > > > > > > > + will be returned. > > > > > > > > > + > > > > > > > > > + * In order to enumerate raw formats supported by a gi= ven coded format, > > > > > > > > > + the client must first set that coded format on ``CA= PTURE`` and then > > > > > > > > > + enumerate the formats on ``OUTPUT``. > > > > > > > > > + > > > > > > > > > +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` t= o detect supported > > > > > > > > > + resolutions for a given format, passing desired pixel= format in > > > > > > > > > + :c:type:`v4l2_frmsizeenum` ``pixel_format``. > > > > > > > > > + > > > > > > > > > + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES`= for a coded pixel > > > > > > > > > + format will include all possible coded resolutions = supported by the > > > > > > > > > + encoder for given coded pixel format. > > > > > > > > > + > > > > > > > > > + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES`= for a raw pixel format > > > > > > > > > + will include all possible frame buffer resolutions = supported by the > > > > > > > > > + encoder for given raw pixel format and coded format= currently set on > > > > > > > > > + ``CAPTURE``. > > > > > > > > > + > > > > > > > > > +4. Supported profiles and levels for the coded format cu= rrently set on > > > > > > > > > + ``CAPTURE``, if applicable, may be queried using thei= r respective controls > > > > > > > > > + via :c:func:`VIDIOC_QUERYCTRL`. > > > > > > > > > + > > > > > > > > > +5. Any additional encoder capabilities may be discovered= by querying > > > > > > > > > + their respective controls. > > > > > > > > > + > > > > > > > > > +Initialization > > > > > > > > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > > > > + > > > > > > > > > +1. Set the coded format on the ``CAPTURE`` queue via :c:= func:`VIDIOC_S_FMT` > > > > > > > > > + > > > > > > > > > + * **Required fields:** > > > > > > > > > + > > > > > > > > > + ``type`` > > > > > > > > > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CA= PTURE`` > > > > > > > > > + > > > > > > > > > + ``pixelformat`` > > > > > > > > > + the coded format to be produced > > > > > > > > > + > > > > > > > > > + ``sizeimage`` > > > > > > > > > + desired size of ``CAPTURE`` buffers; the encode= r may adjust it to > > > > > > > > > + match hardware requirements > > > > > > > > > + > > > > > > > > > + ``width``, ``height`` > > > > > > > > > + ignored (always zero) > > > > > > > > > + > > > > > > > > > + other fields > > > > > > > > > + follow standard semantics > > > > > > > > > + > > > > > > > > > + * **Return fields:** > > > > > > > > > + > > > > > > > > > + ``sizeimage`` > > > > > > > > > + adjusted size of ``CAPTURE`` buffers > > > > > > > > > + > > > > > > > > > + .. important:: > > > > > > > > > + > > > > > > > > > + Changing the ``CAPTURE`` format may change the cur= rently set ``OUTPUT`` > > > > > > > > > + format. The encoder will derive a new ``OUTPUT`` f= ormat from the > > > > > > > > > + ``CAPTURE`` format being set, including resolution= , colorimetry > > > > > > > > > + parameters, etc. If the client needs a specific ``= OUTPUT`` format, it > > > > > > > > > + must adjust it afterwards. > > > > > > > >=20 > > > > > > > > Hmm, "including resolution": if width and height are set to= 0, what should the > > > > > > > > OUTPUT resolution be? Up to the driver? I think this should= be clarified since > > > > > > > > at a first reading of this paragraph it appears to be contr= adictory. > > > > > > >=20 > > > > > > > I think the driver should just return the width and height of= the OUTPUT > > > > > > > format. So the width and height that userspace specifies is j= ust ignored > > > > > > > and replaced by the width and height of the OUTPUT format. Af= ter all, that's > > > > > > > what the bitstream will encode. Returning 0 for width and hei= ght would make > > > > > > > this a strange exception in V4L2 and I want to avoid that. > > > > > > >=20 > > > > > >=20 > > > > > > Hmm, however, the width and height of the OUTPUT format is not = what's > > > > > > actually encoded in the bitstream. The right selection rectangl= e > > > > > > determines that. > > > > > >=20 > > > > > > In one of the previous versions I though we could put the codec > > > >=20 > > > > s/codec/coded/... > > > >=20 > > > > > > resolution as the width and height of the CAPTURE format, which= would > > > > > > be the resolution of the encoded image rounded up to full macro= blocks > > > > > > +/- some encoder-specific constraints. AFAIR there was some con= cern > > > > > > about OUTPUT format changes triggering CAPTURE format changes, = but to > > > > > > be honest, I'm not sure if that's really a problem. I just deci= ded to > > > > > > drop that for the simplicity. > > > > >=20 > > > > > I'm not sure what your point is. > > > > >=20 > > > > > The OUTPUT format has the coded resolution, > > > >=20 > > > > That's not always true. The OUTPUT format is just the format of the > > > > source frame buffers. In special cases where the source resolution = is > > > > nicely aligned, it would be the same as coded size, but the remaini= ng > > > > cases are valid as well. > > > >=20 > > > > > so when you set the > > > > > CAPTURE format it can just copy the OUTPUT coded resolution unles= s the > > > > > chosen CAPTURE pixelformat can't handle that in which case both t= he > > > > > OUTPUT and CAPTURE coded resolutions are clamped to whatever is t= he maximum > > > > > or minimum the codec is capable of. > > > >=20 > > > > As per my comment above, generally speaking, the encoder will deriv= e > > > > an appropriate coded format from the OUTPUT format, but also other > > > > factors, like the crop rectangles and possibly some internal > > > > constraints. > > > >=20 > > > > > That said, I am fine with just leaving it up to the driver as sug= gested > > > > > before. Just as long as both the CAPTURE and OUTPUT formats remai= n valid > > > > > (i.e. width and height may never be out of range). > > > > >=20 > > > >=20 > > > > Sounds good to me. > > > >=20 > > > > > > > > > + > > > > > > > > > +2. **Optional.** Enumerate supported ``OUTPUT`` formats = (raw formats for > > > > > > > > > + source) for the selected coded format via :c:func:`VI= DIOC_ENUM_FMT`. > > > > > > > > > + > > > > > > > > > + * **Required fields:** > > > > > > > > > + > > > > > > > > > + ``type`` > > > > > > > > > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OU= TPUT`` > > > > > > > > > + > > > > > > > > > + other fields > > > > > > > > > + follow standard semantics > > > > > > > > > + > > > > > > > > > + * **Return fields:** > > > > > > > > > + > > > > > > > > > + ``pixelformat`` > > > > > > > > > + raw format supported for the coded format curre= ntly selected on > > > > > > > > > + the ``CAPTURE`` queue. > > > > > > > > > + > > > > > > > > > + other fields > > > > > > > > > + follow standard semantics > > > > > > > > > + > > > > > > > > > +3. Set the raw source format on the ``OUTPUT`` queue via > > > > > > > > > + :c:func:`VIDIOC_S_FMT`. > > > > > > > > > + > > > > > > > > > + * **Required fields:** > > > > > > > > > + > > > > > > > > > + ``type`` > > > > > > > > > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OU= TPUT`` > > > > > > > > > + > > > > > > > > > + ``pixelformat`` > > > > > > > > > + raw format of the source > > > > > > > > > + > > > > > > > > > + ``width``, ``height`` > > > > > > > > > + source resolution > > > > > > > > > + > > > > > > > > > + other fields > > > > > > > > > + follow standard semantics > > > > > > > > > + > > > > > > > > > + * **Return fields:** > > > > > > > > > + > > > > > > > > > + ``width``, ``height`` > > > > > > > > > + may be adjusted by encoder to match alignment r= equirements, as > > > > > > > > > + required by the currently selected formats > > > > > > > >=20 > > > > > > > > What if the width x height is larger than the maximum suppo= rted by the > > > > > > > > selected coded format? This should probably mention that in= that case the > > > > > > > > width x height is reduced to the largest allowed value. Als= o mention that > > > > > > > > this maximum is reported by VIDIOC_ENUM_FRAMESIZES. > > > > > > > >=20 > > > > > > > > > + > > > > > > > > > + other fields > > > > > > > > > + follow standard semantics > > > > > > > > > + > > > > > > > > > + * Setting the source resolution will reset the select= ion rectangles to their > > > > > > > > > + default values, based on the new resolution, as des= cribed in the step 5 > > > > > > > >=20 > > > > > > > > 5 -> 4 > > > > > > > >=20 > > > > > > > > Or just say: "as described in the next step." > > > > > > > >=20 > > > > > > > > > + below. > > > > > > >=20 > > > > > > > It should also be made explicit that: > > > > > > >=20 > > > > > > > 1) the crop rectangle will be set to the given width and heig= ht *before* > > > > > > > it is being adjusted by S_FMT. > > > > > > >=20 > > > > > >=20 > > > > > > I don't think that's what we want here. > > > > > >=20 > > > > > > Defining the default rectangle to be exactly the same as the OU= TPUT > > > > > > resolution (after the adjustment) makes the semantics consisten= t - not > > > > > > setting the crop rectangle gives you exactly the behavior as if= there > > > > > > was no cropping involved (or supported by the encoder). > > > > >=20 > > > > > I think you are right. This seems to be what the coda driver does= as well. > > > > > It is convenient to be able to just set a 1920x1080 format and ha= ve that > > > > > resolution be stored as the crop rectangle, since it avoids havin= g to call > > > > > s_selection afterwards, but it is not really consistent with the = way V4L2 > > > > > works. > > > > >=20 > > > > > > > Open question: should we support a compose rectangle for the = CAPTURE that > > > > > > > is the same as the OUTPUT crop rectangle? I.e. the CAPTURE fo= rmat contains > > > > > > > the adjusted width and height and the compose rectangle (read= -only) contains > > > > > > > the visible width and height. It's not strictly necessary, bu= t it is > > > > > > > symmetrical. > > > > > >=20 > > > > > > Wouldn't it rather be the CAPTURE crop rectangle that would be = of the > > > > > > same resolution of the OUTPUT compose rectangle? Then you could > > > > > > actually have the CAPTURE compose rectangle for putting that in= to the > > > > > > desired rectangle of the encoded stream, if the encoder support= s that. > > > > > > (I don't know any that does, so probably out of concern for now= .) > > > > >=20 > > > > > Yes, you are right. > > > > >=20 > > > > > But should we support this? > > > > >=20 > > > > > I actually think not for this initial version. It can be added la= ter, I guess. > > > > >=20 > > > >=20 > > > > I think it boils down on whether adding it later wouldn't > > > > significantly complicate the application logic. It also relates to = my > > > > other comment somewhere below. > > > >=20 > > > > > > > 2) the CAPTURE format will be updated as well with the new OU= TPUT width and > > > > > > > height. The CAPTURE sizeimage might change as well. > > > > > > >=20 > > > > > > > > > + > > > > > > > > > +4. **Optional.** Set the visible resolution for the stre= am metadata via > > > > > > > > > + :c:func:`VIDIOC_S_SELECTION` on the ``OUTPUT`` queue. > > > > > > >=20 > > > > > > > I think you should mention that this is only necessary if the= crop rectangle > > > > > > > that is set when you set the format isn't what you want. > > > > > > >=20 > > > > > >=20 > > > > > > Ack. > > > > > >=20 > > > > > > > > > + > > > > > > > > > + * **Required fields:** > > > > > > > > > + > > > > > > > > > + ``type`` > > > > > > > > > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OU= TPUT`` > > > > > > > > > + > > > > > > > > > + ``target`` > > > > > > > > > + set to ``V4L2_SEL_TGT_CROP`` > > > > > > > > > + > > > > > > > > > + ``r.left``, ``r.top``, ``r.width``, ``r.height`` > > > > > > > > > + visible rectangle; this must fit within the `V4= L2_SEL_TGT_CROP_BOUNDS` > > > > > > > > > + rectangle and may be subject to adjustment to m= atch codec and > > > > > > > > > + hardware constraints > > > > > > > > > + > > > > > > > > > + * **Return fields:** > > > > > > > > > + > > > > > > > > > + ``r.left``, ``r.top``, ``r.width``, ``r.height`` > > > > > > > > > + visible rectangle adjusted by the encoder > > > > > > > > > + > > > > > > > > > + * The following selection targets are supported on ``= OUTPUT``: > > > > > > > > > + > > > > > > > > > + ``V4L2_SEL_TGT_CROP_BOUNDS`` > > > > > > > > > + equal to the full source frame, matching the ac= tive ``OUTPUT`` > > > > > > > > > + format > > > > > > > > > + > > > > > > > > > + ``V4L2_SEL_TGT_CROP_DEFAULT`` > > > > > > > > > + equal to ``V4L2_SEL_TGT_CROP_BOUNDS`` > > > > > > > > > + > > > > > > > > > + ``V4L2_SEL_TGT_CROP`` > > > > > > > > > + rectangle within the source buffer to be encode= d into the > > > > > > > > > + ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_= CROP_DEFAULT`` > > > > > > > > > + > > > > > > > > > + .. note:: > > > > > > > > > + > > > > > > > > > + A common use case for this selection target = is encoding a source > > > > > > > > > + video with a resolution that is not a multip= le of a macroblock, > > > > > > > > > + e.g. the common 1920x1080 resolution may re= quire the source > > > > > > > > > + buffers to be aligned to 1920x1088 for codec= s with 16x16 macroblock > > > > > > > > > + size. To avoid encoding the padding, the cli= ent needs to explicitly > > > > > > > > > + configure this selection target to 1920x1080= . > > > > > > >=20 > > > > > > > This last sentence contradicts the proposed behavior of S_FMT= (OUTPUT). > > > > > > >=20 > > > > > >=20 > > > > > > Sorry, which part exactly and what part of the proposal exactly= ? :) > > > > > > (My comment above might be related, though.) > > > > >=20 > > > > > Ignore my comment. We go back to explicitly requiring userspace t= o set the OUTPUT > > > > > crop selection target, so this note remains valid. > > > > >=20 > > > >=20 > > > > Ack. > > > >=20 > > > > > > > > > + > > > > > > > > > + ``V4L2_SEL_TGT_COMPOSE_BOUNDS`` > > > > > > > > > + maximum rectangle within the coded resolution, = which the cropped > > > > > > > > > + source frame can be composed into; if the hardw= are does not support > > > > > > > > > + composition or scaling, then this is always equ= al to the rectangle of > > > > > > > > > + width and height matching ``V4L2_SEL_TGT_CROP``= and located at (0, 0) > > > > > > > > > + > > > > > > > > > + ``V4L2_SEL_TGT_COMPOSE_DEFAULT`` > > > > > > > > > + equal to a rectangle of width and height matchi= ng > > > > > > > > > + ``V4L2_SEL_TGT_CROP`` and located at (0, 0) > > > > > > > > > + > > > > > > > > > + ``V4L2_SEL_TGT_COMPOSE`` > > > > > > > > > + rectangle within the coded frame, which the cro= pped source frame > > > > > > > > > + is to be composed into; defaults to > > > > > > > > > + ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on = hardware without > > > > > > > > > + additional compose/scaling capabilities; result= ing stream will > > > > > > > > > + have this rectangle encoded as the visible rect= angle in its > > > > > > > > > + metadata > > > > > > >=20 > > > > > > > I think the compose targets for OUTPUT are only needed if the= hardware can > > > > > > > actually do scaling and/or composition. Otherwise they can (m= ust?) be > > > > > > > dropped. > > > > > > >=20 > > > > > >=20 > > > > > > Note that V4L2_SEL_TGT_COMPOSE is defined to be the way for the > > > > > > userspace to learn the target visible rectangle that's going to= be > > > > > > encoded in the stream metadata. If we omit it, we wouldn't have= a way > > > > > > that would be consistent between encoders that can do > > > > > > scaling/composition and those that can't. > > > > >=20 > > > > > I'm not convinced about this. The standard API behavior is not to= expose > > > > > functionality that the hardware can't do. So if scaling isn't pos= sible on > > > > > the OUTPUT side, then it shouldn't expose OUTPUT compose rectangl= es. > > > > >=20 > > > > > I also believe it very unlikely that we'll see encoders capable o= f scaling > > > > > as it doesn't make much sense. > > > >=20 > > > > It does make a lot of sense - WebRTC requires 3 different sizes of = the > > > > stream to be encoded at the same time. However, unfortunately, I > > > > haven't yet seen an encoder capable of doing so. > > > >=20 > > > > > I would prefer to drop this to simplify the > > > > > spec, and when we get encoders that can scale, then we can add su= pport for > > > > > compose rectangles (and I'm sure we'll need to think about how th= at > > > > > influences the CAPTURE side as well). > > > > >=20 > > > > > For encoders without scaling it is the OUTPUT crop rectangle that= defines > > > > > the visible rectangle. > > > > >=20 > > > > > > However, with your proposal of actually having selection rectan= gles > > > > > > for the CAPTURE queue, it could be solved indeed. The OUTPUT qu= eue > > > > > > would expose a varying set of rectangles, depending on the hard= ware > > > > > > capability, while the CAPTURE queue would always expose its rec= tangle > > > > > > with that information. > > > > >=20 > > > > > I think we should keep it simple and only define selection rectan= gles > > > > > when really needed. > > > > >=20 > > > > > So encoders support CROP on the OUTPUT, and decoders support CAPT= URE > > > > > COMPOSE (may be read-only). Nothing else. > > > > >=20 > > > > > Once support for scaling is needed (either on the encoder or deco= der > > > > > side), then the spec should be enhanced. But I prefer to postpone= that > > > > > until we actually have hardware that needs this. > > > > >=20 > > > >=20 > > > > Okay, let's do it this way then. Actually, I don't even think there= is > > > > much value in exposing information internal to the bitstream metada= ta > > > > like this, similarly to the coded size. My intention was to just > > > > ensure that we can easily add scaling/composing functionality later= . > > > >=20 > > > > I just removed the COMPOSE rectangles from my next draft. > > >=20 > > > I don't think that supporting scaling will be a problem for the API a= s > > > such, since this is supported for standard video capture devices. It > > > just gets very complicated trying to describe how to configure all th= is. > > >=20 > > > So I prefer to avoid this until we need to. > > >=20 > > > > [snip] > > > > > > > Changing the OUTPUT format will always fail if OUTPUT buffers= are already allocated, > > > > > > > or if changing the OUTPUT format would change the CAPTURE for= mat (sizeimage in > > > > > > > particular) and CAPTURE buffers were already allocated and ar= e too small. > > > > > >=20 > > > > > > The OUTPUT format must not change the CAPTURE format by definit= ion. > > > > > > Otherwise we end up in a situation where we can't commit, becau= se both > > > > > > queue formats can affect each other. Any change to the OUTPUT f= ormat > > > > > > that wouldn't work with the current CAPTURE format should be ad= justed > > > > > > by the driver to match the current CAPTURE format. > > > > >=20 > > > > > But the CAPTURE format *does* depend on the OUTPUT format: if the= output > > > > > resolution changes, then so does the CAPTURE resolution and esp. = the > > > > > sizeimage value, since that is typically resolution dependent. > > > > >=20 > > > > > The coda driver does this as well: changing the output resolution > > > > > will update the capture resolution and sizeimage. The vicodec dri= ver does the > > > > > same. > > > > >=20 > > > > > Setting the CAPTURE format basically just selects the codec to us= e, after > > > > > that you can set the OUTPUT format and read the updated CAPTURE f= ormat to > > > > > get the new sizeimage value. In fact, setting the CAPTURE format = shouldn't > > > > > change the OUTPUT format, unless the OUTPUT format is incompatibl= e with the > > > > > newly selected codec. > > > >=20 > > > > Let me think about it for a while. > > >=20 > > > Sleep on it, always works well for me :-) > >=20 > > Okay, I think I'm not convinced. > >=20 > > I believe we decided to allow sizeimage to be specified by the > > application, because it knows more about the stream it's going to > > encode. Only setting the size to 0 would make the encoder fall back to > > some simple internal heuristic. >=20 > Yes, that was the plan, but the patch stalled. I completely forgot > about this patch :-) >=20 > My last reply to "Re: [RFC PATCH] media/doc: Allow sizeimage to be set by > v4l clients" was March 14th. >=20 > Also, sizeimage must be at least the minimum size required for the given > CAPTURE width and height. So if it is less, then sizeimage will be set to= that > minimum size. >=20 > > Another thing is handling resolution changes. I believe that would > > have to be handled by stopping the OUTPUT queue, changing the OUTPUT > > format and starting the OUTPUT queue, all that without stopping the > > CAPTURE queue. With the behavior you described it wouldn't work, > > because the OUTPUT format couldn't be changed. > >=20 > > I'd suggest making OUTPUT format changes not change the CAPTURE sizeima= ge. >=20 > So OUTPUT format changes will still update the CAPTURE width and height? >=20 > It's kind of weird if you are encoding e.g. 1920x1080 but the CAPTURE for= mat > says 1280x720. I'm not sure what is best. >=20 > What if the CAPTURE sizeimage is too small for the new OUTPUT resolution? > Should S_FMT(OUTPUT) fail with some error in that case? Sounds like we need something similar to the SOURCE_CHANGE event mechanism if we want to allow dynamic bitrate control which would require re-allocation of the capture buffer queue. (Or any other runtime control on our encoders, which is really expected to be supported these days). >=20 > Regards, >=20 > Hans >=20 > > Best regards, > > Tomasz > >=20 --=-Bw8g8yr+c7woSQXcNO9C Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iF0EABECAB0WIQSScpfJiL+hb5vvd45xUwItrAaoHAUCXK4UUwAKCRBxUwItrAao HC+MAJ9YvciEGs0mELkul1dHtdFQwcenPACggYkhJc4fvjSHhdQPauVext7AFt0= =C+Ex -----END PGP SIGNATURE----- --=-Bw8g8yr+c7woSQXcNO9C--