From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05E88C433E7 for ; Tue, 13 Oct 2020 06:47:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ADECE20714 for ; Tue, 13 Oct 2020 06:47:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389776AbgJMGrc (ORCPT ); Tue, 13 Oct 2020 02:47:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389346AbgJMGrc (ORCPT ); Tue, 13 Oct 2020 02:47:32 -0400 Received: from bhuna.collabora.co.uk (bhuna.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e3e3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28D8AC0613D0; Mon, 12 Oct 2020 23:47:32 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: aratiu) with ESMTPSA id 7652F1F44489 From: Adrian Ratiu To: Jonas Karlman Cc: Tomasz Figa , Ezequiel Garcia , Philipp Zabel , Mauro Carvalho Chehab , Fruehberger Peter , kuhanh.murugasen.krishnan@intel.com, Daniel Vetter , kernel@collabora.com, linux-media@vger.kernel.org, linux-rockchip@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 00/18] Add Hantro regmap and VC8000 h264 decode support In-Reply-To: <97e84bb5-972a-091d-a159-6ab1151f17ab@kwiboo.se> References: <20201012205957.889185-1-adrian.ratiu@collabora.com> <97e84bb5-972a-091d-a159-6ab1151f17ab@kwiboo.se> Date: Tue, 13 Oct 2020 09:48:29 +0300 Message-ID: <87r1q28vuq.fsf@iwork.i-did-not-set--mail-host-address--so-tickle-me> MIME-Version: 1.0 Content-Type: text/plain; format=flowed Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Jonas, On Mon, 12 Oct 2020, Jonas Karlman wrote: > Hi, > > On 2020-10-12 22:59, Adrian Ratiu wrote: >> Dear all, This series introduces a regmap infrastructure for >> the Hantro driver which is used to compensate for different >> HW-revision register layouts. To justify it h264 decoding >> capability is added for newer VC8000 chips. This is a gradual >> conversion to the new infra - a complete conversion would have >> been very big and I do not have all the HW yet to test (I'm >> expecting a RK3399 shipment next week though ;). I think >> converting the h264 decoder provides a nice blueprint for how >> the other codecs can be converted and enabled for different HW >> revisions. The end goal of this is to make the driver more >> generic and eliminate entirely custom boilerplate like `struct >> hantro_reg` or headers with core-specific bit manipulations >> like `hantro_g1_regs.h` and instead rely on the well-tested >> albeit more verbose regmap subsytem. To give just two >> examples of bugs which are easily discovered by using more >> verbose regmap fields (very easy to compare with the >> datasheets) instead of relying on bit-magic tricks: >> G1_REG_DEC_CTRL3_INIT_QP(x) was off-by-1 and the wrong >> .clk_gate bit was set in hantro_postproc.c. Anyway, this >> series also extends the MMIO regmap API to allow relaxed writes >> for the theoretical reason that avoiding unnecessary >> membarriers leads to less CPU usage and small improvements to >> battery life. However, in practice I could not measure >> differences between relaxed/non-relaxed IO, so I'm on the fence >> whether to keep or remove the relaxed calls. What I could >> masure is the performance impact of adding more sub-reg field >> acesses: a constant ~ 20 microsecond bump per G1 h264 >> frame. This is acceptable considering the total time to decode >> a frame takes three orders of magnitude longer, >> i.e. miliseconds ranges, depending on the frame size and >> bitstream params, so it is an acceptable trade-off to have a >> more generic driver. > > In the RK3399 variant all fields use completely different > positions so in order to make the driver fully generic all > around 145 sub-reg fields used for h264 needs to be converted, > see [1] for a quick generation of field mappings used for h264 > decoding. > > Any indication on how the performance will be impacted with 145 > fields compared to around 20 fields used in this series? I'm aware of the RK3399 bigger layout divergence and have some commits converting more of the reg fields, but not all that is required for h264 on rk3399. I haven't seen a huge perf degradation but more measurements are needed, basically it depends on how often we go from writing a reg once to multiple times due to splitting. I tried some benchmarks using regmap caching (both the default backends provided by the regmap subsystem, and a custom one I wrote) but they were not helping, perhaps if we had more fields then that would have more of an impact. (btw some good news is I'm having a RK3399 SoC in the mail for an unrelated project and expect to receive it soon :D) IMO there will always be a trade-off between optimizing the driver to squeeze the most perf out of the HW, eg optimize reg writes at low microsec level (which I think here is unnecessary) and making it more generic to support more HW. In this case a fundamental question we need to ask ourselves is if the RK3399 "looks like another/different-enough HW" due to its bigger reg shuffling to warrant a separate driver or driver-within-a-driver architecture instead trying to bring it into the fold with the others, possibly degrading perf for everyone. I guess we'll have to see some benchmark numbers and an actual h264 implementation before deciding how to proceed with RK3399. > > Another issue with RK3399 variant is that some fields use > different position depending on the codec used, e.g. two > dec_ref_frames in [2]. Should we use codec specific field maps? > or any other suggestion on how we can handle such case? Yes, codec specific fields would be one idea, but I'd try to avoid it if possible to avoid unnecessary field definitions. The regmap field API and config we currently use are just a flat structs (see hantro_regmap.[h|c]) but it doesn't have to be like that. Maybe we could organize it a bit better and in the future have some codec-level configs going on due to the regmap subsystem allowing de-coupling of the API (struct regmap_field) from the reg defs/configs (struct reg_field). That is just an idea of the top of my head :) Will have to think a bit more about how to handle that specific use case in the future. Thanks! > > [1] https://github.com/Kwiboo/rockchip-vpu-regtool/commit/8b88d94d2ed966c7d88d9a735c0c97368eb6c92d > [2] https://github.com/Kwiboo/rockchip-vpu-regtool/blob/master/rk3399_dec_regs.c#L1065 > [3] https://github.com/Kwiboo/rockchip-vpu-regtool/commit/9498326296445a9ce153b585cc48e0cea05d3c93 > > Best regards, > Jonas > >> >> This has been tested on next-20201009 with imx8mq for G1 and an SoC with >> VC8000 which has not yet been added (hopefuly support lands soon). >> >> Kind regards, >> Adrian >> >> Adrian Ratiu (18): >> media: hantro: document all int reg bits up to vc8000 >> media: hantro: make consistent use of decimal register notation >> media: hantro: make G1_REG_SOFT_RESET Rockchip specific >> media: hantro: add reset controller support >> media: hantro: prepare clocks before variant inits are run >> media: hantro: imx8mq: simplify ctrlblk reset logic >> regmap: mmio: add config option to allow relaxed MMIO accesses >> media: hantro: add initial MMIO regmap infrastructure >> media: hantro: default regmap to relaxed MMIO >> media: hantro: convert G1 h264 decoder to regmap fields >> media: hantro: convert G1 postproc to regmap >> media: hantro: add VC8000D h264 decoding >> media: hantro: add VC8000D postproc support >> media: hantro: make PP enablement logic a bit smarter >> media: hantro: add user-selectable, platform-selectable H264 High10 >> media: hantro: rename h264_dec as it's not G1 specific anymore >> media: hantro: add dump registers debug option before decode start >> media: hantro: document encoder reg fields >> >> drivers/base/regmap/regmap-mmio.c | 34 +- >> drivers/staging/media/hantro/Makefile | 3 +- >> drivers/staging/media/hantro/hantro.h | 79 +- >> drivers/staging/media/hantro/hantro_drv.c | 41 +- >> drivers/staging/media/hantro/hantro_g1_regs.h | 92 +- >> ...hantro_g1_h264_dec.c => hantro_h264_dec.c} | 237 +++- >> drivers/staging/media/hantro/hantro_hw.h | 23 +- >> .../staging/media/hantro/hantro_postproc.c | 144 ++- >> drivers/staging/media/hantro/hantro_regmap.c | 1015 +++++++++++++++++ >> drivers/staging/media/hantro/hantro_regmap.h | 295 +++++ >> drivers/staging/media/hantro/hantro_v4l2.c | 3 +- >> drivers/staging/media/hantro/imx8m_vpu_hw.c | 75 +- >> drivers/staging/media/hantro/rk3288_vpu_hw.c | 5 +- >> include/linux/regmap.h | 5 + >> 14 files changed, 1795 insertions(+), 256 deletions(-) >> rename drivers/staging/media/hantro/{hantro_g1_h264_dec.c => hantro_h264_dec.c} (58%) >> create mode 100644 drivers/staging/media/hantro/hantro_regmap.c >> create mode 100644 drivers/staging/media/hantro/hantro_regmap.h >> From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.7 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73D68C433E7 for ; Tue, 13 Oct 2020 06:47:45 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D747C20714 for ; Tue, 13 Oct 2020 06:47:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="nTp4gtof" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D747C20714 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=collabora.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-rockchip-bounces+linux-rockchip=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:Message-ID:Date:References:In-Reply-To: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=oHDnnTNEZCnsastwNeWIM2NSlC0ODWLLJ+F8pIlSN5Y=; b=nTp4gtofR0n7U+7ugWPoO5YMJ 1Mu7ljn9fnCCRnLHtcoMVMpHYKNPcsz5A0luZ3C0vrwzsAmHJT8BmBRVrvlImMq1zEreECy2Wqasy pPxIeFVsQnhn5F4AF9OC0X0ixneV6iZ7O+oBm++JeCPbTDkPRw1R1DbjWQSwo8EB+YpZSkXwl/nhQ n4bitbX8Es2rOG6srxBHsAJlvyHmurIAUn2tNXgqZvZ5UBPbYQbzOwB7GgU8fOKgC+vj+x+Gsk3sk zZ3i6OmKNtF9KM8Ir013f75IpLCxegptKAfoWbaBXGcbxCMqkxPMjSZZXH04EfmVNnmWGsmitrUbb SosGF7DNQ==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kSE5w-0008GS-Vr; Tue, 13 Oct 2020 06:47:37 +0000 Received: from bhuna.collabora.co.uk ([46.235.227.227]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kSE5t-0008FR-NQ for linux-rockchip@lists.infradead.org; Tue, 13 Oct 2020 06:47:35 +0000 Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: aratiu) with ESMTPSA id 7652F1F44489 From: Adrian Ratiu To: Jonas Karlman Subject: Re: [PATCH 00/18] Add Hantro regmap and VC8000 h264 decode support In-Reply-To: <97e84bb5-972a-091d-a159-6ab1151f17ab@kwiboo.se> References: <20201012205957.889185-1-adrian.ratiu@collabora.com> <97e84bb5-972a-091d-a159-6ab1151f17ab@kwiboo.se> Date: Tue, 13 Oct 2020 09:48:29 +0300 Message-ID: <87r1q28vuq.fsf@iwork.i-did-not-set--mail-host-address--so-tickle-me> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201013_024734_205411_FD57CFE8 X-CRM114-Status: GOOD ( 31.93 ) X-BeenThere: linux-rockchip@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Upstream kernel work for Rockchip platforms List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fruehberger Peter , kernel@collabora.com, Daniel Vetter , linux-kernel@vger.kernel.org, Tomasz Figa , linux-rockchip@lists.infradead.org, kuhanh.murugasen.krishnan@intel.com, Philipp Zabel , Mauro Carvalho Chehab , Ezequiel Garcia , linux-media@vger.kernel.org Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "Linux-rockchip" Errors-To: linux-rockchip-bounces+linux-rockchip=archiver.kernel.org@lists.infradead.org Hi Jonas, On Mon, 12 Oct 2020, Jonas Karlman wrote: > Hi, > > On 2020-10-12 22:59, Adrian Ratiu wrote: >> Dear all, This series introduces a regmap infrastructure for >> the Hantro driver which is used to compensate for different >> HW-revision register layouts. To justify it h264 decoding >> capability is added for newer VC8000 chips. This is a gradual >> conversion to the new infra - a complete conversion would have >> been very big and I do not have all the HW yet to test (I'm >> expecting a RK3399 shipment next week though ;). I think >> converting the h264 decoder provides a nice blueprint for how >> the other codecs can be converted and enabled for different HW >> revisions. The end goal of this is to make the driver more >> generic and eliminate entirely custom boilerplate like `struct >> hantro_reg` or headers with core-specific bit manipulations >> like `hantro_g1_regs.h` and instead rely on the well-tested >> albeit more verbose regmap subsytem. To give just two >> examples of bugs which are easily discovered by using more >> verbose regmap fields (very easy to compare with the >> datasheets) instead of relying on bit-magic tricks: >> G1_REG_DEC_CTRL3_INIT_QP(x) was off-by-1 and the wrong >> .clk_gate bit was set in hantro_postproc.c. Anyway, this >> series also extends the MMIO regmap API to allow relaxed writes >> for the theoretical reason that avoiding unnecessary >> membarriers leads to less CPU usage and small improvements to >> battery life. However, in practice I could not measure >> differences between relaxed/non-relaxed IO, so I'm on the fence >> whether to keep or remove the relaxed calls. What I could >> masure is the performance impact of adding more sub-reg field >> acesses: a constant ~ 20 microsecond bump per G1 h264 >> frame. This is acceptable considering the total time to decode >> a frame takes three orders of magnitude longer, >> i.e. miliseconds ranges, depending on the frame size and >> bitstream params, so it is an acceptable trade-off to have a >> more generic driver. > > In the RK3399 variant all fields use completely different > positions so in order to make the driver fully generic all > around 145 sub-reg fields used for h264 needs to be converted, > see [1] for a quick generation of field mappings used for h264 > decoding. > > Any indication on how the performance will be impacted with 145 > fields compared to around 20 fields used in this series? I'm aware of the RK3399 bigger layout divergence and have some commits converting more of the reg fields, but not all that is required for h264 on rk3399. I haven't seen a huge perf degradation but more measurements are needed, basically it depends on how often we go from writing a reg once to multiple times due to splitting. I tried some benchmarks using regmap caching (both the default backends provided by the regmap subsystem, and a custom one I wrote) but they were not helping, perhaps if we had more fields then that would have more of an impact. (btw some good news is I'm having a RK3399 SoC in the mail for an unrelated project and expect to receive it soon :D) IMO there will always be a trade-off between optimizing the driver to squeeze the most perf out of the HW, eg optimize reg writes at low microsec level (which I think here is unnecessary) and making it more generic to support more HW. In this case a fundamental question we need to ask ourselves is if the RK3399 "looks like another/different-enough HW" due to its bigger reg shuffling to warrant a separate driver or driver-within-a-driver architecture instead trying to bring it into the fold with the others, possibly degrading perf for everyone. I guess we'll have to see some benchmark numbers and an actual h264 implementation before deciding how to proceed with RK3399. > > Another issue with RK3399 variant is that some fields use > different position depending on the codec used, e.g. two > dec_ref_frames in [2]. Should we use codec specific field maps? > or any other suggestion on how we can handle such case? Yes, codec specific fields would be one idea, but I'd try to avoid it if possible to avoid unnecessary field definitions. The regmap field API and config we currently use are just a flat structs (see hantro_regmap.[h|c]) but it doesn't have to be like that. Maybe we could organize it a bit better and in the future have some codec-level configs going on due to the regmap subsystem allowing de-coupling of the API (struct regmap_field) from the reg defs/configs (struct reg_field). That is just an idea of the top of my head :) Will have to think a bit more about how to handle that specific use case in the future. Thanks! > > [1] https://github.com/Kwiboo/rockchip-vpu-regtool/commit/8b88d94d2ed966c7d88d9a735c0c97368eb6c92d > [2] https://github.com/Kwiboo/rockchip-vpu-regtool/blob/master/rk3399_dec_regs.c#L1065 > [3] https://github.com/Kwiboo/rockchip-vpu-regtool/commit/9498326296445a9ce153b585cc48e0cea05d3c93 > > Best regards, > Jonas > >> >> This has been tested on next-20201009 with imx8mq for G1 and an SoC with >> VC8000 which has not yet been added (hopefuly support lands soon). >> >> Kind regards, >> Adrian >> >> Adrian Ratiu (18): >> media: hantro: document all int reg bits up to vc8000 >> media: hantro: make consistent use of decimal register notation >> media: hantro: make G1_REG_SOFT_RESET Rockchip specific >> media: hantro: add reset controller support >> media: hantro: prepare clocks before variant inits are run >> media: hantro: imx8mq: simplify ctrlblk reset logic >> regmap: mmio: add config option to allow relaxed MMIO accesses >> media: hantro: add initial MMIO regmap infrastructure >> media: hantro: default regmap to relaxed MMIO >> media: hantro: convert G1 h264 decoder to regmap fields >> media: hantro: convert G1 postproc to regmap >> media: hantro: add VC8000D h264 decoding >> media: hantro: add VC8000D postproc support >> media: hantro: make PP enablement logic a bit smarter >> media: hantro: add user-selectable, platform-selectable H264 High10 >> media: hantro: rename h264_dec as it's not G1 specific anymore >> media: hantro: add dump registers debug option before decode start >> media: hantro: document encoder reg fields >> >> drivers/base/regmap/regmap-mmio.c | 34 +- >> drivers/staging/media/hantro/Makefile | 3 +- >> drivers/staging/media/hantro/hantro.h | 79 +- >> drivers/staging/media/hantro/hantro_drv.c | 41 +- >> drivers/staging/media/hantro/hantro_g1_regs.h | 92 +- >> ...hantro_g1_h264_dec.c => hantro_h264_dec.c} | 237 +++- >> drivers/staging/media/hantro/hantro_hw.h | 23 +- >> .../staging/media/hantro/hantro_postproc.c | 144 ++- >> drivers/staging/media/hantro/hantro_regmap.c | 1015 +++++++++++++++++ >> drivers/staging/media/hantro/hantro_regmap.h | 295 +++++ >> drivers/staging/media/hantro/hantro_v4l2.c | 3 +- >> drivers/staging/media/hantro/imx8m_vpu_hw.c | 75 +- >> drivers/staging/media/hantro/rk3288_vpu_hw.c | 5 +- >> include/linux/regmap.h | 5 + >> 14 files changed, 1795 insertions(+), 256 deletions(-) >> rename drivers/staging/media/hantro/{hantro_g1_h264_dec.c => hantro_h264_dec.c} (58%) >> create mode 100644 drivers/staging/media/hantro/hantro_regmap.c >> create mode 100644 drivers/staging/media/hantro/hantro_regmap.h >> _______________________________________________ Linux-rockchip mailing list Linux-rockchip@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-rockchip