From: Daniel Vetter <daniel@ffwll.ch>
To: Boris Brezillon <boris.brezillon@bootlin.com>
Cc: "Hollingworth, Gordon" <gordon@raspberrypi.org>,
dri-devel@lists.freedesktop.org,
Eben Upton <eben@raspberrypi.org>
Subject: Re: [RFC PATCH] drm/vc4: Add a load tracker to prevent HVS underflow errors
Date: Tue, 16 Oct 2018 14:57:43 +0200 [thread overview]
Message-ID: <20181016125743.GZ31561@phenom.ffwll.local> (raw)
In-Reply-To: <20181016094045.23021-1-boris.brezillon@bootlin.com>
On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:
> The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> meet the requested framerate. The problem is, the HVS and memory bus
> bandwidths are limited, and if we don't take these limitations into
> account we might end up with HVS underflow errors.
>
> This patch is trying to model the per-plane HVS and memory bus bandwidth
> consumption and take a decision at atomic_check() time whether the
> estimated load will fit in the HVS and membus budget.
>
> Note that we take an extra margin on the memory bus consumption to let
> the system run smoothly when other blocks are doing heavy use of the
> memory bus. Same goes for the HVS limit, except the margin is smaller in
> this case, since the HVS is not used by external components.
>
> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> ---
> This logic has been validated using a simple shell script and
> some instrumentation in the VC4 driver:
>
> - capture underflow errors at the HVS level and expose a debugfs file
> reporting those errors
> - add debugfs files to expose when atomic_check fails because of the
> HVS or membus load limitation or when it fails for other reasons
>
> The branch containing those modification is available here [1], and the
> script (which is internally using modetest) is here [2] (please note
> that I'm bad at writing shell scripts :-)).
>
> Note that those modification tend to over-estimate the load, and thus
> reject setups that might have previously worked, so we might want to
> adjust the limits to avoid that.
>
> [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test
Any interest in using igt to test this stuff? We have at least a bunch of
tests already in there that try all kinds of plane setups. And we use
those to hunt for underruns on i915 hw.
Wrt underrun reporting: On i915 we just dump them into dmesg at the error
level, using DRM_ERROR, plus a tracepoint. See e.g.
intel_pch_fifo_underrun_irq_handler(). If there's interest we could
perhaps extract this into something common, similar to what was done with
crc support already.
> ---
> drivers/gpu/drm/vc4/vc4_drv.h | 11 +++++
> drivers/gpu/drm/vc4/vc4_kms.c | 104 +++++++++++++++++++++++++++++++++++++++-
> drivers/gpu/drm/vc4/vc4_plane.c | 60 +++++++++++++++++++++++
> 3 files changed, 174 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h
> index bd6ef1f31822..48f6ee5ceda3 100644
> --- a/drivers/gpu/drm/vc4/vc4_drv.h
> +++ b/drivers/gpu/drm/vc4/vc4_drv.h
> @@ -200,6 +200,7 @@ struct vc4_dev {
>
> struct drm_modeset_lock ctm_state_lock;
> struct drm_private_obj ctm_manager;
> + struct drm_private_obj load_tracker;
> };
>
> static inline struct vc4_dev *
> @@ -369,6 +370,16 @@ struct vc4_plane_state {
> * to enable background color fill.
> */
> bool needs_bg_fill;
> +
> + /* Load of this plane on the HVS block. The load is expressed in HVS
> + * cycles/sec.
> + */
> + u64 hvs_load;
> +
> + /* Memory bandwidth needed for this plane. This is expressed in
> + * bytes/sec.
> + */
> + u64 membus_load;
> };
>
> static inline struct vc4_plane_state *
> diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c
> index 127468785f74..4c65e6013bd3 100644
> --- a/drivers/gpu/drm/vc4/vc4_kms.c
> +++ b/drivers/gpu/drm/vc4/vc4_kms.c
> @@ -34,6 +34,18 @@ static struct vc4_ctm_state *to_vc4_ctm_state(struct drm_private_state *priv)
> return container_of(priv, struct vc4_ctm_state, base);
> }
>
> +struct vc4_load_tracker_state {
> + struct drm_private_state base;
> + u64 hvs_load;
> + u64 membus_load;
> +};
> +
> +static struct vc4_load_tracker_state *
> +to_vc4_load_tracker_state(struct drm_private_state *priv)
> +{
> + return container_of(priv, struct vc4_load_tracker_state, base);
> +}
> +
> static struct vc4_ctm_state *vc4_get_ctm_state(struct drm_atomic_state *state,
> struct drm_private_obj *manager)
> {
> @@ -379,6 +391,81 @@ vc4_ctm_atomic_check(struct drm_device *dev, struct drm_atomic_state *state)
> return 0;
> }
>
> +static int vc4_load_tracker_atomic_check(struct drm_atomic_state *state)
> +{
> + struct drm_plane_state *old_plane_state, *new_plane_state;
> + struct vc4_dev *vc4 = to_vc4_dev(state->dev);
> + struct vc4_load_tracker_state *load_state;
> + struct drm_private_state *priv_state;
> + struct drm_plane *plane;
> + int ret, i;
> +
You're missing the modeset locking for vc4->load_tracker. See the
kerneldoc for drm_atomic_get_private_obj_state(). Probably a good time to
implement the locking refactoring idea I have and just implement a per
private_obj lock, and remove all the ad-hoc locking from all the callers?
Would definitely simplify the code, and avoid "oops no locking" issues
like here.
Cheers, Daniel
> + priv_state = drm_atomic_get_private_obj_state(state,
> + &vc4->load_tracker);
> + if (IS_ERR(priv_state))
> + return PTR_ERR(priv_state);
> +
> + load_state = to_vc4_load_tracker_state(priv_state);
> + for_each_oldnew_plane_in_state(state, plane, old_plane_state,
> + new_plane_state, i) {
> + struct vc4_plane_state *vc4_plane_state;
> +
> + if (old_plane_state->fb && old_plane_state->crtc) {
> + vc4_plane_state = to_vc4_plane_state(old_plane_state);
> + load_state->membus_load -= vc4_plane_state->membus_load;
> + load_state->hvs_load -= vc4_plane_state->hvs_load;
> + }
> +
> + if (new_plane_state->fb && new_plane_state->crtc) {
> + vc4_plane_state = to_vc4_plane_state(new_plane_state);
> + load_state->membus_load += vc4_plane_state->membus_load;
> + load_state->hvs_load += vc4_plane_state->hvs_load;
> + }
> + }
> +
> + /* The abolsute limit is 2Gbyte/sec, but let's take a margin to let
> + * the system work when other blocks are accessing the memory.
> + */
> + if (load_state->membus_load > SZ_1G + SZ_512M)
> + return -ENOSPC;
> +
> + /* HVS clock is supposed to run @ 250Mhz, let's take a margin and
> + * consider the maximum number of cycles is 240M.
> + */
> + if (load_state->hvs_load > 240000000ULL)
> + return -ENOSPC;
EINVAL is for atomic_check failures. ENOSPC isn't one of the permitted
errno codes, see the kernel-doc for &drm_mode_config_funcs.atomic_check.
atomic_commit has a different set of permissible errno codes.
We should probably enforce this in drm core ...
-Daniel
> +
> + return 0;
> +}
> +
> +static struct drm_private_state *
> +vc4_load_tracker_duplicate_state(struct drm_private_obj *obj)
> +{
> + struct vc4_load_tracker_state *state;
> +
> + state = kmemdup(obj->state, sizeof(*state), GFP_KERNEL);
> + if (!state)
> + return NULL;
> +
> + __drm_atomic_helper_private_obj_duplicate_state(obj, &state->base);
> +
> + return &state->base;
> +}
> +
> +static void vc4_load_tracker_destroy_state(struct drm_private_obj *obj,
> + struct drm_private_state *state)
> +{
> + struct vc4_load_tracker_state *load_state;
> +
> + load_state = to_vc4_load_tracker_state(state);
> + kfree(load_state);
> +}
> +
> +static const struct drm_private_state_funcs vc4_load_tracker_state_funcs = {
> + .atomic_duplicate_state = vc4_load_tracker_duplicate_state,
> + .atomic_destroy_state = vc4_load_tracker_destroy_state,
> +};
> +
> static int
> vc4_atomic_check(struct drm_device *dev, struct drm_atomic_state *state)
> {
> @@ -388,7 +475,11 @@ vc4_atomic_check(struct drm_device *dev, struct drm_atomic_state *state)
> if (ret < 0)
> return ret;
>
> - return drm_atomic_helper_check(dev, state);
> + ret = drm_atomic_helper_check(dev, state);
> + if (ret)
> + return ret;
> +
> + return vc4_load_tracker_atomic_check(state);
> }
>
> static const struct drm_mode_config_funcs vc4_mode_funcs = {
> @@ -401,6 +492,7 @@ int vc4_kms_load(struct drm_device *dev)
> {
> struct vc4_dev *vc4 = to_vc4_dev(dev);
> struct vc4_ctm_state *ctm_state;
> + struct vc4_load_tracker_state *load_state;
> int ret;
>
> sema_init(&vc4->async_modeset, 1);
> @@ -426,9 +518,19 @@ int vc4_kms_load(struct drm_device *dev)
> ctm_state = kzalloc(sizeof(*ctm_state), GFP_KERNEL);
> if (!ctm_state)
> return -ENOMEM;
> +
> drm_atomic_private_obj_init(&vc4->ctm_manager, &ctm_state->base,
> &vc4_ctm_state_funcs);
>
> + load_state = kzalloc(sizeof(*load_state), GFP_KERNEL);
> + if (!load_state) {
> + drm_atomic_private_obj_fini(&vc4->ctm_manager);
> + return -ENOMEM;
> + }
> +
> + drm_atomic_private_obj_init(&vc4->load_tracker, &load_state->base,
> + &vc4_load_tracker_state_funcs);
> +
> drm_mode_config_reset(dev);
>
> drm_kms_helper_poll_init(dev);
> diff --git a/drivers/gpu/drm/vc4/vc4_plane.c b/drivers/gpu/drm/vc4/vc4_plane.c
> index 60d5ad19cedd..f47d38383a2f 100644
> --- a/drivers/gpu/drm/vc4/vc4_plane.c
> +++ b/drivers/gpu/drm/vc4/vc4_plane.c
> @@ -455,6 +455,64 @@ static void vc4_write_scaling_parameters(struct drm_plane_state *state,
> }
> }
>
> +static void vc4_plane_calc_load(struct drm_plane_state *state)
> +{
> + unsigned int hvs_load_shift, vrefresh, i;
> + struct drm_framebuffer *fb = state->fb;
> + struct vc4_plane_state *vc4_state;
> + struct drm_crtc_state *crtc_state;
> + unsigned int vscale_factor;
> +
> + vc4_state = to_vc4_plane_state(state);
> + crtc_state = drm_atomic_get_existing_crtc_state(state->state,
> + state->crtc);
> + vrefresh = drm_mode_vrefresh(&crtc_state->adjusted_mode);
> +
> + /* The HVS is able to process 2 pixels/cycle when scaling the source,
> + * 4 pixels/cycle otherwise.
> + * Alpha blending step seems to be pipelined and it's always operating
> + * at 4 pixels/cycle, so the limiting aspect here seems to be the
> + * scaler block.
> + * HVS load is expressed in clk-cycles/sec (AKA Hz).
> + */
> + if (vc4_state->x_scaling[0] != VC4_SCALING_NONE ||
> + vc4_state->x_scaling[1] != VC4_SCALING_NONE ||
> + vc4_state->y_scaling[0] != VC4_SCALING_NONE ||
> + vc4_state->y_scaling[1] != VC4_SCALING_NONE)
> + hvs_load_shift = 1;
> + else
> + hvs_load_shift = 2;
> +
> + vc4_state->membus_load = 0;
> + vc4_state->hvs_load = 0;
> + for (i = 0; i < fb->format->num_planes; i++) {
> + unsigned long pixels_load;
> +
> + /* Even if the bandwidth/plane required for a single frame is
> + *
> + * vc4_state->src_w[i] * vc4_state->src_h[i] * cpp * vrefresh
> + *
> + * when downscaling, we have to read more pixels per line in
> + * the time frame reserved for a single line, so the bandwidth
> + * demand can be punctually higher. To account for that, we
> + * calculate the down-scaling factor and multiply the plane
> + * load by this number. We're likely over-estimating the read
> + * demand, but that's better than under-estimating it.
> + */
> + vscale_factor = DIV_ROUND_UP(vc4_state->src_h[i],
> + vc4_state->crtc_h);
> + pixels_load = vc4_state->src_w[i] * vc4_state->src_h[i] *
> + vscale_factor;
> +
> + vc4_state->membus_load += fb->format->cpp[i] * pixels_load;
> + vc4_state->hvs_load += pixels_load;
> + }
> +
> + vc4_state->hvs_load *= vrefresh;
> + vc4_state->hvs_load >>= hvs_load_shift;
> + vc4_state->membus_load *= vrefresh;
> +}
> +
> /* Writes out a full display list for an active plane to the plane's
> * private dlist state.
> */
> @@ -722,6 +780,8 @@ static int vc4_plane_mode_set(struct drm_plane *plane,
> vc4_state->needs_bg_fill = fb->format->has_alpha || !covers_screen ||
> state->alpha != DRM_BLEND_ALPHA_OPAQUE;
>
> + vc4_plane_calc_load(state);
> +
> return 0;
> }
>
> --
> 2.14.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
next prev parent reply other threads:[~2018-10-16 12:57 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-16 9:40 [RFC PATCH] drm/vc4: Add a load tracker to prevent HVS underflow errors Boris Brezillon
2018-10-16 12:57 ` Daniel Vetter [this message]
2018-10-16 13:10 ` Boris Brezillon
2018-10-16 13:12 ` Daniel Vetter
2018-10-16 13:28 ` Ville Syrjälä
2018-10-16 16:41 ` Daniel Vetter
2018-10-16 17:39 ` Ville Syrjälä
2018-10-16 17:51 ` Daniel Vetter
2018-10-17 13:45 ` Boris Brezillon
2018-10-23 7:55 ` Boris Brezillon
2018-10-23 13:44 ` Daniel Vetter
2018-10-25 8:09 ` Boris Brezillon
2018-10-25 9:33 ` Daniel Vetter
2018-10-25 9:41 ` Boris Brezillon
2018-10-26 13:30 ` Daniel Vetter
2018-10-26 13:57 ` Boris Brezillon
2018-10-26 14:26 ` Daniel Vetter
2018-10-26 14:52 ` Boris Brezillon
2018-10-26 15:10 ` Ville Syrjälä
2018-10-29 8:06 ` Daniel Vetter
2018-10-29 8:41 ` Boris Brezillon
2018-10-29 9:03 ` Daniel Vetter
2018-10-29 9:10 ` Boris Brezillon
2018-10-29 13:48 ` Daniel Vetter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181016125743.GZ31561@phenom.ffwll.local \
--to=daniel@ffwll.ch \
--cc=boris.brezillon@bootlin.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=eben@raspberrypi.org \
--cc=gordon@raspberrypi.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).