All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Steven Price <steven.price@arm.com>
Cc: Neil Armstrong <narmstrong@baylibre.com>,
	Emil Velikov <emil.l.velikov@gmail.com>,
	dri-devel@lists.freedesktop.org, Rob Herring <robh+dt@kernel.org>,
	Mark Janes <mark.a.janes@intel.com>,
	kernel@collabora.com, Alyssa Rosenzweig <alyssa@rosenzweig.io>
Subject: Re: [PATCH 0/3] drm/panfrost: Expose HW counters to userspace
Date: Tue, 30 Apr 2019 14:42:38 +0200	[thread overview]
Message-ID: <20190430144238.49963521@collabora.com> (raw)
In-Reply-To: <ba54e655-6316-8d36-dfd1-c5df418cee3a@arm.com>

+Rob, Eric, Mark and more

Hi,

On Fri, 5 Apr 2019 16:20:45 +0100
Steven Price <steven.price@arm.com> wrote:

> On 04/04/2019 16:20, Boris Brezillon wrote:
> > Hello,
> > 
> > This patch adds new ioctls to expose GPU counters to userspace.
> > These will be used by the mesa driver (should be posted soon).
> > 
> > A few words about the implementation: I followed the VC4/Etnaviv model
> > where perf counters are retrieved on a per-job basis. This allows one
> > to have get accurate results when there are users using the GPU
> > concurrently.
> > AFAICT, the mali kbase is using a different approach where several
> > users can register a performance monitor but with no way to have fined
> > grained control over what job/GPU-context to track.  
> 
> mali_kbase submits overlapping jobs. The jobs on slot 0 and slot 1 can
> be from different contexts (address spaces), and mali_kbase also fully
> uses the _NEXT registers. So there can be a job from one context
> executing on slot 0 and a job from a different context waiting in the
> _NEXT registers. (And the same for slot 1). This means that there's no
> (visible) gap between the first job finishing and the second job
> starting. Early versions of the driver even had a throttle to avoid
> interrupt storms (see JOB_IRQ_THROTTLE) which would further delay the
> IRQ - but thankfully that's gone.
> 
> The upshot is that it's basically impossible to measure "per-job"
> counters when running at full speed. Because multiple jobs are running
> and the driver doesn't actually know when one ends and the next starts.
> 
> Since one of the primary use cases is to draw pretty graphs of the
> system load [1], this "per-job" information isn't all that relevant (and
> minimal performance overhead is important). And if you want to monitor
> just one application it is usually easiest to ensure that it is the only
> thing running.
> 
> [1]
> https://developer.arm.com/tools-and-software/embedded/arm-development-studio/components/streamline-performance-analyzer
> 
> > This design choice comes at a cost: every time the perfmon context
> > changes (the perfmon context is the list of currently active
> > perfmons), the driver has to add a fence to prevent new jobs from
> > corrupting counters that will be dumped by previous jobs.
> > 
> > Let me know if that's an issue and if you think we should approach
> > things differently.  
> 
> It depends what you expect to do with the counters. Per-job counters are
> certainly useful sometimes. But serialising all jobs can mess up the
> thing you are trying to measure the performance of.

I finally found some time to work on v2 this morning, and it turns out
implementing global perf monitors as done in mali_kbase means rewriting
almost everything (apart from the perfcnt layout stuff). I'm not against
doing that, but I'd like to be sure this is really what we want.

Eric, Rob, any opinion on that? Is it acceptable to expose counters
through the pipe_query/AMD_perfmon interface if we don't have this
job (or at least draw call) granularity? If not, should we keep the
solution I'm proposing here to make sure counters values are accurate,
or should we expose perf counters through a non-standard API?

BTW, I'd like to remind you that serialization (waiting on the perfcnt
fence) only happens if we have a perfmon context change between 2
consecutive jobs, which only happens when
* 2 applications are running in // and at least one of them is
  monitored
* or when userspace decides to stop monitoring things and dump counter
  values

That means that, for the usual case (all perfmons disabled), there's
almost zero overhead (just a few more checks in the submit job code).
That also means that, if we ever decide to support global perfmon (perf
monitors that track things globably) on top of the current approach,
and only global perfmons are enabled, things won't be serialized as
with the per-job approach, because everyone will share the same perfmon
ctx (the same set of perfmons).

I'd appreciate any feedback from people that have used perf counters
(or implemented a way to dump them) on their platform.

Thanks,

Boris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  parent reply	other threads:[~2019-04-30 12:42 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-04 15:20 [PATCH 0/3] drm/panfrost: Expose HW counters to userspace Boris Brezillon
2019-04-04 15:20 ` [PATCH 1/3] drm/panfrost: Move gpu_{write, read}() macros to panfrost_regs.h Boris Brezillon
2019-04-04 15:20 ` [PATCH 2/3] drm/panfrost: Expose HW counters to userspace Boris Brezillon
2019-04-04 15:41   ` Alyssa Rosenzweig
2019-04-04 18:17     ` Boris Brezillon
2019-04-04 22:40       ` Alyssa Rosenzweig
2019-04-05 15:36     ` Eric Anholt
2019-04-05 16:17       ` Alyssa Rosenzweig
2019-04-04 15:20 ` [PATCH 3/3] panfrost/drm: Define T860 perf counters Boris Brezillon
2019-04-05 15:20 ` [PATCH 0/3] drm/panfrost: Expose HW counters to userspace Steven Price
2019-04-05 16:33   ` Alyssa Rosenzweig
2019-04-05 17:40     ` Boris Brezillon
2019-04-05 17:43       ` Alyssa Rosenzweig
2019-04-30 12:42   ` Boris Brezillon [this message]
2019-04-30 13:10     ` Rob Clark
2019-04-30 15:49       ` Jordan Crouse
2019-05-12 13:40         ` Boris Brezillon
2019-05-13 15:00           ` Jordan Crouse
2019-05-01 17:12     ` Eric Anholt
2019-05-12 13:17       ` Boris Brezillon
2019-05-11 22:32     ` Alyssa Rosenzweig
2019-05-12 13:38       ` Boris Brezillon
2019-05-13 12:48         ` Steven Price
2019-05-13 13:39           ` Boris Brezillon
2019-05-13 14:13             ` Steven Price
2019-05-13 14:56             ` Alyssa Rosenzweig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190430144238.49963521@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=alyssa@rosenzweig.io \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=emil.l.velikov@gmail.com \
    --cc=kernel@collabora.com \
    --cc=mark.a.janes@intel.com \
    --cc=narmstrong@baylibre.com \
    --cc=robh+dt@kernel.org \
    --cc=steven.price@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.