All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Price <steven.price@arm.com>
To: Alexey Sheplyakov <asheplyakov@basealt.ru>,
	Alyssa Rosenzweig <alyssa@collabora.com>
Cc: "Vadim V . Vlasov" <vadim.vlasov@elpitech.ru>,
	Tomeu Vizoso <tomeu.vizoso@collabora.com>,
	dri-devel@lists.freedesktop.org,
	Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Subject: Re: [PATCH 2/2] drm/panfrost: adjusted job affinity for dual core group GPUs
Date: Mon, 10 Jan 2022 14:14:12 +0000	[thread overview]
Message-ID: <fca08e3c-c239-efdd-6ae5-132d84637d1f@arm.com> (raw)
In-Reply-To: <c94bafaa-3029-fea3-b623-1961b4b5e4cf@basealt.ru>

On 24/12/2021 08:56, Alexey Sheplyakov wrote:
> Hi,
> 
> On 23.12.2021 18:11, Alyssa Rosenzweig wrote:
>>> The kernel driver itself can't guess which jobs need a such a strict
>>> affinity, so setting proper requirements is the responsibility of
>>> the userspace (Mesa). However the userspace is not smart enough [yet].
>>> Therefore this patch applies the above affinity rule to all jobs on
>>> dual core group GPUs.
>>
>> What does Mesa need to do for this to work "properly"?
> 
> I don't know.
> The blob restricts affinity of jobs with JD_REQ_COHERENT_GROUP requirement.
> In theory jobs without such a requirement can run on any core, but in
> practice all jobs in slots 0, 1 are assigned to core group 0 (with workloads
> I've run - i.e. weston, firefox, glmark2, perhaps it's also SoC dependent).
> So I've forced all jobs in slots 0, 1 to core group 0. Surprisingly this
> (and memory attributes adjustment) appeared to be enough to get panfrost
> working with T628 (on some SoCs). Without these patches GPU locks up in
> a few seconds.

Let me fill in a few details here.

The T628 is pretty unique in that it has two core groups, i.e. more than
one L2 cache. Previous designs (i.e. T604) didn't have enough cores to
require a second core group, and later designs with sufficient cores
have coherent L2 caches so act as a single core group (although the
hardware has multiple L2s it only reports a single one as they act as if
a single cache).

Note that technically the T608, T658 and T678 also exist and have this
problem - but I don't believe any products were produced with these (so
unless you're in ARM with a very unusual FPGA they can be ignored).

The blob/kbase handle this situation with a new flag
JD_REQ_COHERENT_GROUP which specifies that the affinity of a job must
land on a single (coherent) core group, and
JD_REQ_SPECIFIC_COHERENT_GROUP which allows user space to target a
specific group.

In theory fragment shading can be performed over all cores (because a
fragment shader job doesn't need coherency between threads), so doesn't
need the JD_REQ_COHERENT_GROUP flag, vertex shading however requires to
be run on the same core group as the tiler (which always lives in core
group 0).

Of course there are various 'tricks' that can happen even within a
fragment shader which might require coherency.

So the expected sequence is that vertex+tiling is restricted to core
group 0, and fragment shading can be run over all cores. Although there
can be issues with performance doing this naïvely because the Job
Manager doesn't necessarily share the GPUs cores fairly between vertex
and fragment jobs. Also note that a cache flush is needed between
running the vertex+tiling and the fragment job to ensure that the extra
core group is coherent - this can be expensive, so it may not be worth
using the second core group in some situations. I'm not sure what logic
the Blob uses for that.

Finally there's GPU compute (i.e. OpenCL): here coherency is usually
required, but there's often more information about the amount of
coherency needed. In this case it is possible to run different job
chains on each core group. This is the only situation there slot 2 is
used, and is the reason for the JS_REQ_SPECIFIC_COHERENT_GROUP flag.
It's also a nightmare for scheduling as the hardware gets upset if the
affinity masks for slot 1 and slot 2 overlap.

>> What are the limitations of the approach implemented here?
> 
> Suboptimal performance.
> 
> 1) There might be job chains which don't care about affinity
>    (I haven't seen any of these yet on systems I've got).

You are effectively throwing away half the cores if everything is pinned
to core group 0, so I'm pretty sure the Blob manages to run (some)
fragment jobs without the COHERENT_GROUP flag.

But equally this is a reasonable first step for the kernel driver - we
can make the GPU look like ever other GPU by pretending the second core
group doesn't exist.

> 2) There might be dual core group GPUs which don't need such a strict affinity.
>    (I haven't seen any dual core group T[78]xx GPUs yet. This doesn't mean such
>     GPUs don't exist).

They should all be a single core group (fully coherent L2s).

>> If we need to extend it down the line with a UABI change, what would that look like?
> 
> I have no idea. And I'm not sure if it's worth the effort (since most jobs
> end up on core group 0 anyway).

Whether it's worth the effort depends on whether anyone really cares
about getting the full performance out of this particular GPU.

At this stage I think the main UABI change would be to add the opposite
flag to kbase, (e.g. "PANFROST_JD_DOESNT_NEED_COHERENCY_ON_GPU"[1]) to
opt-in to allowing the job to run across all cores.

The second change would be to allow compute jobs to be run on the second
core group, so another flag: PANFROST_RUN_ON_SECOND_CORE_GROUP.

But clearly there's little point adding such flags until someone steps
up to do the Mesa work.

Steve

[1] Bike-shedding the name might be required ;)

  reply	other threads:[~2022-01-10 14:14 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-23 11:06 [PATCH 0/2] drm/panfrost: partial support of T628 GPUs asheplyakov
2021-12-23 11:06 ` [PATCH 1/2] drm/panfrost: mmu: improved memory attributes asheplyakov
2021-12-24 12:56   ` Robin Murphy
2022-01-12 18:18     ` Alexey Sheplyakov
2021-12-23 11:06 ` [PATCH 2/2] drm/panfrost: adjusted job affinity for dual core group GPUs asheplyakov
2021-12-23 14:11   ` Alyssa Rosenzweig
2021-12-24  8:56     ` Alexey Sheplyakov
2022-01-10 14:14       ` Steven Price [this message]
2022-01-10 17:42         ` Alyssa Rosenzweig
2022-01-12 17:03           ` Steven Price
2022-01-13 10:01             ` Alexey Sheplyakov
2022-01-13 11:22               ` Steven Price
2022-01-13 16:06           ` Alexey Sheplyakov
2021-12-23 14:14 ` [PATCH 0/2] drm/panfrost: partial support of T628 GPUs Alyssa Rosenzweig
2022-01-15 16:06 ` [PATCH v2] drm/panfrost: initial dual core group GPUs support Alexey Sheplyakov
2022-01-17  9:42   ` Steven Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fca08e3c-c239-efdd-6ae5-132d84637d1f@arm.com \
    --to=steven.price@arm.com \
    --cc=alyssa.rosenzweig@collabora.com \
    --cc=alyssa@collabora.com \
    --cc=asheplyakov@basealt.ru \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=tomeu.vizoso@collabora.com \
    --cc=vadim.vlasov@elpitech.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.