All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/panfrost: Add "compute shader only" hint
@ 2019-08-06 19:52 Alyssa Rosenzweig
  2019-08-06 20:01 ` Rob Herring
  0 siblings, 1 reply; 7+ messages in thread
From: Alyssa Rosenzweig @ 2019-08-06 19:52 UTC (permalink / raw)
  To: dri-devel; +Cc: Alyssa Rosenzweig, tomeu.vizoso, steven.price

Midgard contains two job slots capable of compute jobs, JS1 and JS2. As
an optimization, it is preferable to schedule compute-only workloads to
JS2, although compute jobs are functionally able to be scheduled to JS1
(barring an obscure errata).

Accordingly, we reserve a compute-only hint in the UABI to allow future
compute-equipped userspace and future compute-optimized kernelspace to
hint towards JS2. At the moment, the hint is ignored, but this is
harmless.

I have verified compute jobs can be successfully submitted and executed
with an appropriate userspace against a 5.1 kernel without this hint.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
---
 drivers/gpu/drm/panfrost/TODO           | 3 ---
 drivers/gpu/drm/panfrost/panfrost_job.c | 4 +++-
 include/uapi/drm/panfrost_drm.h         | 8 ++++++++
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/TODO b/drivers/gpu/drm/panfrost/TODO
index c2e44add3..8c367a5a6 100644
--- a/drivers/gpu/drm/panfrost/TODO
+++ b/drivers/gpu/drm/panfrost/TODO
@@ -20,8 +20,5 @@
 
 - Support for madvise and a shrinker.
 
-- Compute job support. So called 'compute only' jobs need to be plumbed up to
-  userspace.
-
 - Performance counter support. (Boris)
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 9bb9260d9..3e1385a3b 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -108,7 +108,9 @@ static int panfrost_job_get_slot(struct panfrost_job *job)
 	if (job->requirements & PANFROST_JD_REQ_FS)
 		return 0;
 
-/* Not exposed to userspace yet */
+        /* Compute jobs can run on JS1, so compute-only jobs can run with this
+         * simple implementations (useful for backwards compatibility). As an
+         * optimization, we will eventually want to schedule to JS2. */
 #if 0
 	if (job->requirements & PANFROST_JD_REQ_ONLY_COMPUTE) {
 		if ((job->requirements & PANFROST_JD_REQ_CORE_GRP_MASK) &&
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index b5d370638..25acde34b 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -38,6 +38,14 @@ extern "C" {
 #define DRM_IOCTL_PANFROST_PERFCNT_DUMP		DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_PERFCNT_DUMP, struct drm_panfrost_perfcnt_dump)
 
 #define PANFROST_JD_REQ_FS (1 << 0)
+
+/* Optional (mandatory for T600 r0p0 15dev0 due to errata #8987) hint to the
+ * kernel that the commands only contain JOB_TYPE_COMPUTE jobs, without
+ * vertex/tiler/fragment jobs. If present, the kernel may use this as an
+ * optimization, but it is not strictly necessary. */
+
+#define PANFROST_JD_REQ_ONLY_COMPUTE (1 << 1)
+
 /**
  * struct drm_panfrost_submit - ioctl argument for submitting commands to the 3D
  * engine.
-- 
2.20.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/panfrost: Add "compute shader only" hint
  2019-08-06 19:52 [PATCH] drm/panfrost: Add "compute shader only" hint Alyssa Rosenzweig
@ 2019-08-06 20:01 ` Rob Herring
  2019-08-06 20:11   ` Alyssa Rosenzweig
  0 siblings, 1 reply; 7+ messages in thread
From: Rob Herring @ 2019-08-06 20:01 UTC (permalink / raw)
  To: Alyssa Rosenzweig; +Cc: Tomeu Vizoso, dri-devel, Steven Price

On Tue, Aug 6, 2019 at 1:53 PM Alyssa Rosenzweig
<alyssa.rosenzweig@collabora.com> wrote:
>
> Midgard contains two job slots capable of compute jobs, JS1 and JS2. As
> an optimization, it is preferable to schedule compute-only workloads to
> JS2, although compute jobs are functionally able to be scheduled to JS1
> (barring an obscure errata).
>
> Accordingly, we reserve a compute-only hint in the UABI to allow future
> compute-equipped userspace and future compute-optimized kernelspace to
> hint towards JS2. At the moment, the hint is ignored, but this is
> harmless.

Why don't we just go ahead and enable JS2?

> I have verified compute jobs can be successfully submitted and executed
> with an appropriate userspace against a 5.1 kernel without this hint.
>
> Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> ---
>  drivers/gpu/drm/panfrost/TODO           | 3 ---
>  drivers/gpu/drm/panfrost/panfrost_job.c | 4 +++-
>  include/uapi/drm/panfrost_drm.h         | 8 ++++++++
>  3 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/panfrost/TODO b/drivers/gpu/drm/panfrost/TODO
> index c2e44add3..8c367a5a6 100644
> --- a/drivers/gpu/drm/panfrost/TODO
> +++ b/drivers/gpu/drm/panfrost/TODO
> @@ -20,8 +20,5 @@
>
>  - Support for madvise and a shrinker.
>
> -- Compute job support. So called 'compute only' jobs need to be plumbed up to
> -  userspace.
> -
>  - Performance counter support. (Boris)
>
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 9bb9260d9..3e1385a3b 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -108,7 +108,9 @@ static int panfrost_job_get_slot(struct panfrost_job *job)
>         if (job->requirements & PANFROST_JD_REQ_FS)
>                 return 0;
>
> -/* Not exposed to userspace yet */
> +        /* Compute jobs can run on JS1, so compute-only jobs can run with this
> +         * simple implementations (useful for backwards compatibility). As an
> +         * optimization, we will eventually want to schedule to JS2. */
>  #if 0
>         if (job->requirements & PANFROST_JD_REQ_ONLY_COMPUTE) {
>                 if ((job->requirements & PANFROST_JD_REQ_CORE_GRP_MASK) &&
> diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
> index b5d370638..25acde34b 100644
> --- a/include/uapi/drm/panfrost_drm.h
> +++ b/include/uapi/drm/panfrost_drm.h
> @@ -38,6 +38,14 @@ extern "C" {
>  #define DRM_IOCTL_PANFROST_PERFCNT_DUMP                DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_PERFCNT_DUMP, struct drm_panfrost_perfcnt_dump)
>
>  #define PANFROST_JD_REQ_FS (1 << 0)
> +
> +/* Optional (mandatory for T600 r0p0 15dev0 due to errata #8987) hint to the
> + * kernel that the commands only contain JOB_TYPE_COMPUTE jobs, without
> + * vertex/tiler/fragment jobs. If present, the kernel may use this as an
> + * optimization, but it is not strictly necessary. */
> +
> +#define PANFROST_JD_REQ_ONLY_COMPUTE (1 << 1)
> +
>  /**
>   * struct drm_panfrost_submit - ioctl argument for submitting commands to the 3D
>   * engine.
> --
> 2.20.1
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/panfrost: Add "compute shader only" hint
  2019-08-06 20:01 ` Rob Herring
@ 2019-08-06 20:11   ` Alyssa Rosenzweig
  2019-08-06 20:21     ` Rob Herring
  0 siblings, 1 reply; 7+ messages in thread
From: Alyssa Rosenzweig @ 2019-08-06 20:11 UTC (permalink / raw)
  To: Rob Herring; +Cc: Tomeu Vizoso, dri-devel, Steven Price


[-- Attachment #1.1: Type: text/plain, Size: 282 bytes --]

> Why don't we just go ahead and enable JS2?

It's not obvious to me when it actually needs to be enabled. Besides the
errata, it's only when... device_nr=1 for a compute-only job in kbase?

I'm afraid I don't know nearly enough about how kbase plumbs CL to grok
the signifiance...

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/panfrost: Add "compute shader only" hint
  2019-08-06 20:11   ` Alyssa Rosenzweig
@ 2019-08-06 20:21     ` Rob Herring
  2019-08-06 20:25       ` Alyssa Rosenzweig
  0 siblings, 1 reply; 7+ messages in thread
From: Rob Herring @ 2019-08-06 20:21 UTC (permalink / raw)
  To: Alyssa Rosenzweig; +Cc: Tomeu Vizoso, dri-devel, Steven Price

On Tue, Aug 6, 2019 at 2:11 PM Alyssa Rosenzweig
<alyssa.rosenzweig@collabora.com> wrote:
>
> > Why don't we just go ahead and enable JS2?
>
> It's not obvious to me when it actually needs to be enabled. Besides the
> errata, it's only when... device_nr=1 for a compute-only job in kbase?
>
> I'm afraid I don't know nearly enough about how kbase plumbs CL to grok
> the signifiance...

Figuring out the nr_core_groups was the complicated part of this as I
recall. Seems like we should at least figure out if we (or will need)
PANFROST_JD_REQ_CORE_GRP_MASK added to the UAPI as well.

Rob
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/panfrost: Add "compute shader only" hint
  2019-08-06 20:21     ` Rob Herring
@ 2019-08-06 20:25       ` Alyssa Rosenzweig
  2019-08-07 11:11         ` Steven Price
  0 siblings, 1 reply; 7+ messages in thread
From: Alyssa Rosenzweig @ 2019-08-06 20:25 UTC (permalink / raw)
  To: Rob Herring; +Cc: Tomeu Vizoso, dri-devel, Steven Price


[-- Attachment #1.1: Type: text/plain, Size: 573 bytes --]

> > It's not obvious to me when it actually needs to be enabled. Besides the
> > errata, it's only when... device_nr=1 for a compute-only job in kbase?
> >
> > I'm afraid I don't know nearly enough about how kbase plumbs CL to grok
> > the signifiance...
> 
> Figuring out the nr_core_groups was the complicated part of this as I
> recall. Seems like we should at least figure out if we (or will need)
> PANFROST_JD_REQ_CORE_GRP_MASK added to the UAPI as well.

I suspect this is something OpenCL/Vulkan specific. Hopefully Stephen
can shine some light here :)

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/panfrost: Add "compute shader only" hint
  2019-08-06 20:25       ` Alyssa Rosenzweig
@ 2019-08-07 11:11         ` Steven Price
  2019-08-07 14:19           ` Alyssa Rosenzweig
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Price @ 2019-08-07 11:11 UTC (permalink / raw)
  To: Alyssa Rosenzweig, Rob Herring; +Cc: dri-devel, Tomeu Vizoso

On 06/08/2019 21:25, Alyssa Rosenzweig wrote:
>>> It's not obvious to me when it actually needs to be enabled. Besides the
>>> errata, it's only when... device_nr=1 for a compute-only job in kbase?
>>>
>>> I'm afraid I don't know nearly enough about how kbase plumbs CL to grok
>>> the signifiance...
>>
>> Figuring out the nr_core_groups was the complicated part of this as I
>> recall. Seems like we should at least figure out if we (or will need)
>> PANFROST_JD_REQ_CORE_GRP_MASK added to the UAPI as well.
> 
> I suspect this is something OpenCL/Vulkan specific. Hopefully Stephen
> can shine some light here :)

*switches torch on*...

Ok, this is actually a lot more complex than it first appears, so I'll
have to start with a bit of background:

Mali Midgard GPUs have 2 "thread creators" per core. There is one for
fragment threads and one for 'compute' threads (vertex work is
considered compute in this context).

The cores have a set number of threads (e.g. 256 for the early GPUs) and
they effectively get divided between fragment threads and compute
threads - it's effectively round-robin between the thread creators, but
I think there's some extra 'magic' in the hardware.

The idea is that for graphics you can run fragment and vertex workloads
at the same time on the core and make better use of the hardware (i.e.
fragment units are using the texturing hardware, which vertex is using
the ALU).

However two things stand in the way of this working nicely:

1. Core groups - this is a lovely design feature for hardware engineers,
but a pain for software. Basically you can have multiple sets of cores.
The cores in a set are coherent with each other, but they are not
coherent between sets. This is because each core group has it's own L2
cache.

To complicate things even further the tiler notationally exists within
core group 0, so is only coherent with that core group. This means that
if you have a vertex/tiler job chain it has to be run entirely within
core group 0 - or you will need to insert appropriate cache flushes. For
fragment work you generally don't need coherency between threads so this
isn't a problem and you can run over all the cores in all groups.

For compute (i.e. OpenCL) you probably care about coherency in a work
group, but you may have several independent jobs that can run in
parallel. In this case you can run some (coherent) work on core group 0,
and some other (independent but coherent) work on core group 1.

2. Starvation. For compute work it's common to insert barriers requiring
all threads to reach the same point in the shader before any thread can
progress. If your workgroup size (i.e. the number of threads which
synchronise on the barrier) is the same as the number of threads in the
core this means that all threads have to be allocated to compute before
the barrier can complete.

However if the compute thread creator is competing with the fragment
thread creator this can lead to the situation where compute threads are
idle waiting for fragment threads to complete.

This implies that running compute workloads with barriers at the same
time as fragment work on the same cores isn't very optimal.

</end of background>

kbase has several flags:

 * BASE_JD_REQ_COHERENT_GROUP - the job chain must be run on a coherent
set of cores. I.e. must be restricted to a single core group.

 * BASE_JD_REQ_ONLY_COMPUTE - the job chain is compute jobs and may
contain barriers.

 * BASE_JD_REQ_SPECIFIC_COHERENT_GROUP - we care about being on a
particular core group. device_nr is used to select which (and device_nr
is otherwise ignored)

In practice all this only really matters on the T62x GPU. All other GPUs
have only one core group[1]. So it only really makes sense to use JS2 on
the T62x where you want to use both JS1 and JS2 to run two independent
jobs: one on each core group.

Of course kbase makes all this into a maze of twisty little passages,
all alike! :)

Oh, and there is one hardware workaround (BASE_HW_ISSUE_8987) that uses
JS2. This is to avoid vertex and compute jobs landing on the same slot.
This affects T604 "dev15" only and is because some state was not
properly cleared between jobs.

Steve

[1] There might be multiple L2 caches in hardware, but they are coherent
and are logically a single L2 (only 1 bit set in L2_PRESENT).
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/panfrost: Add "compute shader only" hint
  2019-08-07 11:11         ` Steven Price
@ 2019-08-07 14:19           ` Alyssa Rosenzweig
  0 siblings, 0 replies; 7+ messages in thread
From: Alyssa Rosenzweig @ 2019-08-07 14:19 UTC (permalink / raw)
  To: Steven Price; +Cc: dri-devel, Tomeu Vizoso


[-- Attachment #1.1: Type: text/plain, Size: 1207 bytes --]

Ah-ha! Thank you for the detailed explanation; this makes a lot more
sense now :)

> In practice all this only really matters on the T62x GPU. All other GPUs
> have only one core group[1]. So it only really makes sense to use JS2 on
> the T62x where you want to use both JS1 and JS2 to run two independent
> jobs: one on each core group.
> 
> Oh, and there is one hardware workaround (BASE_HW_ISSUE_8987) that uses
> JS2. This is to avoid vertex and compute jobs landing on the same slot.
> This affects T604 "dev15" only and is because some state was not
> properly cleared between jobs.

On my end at least, our work with compute has been focused on
RK3399/T860. In userspace, it's tempting to limit T6xx/T720 to e.g.
OpenGL ES 2.0 (between this/other errata, and of course fake-MRT), at
least for the interim.

For completeness, we'll maybe need to deal with this Eventually, but it
doesn't sound like this needs to be handled right now? (It sounds like
there's no changes needed for compute on T700+, since #8987 doesn't
apply and there's only one core group.). In that case, maybe it's not
worth it to start mucking with the UABI for a feature that may or may
not ever be used.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-08-07 14:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-06 19:52 [PATCH] drm/panfrost: Add "compute shader only" hint Alyssa Rosenzweig
2019-08-06 20:01 ` Rob Herring
2019-08-06 20:11   ` Alyssa Rosenzweig
2019-08-06 20:21     ` Rob Herring
2019-08-06 20:25       ` Alyssa Rosenzweig
2019-08-07 11:11         ` Steven Price
2019-08-07 14:19           ` Alyssa Rosenzweig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.