All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] drm/panfrost: partial support of T628 GPUs
@ 2021-12-23 11:06 asheplyakov
  2021-12-23 11:06 ` [PATCH 1/2] drm/panfrost: mmu: improved memory attributes asheplyakov
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: asheplyakov @ 2021-12-23 11:06 UTC (permalink / raw)
  To: dri-devel
  Cc: Tomeu Vizoso, Vadim V . Vlasov, Steven Price, Alexey Sheplyakov,
	Alyssa Rosenzweig

From: Alexey Sheplyakov <asheplyakov@basealt.ru>

Hello,

these patches address some problems which prevent panfrost from
working with Mali T628 GPUs:

- incomplete/incorrect mmu memory attributes
- wrong jobs affinity on dual core group GPUs

With these patches panfrost is able to drive mali T628 (r1p0) GPU
on some armv8 SoCs (in particular BE-M1000).
r0 GPUs are still not supported [yet] (tested with Exynos 5422).

Alexey Sheplyakov (2):
  drm/panfrost: mmu: improved memory attributes
  drm/panfrost: adjusted job affinity for dual core group GPUs

 drivers/gpu/drm/panfrost/panfrost_device.h |  7 ++++
 drivers/gpu/drm/panfrost/panfrost_gpu.c    | 45 ++++++++++++++++++++++
 drivers/gpu/drm/panfrost/panfrost_job.c    | 14 ++++++-
 drivers/gpu/drm/panfrost/panfrost_mmu.c    |  3 --
 drivers/iommu/io-pgtable-arm.c             | 16 ++++++--
 5 files changed, 76 insertions(+), 9 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] drm/panfrost: mmu: improved memory attributes
  2021-12-23 11:06 [PATCH 0/2] drm/panfrost: partial support of T628 GPUs asheplyakov
@ 2021-12-23 11:06 ` asheplyakov
  2021-12-24 12:56   ` Robin Murphy
  2021-12-23 11:06 ` [PATCH 2/2] drm/panfrost: adjusted job affinity for dual core group GPUs asheplyakov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: asheplyakov @ 2021-12-23 11:06 UTC (permalink / raw)
  To: dri-devel
  Cc: Tomeu Vizoso, Vadim V . Vlasov, Steven Price, Alexey Sheplyakov,
	Alyssa Rosenzweig

From: Alexey Sheplyakov <asheplyakov@basealt.ru>

T62x/T60x GPUs are known to not work with panfrost as of now.
One of the reasons is wrong/incomplete memory attributes which
the panfrost driver sets in the page tables:

- MEMATTR_IMP_DEF should be 0x48ULL, not 0x88ULL.
  0x88ULL is MEMATTR_OUTER_IMP_DEF
- MEMATTR_FORCE_CACHE_ALL and MEMATTR_OUTER_WA are missing.

T72x and newer GPUs work just fine with such incomplete/wrong memory
attributes. However T62x are quite picky and quickly lock up.

To avoid the problem set the same memory attributes (in LPAE page
tables) as mali_kbase.

The patch has been tested (for regressions) with T860 GPU (rk3399 SoC).
At the first glance (using GNOME desktop, running glmark) it does
not cause any changes for this GPU.

Note: this patch is necessary, but *not* enough to get panfrost
working with T62x

Signed-off-by: Alexey Sheplyakov <asheplyakov@basealt.ru>
Signed-off-by: Vadim V. Vlasov <vadim.vlasov@elpitech.ru>

Cc: Rob Herring <robh@kernel.org>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Vadim V. Vlasov <vadim.vlasov@elpitech.ru>
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  3 ---
 drivers/iommu/io-pgtable-arm.c          | 16 ++++++++++++----
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 39562f2d11a4..2f4f8a17bc82 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -133,9 +133,6 @@ static void panfrost_mmu_enable(struct panfrost_device *pfdev, struct panfrost_m
 	mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), lower_32_bits(transtab));
 	mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), upper_32_bits(transtab));
 
-	/* Need to revisit mem attrs.
-	 * NC is the default, Mali driver is inner WT.
-	 */
 	mmu_write(pfdev, AS_MEMATTR_LO(as_nr), lower_32_bits(memattr));
 	mmu_write(pfdev, AS_MEMATTR_HI(as_nr), upper_32_bits(memattr));
 
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index dd9e47189d0d..15b39c337e20 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -122,13 +122,17 @@
 #define ARM_LPAE_MAIR_ATTR_IDX_CACHE	1
 #define ARM_LPAE_MAIR_ATTR_IDX_DEV	2
 #define ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE	3
+#define ARM_LPAE_MAIR_ATTR_IDX_OUTER_WA 4
 
 #define ARM_MALI_LPAE_TTBR_ADRMODE_TABLE (3u << 0)
 #define ARM_MALI_LPAE_TTBR_READ_INNER	BIT(2)
 #define ARM_MALI_LPAE_TTBR_SHARE_OUTER	BIT(4)
 
-#define ARM_MALI_LPAE_MEMATTR_IMP_DEF	0x88ULL
-#define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x8DULL
+#define ARM_MALI_LPAE_MEMATTR_IMP_DEF	0x48ULL
+#define ARM_MALI_LPAE_MEMATTR_FORCE_CACHE_ALL 0x4FULL
+#define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x4DULL
+#define ARM_MALI_LPAE_MEMATTR_OUTER_IMP_DEF 0x88ULL
+#define ARM_MALI_LPAE_MEMATTR_OUTER_WA 0x8DULL
 
 #define APPLE_DART_PTE_PROT_NO_WRITE (1<<7)
 #define APPLE_DART_PTE_PROT_NO_READ (1<<8)
@@ -1080,10 +1084,14 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 	cfg->arm_mali_lpae_cfg.memattr =
 		(ARM_MALI_LPAE_MEMATTR_IMP_DEF
 		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_NC)) |
+		(ARM_MALI_LPAE_MEMATTR_FORCE_CACHE_ALL
+		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
 		(ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC
 		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
-		(ARM_MALI_LPAE_MEMATTR_IMP_DEF
-		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV));
+		(ARM_MALI_LPAE_MEMATTR_OUTER_IMP_DEF
+		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)) |
+		(ARM_MALI_LPAE_MEMATTR_OUTER_WA
+		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_OUTER_WA));
 
 	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL,
 					   cfg);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] drm/panfrost: adjusted job affinity for dual core group GPUs
  2021-12-23 11:06 [PATCH 0/2] drm/panfrost: partial support of T628 GPUs asheplyakov
  2021-12-23 11:06 ` [PATCH 1/2] drm/panfrost: mmu: improved memory attributes asheplyakov
@ 2021-12-23 11:06 ` asheplyakov
  2021-12-23 14:11   ` Alyssa Rosenzweig
  2021-12-23 14:14 ` [PATCH 0/2] drm/panfrost: partial support of T628 GPUs Alyssa Rosenzweig
  2022-01-15 16:06 ` [PATCH v2] drm/panfrost: initial dual core group GPUs support Alexey Sheplyakov
  3 siblings, 1 reply; 16+ messages in thread
From: asheplyakov @ 2021-12-23 11:06 UTC (permalink / raw)
  To: dri-devel
  Cc: Tomeu Vizoso, Vadim V . Vlasov, Steven Price, Alexey Sheplyakov,
	Alyssa Rosenzweig

From: Alexey Sheplyakov <asheplyakov@basealt.ru>

On a dual core group GPU it's not OK to run some jobs on any core.
For such jobs the following affinity rule must be satisfied: job
slot 2 must target the core group 1, and slots 0, 1 - the core
group 0, respectively.

The kernel driver itself can't guess which jobs need a such a strict
affinity, so setting proper requirements is the responsibility of
the userspace (Mesa). However the userspace is not smart enough [yet].
Therefore this patch applies the above affinity rule to all jobs on
dual core group GPUs.

With this patch panfrost is able to drive T628 (r1p0) GPU on some
armv8 SoCs (in particular BE-M1000).

The patch has been also tested with T860 (rk3399 SoC). At the first
glance (using GNOME desktop, running glmark) the patch does not cause
any changes with T860 GPU.

Signed-off-by: Alexey Sheplyakov <asheplyakov@basealt.ru>
Signed-off-by: Vadim V. Vlasov <vadim.vlasov@elpitech.ru>

Cc: Rob Herring <robh@kernel.org>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Vadim V. Vlasov <vadim.vlasov@elpitech.ru>
---
 drivers/gpu/drm/panfrost/panfrost_device.h |  7 ++++
 drivers/gpu/drm/panfrost/panfrost_gpu.c    | 45 ++++++++++++++++++++++
 drivers/gpu/drm/panfrost/panfrost_job.c    | 14 ++++++-
 3 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
index 8b25278f34c8..aa43b1413c8a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -23,6 +23,12 @@ struct panfrost_perfcnt;
 
 #define NUM_JOB_SLOTS 3
 #define MAX_PM_DOMAINS 3
+#define MAX_COHERENT_GROUPS 2
+
+struct panfrost_coherent_group {
+	u64 core_mask;
+	unsigned int nr_cores;
+};
 
 struct panfrost_features {
 	u16 id;
@@ -54,6 +60,7 @@ struct panfrost_features {
 
 	unsigned long hw_features[64 / BITS_PER_LONG];
 	unsigned long hw_issues[64 / BITS_PER_LONG];
+	struct panfrost_coherent_group core_groups[MAX_COHERENT_GROUPS];
 };
 
 /*
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index bbe628b306ee..a02cb160cb4f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -209,6 +209,41 @@ static const struct panfrost_model gpu_models[] = {
 		GPU_REV(g31, 1, 0)),
 };
 
+static void panfrost_decode_coherent_groups(struct panfrost_features *features)
+{
+	struct panfrost_coherent_group *current_group;
+	u64 group_present;
+	u64 group_mask;
+	u64 first_set, first_set_prev;
+	u32 nr_group = 0;
+
+	if (features->mem_features & GROUPS_L2_COHERENT)
+		group_present = features->l2_present;
+	else
+		group_present = features->shader_present;
+
+	current_group = features->core_groups;
+	first_set = group_present & ~(group_present - 1);
+
+	while (group_present != 0 && nr_group < MAX_COHERENT_GROUPS) {
+		group_present -= first_set;
+		first_set_prev = first_set;
+
+		first_set = group_present & ~(group_present - 1);
+		group_mask = (first_set - 1) & ~(first_set_prev - 1);
+		current_group->core_mask = group_mask & features->shader_present;
+		current_group->nr_cores = hweight64(current_group->core_mask);
+
+		nr_group++;
+		current_group++;
+	}
+
+	if (group_present) {
+		pr_warn("%s: too many coherent groups, expected <= %d",
+			__func__, MAX_COHERENT_GROUPS);
+	}
+}
+
 static void panfrost_gpu_init_features(struct panfrost_device *pfdev)
 {
 	u32 gpu_id, num_js, major, minor, status, rev;
@@ -217,6 +252,7 @@ static void panfrost_gpu_init_features(struct panfrost_device *pfdev)
 	u64 hw_issues = hw_issues_all;
 	const struct panfrost_model *model;
 	int i;
+	unsigned long core_mask[64/BITS_PER_LONG];
 
 	pfdev->features.l2_features = gpu_read(pfdev, GPU_L2_FEATURES);
 	pfdev->features.core_features = gpu_read(pfdev, GPU_CORE_FEATURES);
@@ -296,6 +332,7 @@ static void panfrost_gpu_init_features(struct panfrost_device *pfdev)
 
 	bitmap_from_u64(pfdev->features.hw_features, hw_feat);
 	bitmap_from_u64(pfdev->features.hw_issues, hw_issues);
+	panfrost_decode_coherent_groups(&pfdev->features);
 
 	dev_info(pfdev->dev, "mali-%s id 0x%x major 0x%x minor 0x%x status 0x%x",
 		 name, gpu_id, major, minor, status);
@@ -314,6 +351,14 @@ static void panfrost_gpu_init_features(struct panfrost_device *pfdev)
 
 	dev_info(pfdev->dev, "shader_present=0x%0llx l2_present=0x%0llx",
 		 pfdev->features.shader_present, pfdev->features.l2_present);
+	dev_info(pfdev->dev, "%u core groups\n", pfdev->features.nr_core_groups);
+	for (i = 0; i < (int)pfdev->features.nr_core_groups; i++) {
+		bitmap_from_u64(core_mask, pfdev->features.core_groups[i].core_mask);
+		dev_info(pfdev->dev, "core group %u: cores: %64pbl (total %u)\n",
+			 i,
+			 core_mask,
+			 pfdev->features.core_groups[i].nr_cores);
+	}
 }
 
 void panfrost_gpu_power_on(struct panfrost_device *pfdev)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 908d79520853..454515c1e2f1 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -132,11 +132,21 @@ static void panfrost_job_write_affinity(struct panfrost_device *pfdev,
 
 	/*
 	 * Use all cores for now.
-	 * Eventually we may need to support tiler only jobs and h/w with
-	 * multiple (2) coherent core groups
+	 * Eventually we may need to support tiler only jobs
 	 */
 	affinity = pfdev->features.shader_present;
 
+	/* Userspace does not set the requirements properly yet.
+	 * Adjust affinity of all jobs on dual core group GPUs
+	 */
+	if (pfdev->features.nr_core_groups > 1) {
+		if (js == 2)
+			affinity &= pfdev->features.core_groups[1].core_mask;
+		else
+			affinity &= pfdev->features.core_groups[0].core_mask;
+		dev_dbg(pfdev->dev, "js: %d, affinity: %llxu\n", js, affinity);
+	}
+
 	job_write(pfdev, JS_AFFINITY_NEXT_LO(js), lower_32_bits(affinity));
 	job_write(pfdev, JS_AFFINITY_NEXT_HI(js), upper_32_bits(affinity));
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/panfrost: adjusted job affinity for dual core group GPUs
  2021-12-23 11:06 ` [PATCH 2/2] drm/panfrost: adjusted job affinity for dual core group GPUs asheplyakov
@ 2021-12-23 14:11   ` Alyssa Rosenzweig
  2021-12-24  8:56     ` Alexey Sheplyakov
  0 siblings, 1 reply; 16+ messages in thread
From: Alyssa Rosenzweig @ 2021-12-23 14:11 UTC (permalink / raw)
  To: asheplyakov
  Cc: Tomeu Vizoso, Vadim V . Vlasov, dri-devel, Steven Price,
	Alyssa Rosenzweig

> The kernel driver itself can't guess which jobs need a such a strict
> affinity, so setting proper requirements is the responsibility of
> the userspace (Mesa). However the userspace is not smart enough [yet].
> Therefore this patch applies the above affinity rule to all jobs on
> dual core group GPUs.

What does Mesa need to do for this to work "properly"? What are the
limitations of the approach implemented here? If we need to extend it
down the line with a UABI change, what would that look like?

Thanks,

Alyssa

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] drm/panfrost: partial support of T628 GPUs
  2021-12-23 11:06 [PATCH 0/2] drm/panfrost: partial support of T628 GPUs asheplyakov
  2021-12-23 11:06 ` [PATCH 1/2] drm/panfrost: mmu: improved memory attributes asheplyakov
  2021-12-23 11:06 ` [PATCH 2/2] drm/panfrost: adjusted job affinity for dual core group GPUs asheplyakov
@ 2021-12-23 14:14 ` Alyssa Rosenzweig
  2022-01-15 16:06 ` [PATCH v2] drm/panfrost: initial dual core group GPUs support Alexey Sheplyakov
  3 siblings, 0 replies; 16+ messages in thread
From: Alyssa Rosenzweig @ 2021-12-23 14:14 UTC (permalink / raw)
  To: asheplyakov
  Cc: Tomeu Vizoso, Vadim V . Vlasov, dri-devel, Steven Price,
	Alyssa Rosenzweig

> With these patches panfrost is able to drive mali T628 (r1p0) GPU
> on some armv8 SoCs (in particular BE-M1000).
> r0 GPUs are still not supported [yet] (tested with Exynos 5422).

What's needed for r0?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/panfrost: adjusted job affinity for dual core group GPUs
  2021-12-23 14:11   ` Alyssa Rosenzweig
@ 2021-12-24  8:56     ` Alexey Sheplyakov
  2022-01-10 14:14       ` Steven Price
  0 siblings, 1 reply; 16+ messages in thread
From: Alexey Sheplyakov @ 2021-12-24  8:56 UTC (permalink / raw)
  To: Alyssa Rosenzweig
  Cc: Tomeu Vizoso, Vadim V . Vlasov, dri-devel, Steven Price,
	Alyssa Rosenzweig

Hi,

On 23.12.2021 18:11, Alyssa Rosenzweig wrote:
>> The kernel driver itself can't guess which jobs need a such a strict
>> affinity, so setting proper requirements is the responsibility of
>> the userspace (Mesa). However the userspace is not smart enough [yet].
>> Therefore this patch applies the above affinity rule to all jobs on
>> dual core group GPUs.
> 
> What does Mesa need to do for this to work "properly"?

I don't know.
The blob restricts affinity of jobs with JD_REQ_COHERENT_GROUP requirement.
In theory jobs without such a requirement can run on any core, but in
practice all jobs in slots 0, 1 are assigned to core group 0 (with workloads
I've run - i.e. weston, firefox, glmark2, perhaps it's also SoC dependent).
So I've forced all jobs in slots 0, 1 to core group 0. Surprisingly this
(and memory attributes adjustment) appeared to be enough to get panfrost
working with T628 (on some SoCs). Without these patches GPU locks up in
a few seconds.

> What are the limitations of the approach implemented here?

Suboptimal performance.

1) There might be job chains which don't care about affinity
   (I haven't seen any of these yet on systems I've got).
2) There might be dual core group GPUs which don't need such a strict affinity.
   (I haven't seen any dual core group T[78]xx GPUs yet. This doesn't mean such
    GPUs don't exist).

> If we need to extend it down the line with a UABI change, what would that look like?

I have no idea. And I'm not sure if it's worth the effort (since most jobs
end up on core group 0 anyway).

Best regards,
   Alexey

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] drm/panfrost: mmu: improved memory attributes
  2021-12-23 11:06 ` [PATCH 1/2] drm/panfrost: mmu: improved memory attributes asheplyakov
@ 2021-12-24 12:56   ` Robin Murphy
  2022-01-12 18:18     ` Alexey Sheplyakov
  0 siblings, 1 reply; 16+ messages in thread
From: Robin Murphy @ 2021-12-24 12:56 UTC (permalink / raw)
  To: asheplyakov, dri-devel
  Cc: Vadim V . Vlasov, Alyssa Rosenzweig, Tomeu Vizoso, Steven Price

On 2021-12-23 11:06, asheplyakov@basealt.ru wrote:
> From: Alexey Sheplyakov <asheplyakov@basealt.ru>
> 
> T62x/T60x GPUs are known to not work with panfrost as of now.
> One of the reasons is wrong/incomplete memory attributes which
> the panfrost driver sets in the page tables:
> 
> - MEMATTR_IMP_DEF should be 0x48ULL, not 0x88ULL.
>    0x88ULL is MEMATTR_OUTER_IMP_DEF

I guess the macro could be renamed if anyone's particularly bothered, 
but using the outer-cacheable attribute is deliberate because it is 
necessary for I/O-coherent GPUs to work properly (and should be 
irrelevant for non-coherent integrations). I'm away from my Juno board 
for the next couple of weeks but I'm fairly confident that this change 
as-is will break cache snooping.

> - MEMATTR_FORCE_CACHE_ALL and MEMATTR_OUTER_WA are missing.

They're "missing" because they're not used, and there's currently no 
mechanism by which they *could* be used. Also note that the indices in 
MEMATTR have to line up with the euqivalent LPAE indices for the closest 
match to what the IOMMU API's IOMMU_{CACHE,MMIO} flags represent, so 
moving those around is yet more subtle breakage.

> T72x and newer GPUs work just fine with such incomplete/wrong memory
> attributes. However T62x are quite picky and quickly lock up.
> 
> To avoid the problem set the same memory attributes (in LPAE page
> tables) as mali_kbase.
> 
> The patch has been tested (for regressions) with T860 GPU (rk3399 SoC).
> At the first glance (using GNOME desktop, running glmark) it does
> not cause any changes for this GPU.
> 
> Note: this patch is necessary, but *not* enough to get panfrost
> working with T62x

I'd note that panfrost has been working OK - to the extent that Mesa 
supports its older ISA - on the T624 (single core group) in Arm's Juno 
SoC for over a year now since commit 268af50f38b1.

If you have to force outer non-cacheable to avoid getting translation 
faults and other errors that look like the GPU is inexplicably seeing 
the wrong data, I'd check whether you have the same thing where your 
integration is actually I/O-coherent and you're missing the 
"dma-coherent" property in your DT.

Thanks,
Robin.

> Signed-off-by: Alexey Sheplyakov <asheplyakov@basealt.ru>
> Signed-off-by: Vadim V. Vlasov <vadim.vlasov@elpitech.ru>
> 
> Cc: Rob Herring <robh@kernel.org>
> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> Cc: Steven Price <steven.price@arm.com>
> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> Cc: Vadim V. Vlasov <vadim.vlasov@elpitech.ru>
> ---
>   drivers/gpu/drm/panfrost/panfrost_mmu.c |  3 ---
>   drivers/iommu/io-pgtable-arm.c          | 16 ++++++++++++----
>   2 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index 39562f2d11a4..2f4f8a17bc82 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -133,9 +133,6 @@ static void panfrost_mmu_enable(struct panfrost_device *pfdev, struct panfrost_m
>   	mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), lower_32_bits(transtab));
>   	mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), upper_32_bits(transtab));
>   
> -	/* Need to revisit mem attrs.
> -	 * NC is the default, Mali driver is inner WT.
> -	 */
>   	mmu_write(pfdev, AS_MEMATTR_LO(as_nr), lower_32_bits(memattr));
>   	mmu_write(pfdev, AS_MEMATTR_HI(as_nr), upper_32_bits(memattr));
>   
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index dd9e47189d0d..15b39c337e20 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -122,13 +122,17 @@
>   #define ARM_LPAE_MAIR_ATTR_IDX_CACHE	1
>   #define ARM_LPAE_MAIR_ATTR_IDX_DEV	2
>   #define ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE	3
> +#define ARM_LPAE_MAIR_ATTR_IDX_OUTER_WA 4
>   
>   #define ARM_MALI_LPAE_TTBR_ADRMODE_TABLE (3u << 0)
>   #define ARM_MALI_LPAE_TTBR_READ_INNER	BIT(2)
>   #define ARM_MALI_LPAE_TTBR_SHARE_OUTER	BIT(4)
>   
> -#define ARM_MALI_LPAE_MEMATTR_IMP_DEF	0x88ULL
> -#define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x8DULL
> +#define ARM_MALI_LPAE_MEMATTR_IMP_DEF	0x48ULL
> +#define ARM_MALI_LPAE_MEMATTR_FORCE_CACHE_ALL 0x4FULL
> +#define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x4DULL
> +#define ARM_MALI_LPAE_MEMATTR_OUTER_IMP_DEF 0x88ULL
> +#define ARM_MALI_LPAE_MEMATTR_OUTER_WA 0x8DULL
>   
>   #define APPLE_DART_PTE_PROT_NO_WRITE (1<<7)
>   #define APPLE_DART_PTE_PROT_NO_READ (1<<8)
> @@ -1080,10 +1084,14 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
>   	cfg->arm_mali_lpae_cfg.memattr =
>   		(ARM_MALI_LPAE_MEMATTR_IMP_DEF
>   		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_NC)) |
> +		(ARM_MALI_LPAE_MEMATTR_FORCE_CACHE_ALL
> +		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
>   		(ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC
>   		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
> -		(ARM_MALI_LPAE_MEMATTR_IMP_DEF
> -		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV));
> +		(ARM_MALI_LPAE_MEMATTR_OUTER_IMP_DEF
> +		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)) |
> +		(ARM_MALI_LPAE_MEMATTR_OUTER_WA
> +		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_OUTER_WA));
>   
>   	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL,
>   					   cfg);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/panfrost: adjusted job affinity for dual core group GPUs
  2021-12-24  8:56     ` Alexey Sheplyakov
@ 2022-01-10 14:14       ` Steven Price
  2022-01-10 17:42         ` Alyssa Rosenzweig
  0 siblings, 1 reply; 16+ messages in thread
From: Steven Price @ 2022-01-10 14:14 UTC (permalink / raw)
  To: Alexey Sheplyakov, Alyssa Rosenzweig
  Cc: Vadim V . Vlasov, Tomeu Vizoso, dri-devel, Alyssa Rosenzweig

On 24/12/2021 08:56, Alexey Sheplyakov wrote:
> Hi,
> 
> On 23.12.2021 18:11, Alyssa Rosenzweig wrote:
>>> The kernel driver itself can't guess which jobs need a such a strict
>>> affinity, so setting proper requirements is the responsibility of
>>> the userspace (Mesa). However the userspace is not smart enough [yet].
>>> Therefore this patch applies the above affinity rule to all jobs on
>>> dual core group GPUs.
>>
>> What does Mesa need to do for this to work "properly"?
> 
> I don't know.
> The blob restricts affinity of jobs with JD_REQ_COHERENT_GROUP requirement.
> In theory jobs without such a requirement can run on any core, but in
> practice all jobs in slots 0, 1 are assigned to core group 0 (with workloads
> I've run - i.e. weston, firefox, glmark2, perhaps it's also SoC dependent).
> So I've forced all jobs in slots 0, 1 to core group 0. Surprisingly this
> (and memory attributes adjustment) appeared to be enough to get panfrost
> working with T628 (on some SoCs). Without these patches GPU locks up in
> a few seconds.

Let me fill in a few details here.

The T628 is pretty unique in that it has two core groups, i.e. more than
one L2 cache. Previous designs (i.e. T604) didn't have enough cores to
require a second core group, and later designs with sufficient cores
have coherent L2 caches so act as a single core group (although the
hardware has multiple L2s it only reports a single one as they act as if
a single cache).

Note that technically the T608, T658 and T678 also exist and have this
problem - but I don't believe any products were produced with these (so
unless you're in ARM with a very unusual FPGA they can be ignored).

The blob/kbase handle this situation with a new flag
JD_REQ_COHERENT_GROUP which specifies that the affinity of a job must
land on a single (coherent) core group, and
JD_REQ_SPECIFIC_COHERENT_GROUP which allows user space to target a
specific group.

In theory fragment shading can be performed over all cores (because a
fragment shader job doesn't need coherency between threads), so doesn't
need the JD_REQ_COHERENT_GROUP flag, vertex shading however requires to
be run on the same core group as the tiler (which always lives in core
group 0).

Of course there are various 'tricks' that can happen even within a
fragment shader which might require coherency.

So the expected sequence is that vertex+tiling is restricted to core
group 0, and fragment shading can be run over all cores. Although there
can be issues with performance doing this naïvely because the Job
Manager doesn't necessarily share the GPUs cores fairly between vertex
and fragment jobs. Also note that a cache flush is needed between
running the vertex+tiling and the fragment job to ensure that the extra
core group is coherent - this can be expensive, so it may not be worth
using the second core group in some situations. I'm not sure what logic
the Blob uses for that.

Finally there's GPU compute (i.e. OpenCL): here coherency is usually
required, but there's often more information about the amount of
coherency needed. In this case it is possible to run different job
chains on each core group. This is the only situation there slot 2 is
used, and is the reason for the JS_REQ_SPECIFIC_COHERENT_GROUP flag.
It's also a nightmare for scheduling as the hardware gets upset if the
affinity masks for slot 1 and slot 2 overlap.

>> What are the limitations of the approach implemented here?
> 
> Suboptimal performance.
> 
> 1) There might be job chains which don't care about affinity
>    (I haven't seen any of these yet on systems I've got).

You are effectively throwing away half the cores if everything is pinned
to core group 0, so I'm pretty sure the Blob manages to run (some)
fragment jobs without the COHERENT_GROUP flag.

But equally this is a reasonable first step for the kernel driver - we
can make the GPU look like ever other GPU by pretending the second core
group doesn't exist.

> 2) There might be dual core group GPUs which don't need such a strict affinity.
>    (I haven't seen any dual core group T[78]xx GPUs yet. This doesn't mean such
>     GPUs don't exist).

They should all be a single core group (fully coherent L2s).

>> If we need to extend it down the line with a UABI change, what would that look like?
> 
> I have no idea. And I'm not sure if it's worth the effort (since most jobs
> end up on core group 0 anyway).

Whether it's worth the effort depends on whether anyone really cares
about getting the full performance out of this particular GPU.

At this stage I think the main UABI change would be to add the opposite
flag to kbase, (e.g. "PANFROST_JD_DOESNT_NEED_COHERENCY_ON_GPU"[1]) to
opt-in to allowing the job to run across all cores.

The second change would be to allow compute jobs to be run on the second
core group, so another flag: PANFROST_RUN_ON_SECOND_CORE_GROUP.

But clearly there's little point adding such flags until someone steps
up to do the Mesa work.

Steve

[1] Bike-shedding the name might be required ;)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/panfrost: adjusted job affinity for dual core group GPUs
  2022-01-10 14:14       ` Steven Price
@ 2022-01-10 17:42         ` Alyssa Rosenzweig
  2022-01-12 17:03           ` Steven Price
  2022-01-13 16:06           ` Alexey Sheplyakov
  0 siblings, 2 replies; 16+ messages in thread
From: Alyssa Rosenzweig @ 2022-01-10 17:42 UTC (permalink / raw)
  To: Steven Price
  Cc: Tomeu Vizoso, Vadim V . Vlasov, dri-devel, Alexey Sheplyakov,
	Alyssa Rosenzweig

> Whether it's worth the effort depends on whether anyone really cares
> about getting the full performance out of this particular GPU.
> 
> At this stage I think the main UABI change would be to add the opposite
> flag to kbase, (e.g. "PANFROST_JD_DOESNT_NEED_COHERENCY_ON_GPU"[1]) to
> opt-in to allowing the job to run across all cores.
> 
> The second change would be to allow compute jobs to be run on the second
> core group, so another flag: PANFROST_RUN_ON_SECOND_CORE_GROUP.
> 
> But clearly there's little point adding such flags until someone steps
> up to do the Mesa work.

I worry about the maintainence burden (both Mesa and kernel) of adding
UABI only used by a piece of hardware none of us own, and only useful
"sometimes" for that hardware. Doubly so for the second core group
support; currently Mesa doesn't advertise any compute support on
anything older than Mali T760 ... to the best of my knowledge, nobody
has missed that support either...

To be clear I am in favour of merging the patches needed for GLES2 to
work on all Malis, possibly at a performance cost on these dual-core
systems. That's a far cry from the level of support the DDK gave these
chips back in the day ... of course, the DDK doesn't support them at all
anymore, so Panfrost wins there by default! ;)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/panfrost: adjusted job affinity for dual core group GPUs
  2022-01-10 17:42         ` Alyssa Rosenzweig
@ 2022-01-12 17:03           ` Steven Price
  2022-01-13 10:01             ` Alexey Sheplyakov
  2022-01-13 16:06           ` Alexey Sheplyakov
  1 sibling, 1 reply; 16+ messages in thread
From: Steven Price @ 2022-01-12 17:03 UTC (permalink / raw)
  To: Alyssa Rosenzweig
  Cc: Tomeu Vizoso, Vadim V . Vlasov, dri-devel, Alexey Sheplyakov,
	Alyssa Rosenzweig

On 10/01/2022 17:42, Alyssa Rosenzweig wrote:
>> Whether it's worth the effort depends on whether anyone really cares
>> about getting the full performance out of this particular GPU.
>>
>> At this stage I think the main UABI change would be to add the opposite
>> flag to kbase, (e.g. "PANFROST_JD_DOESNT_NEED_COHERENCY_ON_GPU"[1]) to
>> opt-in to allowing the job to run across all cores.
>>
>> The second change would be to allow compute jobs to be run on the second
>> core group, so another flag: PANFROST_RUN_ON_SECOND_CORE_GROUP.
>>
>> But clearly there's little point adding such flags until someone steps
>> up to do the Mesa work.
> 
> I worry about the maintainence burden (both Mesa and kernel) of adding
> UABI only used by a piece of hardware none of us own, and only useful
> "sometimes" for that hardware. Doubly so for the second core group
> support; currently Mesa doesn't advertise any compute support on
> anything older than Mali T760 ... to the best of my knowledge, nobody
> has missed that support either...

I agree there's no point adding the UABI support unless someone is
willing to step and be a maintainer for that hardware. And I suspect no
one cares enough about that hardware to do that.

> To be clear I am in favour of merging the patches needed for GLES2 to
> work on all Malis, possibly at a performance cost on these dual-core
> systems. That's a far cry from the level of support the DDK gave these
> chips back in the day ... of course, the DDK doesn't support them at all
> anymore, so Panfrost wins there by default! ;)
> 

Agreed - I'm happy to merge a kernel series similar to this. I think the
remaining problems are:

1. Addressing Robin's concerns about the first patch. That looks like
it's probably just wrong.

2. I think this patch is too complex for the basic support. There's some
parts like checking GROUPS_L2_COHERENT which also don't feature in kbase
so I don't believe are correct.

3. I don't think this blocks the change. But if we're not using the
second core group we could actually power it down. Indeed simply not
turning on the L2/shader cores should in theory work (jobs are not
scheduled to cores which are turned off even if they are included in the
affinity).

Thanks,

Steve

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] drm/panfrost: mmu: improved memory attributes
  2021-12-24 12:56   ` Robin Murphy
@ 2022-01-12 18:18     ` Alexey Sheplyakov
  0 siblings, 0 replies; 16+ messages in thread
From: Alexey Sheplyakov @ 2022-01-12 18:18 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Vadim V . Vlasov, Alyssa Rosenzweig, Tomeu Vizoso, dri-devel,
	Steven Price

Hi, Robin,

On Fri, Dec 24, 2021 at 12:56:57PM +0000, Robin Murphy wrote:

> I'd note that panfrost has been working OK - to the extent that Mesa
> supports its older ISA - on the T624 (single core group) in Arm's Juno SoC
> for over a year now since commit 268af50f38b1.
> 
> If you have to force outer non-cacheable to avoid getting translation faults
> and other errors that look like the GPU is inexplicably seeing the wrong
> data, I'd check whether you have the same thing where your integration is
> actually I/O-coherent and you're missing the "dma-coherent" property in your
> DT.

Thanks for a detailed explanation. Indeed adding the "dma-coherent" property
(and the 2nd patch which effectively disables the 2nd core group) is enough
to get panfrost working with T628 on BE-M1000 SoC. Unfortunately the same
trick does NOT work for Exynos 5422. Here I get an Oops as soon as the driver
tries to reset the GPU [1]. My patch does not work for Exynos 5422 either
(but in a different way: GPU locks up in a few seconds).

So the first patch seems to be wrong and I'll drop it.

Best regards,
   Alexey

[1]

[   73.061941] panfrost 11800000.gpu: AS_ACTIVE bit stuck
[   73.065671] 8<--- cut here ---
[   73.067015] Power domain G3D disable failed
[   73.068637] Unhandled fault: external abort on non-linefetch (0x1008) at 0xf09d0020
[   73.080424] pgd = f347788b
[   73.083108] [f09d0020] *pgd=414eb811, *pte=11800653, *ppte=11800453
[   73.089352] Internal error: : 1008 [#1] PREEMPT SMP ARM
[   73.094549] Modules linked in: xfrm_user l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc rfkill loop zstd input_leds panfrost lzo_rle gpu_sched evdev uio_pdrv_genirq uio exynos_gpiomem zram sch_fq_codel ip_tables ipv6 btrfs blake2b_generic xor xor_neon zstd_compress lzo_compress zlib_deflate raid6_pq libcrc32c usbhid gpio_keys
[   73.126264] CPU: 3 PID: 130 Comm: kworker/u16:3 Not tainted 5.15.8-00057-g5ecb31848317 #1
[   73.134407] Hardware name: Hardkernel ODROID-XU4
[   73.139001] Workqueue: panfrost-reset panfrost_reset_work [panfrost]
[   73.145325] PC is at panfrost_gpu_soft_reset+0xa0/0x104 [panfrost]
[   73.151477] LR is at schedule_hrtimeout_range_clock+0x114/0x240
[   73.157370] pc : [<bf2c092c>]    lr : [<c0bca590>]    psr: 600e0013
[   73.163609] sp : c16f3e88  ip : 00004710  fp : c16f3ea4
[   73.168809] r10: c1255220  r9 : c5982a88  r8 : c5982a88
[   73.174007] r7 : c5982a7c  r6 : 00000011  r5 : 0364a751  r4 : c5982840
[   73.180506] r3 : f09d0000  r2 : 00000000  r1 : 00000000  r0 : 00000000
[   73.187006] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   73.194112] Control: 10c5387d  Table: 45d5c06a  DAC: 00000051
[   73.199830] Register r0 information: NULL pointer
[   73.204509] Register r1 information: NULL pointer
[   73.209189] Register r2 information: NULL pointer
[   73.213868] Register r3 information: 0-page vmalloc region starting at 0xf09d0000 allocated at __devm_ioremap_resource+0x170/0x1e8
[   73.225566] Register r4 information: slab kmalloc-1k start c5982800 pointer offset 64 size 1024
[   73.234232] Register r5 information: non-paged memory
[   73.239258] Register r6 information: non-paged memory
[   73.244284] Register r7 information: slab kmalloc-1k start c5982800 pointer offset 636 size 1024
[   73.253036] Register r8 information: slab kmalloc-1k start c5982800 pointer offset 648 size 1024
[   73.261788] Register r9 information: slab kmalloc-1k start c5982800 pointer offset 648 size 1024
[   73.270540] Register r10 information: non-slab/vmalloc memory
[   73.276259] Register r11 information: non-slab/vmalloc memory
[   73.281978] Register r12 information: non-paged memory
[   73.287091] Process kworker/u16:3 (pid: 130, stack limit = 0x6da5c776)
[   73.293591] Stack: (0xc16f3e88 to 0xc16f4000)
[   73.297926] 3e80:                   c5982840 00000000 00000000 c5982a7c c16f3ebc c16f3ea8
[   73.306072] 3ea0: bf2bf65c bf2c0898 c5982840 00000000 c16f3ef4 c16f3ec0 bf2c278c bf2bf64c
[   73.314218] 3ec0: c16f3ef4 c16f3ed0 c015cf28 c5982aa4 c44ba900 c1410a00 c1e9c400 00000000
[   73.322363] 3ee0: c1e9c405 c1255220 c16f3f04 c16f3ef8 bf2c29cc bf2c2538 c16f3f44 c16f3f08
[   73.330509] 3f00: c014b3ac bf2c29b8 c1410a00 c1410a00 c1410a00 c1410a00 c1410a00 c44ba900
[   73.338654] 3f20: c1410a00 c44ba918 c1410a18 c1103d00 00000088 c1410a00 c16f3f7c c16f3f48
[   73.346800] 3f40: c014bc18 c014b1c0 c127b138 c16f2000 c0151004 c44bb880 c3747500 c014bbb4
[   73.354945] 3f60: c44ba900 c16f2000 c35e5e8c c3747520 c16f3fac c16f3f80 c0152ee8 c014bbc0
[   73.363091] 3f80: 00000000 c44bb880 c0152d7c 00000000 00000000 00000000 00000000 00000000
[   73.371236] 3fa0: 00000000 c16f3fb0 c01000fc c0152d88 00000000 00000000 00000000 00000000
[   73.379381] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   73.387526] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[   73.395669] Backtrace: 
[   73.398096] [<bf2c088c>] (panfrost_gpu_soft_reset [panfrost]) from [<bf2bf65c>] (panfrost_device_reset+0x1c/0x38 [panfrost])
[   73.409278]  r7:c5982a7c r6:00000000 r5:00000000 r4:c5982840
[   73.414906] [<bf2bf640>] (panfrost_device_reset [panfrost]) from [<bf2c278c>] (panfrost_reset+0x260/0x378 [panfrost])
[   73.425479]  r5:00000000 r4:c5982840
[   73.429031] [<bf2c252c>] (panfrost_reset [panfrost]) from [<bf2c29cc>] (panfrost_reset_work+0x20/0x24 [panfrost])
[   73.439260]  r10:c1255220 r9:c1e9c405 r8:00000000 r7:c1e9c400 r6:c1410a00 r5:c44ba900
[   73.447055]  r4:c5982aa4
[   73.449568] [<bf2c29ac>] (panfrost_reset_work [panfrost]) from [<c014b3ac>] (process_one_work+0x1f8/0x5c0)
[   73.459186] [<c014b1b4>] (process_one_work) from [<c014bc18>] (worker_thread+0x64/0x580)
[   73.467250]  r10:c1410a00 r9:00000088 r8:c1103d00 r7:c1410a18 r6:c44ba918 r5:c1410a00
[   73.475045]  r4:c44ba900
[   73.477557] [<c014bbb4>] (worker_thread) from [<c0152ee8>] (kthread+0x16c/0x1a0)
[   73.484927]  r10:c3747520 r9:c35e5e8c r8:c16f2000 r7:c44ba900 r6:c014bbb4 r5:c3747500
[   73.492722]  r4:c44bb880
[   73.495234] [<c0152d7c>] (kthread) from [<c01000fc>] (ret_from_fork+0x14/0x38)
[   73.502427] Exception stack(0xc16f3fb0 to 0xc16f3ff8)
[   73.507454] 3fa0:                                     00000000 00000000 00000000 00000000
[   73.515601] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   73.523746] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   73.530334]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0152d7c
[   73.538129]  r4:c44bb880 r3:00000000
[   73.541685] Code: e3a0001a ba000010 eb64253c e594300c (e5933020) 
[   73.547753] ---[ end trace 1af2dce52aebcc96 ]---

However my patch does not work for Exynos
5422 either, so I'll drop the first patch.

> 
> Thanks,
> Robin.
> 
> > Signed-off-by: Alexey Sheplyakov <asheplyakov@basealt.ru>
> > Signed-off-by: Vadim V. Vlasov <vadim.vlasov@elpitech.ru>
> > 
> > Cc: Rob Herring <robh@kernel.org>
> > Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> > Cc: Steven Price <steven.price@arm.com>
> > Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> > Cc: Vadim V. Vlasov <vadim.vlasov@elpitech.ru>
> > ---
> >   drivers/gpu/drm/panfrost/panfrost_mmu.c |  3 ---
> >   drivers/iommu/io-pgtable-arm.c          | 16 ++++++++++++----
> >   2 files changed, 12 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> > index 39562f2d11a4..2f4f8a17bc82 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> > @@ -133,9 +133,6 @@ static void panfrost_mmu_enable(struct panfrost_device *pfdev, struct panfrost_m
> >   	mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), lower_32_bits(transtab));
> >   	mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), upper_32_bits(transtab));
> > -	/* Need to revisit mem attrs.
> > -	 * NC is the default, Mali driver is inner WT.
> > -	 */
> >   	mmu_write(pfdev, AS_MEMATTR_LO(as_nr), lower_32_bits(memattr));
> >   	mmu_write(pfdev, AS_MEMATTR_HI(as_nr), upper_32_bits(memattr));
> > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > index dd9e47189d0d..15b39c337e20 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -122,13 +122,17 @@
> >   #define ARM_LPAE_MAIR_ATTR_IDX_CACHE	1
> >   #define ARM_LPAE_MAIR_ATTR_IDX_DEV	2
> >   #define ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE	3
> > +#define ARM_LPAE_MAIR_ATTR_IDX_OUTER_WA 4
> >   #define ARM_MALI_LPAE_TTBR_ADRMODE_TABLE (3u << 0)
> >   #define ARM_MALI_LPAE_TTBR_READ_INNER	BIT(2)
> >   #define ARM_MALI_LPAE_TTBR_SHARE_OUTER	BIT(4)
> > -#define ARM_MALI_LPAE_MEMATTR_IMP_DEF	0x88ULL
> > -#define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x8DULL
> > +#define ARM_MALI_LPAE_MEMATTR_IMP_DEF	0x48ULL
> > +#define ARM_MALI_LPAE_MEMATTR_FORCE_CACHE_ALL 0x4FULL
> > +#define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x4DULL
> > +#define ARM_MALI_LPAE_MEMATTR_OUTER_IMP_DEF 0x88ULL
> > +#define ARM_MALI_LPAE_MEMATTR_OUTER_WA 0x8DULL
> >   #define APPLE_DART_PTE_PROT_NO_WRITE (1<<7)
> >   #define APPLE_DART_PTE_PROT_NO_READ (1<<8)
> > @@ -1080,10 +1084,14 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
> >   	cfg->arm_mali_lpae_cfg.memattr =
> >   		(ARM_MALI_LPAE_MEMATTR_IMP_DEF
> >   		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_NC)) |
> > +		(ARM_MALI_LPAE_MEMATTR_FORCE_CACHE_ALL
> > +		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
> >   		(ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC
> >   		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
> > -		(ARM_MALI_LPAE_MEMATTR_IMP_DEF
> > -		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV));
> > +		(ARM_MALI_LPAE_MEMATTR_OUTER_IMP_DEF
> > +		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)) |
> > +		(ARM_MALI_LPAE_MEMATTR_OUTER_WA
> > +		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_OUTER_WA));
> >   	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL,
> >   					   cfg);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/panfrost: adjusted job affinity for dual core group GPUs
  2022-01-12 17:03           ` Steven Price
@ 2022-01-13 10:01             ` Alexey Sheplyakov
  2022-01-13 11:22               ` Steven Price
  0 siblings, 1 reply; 16+ messages in thread
From: Alexey Sheplyakov @ 2022-01-13 10:01 UTC (permalink / raw)
  To: Steven Price
  Cc: Tomeu Vizoso, Vadim V . Vlasov, dri-devel, Alyssa Rosenzweig,
	Alyssa Rosenzweig

Hi, Steven!

Thanks for such a detailed explanation of T628 peculiarities.

On Wed, Jan 12, 2022 at 05:03:15PM +0000, Steven Price wrote:
> On 10/01/2022 17:42, Alyssa Rosenzweig wrote:
> >> Whether it's worth the effort depends on whether anyone really cares
> >> about getting the full performance out of this particular GPU.
> >>
> >> At this stage I think the main UABI change would be to add the opposite
> >> flag to kbase, (e.g. "PANFROST_JD_DOESNT_NEED_COHERENCY_ON_GPU"[1]) to
> >> opt-in to allowing the job to run across all cores.
> >>
> >> The second change would be to allow compute jobs to be run on the second
> >> core group, so another flag: PANFROST_RUN_ON_SECOND_CORE_GROUP.
> >>
> >> But clearly there's little point adding such flags until someone steps
> >> up to do the Mesa work.
> > 
> > I worry about the maintainence burden (both Mesa and kernel) of adding
> > UABI only used by a piece of hardware none of us own, and only useful
> > "sometimes" for that hardware. Doubly so for the second core group
> > support; currently Mesa doesn't advertise any compute support on
> > anything older than Mali T760 ... to the best of my knowledge, nobody
> > has missed that support either...
> 
> I agree there's no point adding the UABI support unless someone is
> willing to step and be a maintainer for that hardware. And I suspect no
> one cares enough about that hardware to do that.
> 
> > To be clear I am in favour of merging the patches needed for GLES2 to
> > work on all Malis, possibly at a performance cost on these dual-core
> > systems. That's a far cry from the level of support the DDK gave these
> > chips back in the day ... of course, the DDK doesn't support them at all
> > anymore, so Panfrost wins there by default! ;)
> > 
> 
> Agreed - I'm happy to merge a kernel series similar to this. I think the
> remaining problems are:
> 
> 1. Addressing Robin's concerns about the first patch. That looks like
> it's probably just wrong.

The first patch is wrong and I'll drop it.

> 2. I think this patch is too complex for the basic support. There's some
> parts like checking GROUPS_L2_COHERENT which also don't feature in kbase

That part has been adapted from kbase_gpuprops_construct_coherent_groups, see
https://github.com/hardkernel/linux/blob/2f0f4268209ddacc2cdea158104b87cedacbd0e3/drivers/gpu/arm/midgard/mali_kbase_gpuprops.c#L94

> so I don't believe are correct.

I'm not sure if it's correct or not, however
- it does not change anything for GPUs with coherent L2 caches
- it appears to correctly figure out core groups for several SoCs
  with T628 GPU (BE-M1000, Exynos 5422).

> 3. I don't think this blocks the change. But if we're not using the
> second core group we could actually power it down. Indeed simply not
> turning on the L2/shader cores should in theory work (jobs are not
> scheduled to cores which are turned off even if they are included in the
> affinity).

Powering off unused GPU is would be nice, however

1) the userspace might power on those cores again (via sysfs or something),
   so I prefer to explicitly schedule jobs to the core group 0.

2) on BE-M1000 GPU seems to lock up in a few seconds after powering off
   some (GPU) cores. In fact I had to disable GPU devfreq to prevent GPU
   lockups.

Therefore I consider powering off unused cores as a later optimization. 
(frankly speaking I'd better put the effort to *making use* of those cores
instead of figuring out why they fail to power down properly).

Best regards,
   Alexey

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/panfrost: adjusted job affinity for dual core group GPUs
  2022-01-13 10:01             ` Alexey Sheplyakov
@ 2022-01-13 11:22               ` Steven Price
  0 siblings, 0 replies; 16+ messages in thread
From: Steven Price @ 2022-01-13 11:22 UTC (permalink / raw)
  To: Alexey Sheplyakov
  Cc: Tomeu Vizoso, Vadim V . Vlasov, dri-devel, Alyssa Rosenzweig,
	Alyssa Rosenzweig

On 13/01/2022 10:01, Alexey Sheplyakov wrote:
> Hi, Steven!
> 
> Thanks for such a detailed explanation of T628 peculiarities.
> 
> On Wed, Jan 12, 2022 at 05:03:15PM +0000, Steven Price wrote:
>> On 10/01/2022 17:42, Alyssa Rosenzweig wrote:
>>>> Whether it's worth the effort depends on whether anyone really cares
>>>> about getting the full performance out of this particular GPU.
>>>>
>>>> At this stage I think the main UABI change would be to add the opposite
>>>> flag to kbase, (e.g. "PANFROST_JD_DOESNT_NEED_COHERENCY_ON_GPU"[1]) to
>>>> opt-in to allowing the job to run across all cores.
>>>>
>>>> The second change would be to allow compute jobs to be run on the second
>>>> core group, so another flag: PANFROST_RUN_ON_SECOND_CORE_GROUP.
>>>>
>>>> But clearly there's little point adding such flags until someone steps
>>>> up to do the Mesa work.
>>>
>>> I worry about the maintainence burden (both Mesa and kernel) of adding
>>> UABI only used by a piece of hardware none of us own, and only useful
>>> "sometimes" for that hardware. Doubly so for the second core group
>>> support; currently Mesa doesn't advertise any compute support on
>>> anything older than Mali T760 ... to the best of my knowledge, nobody
>>> has missed that support either...
>>
>> I agree there's no point adding the UABI support unless someone is
>> willing to step and be a maintainer for that hardware. And I suspect no
>> one cares enough about that hardware to do that.
>>
>>> To be clear I am in favour of merging the patches needed for GLES2 to
>>> work on all Malis, possibly at a performance cost on these dual-core
>>> systems. That's a far cry from the level of support the DDK gave these
>>> chips back in the day ... of course, the DDK doesn't support them at all
>>> anymore, so Panfrost wins there by default! ;)
>>>
>>
>> Agreed - I'm happy to merge a kernel series similar to this. I think the
>> remaining problems are:
>>
>> 1. Addressing Robin's concerns about the first patch. That looks like
>> it's probably just wrong.
> 
> The first patch is wrong and I'll drop it.
> 
>> 2. I think this patch is too complex for the basic support. There's some
>> parts like checking GROUPS_L2_COHERENT which also don't feature in kbase
> 
> That part has been adapted from kbase_gpuprops_construct_coherent_groups, see
> https://github.com/hardkernel/linux/blob/2f0f4268209ddacc2cdea158104b87cedacbd0e3/drivers/gpu/arm/midgard/mali_kbase_gpuprops.c#L94

Gah! I was looking at the wrong version of kbase... ;)

>> so I don't believe are correct.
> 
> I'm not sure if it's correct or not, however
> - it does not change anything for GPUs with coherent L2 caches
> - it appears to correctly figure out core groups for several SoCs
>   with T628 GPU (BE-M1000, Exynos 5422).

Yeah, sorry about that - you are right this matches the (earlier) kbase
versions. I don't believe there ever was a GPU released without
GROUPS_L2_COHERENT set (which means that shader cores are coherent
within an L2, *NOT* that the L2 is coherent).

I think I was having an off day yesterday... ;) Ignore this - it's
arguably cargo-culting from kbase, but there are advantages to that.

>> 3. I don't think this blocks the change. But if we're not using the
>> second core group we could actually power it down. Indeed simply not
>> turning on the L2/shader cores should in theory work (jobs are not
>> scheduled to cores which are turned off even if they are included in the
>> affinity).
> 
> Powering off unused GPU is would be nice, however
> 
> 1) the userspace might power on those cores again (via sysfs or something),
>    so I prefer to explicitly schedule jobs to the core group 0.

We don't expose that level of control to user space. Currently panfrost
either powers on all cores or none.

> 2) on BE-M1000 GPU seems to lock up in a few seconds after powering off
>    some (GPU) cores. In fact I had to disable GPU devfreq to prevent GPU
>    lockups.

I'm not sure how you tested powering off some GPU cores - I'm surprised
that that causes problems. Having to disable GPU devfreq points to an
issue with the DVFS performance points in your DT (perhaps the voltages
are wrong?) or potentially a bug in the driving of the clocks/regulators.

> Therefore I consider powering off unused cores as a later optimization. 
> (frankly speaking I'd better put the effort to *making use* of those cores
> instead of figuring out why they fail to power down properly).

Making use is much more work - it depends if you(/anyone) cares about
power consumption on these devices. Although whether the hardware
actually implements any meaningful power gating for the second core
group is another matter so perhaps it wouldn't make any difference anyway.

My thought was something along the lines of the below which just turns
the cores off and otherwise doesn't require playing with affinities. If
we actually want to use the second core group then much more thought
will have to go into how the job slots are used.

---8<---
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index bbe628b306ee..2e9f9d1ee830 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -320,19 +320,34 @@ void panfrost_gpu_power_on(struct panfrost_device *pfdev)
 {
 	int ret;
 	u32 val;
+	u32 core_mask = U32_MAX;
 
 	panfrost_gpu_init_quirks(pfdev);
 
-	/* Just turn on everything for now */
-	gpu_write(pfdev, L2_PWRON_LO, pfdev->features.l2_present);
+	if (pfdev->features.l2_present != 1) {
+		/*
+		 * Only support one core group for now.
+		 * ~(l2_present - 1) unsets all bits in l2_present except the
+		 * bottom bit. (l2_present - 2) has all the bits in the first
+		 * core group set. AND them together to generate a mask of
+		 * cores in the first core group.
+		 */
+		core_mask = ~(pfdev->features.l2_present - 1) &
+			     (pfdev->features.l2_present - 2);
+	}
+
+	gpu_write(pfdev, L2_PWRON_LO, pfdev->features.l2_present & core_mask);
 	ret = readl_relaxed_poll_timeout(pfdev->iomem + L2_READY_LO,
-		val, val == pfdev->features.l2_present, 100, 20000);
+		val, val == (pfdev->features.l2_present & core_mask),
+		100, 20000);
 	if (ret)
 		dev_err(pfdev->dev, "error powering up gpu L2");
 
-	gpu_write(pfdev, SHADER_PWRON_LO, pfdev->features.shader_present);
+	gpu_write(pfdev, SHADER_PWRON_LO,
+		  pfdev->features.shader_present & core_mask);
 	ret = readl_relaxed_poll_timeout(pfdev->iomem + SHADER_READY_LO,
-		val, val == pfdev->features.shader_present, 100, 20000);
+		val, val == (pfdev->features.shader_present & core_mask),
+		100, 20000);
 	if (ret)
 		dev_err(pfdev->dev, "error powering up gpu shader");
 
---8<---
I don't have a dual-core group system to hand so this is mostly untested.

Thanks,

Steve

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] drm/panfrost: adjusted job affinity for dual core group GPUs
  2022-01-10 17:42         ` Alyssa Rosenzweig
  2022-01-12 17:03           ` Steven Price
@ 2022-01-13 16:06           ` Alexey Sheplyakov
  1 sibling, 0 replies; 16+ messages in thread
From: Alexey Sheplyakov @ 2022-01-13 16:06 UTC (permalink / raw)
  To: Alyssa Rosenzweig
  Cc: Tomeu Vizoso, Vadim V . Vlasov, dri-devel, Steven Price,
	Alyssa Rosenzweig

Hi, Alyssa,

On Mon, Jan 10, 2022 at 12:42:44PM -0500, Alyssa Rosenzweig wrote:
> > Whether it's worth the effort depends on whether anyone really cares
> > about getting the full performance out of this particular GPU.
> > 
> > At this stage I think the main UABI change would be to add the opposite
> > flag to kbase, (e.g. "PANFROST_JD_DOESNT_NEED_COHERENCY_ON_GPU"[1]) to
> > opt-in to allowing the job to run across all cores.
> > 
> > The second change would be to allow compute jobs to be run on the second
> > core group, so another flag: PANFROST_RUN_ON_SECOND_CORE_GROUP.
> > 
> > But clearly there's little point adding such flags until someone steps
> > up to do the Mesa work.
> 
> I worry about the maintainence burden (both Mesa and kernel) of adding
> UABI only used by a piece of hardware none of us own, and only useful

To solve the "no hardware" problem we can send you (or any interested
panfrost hacker) a BE-M1000 based board (for free). BE-M1000 is 8 core
armv8 (Cortex A53) SoC with Mali T628 GPU. Plese email me for the further
details (preferably off the list) if you are interested.

Best regards,
   Alexey

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2] drm/panfrost: initial dual core group GPUs support
  2021-12-23 11:06 [PATCH 0/2] drm/panfrost: partial support of T628 GPUs asheplyakov
                   ` (2 preceding siblings ...)
  2021-12-23 14:14 ` [PATCH 0/2] drm/panfrost: partial support of T628 GPUs Alyssa Rosenzweig
@ 2022-01-15 16:06 ` Alexey Sheplyakov
  2022-01-17  9:42   ` Steven Price
  3 siblings, 1 reply; 16+ messages in thread
From: Alexey Sheplyakov @ 2022-01-15 16:06 UTC (permalink / raw)
  To: dri-devel
  Cc: Tomeu Vizoso, Vadim V . Vlasov, Steven Price, Alexey Sheplyakov,
	Alyssa Rosenzweig, Robin Murphy

On a dual core group GPUs (such as T628) fragment shading can be
performed over all cores (because a fragment shader job doesn't
need coherency between threads), however vertex shading requires
to be run on the same core group as the tiler (which always lives
in core group 0).

As a first step to support T628 power on only the first core group
(so no jobs are scheduled on the second one). This makes T628 look
like every other Midgard GPU (and throws away up to half the cores).

With this patch panfrost is able to drive T628 (r1p0) GPU on some
armv8 SoCs (in particular BE-M1000). Without the patch rendering
is horriby broken (desktop is completely unusable) and eventually
the GPU locks up (it takes from a few seconds to a couple of
minutes).

Using the second core group requires support in Mesa (and an UABI
change): the userspace should

1) set PANFROST_JD_DOESNT_NEED_COHERENCY_ON_GPU flag to opt-in
   to allowing the job to run across all cores.
2) set PANFROST_RUN_ON_SECOND_CORE_GROUP flag to allow compute
   jobs to be run on the second core group (at the moment Mesa
   does not advertise compute support on anything older than
   Mali T760)

But there's little point adding such flags until someone (myself)
steps up to do the Mesa work.

Signed-off-by: Alexey Sheplyakov <asheplyakov@basealt.ru>
Signed-off-by: Vadim V. Vlasov <vadim.vlasov@elpitech.ru>
Tested-by: Alexey Sheplyakov <asheplyakov@basealt.ru>
---
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 27 ++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index f8355de6e335..15cec831a99a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -320,19 +320,36 @@ void panfrost_gpu_power_on(struct panfrost_device *pfdev)
 {
 	int ret;
 	u32 val;
+	u64 core_mask = U64_MAX;
 
 	panfrost_gpu_init_quirks(pfdev);
 
-	/* Just turn on everything for now */
-	gpu_write(pfdev, L2_PWRON_LO, pfdev->features.l2_present);
+	if (pfdev->features.l2_present != 1) {
+		/*
+		 * Only support one core group now.
+		 * ~(l2_present - 1) unsets all bits in l2_present except
+		 * the bottom bit. (l2_present - 2) has all the bits in
+		 * the first core group set. AND them together to generate
+		 * a mask of cores in the first core group.
+		 */
+		core_mask = ~(pfdev->features.l2_present - 1) &
+			     (pfdev->features.l2_present - 2);
+		dev_info_once(pfdev->dev, "using only 1st core group (%lu cores from %lu)\n",
+			      hweight64(core_mask),
+			      hweight64(pfdev->features.shader_present));
+	}
+	gpu_write(pfdev, L2_PWRON_LO, pfdev->features.l2_present & core_mask);
 	ret = readl_relaxed_poll_timeout(pfdev->iomem + L2_READY_LO,
-		val, val == pfdev->features.l2_present, 100, 20000);
+		val, val == (pfdev->features.l2_present & core_mask),
+		100, 20000);
 	if (ret)
 		dev_err(pfdev->dev, "error powering up gpu L2");
 
-	gpu_write(pfdev, SHADER_PWRON_LO, pfdev->features.shader_present);
+	gpu_write(pfdev, SHADER_PWRON_LO,
+		  pfdev->features.shader_present & core_mask);
 	ret = readl_relaxed_poll_timeout(pfdev->iomem + SHADER_READY_LO,
-		val, val == pfdev->features.shader_present, 100, 20000);
+		val, val == (pfdev->features.shader_present & core_mask),
+		100, 20000);
 	if (ret)
 		dev_err(pfdev->dev, "error powering up gpu shader");
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] drm/panfrost: initial dual core group GPUs support
  2022-01-15 16:06 ` [PATCH v2] drm/panfrost: initial dual core group GPUs support Alexey Sheplyakov
@ 2022-01-17  9:42   ` Steven Price
  0 siblings, 0 replies; 16+ messages in thread
From: Steven Price @ 2022-01-17  9:42 UTC (permalink / raw)
  To: Alexey Sheplyakov, dri-devel
  Cc: Robin Murphy, Alyssa Rosenzweig, Tomeu Vizoso, Vadim V . Vlasov

On 15/01/2022 16:06, Alexey Sheplyakov wrote:
> On a dual core group GPUs (such as T628) fragment shading can be
> performed over all cores (because a fragment shader job doesn't
> need coherency between threads), however vertex shading requires
> to be run on the same core group as the tiler (which always lives
> in core group 0).
> 
> As a first step to support T628 power on only the first core group
> (so no jobs are scheduled on the second one). This makes T628 look
> like every other Midgard GPU (and throws away up to half the cores).
> 
> With this patch panfrost is able to drive T628 (r1p0) GPU on some
> armv8 SoCs (in particular BE-M1000). Without the patch rendering
> is horriby broken (desktop is completely unusable) and eventually
> the GPU locks up (it takes from a few seconds to a couple of
> minutes).
> 
> Using the second core group requires support in Mesa (and an UABI
> change): the userspace should
> 
> 1) set PANFROST_JD_DOESNT_NEED_COHERENCY_ON_GPU flag to opt-in
>    to allowing the job to run across all cores.
> 2) set PANFROST_RUN_ON_SECOND_CORE_GROUP flag to allow compute
>    jobs to be run on the second core group (at the moment Mesa
>    does not advertise compute support on anything older than
>    Mali T760)

Minor comment: these flags obviously don't exist yet, and I would prefer
some more thought went into the names before we merge a new UABI. I
picked overly verbose names (for the purposes of discussion) in the hope
that it was 'obvious' I hadn't thought about the names, but in hindsight
realise this perhaps wasn't obvious. As well as being too verbose, the
second name is missing the "_JD_" part of the prefix.

> 
> But there's little point adding such flags until someone (myself)
> steps up to do the Mesa work.
> 
> Signed-off-by: Alexey Sheplyakov <asheplyakov@basealt.ru>
> Signed-off-by: Vadim V. Vlasov <vadim.vlasov@elpitech.ru>
> Tested-by: Alexey Sheplyakov <asheplyakov@basealt.ru>

Since I basically wrote the patch you should have included a credit here
for me (see the developer certificate of origin[1]). But thank you for
writing up a detailed commit message.

I'll push to drm-misc-next with a Co-developed-by for me. Thanks for
testing the patch and good luck with the Mesa changes!

Thanks,

Steve

[1] https://developercertificate.org/

> ---
>  drivers/gpu/drm/panfrost/panfrost_gpu.c | 27 ++++++++++++++++++++-----
>  1 file changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> index f8355de6e335..15cec831a99a 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> @@ -320,19 +320,36 @@ void panfrost_gpu_power_on(struct panfrost_device *pfdev)
>  {
>  	int ret;
>  	u32 val;
> +	u64 core_mask = U64_MAX;
>  
>  	panfrost_gpu_init_quirks(pfdev);
>  
> -	/* Just turn on everything for now */
> -	gpu_write(pfdev, L2_PWRON_LO, pfdev->features.l2_present);
> +	if (pfdev->features.l2_present != 1) {
> +		/*
> +		 * Only support one core group now.
> +		 * ~(l2_present - 1) unsets all bits in l2_present except
> +		 * the bottom bit. (l2_present - 2) has all the bits in
> +		 * the first core group set. AND them together to generate
> +		 * a mask of cores in the first core group.
> +		 */
> +		core_mask = ~(pfdev->features.l2_present - 1) &
> +			     (pfdev->features.l2_present - 2);
> +		dev_info_once(pfdev->dev, "using only 1st core group (%lu cores from %lu)\n",
> +			      hweight64(core_mask),
> +			      hweight64(pfdev->features.shader_present));
> +	}
> +	gpu_write(pfdev, L2_PWRON_LO, pfdev->features.l2_present & core_mask);
>  	ret = readl_relaxed_poll_timeout(pfdev->iomem + L2_READY_LO,
> -		val, val == pfdev->features.l2_present, 100, 20000);
> +		val, val == (pfdev->features.l2_present & core_mask),
> +		100, 20000);
>  	if (ret)
>  		dev_err(pfdev->dev, "error powering up gpu L2");
>  
> -	gpu_write(pfdev, SHADER_PWRON_LO, pfdev->features.shader_present);
> +	gpu_write(pfdev, SHADER_PWRON_LO,
> +		  pfdev->features.shader_present & core_mask);
>  	ret = readl_relaxed_poll_timeout(pfdev->iomem + SHADER_READY_LO,
> -		val, val == pfdev->features.shader_present, 100, 20000);
> +		val, val == (pfdev->features.shader_present & core_mask),
> +		100, 20000);
>  	if (ret)
>  		dev_err(pfdev->dev, "error powering up gpu shader");
>  
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-01-17  9:42 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-23 11:06 [PATCH 0/2] drm/panfrost: partial support of T628 GPUs asheplyakov
2021-12-23 11:06 ` [PATCH 1/2] drm/panfrost: mmu: improved memory attributes asheplyakov
2021-12-24 12:56   ` Robin Murphy
2022-01-12 18:18     ` Alexey Sheplyakov
2021-12-23 11:06 ` [PATCH 2/2] drm/panfrost: adjusted job affinity for dual core group GPUs asheplyakov
2021-12-23 14:11   ` Alyssa Rosenzweig
2021-12-24  8:56     ` Alexey Sheplyakov
2022-01-10 14:14       ` Steven Price
2022-01-10 17:42         ` Alyssa Rosenzweig
2022-01-12 17:03           ` Steven Price
2022-01-13 10:01             ` Alexey Sheplyakov
2022-01-13 11:22               ` Steven Price
2022-01-13 16:06           ` Alexey Sheplyakov
2021-12-23 14:14 ` [PATCH 0/2] drm/panfrost: partial support of T628 GPUs Alyssa Rosenzweig
2022-01-15 16:06 ` [PATCH v2] drm/panfrost: initial dual core group GPUs support Alexey Sheplyakov
2022-01-17  9:42   ` Steven Price

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.