linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/15] GMU-less A6xx support (A610, A619_holi)
@ 2023-04-01 11:54 Konrad Dybcio
  2023-04-01 11:54 ` [PATCH v6 01/15] drm/msm/adreno: adreno_gpu: Don't set OPP scaling clock w/ GMU Konrad Dybcio
                   ` (14 more replies)
  0 siblings, 15 replies; 30+ messages in thread
From: Konrad Dybcio @ 2023-04-01 11:54 UTC (permalink / raw)
  To: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten, Konrad Dybcio, Krzysztof Kozlowski

v5 -> v6:
- Rebase on 8ead96783163 ("drm/msm/gpu: Move BO allocation out of hw_init")
  (Add .ucode_load to funcs_gmuwrapper)
- Drop A6[45]0 speedbin deps, merged into msm-next

Dependencies:
- https://lore.kernel.org/linux-arm-msm/20230330231517.2747024-1-konrad.dybcio@linaro.org/ (to work properly)

v5: https://lore.kernel.org/linux-arm-msm/20230223-topic-gmuwrapper-v5-0-bf774b9a902a@linaro.org/

v4 -> v5:
- Add a newline before the new allOf:if: [3/15]
- Enforce 6 clocks on A619_holi/A610 [2/15]
- Pick up tags
- Improve error handling in a6xx_pm_resume [6/15]
- Add patch [1/15] (fix an existing issue) which can be picked
  separately and account for it in [6/15]
- Rebase atop Akhil's CX shutdown patches and incorporate analogous logic
- Fix a regression introduced in v3 that made the fw loader expect
  GMU fw on GMU wrapper GPUs

Dependencies:
- https://lore.kernel.org/linux-arm-msm/20230120172233.1905761-1-konrad.dybcio@linaro.org/ (to apply)
- https://lore.kernel.org/linux-arm-msm/20230330231517.2747024-1-konrad.dybcio@linaro.org/ (to work properly)

v4: https://lore.kernel.org/r/20230223-topic-gmuwrapper-v4-0-e987eb79d03f@linaro.org

v3 -> v4:
- Drop the mistakengly-included and wrong A3xx-A5xx bindings changes
- Improve bindings commit messages to better explain what GMU Wrapper is
- Drop the A680 highest bank bit value adjustment patch
- Sort UBWC config variables in a reverse-Christmass-tree fashion [4/14]
- Don't alter any UBWC config values in [4/14]
  - Do so for a619_holi in [8/14]
- Rebase on next-20230314 (shouldn't matter at all)

v3: https://lore.kernel.org/r/20230223-topic-gmuwrapper-v3-0-5be55a336819@linaro.org

v2 -> v3:
New dependencies:
- https://lore.kernel.org/linux-arm-msm/20230223-topic-opp-v3-0-5f22163cd1df@linaro.org/T/#t
- https://lore.kernel.org/linux-arm-msm/20230120172233.1905761-1-konrad.dybcio@linaro.org/

Sidenote: A speedbin rework is in progress, the of_machine_is_compatible
calls in A619_holi are ugly (but well, necessary..) but they'll be
replaced with socid matching in this or the next kernel cycle.

Due to the new way of identifying GMU wrapper GPUs, configuring 6350
to use wrapper would cause the wrong fuse values to be checked, but that
will be solved by the conversion + the ultimate goal is to use the GMU
whenever possible with the wrapper left for GMU-less Adrenos and early
bringup debugging of GMU-equipped ones.

- Ship dt-bindings in this series as we're referencing the compatible now

- "De-staticize" -> "remove static keyword" [3/15]

- Track down all the values in [4/15]

- Add many comments and explanations in [4/15]

- Fix possible return-before-mutex-unlock [5/15]

- Explain the GMU wrapper a bit more in the commit msg [5/15]

- Separate out pm_resume/suspend for GMU-wrapper GPUs to make things
  cleaner [5/15]

- Don't check if `info` exists, it has to at this point [5/15]

- Assign gpu->info early and clean up following if statements in
  a6xx_gpu_init [5/15]

- Determine whether we use GMU wrapper based on the GMU compatible
  instead of a quirk [5/15]

- Use a struct field to annotate whether we're using gmu wrapper so
  that it can be assigned at runtime (turns out a619 holi-ness cannot
  be determined by patchid + that will make it easier to test out GMU
  GPUs without actually turning on the GMU if anybody wants to do so)
  [5/15]

- Unconditionally hook up gx to the gmu wrapper (otherwise our gpu
  will not get power) [5/15]

- Don't check for gx domain presence in gmu_wrapper paths, it's
  guaranteed [5/15]

- Use opp set rate in the gmuwrapper suspend path [5/15]

- Call opp functions on the GPU device and not on the DRM device of
  mdp4/5/DPU1 half the time (WHOOOOPS!) [5/15]

- Disable the memory clock in a6xx_pm_suspend instead of enabling it
  (moderate oops) [5/15]

- Call the forgotten clk_bulk_disable_unprepare in a6xx_pm_suspend [5/15]

- Set rate to FMIN (a6xx really doesn't like rate=0 + that's what
  msm-5.x does anyway) before disabling core clock [5/15]

- pm_runtime_get_sync -> pm_runtime_resume_and_get [5/15]

- Don't annotate no cached BO support with a quirk, as A619_holi is
  merged into the A619 entry in the big const struct - this means
  that all GPUs operating in gmu wrapper configuration will be
  implicitly treated as if they didn't have this feature [7/15]

- Drop OPP rate & icc related patches, they're a part of a separate
  series now; rebase on it

- Clean up extra parentheses [8/15]

- Identify A619_holi by checking the compatible of its GMU instead
  of patchlevel [8/15]

- Drop "Fix up A6XX protected registers" - unnecessary, Rob will add
  a comment explaining why

- Fix existing UBWC values for A680, new patch [10/15]

- Use adreno_is_aXYZ macros in speedbin matching [13/15] - new patch

v2: https://lore.kernel.org/linux-arm-msm/20230214173145.2482651-1-konrad.dybcio@linaro.org/

v1 -> v2:
- Fix A630 values in [2/14]
- Fix [6/14] for GMU-equipped GPUs

Link to v1: https://lore.kernel.org/linux-arm-msm/20230126151618.225127-1-konrad.dybcio@linaro.org/

This series concludes my couple-weeks-long suffering of figuring out
the ins and outs of the "non-standard" A6xx GPUs which feature no GMU.

The GMU functionality is essentially emulated by parting out a
"GMU wrapper" region, which is essentially just a register space
within the GPU. It's modeled to be as similar to the actual GMU
as possible while staying as unnecessary as we can make it - there's
no IRQs, communicating with a microcontroller, no RPMh communication
etc. etc. I tried to reuse as much code as possible without making
a mess where every even line is used for GMU and every odd line is
used for GMU wrapper..

This series contains:
- plumbing for non-GMU operation, if-ing out GMU calls based on
  GMU presence
- GMU wrapper support
- A610 support (w/ speedbin)
- A619 support (w/ speedbin)
- couple of minor fixes and improvements
- VDDCX/VDDGX scaling fix for non-GMU GPUs (concerns more than just
  A6xx)
- Enablement of opp interconnect properties

A619_holi works perfectly fine using the already-present A619 support
in mesa. A610 needs more work on that front, but can already replay
command traces captures on downstream.

NOTE: the "drm/msm/a6xx: Add support for A619_holi" patch contains
two occurences of 0x18 used in place of a register #define, as it's
supposed to be RBBM_GPR0_CNTL, but that will only be present after
mesa-side changes are merged and headers are synced from there.

Speedbin patches depend on:
https://lore.kernel.org/linux-arm-msm/20230120172233.1905761-1-konrad.dybcio@linaro.org/

Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
Konrad Dybcio (15):
      drm/msm/adreno: adreno_gpu: Don't set OPP scaling clock w/ GMU
      dt-bindings: display/msm: gpu: Document GMU wrapper-equipped A6xx
      dt-bindings: display/msm/gmu: Add GMU wrapper
      drm/msm/a6xx: Remove static keyword from sptprac en/disable functions
      drm/msm/a6xx: Extend and explain UBWC config
      drm/msm/a6xx: Introduce GMU wrapper support
      drm/msm/a6xx: Remove both GBIF and RBBM GBIF halt on hw init
      drm/msm/adreno: Disable has_cached_coherent in GMU wrapper configurations
      drm/msm/a6xx: Add support for A619_holi
      drm/msm/a6xx: Add A610 support
      drm/msm/a6xx: Fix some A619 tunables
      drm/msm/a6xx: Use "else if" in GPU speedbin rev matching
      drm/msm/a6xx: Use adreno_is_aXYZ macros in speedbin matching
      drm/msm/a6xx: Add A619_holi speedbin support
      drm/msm/a6xx: Add A610 speedbin support

 .../devicetree/bindings/display/msm/gmu.yaml       |  50 +-
 .../devicetree/bindings/display/msm/gpu.yaml       |  61 ++-
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c              |  76 ++-
 drivers/gpu/drm/msm/adreno/a6xx_gmu.h              |   2 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c              | 508 ++++++++++++++++++---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h              |   1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c        |  14 +-
 drivers/gpu/drm/msm/adreno/adreno_device.c         |  17 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c            |  28 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h            |  33 +-
 10 files changed, 688 insertions(+), 102 deletions(-)
---
base-commit: b5595a717f5b26d99ec94d038cb1cfaae319bd6e
change-id: 20230223-topic-gmuwrapper-b4fff5fd7789

Best regards,
-- 
Konrad Dybcio <konrad.dybcio@linaro.org>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v6 01/15] drm/msm/adreno: adreno_gpu: Don't set OPP scaling clock w/ GMU
  2023-04-01 11:54 [PATCH v6 00/15] GMU-less A6xx support (A610, A619_holi) Konrad Dybcio
@ 2023-04-01 11:54 ` Konrad Dybcio
  2023-04-02 15:43   ` Dmitry Baryshkov
  2023-04-01 11:54 ` [PATCH v6 02/15] dt-bindings: display/msm: gpu: Document GMU wrapper-equipped A6xx Konrad Dybcio
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 30+ messages in thread
From: Konrad Dybcio @ 2023-04-01 11:54 UTC (permalink / raw)
  To: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten, Konrad Dybcio

Recently I contributed the switch to OPP API for all Adreno generations.
I did however also skip over the fact that GPUs with a GMU don't specify
a core clock of any kind in the GPU node. While that didn't break
anything, it did introduce unwanted spam in the dmesg:

adreno 5000000.gpu: error -ENOENT: _opp_set_clknames: Couldn't find clock with name: core_clk

Guard the entire logic so that it's not used with GMU-equipped GPUs.

Fixes: 9f251f934012 ("drm/msm/adreno: Use OPP for every GPU generation")
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index bb38e728864d..6934cee07d42 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -1074,18 +1074,22 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	u32 speedbin;
 	int ret;
 
-	/*
-	 * This can only be done before devm_pm_opp_of_add_table(), or
-	 * dev_pm_opp_set_config() will WARN_ON()
-	 */
-	if (IS_ERR(devm_clk_get(dev, "core"))) {
+	/* Only handle the core clock when GMU is not in use */
+	if (config->rev.core < 6) {
 		/*
-		 * If "core" is absent, go for the legacy clock name.
-		 * If we got this far in probing, it's a given one of them exists.
+		 * This can only be done before devm_pm_opp_of_add_table(), or
+		 * dev_pm_opp_set_config() will WARN_ON()
 		 */
-		devm_pm_opp_set_clkname(dev, "core_clk");
-	} else
-		devm_pm_opp_set_clkname(dev, "core");
+		if (IS_ERR(devm_clk_get(dev, "core"))) {
+			/*
+			 * If "core" is absent, go for the legacy clock name.
+			 * If we got this far in probing, it's a given one of
+			 * them exists.
+			 */
+			devm_pm_opp_set_clkname(dev, "core_clk");
+		} else
+			devm_pm_opp_set_clkname(dev, "core");
+	}
 
 	adreno_gpu->funcs = funcs;
 	adreno_gpu->info = adreno_info(config->rev);

-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 02/15] dt-bindings: display/msm: gpu: Document GMU wrapper-equipped A6xx
  2023-04-01 11:54 [PATCH v6 00/15] GMU-less A6xx support (A610, A619_holi) Konrad Dybcio
  2023-04-01 11:54 ` [PATCH v6 01/15] drm/msm/adreno: adreno_gpu: Don't set OPP scaling clock w/ GMU Konrad Dybcio
@ 2023-04-01 11:54 ` Konrad Dybcio
  2023-04-05  6:36   ` Krzysztof Kozlowski
  2023-04-01 11:54 ` [PATCH v6 03/15] dt-bindings: display/msm/gmu: Add GMU wrapper Konrad Dybcio
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 30+ messages in thread
From: Konrad Dybcio @ 2023-04-01 11:54 UTC (permalink / raw)
  To: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten, Konrad Dybcio

The "GMU Wrapper" is Qualcomm's name for "let's treat the GPU blocks
we'd normally assign to the GMU as if they were a part of the GMU, even
though they are not". It's a (good) software representation of the GMU_CX
and GMU_GX register spaces within the GPUSS that helps us programatically
treat these de-facto GMU-less parts in a way that's very similar to their
GMU-equipped cousins, massively saving up on code duplication.

The "wrapper" register space was specifically designed to mimic the layout
of a real GMU, though it rather obviously does not have the M3 core et al.

GMU wrapper-equipped A6xx GPUs require clocks and clock-names to be
specified under the GPU node, just like their older cousins. Account
for that.

Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 .../devicetree/bindings/display/msm/gpu.yaml       | 61 ++++++++++++++++++----
 1 file changed, 52 insertions(+), 9 deletions(-)

diff --git a/Documentation/devicetree/bindings/display/msm/gpu.yaml b/Documentation/devicetree/bindings/display/msm/gpu.yaml
index 5dabe7b6794b..58ca8912a8c3 100644
--- a/Documentation/devicetree/bindings/display/msm/gpu.yaml
+++ b/Documentation/devicetree/bindings/display/msm/gpu.yaml
@@ -36,10 +36,7 @@ properties:
 
   reg-names:
     minItems: 1
-    items:
-      - const: kgsl_3d0_reg_memory
-      - const: cx_mem
-      - const: cx_dbgc
+    maxItems: 3
 
   interrupts:
     maxItems: 1
@@ -157,16 +154,62 @@ allOf:
       required:
         - clocks
         - clock-names
+
   - if:
       properties:
         compatible:
           contains:
-            pattern: '^qcom,adreno-6[0-9][0-9]\.[0-9]$'
-
-    then: # Since Adreno 6xx series clocks should be defined in GMU
+            enum:
+              - qcom,adreno-610.0
+              - qcom,adreno-619.1
+    then:
       properties:
-        clocks: false
-        clock-names: false
+        clocks:
+          minItems: 6
+          maxItems: 6
+
+        clock-names:
+          items:
+            - const: core
+              description: GPU Core clock
+            - const: iface
+              description: GPU Interface clock
+            - const: mem_iface
+              description: GPU Memory Interface clock
+            - const: alt_mem_iface
+              description: GPU Alternative Memory Interface clock
+            - const: gmu
+              description: CX GMU clock
+            - const: xo
+              description: GPUCC clocksource clock
+
+        reg-names:
+          minItems: 1
+          items:
+            - const: kgsl_3d0_reg_memory
+            - const: cx_dbgc
+
+      required:
+        - clocks
+        - clock-names
+    else:
+      if:
+        properties:
+          compatible:
+            contains:
+              pattern: '^qcom,adreno-6[0-9][0-9]\.[0-9]$'
+
+      then: # Starting with A6xx, the clocks are usually defined in the GMU node
+        properties:
+          clocks: false
+          clock-names: false
+
+          reg-names:
+            minItems: 1
+            items:
+              - const: kgsl_3d0_reg_memory
+              - const: cx_mem
+              - const: cx_dbgc
 
 examples:
   - |

-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 03/15] dt-bindings: display/msm/gmu: Add GMU wrapper
  2023-04-01 11:54 [PATCH v6 00/15] GMU-less A6xx support (A610, A619_holi) Konrad Dybcio
  2023-04-01 11:54 ` [PATCH v6 01/15] drm/msm/adreno: adreno_gpu: Don't set OPP scaling clock w/ GMU Konrad Dybcio
  2023-04-01 11:54 ` [PATCH v6 02/15] dt-bindings: display/msm: gpu: Document GMU wrapper-equipped A6xx Konrad Dybcio
@ 2023-04-01 11:54 ` Konrad Dybcio
  2023-04-01 11:54 ` [PATCH v6 04/15] drm/msm/a6xx: Remove static keyword from sptprac en/disable functions Konrad Dybcio
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: Konrad Dybcio @ 2023-04-01 11:54 UTC (permalink / raw)
  To: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten, Konrad Dybcio, Krzysztof Kozlowski

The "GMU Wrapper" is Qualcomm's name for "let's treat the GPU blocks
we'd normally assign to the GMU as if they were a part of the GMU, even
though they are not". It's a (good) software representation of the GMU_CX
and GMU_GX register spaces within the GPUSS that helps us programatically
treat these de-facto GMU-less parts in a way that's very similar to their
GMU-equipped cousins, massively saving up on code duplication.

The "wrapper" register space was specifically designed to mimic the layout
of a real GMU, though it rather obviously does not have the M3 core et al.

To sum it all up, the GMU wrapper is essentially a register space within
the GPU, which Linux sees as a dumbed-down regular GMU: there's no clocks,
interrupts, multiple reg spaces, iommus and OPP. Document it.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 .../devicetree/bindings/display/msm/gmu.yaml       | 50 ++++++++++++++++------
 1 file changed, 38 insertions(+), 12 deletions(-)

diff --git a/Documentation/devicetree/bindings/display/msm/gmu.yaml b/Documentation/devicetree/bindings/display/msm/gmu.yaml
index 029d72822d8b..e36c40b935de 100644
--- a/Documentation/devicetree/bindings/display/msm/gmu.yaml
+++ b/Documentation/devicetree/bindings/display/msm/gmu.yaml
@@ -19,16 +19,18 @@ description: |
 
 properties:
   compatible:
-    items:
-      - pattern: '^qcom,adreno-gmu-6[0-9][0-9]\.[0-9]$'
-      - const: qcom,adreno-gmu
+    oneOf:
+      - items:
+          - pattern: '^qcom,adreno-gmu-6[0-9][0-9]\.[0-9]$'
+          - const: qcom,adreno-gmu
+      - const: qcom,adreno-gmu-wrapper
 
   reg:
-    minItems: 3
+    minItems: 1
     maxItems: 4
 
   reg-names:
-    minItems: 3
+    minItems: 1
     maxItems: 4
 
   clocks:
@@ -44,7 +46,6 @@ properties:
       - description: GMU HFI interrupt
       - description: GMU interrupt
 
-
   interrupt-names:
     items:
       - const: hfi
@@ -72,14 +73,8 @@ required:
   - compatible
   - reg
   - reg-names
-  - clocks
-  - clock-names
-  - interrupts
-  - interrupt-names
   - power-domains
   - power-domain-names
-  - iommus
-  - operating-points-v2
 
 additionalProperties: false
 
@@ -217,6 +212,28 @@ allOf:
             - const: axi
             - const: memnoc
 
+  - if:
+      properties:
+        compatible:
+          contains:
+            const: qcom,adreno-gmu-wrapper
+    then:
+      properties:
+        reg:
+          items:
+            - description: GMU wrapper register space
+        reg-names:
+          items:
+            - const: gmu
+    else:
+      required:
+        - clocks
+        - clock-names
+        - interrupts
+        - interrupt-names
+        - iommus
+        - operating-points-v2
+
 examples:
   - |
     #include <dt-bindings/clock/qcom,gpucc-sdm845.h>
@@ -249,3 +266,12 @@ examples:
         iommus = <&adreno_smmu 5>;
         operating-points-v2 = <&gmu_opp_table>;
     };
+
+    gmu_wrapper: gmu@596a000 {
+        compatible = "qcom,adreno-gmu-wrapper";
+        reg = <0x0596a000 0x30000>;
+        reg-names = "gmu";
+        power-domains = <&gpucc GPU_CX_GDSC>,
+                        <&gpucc GPU_GX_GDSC>;
+        power-domain-names = "cx", "gx";
+    };

-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 04/15] drm/msm/a6xx: Remove static keyword from sptprac en/disable functions
  2023-04-01 11:54 [PATCH v6 00/15] GMU-less A6xx support (A610, A619_holi) Konrad Dybcio
                   ` (2 preceding siblings ...)
  2023-04-01 11:54 ` [PATCH v6 03/15] dt-bindings: display/msm/gmu: Add GMU wrapper Konrad Dybcio
@ 2023-04-01 11:54 ` Konrad Dybcio
  2023-04-01 11:54 ` [PATCH v6 05/15] drm/msm/a6xx: Extend and explain UBWC config Konrad Dybcio
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: Konrad Dybcio @ 2023-04-01 11:54 UTC (permalink / raw)
  To: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten, Konrad Dybcio

These two will be reused by at least A619_holi in the non-gmu
paths. Turn them non-static them to make it possible.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 4 ++--
 drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index e16b4b3f8535..87babbb2a19f 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -354,7 +354,7 @@ void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state)
 }
 
 /* Enable CPU control of SPTP power power collapse */
-static int a6xx_sptprac_enable(struct a6xx_gmu *gmu)
+int a6xx_sptprac_enable(struct a6xx_gmu *gmu)
 {
 	int ret;
 	u32 val;
@@ -376,7 +376,7 @@ static int a6xx_sptprac_enable(struct a6xx_gmu *gmu)
 }
 
 /* Disable CPU control of SPTP power power collapse */
-static void a6xx_sptprac_disable(struct a6xx_gmu *gmu)
+void a6xx_sptprac_disable(struct a6xx_gmu *gmu)
 {
 	u32 val;
 	int ret;
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
index 0bc3eb443fec..7ee5b606bc47 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
@@ -193,5 +193,7 @@ int a6xx_hfi_set_freq(struct a6xx_gmu *gmu, int index);
 
 bool a6xx_gmu_gx_is_on(struct a6xx_gmu *gmu);
 bool a6xx_gmu_sptprac_is_on(struct a6xx_gmu *gmu);
+void a6xx_sptprac_disable(struct a6xx_gmu *gmu);
+int a6xx_sptprac_enable(struct a6xx_gmu *gmu);
 
 #endif

-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 05/15] drm/msm/a6xx: Extend and explain UBWC config
  2023-04-01 11:54 [PATCH v6 00/15] GMU-less A6xx support (A610, A619_holi) Konrad Dybcio
                   ` (3 preceding siblings ...)
  2023-04-01 11:54 ` [PATCH v6 04/15] drm/msm/a6xx: Remove static keyword from sptprac en/disable functions Konrad Dybcio
@ 2023-04-01 11:54 ` Konrad Dybcio
  2023-04-01 11:54 ` [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support Konrad Dybcio
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: Konrad Dybcio @ 2023-04-01 11:54 UTC (permalink / raw)
  To: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten, Konrad Dybcio

Rename lower_bit to hbb_lo and explain what it signifies.
Add explanations (wherever possible to other tunables).

Port setting min_access_length, ubwc_mode and hbb_hi from downstream.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 39 +++++++++++++++++++++++++++--------
 1 file changed, 30 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index a501654fd8bd..931f9f3b3a85 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -786,10 +786,25 @@ static void a6xx_set_cp_protect(struct msm_gpu *gpu)
 static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
-	u32 lower_bit = 2;
-	u32 amsbc = 0;
+	/* Unknown, introduced with A650 family, related to UBWC mode/ver 4 */
 	u32 rgb565_predicator = 0;
+	/* Unknown, introduced with A650 family */
 	u32 uavflagprd_inv = 0;
+	/* Whether the minimum access length is 64 bits */
+	u32 min_acc_len = 0;
+	/* Entirely magic, per-GPU-gen value */
+	u32 ubwc_mode = 0;
+	/*
+	 * The Highest Bank Bit value represents the bit of the highest DDR bank.
+	 * We then subtract 13 from it (13 is the minimum value allowed by hw) and
+	 * write the lowest two bits of the remaining value as hbb_lo and the
+	 * one above it as hbb_hi to the hardware. This should ideally use DRAM
+	 * type detection.
+	 */
+	u32 hbb_hi = 0;
+	u32 hbb_lo = 2;
+	/* Unknown, introduced with A640/680 */
+	u32 amsbc = 0;
 
 	/* a618 is using the hw default values */
 	if (adreno_is_a618(adreno_gpu))
@@ -800,25 +815,31 @@ static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
 
 	if (adreno_is_a650(adreno_gpu) || adreno_is_a660(adreno_gpu)) {
 		/* TODO: get ddr type from bootloader and use 2 for LPDDR4 */
-		lower_bit = 3;
+		hbb_lo = 3;
 		amsbc = 1;
 		rgb565_predicator = 1;
 		uavflagprd_inv = 2;
 	}
 
 	if (adreno_is_7c3(adreno_gpu)) {
-		lower_bit = 1;
+		hbb_lo = 1;
 		amsbc = 1;
 		rgb565_predicator = 1;
 		uavflagprd_inv = 2;
 	}
 
 	gpu_write(gpu, REG_A6XX_RB_NC_MODE_CNTL,
-		rgb565_predicator << 11 | amsbc << 4 | lower_bit << 1);
-	gpu_write(gpu, REG_A6XX_TPL1_NC_MODE_CNTL, lower_bit << 1);
-	gpu_write(gpu, REG_A6XX_SP_NC_MODE_CNTL,
-		uavflagprd_inv << 4 | lower_bit << 1);
-	gpu_write(gpu, REG_A6XX_UCHE_MODE_CNTL, lower_bit << 21);
+		  rgb565_predicator << 11 | hbb_hi << 10 | amsbc << 4 |
+		  min_acc_len << 3 | hbb_lo << 1 | ubwc_mode);
+
+	gpu_write(gpu, REG_A6XX_TPL1_NC_MODE_CNTL, hbb_hi << 4 |
+		  min_acc_len << 3 | hbb_lo << 1 | ubwc_mode);
+
+	gpu_write(gpu, REG_A6XX_SP_NC_MODE_CNTL, hbb_hi << 10 |
+		  uavflagprd_inv << 4 | min_acc_len << 3 |
+		  hbb_lo << 1 | ubwc_mode);
+
+	gpu_write(gpu, REG_A6XX_UCHE_MODE_CNTL, min_acc_len << 23 | hbb_lo << 21);
 }
 
 static int a6xx_cp_init(struct msm_gpu *gpu)

-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support
  2023-04-01 11:54 [PATCH v6 00/15] GMU-less A6xx support (A610, A619_holi) Konrad Dybcio
                   ` (4 preceding siblings ...)
  2023-04-01 11:54 ` [PATCH v6 05/15] drm/msm/a6xx: Extend and explain UBWC config Konrad Dybcio
@ 2023-04-01 11:54 ` Konrad Dybcio
  2023-05-02  7:49   ` Akhil P Oommen
  2023-04-01 11:54 ` [PATCH v6 07/15] drm/msm/a6xx: Remove both GBIF and RBBM GBIF halt on hw init Konrad Dybcio
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 30+ messages in thread
From: Konrad Dybcio @ 2023-04-01 11:54 UTC (permalink / raw)
  To: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten, Konrad Dybcio

Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
but don't implement the associated GMUs. This is due to the fact that
the GMU directly pokes at RPMh. Sadly, this means we have to take care
of enabling & scaling power rails, clocks and bandwidth ourselves.

Reuse existing Adreno-common code and modify the deeply-GMU-infused
A6XX code to facilitate these GPUs. This involves if-ing out lots
of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
the actual name that Qualcomm uses in their downstream kernels).

This is essentially a register region which is convenient to model
as a device. We'll use it for managing the GDSCs. The register
layout matches the actual GMU_CX/GX regions on the "real GMU" devices
and lets us reuse quite a bit of gmu_read/write/rmw calls.

Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c       |  72 +++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 255 +++++++++++++++++++++++++---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h       |   1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c     |   8 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h     |   6 +
 6 files changed, 318 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 87babbb2a19f..b1acdb027205 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, struct platform_device *pdev,
 
 void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
 {
+	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
 	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
 	struct platform_device *pdev = to_platform_device(gmu->dev);
 
@@ -1494,10 +1495,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
 	gmu->mmio = NULL;
 	gmu->rscc = NULL;
 
-	a6xx_gmu_memory_free(gmu);
+	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
+		a6xx_gmu_memory_free(gmu);
 
-	free_irq(gmu->gmu_irq, gmu);
-	free_irq(gmu->hfi_irq, gmu);
+		free_irq(gmu->gmu_irq, gmu);
+		free_irq(gmu->hfi_irq, gmu);
+	}
 
 	/* Drop reference taken in of_find_device_by_node */
 	put_device(gmu->dev);
@@ -1516,6 +1519,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
 	return 0;
 }
 
+int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
+{
+	struct platform_device *pdev = of_find_device_by_node(node);
+	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
+	int ret;
+
+	if (!pdev)
+		return -ENODEV;
+
+	gmu->dev = &pdev->dev;
+
+	of_dma_configure(gmu->dev, node, true);
+
+	pm_runtime_enable(gmu->dev);
+
+	/* Mark legacy for manual SPTPRAC control */
+	gmu->legacy = true;
+
+	/* Map the GMU registers */
+	gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
+	if (IS_ERR(gmu->mmio)) {
+		ret = PTR_ERR(gmu->mmio);
+		goto err_mmio;
+	}
+
+	gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
+	if (IS_ERR(gmu->cxpd)) {
+		ret = PTR_ERR(gmu->cxpd);
+		goto err_mmio;
+	}
+
+	if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
+		ret = -ENODEV;
+		goto detach_cxpd;
+	}
+
+	init_completion(&gmu->pd_gate);
+	complete_all(&gmu->pd_gate);
+	gmu->pd_nb.notifier_call = cxpd_notifier_cb;
+
+	/* Get a link to the GX power domain to reset the GPU */
+	gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
+	if (IS_ERR(gmu->gxpd)) {
+		ret = PTR_ERR(gmu->gxpd);
+		goto err_mmio;
+	}
+
+	gmu->initialized = true;
+
+	return 0;
+
+detach_cxpd:
+	dev_pm_domain_detach(gmu->cxpd, false);
+
+err_mmio:
+	iounmap(gmu->mmio);
+
+	/* Drop reference taken in of_find_device_by_node */
+	put_device(gmu->dev);
+
+	return ret;
+}
+
 int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
 {
 	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 931f9f3b3a85..8e0345ffab81 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -20,9 +20,11 @@ static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
 
-	/* Check that the GMU is idle */
-	if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
-		return false;
+	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
+		/* Check that the GMU is idle */
+		if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
+			return false;
+	}
 
 	/* Check tha the CX master is idle */
 	if (gpu_read(gpu, REG_A6XX_RBBM_STATUS) &
@@ -612,13 +614,15 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
 		return;
 
 	/* Disable SP clock before programming HWCG registers */
-	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
+	if (!adreno_has_gmu_wrapper(adreno_gpu))
+		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
 
 	for (i = 0; (reg = &adreno_gpu->info->hwcg[i], reg->offset); i++)
 		gpu_write(gpu, reg->offset, state ? reg->value : 0);
 
 	/* Enable SP clock */
-	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
+	if (!adreno_has_gmu_wrapper(adreno_gpu))
+		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
 
 	gpu_write(gpu, REG_A6XX_RBBM_CLOCK_CNTL, state ? clock_cntl_on : 0);
 }
@@ -1018,10 +1022,13 @@ static int hw_init(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
 	int ret;
 
-	/* Make sure the GMU keeps the GPU on while we set it up */
-	a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
+	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
+		/* Make sure the GMU keeps the GPU on while we set it up */
+		a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
+	}
 
 	/* Clear GBIF halt in case GX domain was not collapsed */
 	if (a6xx_has_gbif(adreno_gpu))
@@ -1144,6 +1151,17 @@ static int hw_init(struct msm_gpu *gpu)
 			0x3f0243f0);
 	}
 
+	if (adreno_has_gmu_wrapper(adreno_gpu)) {
+		/* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
+
+		/* Set up the CX GMU counter 0 to count busy ticks */
+		gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
+
+		/* Enable power counter 0 */
+		gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
+		gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
+	}
+
 	/* Protect registers from the CP */
 	a6xx_set_cp_protect(gpu);
 
@@ -1233,6 +1251,8 @@ static int hw_init(struct msm_gpu *gpu)
 	}
 
 out:
+	if (adreno_has_gmu_wrapper(adreno_gpu))
+		return ret;
 	/*
 	 * Tell the GMU that we are done touching the GPU and it can start power
 	 * management
@@ -1267,6 +1287,9 @@ static void a6xx_dump(struct msm_gpu *gpu)
 	adreno_dump(gpu);
 }
 
+#define GBIF_GX_HALT_MASK	BIT(0)
+#define GBIF_CLIENT_HALT_MASK	BIT(0)
+#define GBIF_ARB_HALT_MASK	BIT(1)
 #define VBIF_RESET_ACK_TIMEOUT	100
 #define VBIF_RESET_ACK_MASK	0x00f0
 
@@ -1299,7 +1322,8 @@ static void a6xx_recover(struct msm_gpu *gpu)
 	 * Turn off keep alive that might have been enabled by the hang
 	 * interrupt
 	 */
-	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
+	if (!adreno_has_gmu_wrapper(adreno_gpu))
+		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
 
 	pm_runtime_dont_use_autosuspend(&gpu->pdev->dev);
 
@@ -1329,6 +1353,32 @@ static void a6xx_recover(struct msm_gpu *gpu)
 
 	dev_pm_genpd_remove_notifier(gmu->cxpd);
 
+	/* Software-reset the GPU */
+	if (adreno_has_gmu_wrapper(adreno_gpu)) {
+		/* Halt the GX side of GBIF */
+		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, GBIF_GX_HALT_MASK);
+		spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) &
+			   GBIF_GX_HALT_MASK);
+
+		/* Halt new client requests on GBIF */
+		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
+		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
+			   (GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
+
+		/* Halt all AXI requests on GBIF */
+		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
+		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
+			   (GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
+
+		/* Clear the halts */
+		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
+
+		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
+
+		/* This *really* needs to go through before we do anything else! */
+		mb();
+	}
+
 	pm_runtime_use_autosuspend(&gpu->pdev->dev);
 
 	if (active_submits)
@@ -1463,7 +1513,8 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
 	 * Force the GPU to stay on until after we finish
 	 * collecting information
 	 */
-	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
+	if (!adreno_has_gmu_wrapper(adreno_gpu))
+		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
 
 	DRM_DEV_ERROR(&gpu->pdev->dev,
 		"gpu fault ring %d fence %x status %8.8X rb %4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
@@ -1624,7 +1675,7 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
 		a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
 }
 
-static int a6xx_pm_resume(struct msm_gpu *gpu)
+static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
@@ -1644,10 +1695,61 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
 
 	a6xx_llc_activate(a6xx_gpu);
 
-	return 0;
+	return ret;
 }
 
-static int a6xx_pm_suspend(struct msm_gpu *gpu)
+static int a6xx_pm_resume(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
+	unsigned long freq = 0;
+	struct dev_pm_opp *opp;
+	int ret;
+
+	gpu->needs_hw_init = true;
+
+	trace_msm_gpu_resume(0);
+
+	mutex_lock(&a6xx_gpu->gmu.lock);
+
+	pm_runtime_resume_and_get(gmu->dev);
+	pm_runtime_resume_and_get(gmu->gxpd);
+
+	/* Set the core clock, having VDD scaling in mind */
+	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, gpu->fast_rate);
+	if (ret)
+		goto err_core_clk;
+
+	ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
+	if (ret)
+		goto err_bulk_clk;
+
+	ret = clk_prepare_enable(gpu->ebi1_clk);
+	if (ret)
+		goto err_mem_clk;
+
+	/* If anything goes south, tear the GPU down piece by piece.. */
+	if (ret) {
+err_mem_clk:
+		clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
+err_bulk_clk:
+		opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
+		dev_pm_opp_put(opp);
+		dev_pm_opp_set_rate(&gpu->pdev->dev, 0);
+err_core_clk:
+		pm_runtime_put(gmu->gxpd);
+		pm_runtime_put(gmu->dev);
+	}
+	mutex_unlock(&a6xx_gpu->gmu.lock);
+
+	if (!ret)
+		msm_devfreq_resume(gpu);
+
+	return ret;
+}
+
+static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
@@ -1674,11 +1776,62 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
 	return 0;
 }
 
+static int a6xx_pm_suspend(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
+	unsigned long freq = 0;
+	struct dev_pm_opp *opp;
+	int i, ret;
+
+	trace_msm_gpu_suspend(0);
+
+	opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
+	dev_pm_opp_put(opp);
+
+	msm_devfreq_suspend(gpu);
+
+	mutex_lock(&a6xx_gpu->gmu.lock);
+
+	clk_disable_unprepare(gpu->ebi1_clk);
+
+	clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
+
+	/* Set frequency to the minimum supported level (no 27MHz on A6xx!) */
+	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, freq);
+	if (ret)
+		goto err;
+
+	pm_runtime_put_sync(gmu->gxpd);
+	pm_runtime_put_sync(gmu->dev);
+
+	mutex_unlock(&a6xx_gpu->gmu.lock);
+
+	if (a6xx_gpu->shadow_bo)
+		for (i = 0; i < gpu->nr_rings; i++)
+			a6xx_gpu->shadow[i] = 0;
+
+	gpu->suspend_count++;
+
+	return 0;
+
+err:
+	mutex_unlock(&a6xx_gpu->gmu.lock);
+
+	return ret;
+}
+
 static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
 
+	if (adreno_has_gmu_wrapper(adreno_gpu)) {
+		*value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
+		return 0;
+	}
+
 	mutex_lock(&a6xx_gpu->gmu.lock);
 
 	/* Force the GPU power on so we can read this register */
@@ -1716,7 +1869,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
 		drm_gem_object_put(a6xx_gpu->shadow_bo);
 	}
 
-	a6xx_llc_slices_destroy(a6xx_gpu);
+	if (!adreno_has_gmu_wrapper(adreno_gpu))
+		a6xx_llc_slices_destroy(a6xx_gpu);
 
 	mutex_lock(&a6xx_gpu->gmu.lock);
 	a6xx_gmu_remove(a6xx_gpu);
@@ -1957,8 +2111,8 @@ static const struct adreno_gpu_funcs funcs = {
 		.set_param = adreno_set_param,
 		.hw_init = a6xx_hw_init,
 		.ucode_load = a6xx_ucode_load,
-		.pm_suspend = a6xx_pm_suspend,
-		.pm_resume = a6xx_pm_resume,
+		.pm_suspend = a6xx_gmu_pm_suspend,
+		.pm_resume = a6xx_gmu_pm_resume,
 		.recover = a6xx_recover,
 		.submit = a6xx_submit,
 		.active_ring = a6xx_active_ring,
@@ -1982,6 +2136,35 @@ static const struct adreno_gpu_funcs funcs = {
 	.get_timestamp = a6xx_get_timestamp,
 };
 
+static const struct adreno_gpu_funcs funcs_gmuwrapper = {
+	.base = {
+		.get_param = adreno_get_param,
+		.set_param = adreno_set_param,
+		.hw_init = a6xx_hw_init,
+		.ucode_load = a6xx_ucode_load,
+		.pm_suspend = a6xx_pm_suspend,
+		.pm_resume = a6xx_pm_resume,
+		.recover = a6xx_recover,
+		.submit = a6xx_submit,
+		.active_ring = a6xx_active_ring,
+		.irq = a6xx_irq,
+		.destroy = a6xx_destroy,
+#if defined(CONFIG_DRM_MSM_GPU_STATE)
+		.show = a6xx_show,
+#endif
+		.gpu_busy = a6xx_gpu_busy,
+#if defined(CONFIG_DRM_MSM_GPU_STATE)
+		.gpu_state_get = a6xx_gpu_state_get,
+		.gpu_state_put = a6xx_gpu_state_put,
+#endif
+		.create_address_space = a6xx_create_address_space,
+		.create_private_address_space = a6xx_create_private_address_space,
+		.get_rptr = a6xx_get_rptr,
+		.progress = a6xx_progress,
+	},
+	.get_timestamp = a6xx_get_timestamp,
+};
+
 struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 {
 	struct msm_drm_private *priv = dev->dev_private;
@@ -2003,18 +2186,36 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 
 	adreno_gpu->registers = NULL;
 
+	/* Check if there is a GMU phandle and set it up */
+	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
+	/* FIXME: How do we gracefully handle this? */
+	BUG_ON(!node);
+
+	adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
+
 	/*
 	 * We need to know the platform type before calling into adreno_gpu_init
 	 * so that the hw_apriv flag can be correctly set. Snoop into the info
 	 * and grab the revision number
 	 */
 	info = adreno_info(config->rev);
-
-	if (info && (info->revn == 650 || info->revn == 660 ||
-			adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
+	if (!info)
+		return ERR_PTR(-EINVAL);
+
+	/* Assign these early so that we can use the is_aXYZ helpers */
+	/* Numeric revision IDs (e.g. 630) */
+	adreno_gpu->revn = info->revn;
+	/* New-style ADRENO_REV()-only */
+	adreno_gpu->rev = info->rev;
+	/* Quirk data */
+	adreno_gpu->info = info;
+
+	if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
 		adreno_gpu->base.hw_apriv = true;
 
-	a6xx_llc_slices_init(pdev, a6xx_gpu);
+	/* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
+	if (!adreno_has_gmu_wrapper(adreno_gpu))
+		a6xx_llc_slices_init(pdev, a6xx_gpu);
 
 	ret = a6xx_set_supported_hw(&pdev->dev, config->rev);
 	if (ret) {
@@ -2022,7 +2223,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 		return ERR_PTR(ret);
 	}
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	if (adreno_has_gmu_wrapper(adreno_gpu))
+		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
+	else
+		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
 	if (ret) {
 		a6xx_destroy(&(a6xx_gpu->base.base));
 		return ERR_PTR(ret);
@@ -2035,13 +2239,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 	if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
 		priv->gpu_clamp_to_idle = true;
 
-	/* Check if there is a GMU phandle and set it up */
-	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
-
-	/* FIXME: How do we gracefully handle this? */
-	BUG_ON(!node);
-
-	ret = a6xx_gmu_init(a6xx_gpu, node);
+	if (adreno_has_gmu_wrapper(adreno_gpu))
+		ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
+	else
+		ret = a6xx_gmu_init(a6xx_gpu, node);
 	of_node_put(node);
 	if (ret) {
 		a6xx_destroy(&(a6xx_gpu->base.base));
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index eea2e60ce3b7..51a7656072fa 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
 void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
 
 int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
+int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
 void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
 
 void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
index 30ecdff363e7..4e5d650578c6 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
@@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
 	/* Get the generic state from the adreno core */
 	adreno_gpu_state_get(gpu, &a6xx_state->base);
 
-	a6xx_get_gmu_registers(gpu, a6xx_state);
+	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
+		a6xx_get_gmu_registers(gpu, a6xx_state);
 
-	a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
-	a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
-	a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
+		a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
+		a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
+		a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
 
-	a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
+		a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
+	}
 
 	/* If GX isn't on the rest of the data isn't going to be accessible */
-	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
+	if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
 		return &a6xx_state->base;
 
 	/* Get the banks of indexed registers */
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 6934cee07d42..5c5901d65950 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
 		if (!adreno_gpu->info->fw[i])
 			continue;
 
+		/* Skip loading GMU firwmare with GMU Wrapper */
+		if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
+			continue;
+
 		/* Skip if the firmware has already been loaded */
 		if (adreno_gpu->fw[i])
 			continue;
@@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	u32 speedbin;
 	int ret;
 
-	/* Only handle the core clock when GMU is not in use */
-	if (config->rev.core < 6) {
+	/* Only handle the core clock when GMU is not in use (or is absent). */
+	if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
 		/*
 		 * This can only be done before devm_pm_opp_of_add_table(), or
 		 * dev_pm_opp_set_config() will WARN_ON()
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index f62612a5c70f..ee5352bc5329 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -115,6 +115,7 @@ struct adreno_gpu {
 	 * code (a3xx_gpu.c) and stored in this common location.
 	 */
 	const unsigned int *reg_offsets;
+	bool gmu_is_wrapper;
 };
 #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
 
@@ -145,6 +146,11 @@ struct adreno_platform_config {
 
 bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
 
+static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
+{
+	return gpu->gmu_is_wrapper;
+}
+
 static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
 {
 	return (gpu->revn < 300);

-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 07/15] drm/msm/a6xx: Remove both GBIF and RBBM GBIF halt on hw init
  2023-04-01 11:54 [PATCH v6 00/15] GMU-less A6xx support (A610, A619_holi) Konrad Dybcio
                   ` (5 preceding siblings ...)
  2023-04-01 11:54 ` [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support Konrad Dybcio
@ 2023-04-01 11:54 ` Konrad Dybcio
  2023-04-01 11:54 ` [PATCH v6 08/15] drm/msm/adreno: Disable has_cached_coherent in GMU wrapper configurations Konrad Dybcio
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: Konrad Dybcio @ 2023-04-01 11:54 UTC (permalink / raw)
  To: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten, Konrad Dybcio

Currently we're only deasserting REG_A6XX_RBBM_GBIF_HALT, but we also
need REG_A6XX_GBIF_HALT to be set to 0. For GMU-equipped GPUs this is
done in a6xx_bus_clear_pending_transactions(), but for the GMU-less
ones we have to do it *somewhere*. Unhalting both side by side sounds
like a good plan and it won't cause any issues if it's unnecessary.

Also, add a memory barrier to ensure it's gone through.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 8e0345ffab81..17e314a745c3 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1031,8 +1031,12 @@ static int hw_init(struct msm_gpu *gpu)
 	}
 
 	/* Clear GBIF halt in case GX domain was not collapsed */
-	if (a6xx_has_gbif(adreno_gpu))
+	if (a6xx_has_gbif(adreno_gpu)) {
+		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
 		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
+		/* Let's make extra sure that the GPU can access the memory.. */
+		mb();
+	}
 
 	gpu_write(gpu, REG_A6XX_RBBM_SECVID_TSB_CNTL, 0);
 

-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 08/15] drm/msm/adreno: Disable has_cached_coherent in GMU wrapper configurations
  2023-04-01 11:54 [PATCH v6 00/15] GMU-less A6xx support (A610, A619_holi) Konrad Dybcio
                   ` (6 preceding siblings ...)
  2023-04-01 11:54 ` [PATCH v6 07/15] drm/msm/a6xx: Remove both GBIF and RBBM GBIF halt on hw init Konrad Dybcio
@ 2023-04-01 11:54 ` Konrad Dybcio
  2023-04-01 11:54 ` [PATCH v6 09/15] drm/msm/a6xx: Add support for A619_holi Konrad Dybcio
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: Konrad Dybcio @ 2023-04-01 11:54 UTC (permalink / raw)
  To: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten, Konrad Dybcio

A610 and A619_holi don't support the feature. Disable it to make the GPU stop
crashing after almost each and every submission - the received data on
the GPU end was simply incomplete in garbled, resulting in almost nothing
being executed properly. Extend the disablement to adreno_has_gmu_wrapper,
as none of the GMU wrapper Adrenos that don't support yet seem to feature it.

Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 drivers/gpu/drm/msm/adreno/adreno_device.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 4d1448714285..4705ce3eb95e 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -551,7 +551,6 @@ static int adreno_bind(struct device *dev, struct device *master, void *data)
 		config.rev.minor, config.rev.patchid);
 
 	priv->is_a2xx = config.rev.core == 2;
-	priv->has_cached_coherent = config.rev.core >= 6;
 
 	gpu = info->init(drm);
 	if (IS_ERR(gpu)) {
@@ -563,6 +562,10 @@ static int adreno_bind(struct device *dev, struct device *master, void *data)
 	if (ret)
 		return ret;
 
+	if (config.rev.core >= 6)
+		if (!adreno_has_gmu_wrapper(to_adreno_gpu(gpu)))
+			priv->has_cached_coherent = true;
+
 	return 0;
 }
 

-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 09/15] drm/msm/a6xx: Add support for A619_holi
  2023-04-01 11:54 [PATCH v6 00/15] GMU-less A6xx support (A610, A619_holi) Konrad Dybcio
                   ` (7 preceding siblings ...)
  2023-04-01 11:54 ` [PATCH v6 08/15] drm/msm/adreno: Disable has_cached_coherent in GMU wrapper configurations Konrad Dybcio
@ 2023-04-01 11:54 ` Konrad Dybcio
  2023-04-01 11:54 ` [PATCH v6 10/15] drm/msm/a6xx: Add A610 support Konrad Dybcio
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: Konrad Dybcio @ 2023-04-01 11:54 UTC (permalink / raw)
  To: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten, Konrad Dybcio

A619_holi is a GMU-less variant of the already-supported A619 GPU.
It's present on at least SM4350 (holi) and SM6375 (blair). No mesa
changes are required. Add the required kernel-side support for it.

Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 47 ++++++++++++++++++++++++++-------
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |  5 ++++
 2 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 17e314a745c3..2d68b7488b1b 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -614,14 +614,16 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
 		return;
 
 	/* Disable SP clock before programming HWCG registers */
-	if (!adreno_has_gmu_wrapper(adreno_gpu))
+	if (!adreno_has_gmu_wrapper(adreno_gpu) ||
+	     adreno_is_a619_holi(adreno_gpu))
 		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
 
 	for (i = 0; (reg = &adreno_gpu->info->hwcg[i], reg->offset); i++)
 		gpu_write(gpu, reg->offset, state ? reg->value : 0);
 
 	/* Enable SP clock */
-	if (!adreno_has_gmu_wrapper(adreno_gpu))
+	if (!adreno_has_gmu_wrapper(adreno_gpu) ||
+	     adreno_is_a619_holi(adreno_gpu))
 		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
 
 	gpu_write(gpu, REG_A6XX_RBBM_CLOCK_CNTL, state ? clock_cntl_on : 0);
@@ -814,6 +816,9 @@ static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
 	if (adreno_is_a618(adreno_gpu))
 		return;
 
+	if (adreno_is_a619_holi(adreno_gpu))
+		hbb_lo = 0;
+
 	if (adreno_is_a640_family(adreno_gpu))
 		amsbc = 1;
 
@@ -1031,7 +1036,12 @@ static int hw_init(struct msm_gpu *gpu)
 	}
 
 	/* Clear GBIF halt in case GX domain was not collapsed */
-	if (a6xx_has_gbif(adreno_gpu)) {
+	if (adreno_is_a619_holi(adreno_gpu)) {
+		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
+		gpu_write(gpu, 0x18, 0);
+		/* Let's make extra sure that the GPU can access the memory.. */
+		mb();
+	} else if (a6xx_has_gbif(adreno_gpu)) {
 		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
 		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
 		/* Let's make extra sure that the GPU can access the memory.. */
@@ -1040,6 +1050,9 @@ static int hw_init(struct msm_gpu *gpu)
 
 	gpu_write(gpu, REG_A6XX_RBBM_SECVID_TSB_CNTL, 0);
 
+	if (adreno_is_a619_holi(adreno_gpu))
+		a6xx_sptprac_enable(gmu);
+
 	/*
 	 * Disable the trusted memory range - we don't actually supported secure
 	 * memory rendering at this point in time and we don't want to block off
@@ -1295,7 +1308,8 @@ static void a6xx_dump(struct msm_gpu *gpu)
 #define GBIF_CLIENT_HALT_MASK	BIT(0)
 #define GBIF_ARB_HALT_MASK	BIT(1)
 #define VBIF_RESET_ACK_TIMEOUT	100
-#define VBIF_RESET_ACK_MASK	0x00f0
+#define VBIF_RESET_ACK_MASK	0xF0
+#define GPR0_GBIF_HALT_REQUEST	0x1E0
 
 static void a6xx_recover(struct msm_gpu *gpu)
 {
@@ -1359,10 +1373,16 @@ static void a6xx_recover(struct msm_gpu *gpu)
 
 	/* Software-reset the GPU */
 	if (adreno_has_gmu_wrapper(adreno_gpu)) {
-		/* Halt the GX side of GBIF */
-		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, GBIF_GX_HALT_MASK);
-		spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) &
-			   GBIF_GX_HALT_MASK);
+		if (adreno_is_a619_holi(adreno_gpu)) {
+			gpu_write(gpu, 0x18, GPR0_GBIF_HALT_REQUEST);
+			spin_until((gpu_read(gpu, REG_A6XX_RBBM_VBIF_GX_RESET_STATUS) &
+				   (VBIF_RESET_ACK_MASK)) == VBIF_RESET_ACK_MASK);
+		} else {
+			/* Halt the GX side of GBIF */
+			gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, GBIF_GX_HALT_MASK);
+			spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) &
+				   GBIF_GX_HALT_MASK);
+		}
 
 		/* Halt new client requests on GBIF */
 		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
@@ -1377,7 +1397,10 @@ static void a6xx_recover(struct msm_gpu *gpu)
 		/* Clear the halts */
 		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
 
-		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
+		if (adreno_is_a619_holi(adreno_gpu))
+			gpu_write(gpu, 0x18, 0);
+		else
+			gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
 
 		/* This *really* needs to go through before we do anything else! */
 		mb();
@@ -1733,6 +1756,9 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
 	if (ret)
 		goto err_mem_clk;
 
+	if (adreno_is_a619_holi(adreno_gpu))
+		a6xx_sptprac_enable(gmu);
+
 	/* If anything goes south, tear the GPU down piece by piece.. */
 	if (ret) {
 err_mem_clk:
@@ -1798,6 +1824,9 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
 
 	mutex_lock(&a6xx_gpu->gmu.lock);
 
+	if (adreno_is_a619_holi(adreno_gpu))
+		a6xx_sptprac_disable(gmu);
+
 	clk_disable_unprepare(gpu->ebi1_clk);
 
 	clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index ee5352bc5329..432fee5c1516 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -252,6 +252,11 @@ static inline int adreno_is_a619(struct adreno_gpu *gpu)
 	return gpu->revn == 619;
 }
 
+static inline int adreno_is_a619_holi(struct adreno_gpu *gpu)
+{
+	return adreno_is_a619(gpu) && adreno_has_gmu_wrapper(gpu);
+}
+
 static inline int adreno_is_a630(struct adreno_gpu *gpu)
 {
 	return gpu->revn == 630;

-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 10/15] drm/msm/a6xx: Add A610 support
  2023-04-01 11:54 [PATCH v6 00/15] GMU-less A6xx support (A610, A619_holi) Konrad Dybcio
                   ` (8 preceding siblings ...)
  2023-04-01 11:54 ` [PATCH v6 09/15] drm/msm/a6xx: Add support for A619_holi Konrad Dybcio
@ 2023-04-01 11:54 ` Konrad Dybcio
  2023-04-01 11:54 ` [PATCH v6 11/15] drm/msm/a6xx: Fix some A619 tunables Konrad Dybcio
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: Konrad Dybcio @ 2023-04-01 11:54 UTC (permalink / raw)
  To: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten, Konrad Dybcio

A610 is one of (if not the) lowest-tier SKUs in the A6XX family. It
features no GMU, as it's implemented solely on SoCs with SMD_RPM.
What's more interesting is that it does not feature a VDDGX line
either, being powered solely by VDDCX and has an unfortunate hardware
quirk that makes its reset line broken - after a couple of assert/
deassert cycles, it will hang for good and will not wake up again.

This GPU requires mesa changes for proper rendering, and lots of them
at that. The command streams are quite far away from any other A6XX
GPU and hence it needs special care. This patch was validated both
by running an (incomplete) downstream mesa with some hacks (frames
rendered correctly, though some instructions made the GPU hangcheck
which is expected - garbage in, garbage out) and by replaying RD
traces captured with the downstream KGSL driver - no crashes there,
ever.

Add support for this GPU on the kernel side, which comes down to
pretty simply adding A612 HWCG tables, altering a few values and
adding a special case for handling the reset line.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c      | 97 +++++++++++++++++++++++++++---
 drivers/gpu/drm/msm/adreno/adreno_device.c | 12 ++++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h    |  8 ++-
 3 files changed, 107 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 2d68b7488b1b..b2c604a66007 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -254,6 +254,56 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 	a6xx_flush(gpu, ring);
 }
 
+const struct adreno_reglist a612_hwcg[] = {
+	{REG_A6XX_RBBM_CLOCK_CNTL_SP0, 0x22222222},
+	{REG_A6XX_RBBM_CLOCK_CNTL2_SP0, 0x02222220},
+	{REG_A6XX_RBBM_CLOCK_DELAY_SP0, 0x00000081},
+	{REG_A6XX_RBBM_CLOCK_HYST_SP0, 0x0000f3cf},
+	{REG_A6XX_RBBM_CLOCK_CNTL_TP0, 0x22222222},
+	{REG_A6XX_RBBM_CLOCK_CNTL2_TP0, 0x22222222},
+	{REG_A6XX_RBBM_CLOCK_CNTL3_TP0, 0x22222222},
+	{REG_A6XX_RBBM_CLOCK_CNTL4_TP0, 0x00022222},
+	{REG_A6XX_RBBM_CLOCK_DELAY_TP0, 0x11111111},
+	{REG_A6XX_RBBM_CLOCK_DELAY2_TP0, 0x11111111},
+	{REG_A6XX_RBBM_CLOCK_DELAY3_TP0, 0x11111111},
+	{REG_A6XX_RBBM_CLOCK_DELAY4_TP0, 0x00011111},
+	{REG_A6XX_RBBM_CLOCK_HYST_TP0, 0x77777777},
+	{REG_A6XX_RBBM_CLOCK_HYST2_TP0, 0x77777777},
+	{REG_A6XX_RBBM_CLOCK_HYST3_TP0, 0x77777777},
+	{REG_A6XX_RBBM_CLOCK_HYST4_TP0, 0x00077777},
+	{REG_A6XX_RBBM_CLOCK_CNTL_RB0, 0x22222222},
+	{REG_A6XX_RBBM_CLOCK_CNTL2_RB0, 0x01202222},
+	{REG_A6XX_RBBM_CLOCK_CNTL_CCU0, 0x00002220},
+	{REG_A6XX_RBBM_CLOCK_HYST_RB_CCU0, 0x00040f00},
+	{REG_A6XX_RBBM_CLOCK_CNTL_RAC, 0x05522022},
+	{REG_A6XX_RBBM_CLOCK_CNTL2_RAC, 0x00005555},
+	{REG_A6XX_RBBM_CLOCK_DELAY_RAC, 0x00000011},
+	{REG_A6XX_RBBM_CLOCK_HYST_RAC, 0x00445044},
+	{REG_A6XX_RBBM_CLOCK_CNTL_TSE_RAS_RBBM, 0x04222222},
+	{REG_A6XX_RBBM_CLOCK_MODE_VFD, 0x00002222},
+	{REG_A6XX_RBBM_CLOCK_MODE_GPC, 0x02222222},
+	{REG_A6XX_RBBM_CLOCK_DELAY_HLSQ_2, 0x00000002},
+	{REG_A6XX_RBBM_CLOCK_MODE_HLSQ, 0x00002222},
+	{REG_A6XX_RBBM_CLOCK_DELAY_TSE_RAS_RBBM, 0x00004000},
+	{REG_A6XX_RBBM_CLOCK_DELAY_VFD, 0x00002222},
+	{REG_A6XX_RBBM_CLOCK_DELAY_GPC, 0x00000200},
+	{REG_A6XX_RBBM_CLOCK_DELAY_HLSQ, 0x00000000},
+	{REG_A6XX_RBBM_CLOCK_HYST_TSE_RAS_RBBM, 0x00000000},
+	{REG_A6XX_RBBM_CLOCK_HYST_VFD, 0x00000000},
+	{REG_A6XX_RBBM_CLOCK_HYST_GPC, 0x04104004},
+	{REG_A6XX_RBBM_CLOCK_HYST_HLSQ, 0x00000000},
+	{REG_A6XX_RBBM_CLOCK_CNTL_UCHE, 0x22222222},
+	{REG_A6XX_RBBM_CLOCK_HYST_UCHE, 0x00000004},
+	{REG_A6XX_RBBM_CLOCK_DELAY_UCHE, 0x00000002},
+	{REG_A6XX_RBBM_ISDB_CNT, 0x00000182},
+	{REG_A6XX_RBBM_RAC_THRESHOLD_CNT, 0x00000000},
+	{REG_A6XX_RBBM_SP_HYST_CNT, 0x00000000},
+	{REG_A6XX_RBBM_CLOCK_CNTL_GMU_GX, 0x00000222},
+	{REG_A6XX_RBBM_CLOCK_DELAY_GMU_GX, 0x00000111},
+	{REG_A6XX_RBBM_CLOCK_HYST_GMU_GX, 0x00000555},
+	{},
+};
+
 /* For a615 family (a615, a616, a618 and a619) */
 const struct adreno_reglist a615_hwcg[] = {
 	{REG_A6XX_RBBM_CLOCK_CNTL_SP0,  0x02222222},
@@ -604,6 +654,8 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
 
 	if (adreno_is_a630(adreno_gpu))
 		clock_cntl_on = 0x8aa8aa02;
+	else if (adreno_is_a610(adreno_gpu))
+		clock_cntl_on = 0xaaa8aa82;
 	else
 		clock_cntl_on = 0x8aa8aa82;
 
@@ -812,6 +864,13 @@ static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
 	/* Unknown, introduced with A640/680 */
 	u32 amsbc = 0;
 
+	if (adreno_is_a610(adreno_gpu)) {
+		/* HBB = 14 */
+		hbb_lo = 1;
+		min_acc_len = 1;
+		ubwc_mode = 1;
+	}
+
 	/* a618 is using the hw default values */
 	if (adreno_is_a618(adreno_gpu))
 		return;
@@ -1079,13 +1138,13 @@ static int hw_init(struct msm_gpu *gpu)
 	a6xx_set_hwcg(gpu, true);
 
 	/* VBIF/GBIF start*/
-	if (adreno_is_a640_family(adreno_gpu) ||
+	if (adreno_is_a610(adreno_gpu) ||
+	    adreno_is_a640_family(adreno_gpu) ||
 	    adreno_is_a650_family(adreno_gpu)) {
 		gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE0, 0x00071620);
 		gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE1, 0x00071620);
 		gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE2, 0x00071620);
 		gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE3, 0x00071620);
-		gpu_write(gpu, REG_A6XX_GBIF_QSB_SIDE3, 0x00071620);
 		gpu_write(gpu, REG_A6XX_RBBM_GBIF_CLIENT_QOS_CNTL, 0x3);
 	} else {
 		gpu_write(gpu, REG_A6XX_RBBM_VBIF_CLIENT_QOS_CNTL, 0x3);
@@ -1113,18 +1172,26 @@ static int hw_init(struct msm_gpu *gpu)
 	gpu_write(gpu, REG_A6XX_UCHE_FILTER_CNTL, 0x804);
 	gpu_write(gpu, REG_A6XX_UCHE_CACHE_WAYS, 0x4);
 
-	if (adreno_is_a640_family(adreno_gpu) ||
-	    adreno_is_a650_family(adreno_gpu))
+	if (adreno_is_a640_family(adreno_gpu) || adreno_is_a650_family(adreno_gpu)) {
 		gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_2, 0x02000140);
-	else
+		gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_1, 0x8040362c);
+	} else if (adreno_is_a610(adreno_gpu)) {
+		gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_2, 0x00800060);
+		gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_1, 0x40201b16);
+	} else {
 		gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_2, 0x010000c0);
-	gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_1, 0x8040362c);
+		gpu_write(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_1, 0x8040362c);
+	}
 
 	if (adreno_is_a660_family(adreno_gpu))
 		gpu_write(gpu, REG_A6XX_CP_LPAC_PROG_FIFO_SIZE, 0x00000020);
 
 	/* Setting the mem pool size */
-	gpu_write(gpu, REG_A6XX_CP_MEM_POOL_SIZE, 128);
+	if (adreno_is_a610(adreno_gpu)) {
+		gpu_write(gpu, REG_A6XX_CP_MEM_POOL_SIZE, 48);
+		gpu_write(gpu, REG_A6XX_CP_MEM_POOL_DBG_ADDR, 47);
+	} else
+		gpu_write(gpu, REG_A6XX_CP_MEM_POOL_SIZE, 128);
 
 	/* Setting the primFifo thresholds default values,
 	 * and vccCacheSkipDis=1 bit (0x200) for A640 and newer
@@ -1135,6 +1202,8 @@ static int hw_init(struct msm_gpu *gpu)
 		gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00200200);
 	else if (adreno_is_a650(adreno_gpu) || adreno_is_a660(adreno_gpu))
 		gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00300200);
+	else if (adreno_is_a610(adreno_gpu))
+		gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00080000);
 	else
 		gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00180000);
 
@@ -1150,8 +1219,10 @@ static int hw_init(struct msm_gpu *gpu)
 	a6xx_set_ubwc_config(gpu);
 
 	/* Enable fault detection */
-	gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL,
-		(1 << 30) | 0x1fffff);
+	if (adreno_is_a610(adreno_gpu))
+		gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) | 0x3ffff);
+	else
+		gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) | 0x1fffff);
 
 	gpu_write(gpu, REG_A6XX_UCHE_CLIENT_PF, 1);
 
@@ -1373,6 +1444,14 @@ static void a6xx_recover(struct msm_gpu *gpu)
 
 	/* Software-reset the GPU */
 	if (adreno_has_gmu_wrapper(adreno_gpu)) {
+		/* 11nm chips (i.e. A610-hosting ones) have HW issues with the reset line */
+		if (!adreno_is_a610(adreno_gpu)) {
+			gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, 1);
+			gpu_read(gpu, REG_A6XX_RBBM_SW_RESET_CMD);
+			udelay(100);
+			gpu_write(gpu, REG_A6XX_RBBM_SW_RESET_CMD, 0);
+		}
+
 		if (adreno_is_a619_holi(adreno_gpu)) {
 			gpu_write(gpu, 0x18, GPR0_GBIF_HALT_REQUEST);
 			spin_until((gpu_read(gpu, REG_A6XX_RBBM_VBIF_GX_RESET_STATUS) &
diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 4705ce3eb95e..bc536d658aa6 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -253,6 +253,18 @@ static const struct adreno_info gpulist[] = {
 		.quirks = ADRENO_QUIRK_LMLOADKILL_DISABLE,
 		.init = a5xx_gpu_init,
 		.zapfw = "a540_zap.mdt",
+	}, {
+		.rev = ADRENO_REV(6, 1, 0, ANY_ID),
+		.revn = 610,
+		.name = "A610",
+		.fw = {
+			[ADRENO_FW_SQE] = "a630_sqe.fw",
+		},
+		.gmem = (SZ_128K + SZ_4K),
+		.inactive_period = 500,
+		.init = a6xx_gpu_init,
+		.zapfw = "a610_zap.mdt",
+		.hwcg = a612_hwcg,
 	}, {
 		.rev = ADRENO_REV(6, 1, 8, ANY_ID),
 		.revn = 618,
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 432fee5c1516..7a5d595d4b99 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -55,7 +55,8 @@ struct adreno_reglist {
 	u32 value;
 };
 
-extern const struct adreno_reglist a615_hwcg[], a630_hwcg[], a640_hwcg[], a650_hwcg[], a660_hwcg[];
+extern const struct adreno_reglist a612_hwcg[], a615_hwcg[], a630_hwcg[], a640_hwcg[], a650_hwcg[];
+extern const struct adreno_reglist a660_hwcg[];
 
 struct adreno_info {
 	struct adreno_rev rev;
@@ -242,6 +243,11 @@ static inline int adreno_is_a540(struct adreno_gpu *gpu)
 	return gpu->revn == 540;
 }
 
+static inline int adreno_is_a610(struct adreno_gpu *gpu)
+{
+	return gpu->revn == 610;
+}
+
 static inline int adreno_is_a618(struct adreno_gpu *gpu)
 {
 	return gpu->revn == 618;

-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 11/15] drm/msm/a6xx: Fix some A619 tunables
  2023-04-01 11:54 [PATCH v6 00/15] GMU-less A6xx support (A610, A619_holi) Konrad Dybcio
                   ` (9 preceding siblings ...)
  2023-04-01 11:54 ` [PATCH v6 10/15] drm/msm/a6xx: Add A610 support Konrad Dybcio
@ 2023-04-01 11:54 ` Konrad Dybcio
  2023-04-01 11:54 ` [PATCH v6 12/15] drm/msm/a6xx: Use "else if" in GPU speedbin rev matching Konrad Dybcio
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: Konrad Dybcio @ 2023-04-01 11:54 UTC (permalink / raw)
  To: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten, Konrad Dybcio

Adreno 619 expects some tunables to be set differently. Make up for it.

Fixes: b7616b5c69e6 ("drm/msm/adreno: Add A619 support")
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index b2c604a66007..389a1f7251fe 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1202,6 +1202,8 @@ static int hw_init(struct msm_gpu *gpu)
 		gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00200200);
 	else if (adreno_is_a650(adreno_gpu) || adreno_is_a660(adreno_gpu))
 		gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00300200);
+	else if (adreno_is_a619(adreno_gpu))
+		gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00018000);
 	else if (adreno_is_a610(adreno_gpu))
 		gpu_write(gpu, REG_A6XX_PC_DBG_ECO_CNTL, 0x00080000);
 	else
@@ -1219,7 +1221,9 @@ static int hw_init(struct msm_gpu *gpu)
 	a6xx_set_ubwc_config(gpu);
 
 	/* Enable fault detection */
-	if (adreno_is_a610(adreno_gpu))
+	if (adreno_is_a619(adreno_gpu))
+		gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) | 0x3fffff);
+	else if (adreno_is_a610(adreno_gpu))
 		gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) | 0x3ffff);
 	else
 		gpu_write(gpu, REG_A6XX_RBBM_INTERFACE_HANG_INT_CNTL, (1 << 30) | 0x1fffff);

-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 12/15] drm/msm/a6xx: Use "else if" in GPU speedbin rev matching
  2023-04-01 11:54 [PATCH v6 00/15] GMU-less A6xx support (A610, A619_holi) Konrad Dybcio
                   ` (10 preceding siblings ...)
  2023-04-01 11:54 ` [PATCH v6 11/15] drm/msm/a6xx: Fix some A619 tunables Konrad Dybcio
@ 2023-04-01 11:54 ` Konrad Dybcio
  2023-04-01 11:54 ` [PATCH v6 13/15] drm/msm/a6xx: Use adreno_is_aXYZ macros in speedbin matching Konrad Dybcio
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: Konrad Dybcio @ 2023-04-01 11:54 UTC (permalink / raw)
  To: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten, Konrad Dybcio

The GPU can only be one at a time. Turn a series of ifs into if +
elseifs to save some CPU cycles.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 389a1f7251fe..a802a29f8173 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -2171,16 +2171,16 @@ static u32 fuse_to_supp_hw(struct device *dev, struct adreno_rev rev, u32 fuse)
 	if (adreno_cmp_rev(ADRENO_REV(6, 1, 8, ANY_ID), rev))
 		val = a618_get_speed_bin(fuse);
 
-	if (adreno_cmp_rev(ADRENO_REV(6, 1, 9, ANY_ID), rev))
+	else if (adreno_cmp_rev(ADRENO_REV(6, 1, 9, ANY_ID), rev))
 		val = a619_get_speed_bin(fuse);
 
-	if (adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), rev))
+	else if (adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), rev))
 		val = adreno_7c3_get_speed_bin(fuse);
 
-	if (adreno_cmp_rev(ADRENO_REV(6, 4, 0, ANY_ID), rev))
+	else if (adreno_cmp_rev(ADRENO_REV(6, 4, 0, ANY_ID), rev))
 		val = a640_get_speed_bin(fuse);
 
-	if (adreno_cmp_rev(ADRENO_REV(6, 5, 0, ANY_ID), rev))
+	else if (adreno_cmp_rev(ADRENO_REV(6, 5, 0, ANY_ID), rev))
 		val = a650_get_speed_bin(fuse);
 
 	if (val == UINT_MAX) {

-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 13/15] drm/msm/a6xx: Use adreno_is_aXYZ macros in speedbin matching
  2023-04-01 11:54 [PATCH v6 00/15] GMU-less A6xx support (A610, A619_holi) Konrad Dybcio
                   ` (11 preceding siblings ...)
  2023-04-01 11:54 ` [PATCH v6 12/15] drm/msm/a6xx: Use "else if" in GPU speedbin rev matching Konrad Dybcio
@ 2023-04-01 11:54 ` Konrad Dybcio
  2023-04-01 11:54 ` [PATCH v6 14/15] drm/msm/a6xx: Add A619_holi speedbin support Konrad Dybcio
  2023-04-01 11:54 ` [PATCH v6 15/15] drm/msm/a6xx: Add A610 " Konrad Dybcio
  14 siblings, 0 replies; 30+ messages in thread
From: Konrad Dybcio @ 2023-04-01 11:54 UTC (permalink / raw)
  To: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten, Konrad Dybcio

Before transitioning to using per-SoC and not per-Adreno speedbin
fuse values (need another patchset to land elsewhere), a good
improvement/stopgap solution is to use adreno_is_aXYZ macros in
place of explicit revision matching. Do so to allow differentiating
between A619 and A619_holi.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 18 +++++++++---------
 drivers/gpu/drm/msm/adreno/adreno_gpu.h | 14 ++++++++++++--
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index a802a29f8173..6c84ef82e504 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -2164,23 +2164,23 @@ static u32 adreno_7c3_get_speed_bin(u32 fuse)
 	return UINT_MAX;
 }
 
-static u32 fuse_to_supp_hw(struct device *dev, struct adreno_rev rev, u32 fuse)
+static u32 fuse_to_supp_hw(struct device *dev, struct adreno_gpu *adreno_gpu, u32 fuse)
 {
 	u32 val = UINT_MAX;
 
-	if (adreno_cmp_rev(ADRENO_REV(6, 1, 8, ANY_ID), rev))
+	if (adreno_is_a618(adreno_gpu))
 		val = a618_get_speed_bin(fuse);
 
-	else if (adreno_cmp_rev(ADRENO_REV(6, 1, 9, ANY_ID), rev))
+	else if (adreno_is_a619(adreno_gpu))
 		val = a619_get_speed_bin(fuse);
 
-	else if (adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), rev))
+	else if (adreno_is_7c3(adreno_gpu))
 		val = adreno_7c3_get_speed_bin(fuse);
 
-	else if (adreno_cmp_rev(ADRENO_REV(6, 4, 0, ANY_ID), rev))
+	else if (adreno_is_a640(adreno_gpu))
 		val = a640_get_speed_bin(fuse);
 
-	else if (adreno_cmp_rev(ADRENO_REV(6, 5, 0, ANY_ID), rev))
+	else if (adreno_is_a650(adreno_gpu))
 		val = a650_get_speed_bin(fuse);
 
 	if (val == UINT_MAX) {
@@ -2193,7 +2193,7 @@ static u32 fuse_to_supp_hw(struct device *dev, struct adreno_rev rev, u32 fuse)
 	return (1 << val);
 }
 
-static int a6xx_set_supported_hw(struct device *dev, struct adreno_rev rev)
+static int a6xx_set_supported_hw(struct device *dev, struct adreno_gpu *adreno_gpu)
 {
 	u32 supp_hw;
 	u32 speedbin;
@@ -2212,7 +2212,7 @@ static int a6xx_set_supported_hw(struct device *dev, struct adreno_rev rev)
 		return ret;
 	}
 
-	supp_hw = fuse_to_supp_hw(dev, rev, speedbin);
+	supp_hw = fuse_to_supp_hw(dev, adreno_gpu, speedbin);
 
 	ret = devm_pm_opp_set_supported_hw(dev, &supp_hw, 1);
 	if (ret)
@@ -2333,7 +2333,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
 	if (!adreno_has_gmu_wrapper(adreno_gpu))
 		a6xx_llc_slices_init(pdev, a6xx_gpu);
 
-	ret = a6xx_set_supported_hw(&pdev->dev, config->rev);
+	ret = a6xx_set_supported_hw(&pdev->dev, adreno_gpu);
 	if (ret) {
 		a6xx_destroy(&(a6xx_gpu->base.base));
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 7a5d595d4b99..21513cec038f 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -268,9 +268,9 @@ static inline int adreno_is_a630(struct adreno_gpu *gpu)
 	return gpu->revn == 630;
 }
 
-static inline int adreno_is_a640_family(struct adreno_gpu *gpu)
+static inline int adreno_is_a640(struct adreno_gpu *gpu)
 {
-	return (gpu->revn == 640) || (gpu->revn == 680);
+	return gpu->revn == 640;
 }
 
 static inline int adreno_is_a650(struct adreno_gpu *gpu)
@@ -289,6 +289,11 @@ static inline int adreno_is_a660(struct adreno_gpu *gpu)
 	return gpu->revn == 660;
 }
 
+static inline int adreno_is_a680(struct adreno_gpu *gpu)
+{
+	return gpu->revn == 680;
+}
+
 /* check for a615, a616, a618, a619 or any derivatives */
 static inline int adreno_is_a615_family(struct adreno_gpu *gpu)
 {
@@ -306,6 +311,11 @@ static inline int adreno_is_a650_family(struct adreno_gpu *gpu)
 	return gpu->revn == 650 || gpu->revn == 620 || adreno_is_a660_family(gpu);
 }
 
+static inline int adreno_is_a640_family(struct adreno_gpu *gpu)
+{
+	return adreno_is_a640(gpu) || adreno_is_a680(gpu);
+}
+
 u64 adreno_private_address_space_size(struct msm_gpu *gpu);
 int adreno_get_param(struct msm_gpu *gpu, struct msm_file_private *ctx,
 		     uint32_t param, uint64_t *value, uint32_t *len);

-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 14/15] drm/msm/a6xx: Add A619_holi speedbin support
  2023-04-01 11:54 [PATCH v6 00/15] GMU-less A6xx support (A610, A619_holi) Konrad Dybcio
                   ` (12 preceding siblings ...)
  2023-04-01 11:54 ` [PATCH v6 13/15] drm/msm/a6xx: Use adreno_is_aXYZ macros in speedbin matching Konrad Dybcio
@ 2023-04-01 11:54 ` Konrad Dybcio
  2023-04-01 11:54 ` [PATCH v6 15/15] drm/msm/a6xx: Add A610 " Konrad Dybcio
  14 siblings, 0 replies; 30+ messages in thread
From: Konrad Dybcio @ 2023-04-01 11:54 UTC (permalink / raw)
  To: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten, Konrad Dybcio

A619_holi is implemented on at least two SoCs: SM4350 (holi) and SM6375
(blair). This is what seems to be a first occurrence of this happening,
but it's easy to overcome by guarding the SoC-specific fuse values with
of_machine_is_compatible(). Do just that to enable frequency limiting
on these SoCs.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 6c84ef82e504..f692f540c13c 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -2112,6 +2112,34 @@ static u32 a618_get_speed_bin(u32 fuse)
 	return UINT_MAX;
 }
 
+static u32 a619_holi_get_speed_bin(u32 fuse)
+{
+	/*
+	 * There are (at least) two SoCs implementing A619_holi: SM4350 (holi)
+	 * and SM6375 (blair). Limit the fuse matching to the corresponding
+	 * SoC to prevent bogus frequency setting (as improbable as it may be,
+	 * given unexpected fuse values are.. unexpected! But still possible.)
+	 */
+
+	if (fuse == 0)
+		return 0;
+
+	if (of_machine_is_compatible("qcom,sm4350")) {
+		if (fuse == 138)
+			return 1;
+		else if (fuse == 92)
+			return 2;
+	} else if (of_machine_is_compatible("qcom,sm6375")) {
+		if (fuse == 190)
+			return 1;
+		else if (fuse == 177)
+			return 2;
+	} else
+		pr_warn("Unknown SoC implementing A619_holi!\n");
+
+	return UINT_MAX;
+}
+
 static u32 a619_get_speed_bin(u32 fuse)
 {
 	if (fuse == 0)
@@ -2171,6 +2199,9 @@ static u32 fuse_to_supp_hw(struct device *dev, struct adreno_gpu *adreno_gpu, u3
 	if (adreno_is_a618(adreno_gpu))
 		val = a618_get_speed_bin(fuse);
 
+	else if (adreno_is_a619_holi(adreno_gpu))
+		val = a619_holi_get_speed_bin(fuse);
+
 	else if (adreno_is_a619(adreno_gpu))
 		val = a619_get_speed_bin(fuse);
 

-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v6 15/15] drm/msm/a6xx: Add A610 speedbin support
  2023-04-01 11:54 [PATCH v6 00/15] GMU-less A6xx support (A610, A619_holi) Konrad Dybcio
                   ` (13 preceding siblings ...)
  2023-04-01 11:54 ` [PATCH v6 14/15] drm/msm/a6xx: Add A619_holi speedbin support Konrad Dybcio
@ 2023-04-01 11:54 ` Konrad Dybcio
  14 siblings, 0 replies; 30+ messages in thread
From: Konrad Dybcio @ 2023-04-01 11:54 UTC (permalink / raw)
  To: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten, Konrad Dybcio

A610 is implemented on at least three SoCs: SM6115 (bengal), SM6125
(trinket) and SM6225 (khaje). Trinket does not support speed binning
(only a single SKU exists) and we don't yet support khaje upstream.
Hence, add a fuse mapping table for bengal to allow for per-chip
frequency limiting.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index f692f540c13c..e3be878afbb0 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -2100,6 +2100,30 @@ static bool a6xx_progress(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 	return progress;
 }
 
+static u32 a610_get_speed_bin(u32 fuse)
+{
+	/*
+	 * There are (at least) three SoCs implementing A610: SM6125 (trinket),
+	 * SM6115 (bengal) and SM6225 (khaje). Trinket does not have speedbinning,
+	 * as only a single SKU exists and we don't support khaje upstream yet.
+	 * Hence, this matching table is only valid for bengal and can be easily
+	 * expanded if need be.
+	 */
+
+	if (fuse == 0)
+		return 0;
+	else if (fuse == 206)
+		return 1;
+	else if (fuse == 200)
+		return 2;
+	else if (fuse == 157)
+		return 3;
+	else if (fuse == 127)
+		return 4;
+
+	return UINT_MAX;
+}
+
 static u32 a618_get_speed_bin(u32 fuse)
 {
 	if (fuse == 0)
@@ -2196,6 +2220,9 @@ static u32 fuse_to_supp_hw(struct device *dev, struct adreno_gpu *adreno_gpu, u3
 {
 	u32 val = UINT_MAX;
 
+	if (adreno_is_a610(adreno_gpu))
+		val = a610_get_speed_bin(fuse);
+
 	if (adreno_is_a618(adreno_gpu))
 		val = a618_get_speed_bin(fuse);
 

-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 01/15] drm/msm/adreno: adreno_gpu: Don't set OPP scaling clock w/ GMU
  2023-04-01 11:54 ` [PATCH v6 01/15] drm/msm/adreno: adreno_gpu: Don't set OPP scaling clock w/ GMU Konrad Dybcio
@ 2023-04-02 15:43   ` Dmitry Baryshkov
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Baryshkov @ 2023-04-02 15:43 UTC (permalink / raw)
  To: Konrad Dybcio, Rob Clark, Abhinav Kumar, Sean Paul, David Airlie,
	Daniel Vetter, Rob Herring, Krzysztof Kozlowski, Bjorn Andersson,
	Konrad Dybcio, Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten

On 01/04/2023 14:54, Konrad Dybcio wrote:
> Recently I contributed the switch to OPP API for all Adreno generations.
> I did however also skip over the fact that GPUs with a GMU don't specify
> a core clock of any kind in the GPU node. While that didn't break
> anything, it did introduce unwanted spam in the dmesg:
> 
> adreno 5000000.gpu: error -ENOENT: _opp_set_clknames: Couldn't find clock with name: core_clk
> 
> Guard the entire logic so that it's not used with GMU-equipped GPUs.
> 
> Fixes: 9f251f934012 ("drm/msm/adreno: Use OPP for every GPU generation")
> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> ---
>   drivers/gpu/drm/msm/adreno/adreno_gpu.c | 24 ++++++++++++++----------
>   1 file changed, 14 insertions(+), 10 deletions(-)

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

-- 
With best wishes
Dmitry


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 02/15] dt-bindings: display/msm: gpu: Document GMU wrapper-equipped A6xx
  2023-04-01 11:54 ` [PATCH v6 02/15] dt-bindings: display/msm: gpu: Document GMU wrapper-equipped A6xx Konrad Dybcio
@ 2023-04-05  6:36   ` Krzysztof Kozlowski
  0 siblings, 0 replies; 30+ messages in thread
From: Krzysztof Kozlowski @ 2023-04-05  6:36 UTC (permalink / raw)
  To: Konrad Dybcio, Rob Clark, Abhinav Kumar, Dmitry Baryshkov,
	Sean Paul, David Airlie, Daniel Vetter, Rob Herring,
	Krzysztof Kozlowski, Bjorn Andersson, Konrad Dybcio,
	Akhil P Oommen
  Cc: linux-arm-msm, dri-devel, freedreno, devicetree, linux-kernel,
	Rob Clark, Marijn Suijten

On 01/04/2023 13:54, Konrad Dybcio wrote:
> The "GMU Wrapper" is Qualcomm's name for "let's treat the GPU blocks
> we'd normally assign to the GMU as if they were a part of the GMU, even
> though they are not". It's a (good) software representation of the GMU_CX
> and GMU_GX register spaces within the GPUSS that helps us programatically
> treat these de-facto GMU-less parts in a way that's very similar to their
> GMU-equipped cousins, massively saving up on code duplication.
> 
> The "wrapper" register space was specifically designed to mimic the layout
> of a real GMU, though it rather obviously does not have the M3 core et al.
> 
> GMU wrapper-equipped A6xx GPUs require clocks and clock-names to be
> specified under the GPU node, just like their older cousins. Account
> for that.
> 
> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>


Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>

Best regards,
Krzysztof


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support
  2023-04-01 11:54 ` [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support Konrad Dybcio
@ 2023-05-02  7:49   ` Akhil P Oommen
  2023-05-02  9:40     ` Konrad Dybcio
  0 siblings, 1 reply; 30+ messages in thread
From: Akhil P Oommen @ 2023-05-02  7:49 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, linux-arm-msm, dri-devel,
	freedreno, devicetree, linux-kernel, Rob Clark, Marijn Suijten

On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
> but don't implement the associated GMUs. This is due to the fact that
> the GMU directly pokes at RPMh. Sadly, this means we have to take care
> of enabling & scaling power rails, clocks and bandwidth ourselves.
> 
> Reuse existing Adreno-common code and modify the deeply-GMU-infused
> A6XX code to facilitate these GPUs. This involves if-ing out lots
> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
> the actual name that Qualcomm uses in their downstream kernels).
> 
> This is essentially a register region which is convenient to model
> as a device. We'll use it for managing the GDSCs. The register
> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
> and lets us reuse quite a bit of gmu_read/write/rmw calls.
<< I sent a reply to this patch earlier, but not sure where it went.
Still figuring out Mutt... >>

Only convenience I found is that we can reuse gmu register ops in a few
places (< 10 I think). If we just model this as another gpu memory
region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
architecture code with clean separation. Also, it looks like we need to
keep a dummy gmu platform device in the devicetree with the current
approach. That doesn't sound right.
> 
> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c       |  72 +++++++-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 255 +++++++++++++++++++++++++---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h       |   1 +
>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c     |   8 +-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h     |   6 +
>  6 files changed, 318 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index 87babbb2a19f..b1acdb027205 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, struct platform_device *pdev,
>  
>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>  {
> +	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>  	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>  	struct platform_device *pdev = to_platform_device(gmu->dev);
>  
> @@ -1494,10 +1495,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>  	gmu->mmio = NULL;
>  	gmu->rscc = NULL;
>  
> -	a6xx_gmu_memory_free(gmu);
> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> +		a6xx_gmu_memory_free(gmu);
>  
> -	free_irq(gmu->gmu_irq, gmu);
> -	free_irq(gmu->hfi_irq, gmu);
> +		free_irq(gmu->gmu_irq, gmu);
> +		free_irq(gmu->hfi_irq, gmu);
> +	}
>  
>  	/* Drop reference taken in of_find_device_by_node */
>  	put_device(gmu->dev);
> @@ -1516,6 +1519,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
>  	return 0;
>  }
>  
> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> +{
> +	struct platform_device *pdev = of_find_device_by_node(node);
> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> +	int ret;
> +
> +	if (!pdev)
> +		return -ENODEV;
> +
> +	gmu->dev = &pdev->dev;
> +
> +	of_dma_configure(gmu->dev, node, true);
why setup dma for a device that is not actually present?
> +
> +	pm_runtime_enable(gmu->dev);
> +
> +	/* Mark legacy for manual SPTPRAC control */
> +	gmu->legacy = true;
> +
> +	/* Map the GMU registers */
> +	gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
> +	if (IS_ERR(gmu->mmio)) {
> +		ret = PTR_ERR(gmu->mmio);
> +		goto err_mmio;
> +	}
> +
> +	gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
> +	if (IS_ERR(gmu->cxpd)) {
> +		ret = PTR_ERR(gmu->cxpd);
> +		goto err_mmio;
> +	}
> +
> +	if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
> +		ret = -ENODEV;
> +		goto detach_cxpd;
> +	}
> +
> +	init_completion(&gmu->pd_gate);
> +	complete_all(&gmu->pd_gate);
> +	gmu->pd_nb.notifier_call = cxpd_notifier_cb;
> +
> +	/* Get a link to the GX power domain to reset the GPU */
> +	gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
> +	if (IS_ERR(gmu->gxpd)) {
> +		ret = PTR_ERR(gmu->gxpd);
> +		goto err_mmio;
> +	}
> +
> +	gmu->initialized = true;
> +
> +	return 0;
> +
> +detach_cxpd:
> +	dev_pm_domain_detach(gmu->cxpd, false);
> +
> +err_mmio:
> +	iounmap(gmu->mmio);
> +
> +	/* Drop reference taken in of_find_device_by_node */
> +	put_device(gmu->dev);
> +
> +	return ret;
> +}
> +
>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
>  {
>  	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 931f9f3b3a85..8e0345ffab81 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -20,9 +20,11 @@ static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>  
> -	/* Check that the GMU is idle */
> -	if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
> -		return false;
> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> +		/* Check that the GMU is idle */
> +		if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
> +			return false;
> +	}
>  
>  	/* Check tha the CX master is idle */
>  	if (gpu_read(gpu, REG_A6XX_RBBM_STATUS) &
> @@ -612,13 +614,15 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
>  		return;
>  
>  	/* Disable SP clock before programming HWCG registers */
> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
>  
>  	for (i = 0; (reg = &adreno_gpu->info->hwcg[i], reg->offset); i++)
>  		gpu_write(gpu, reg->offset, state ? reg->value : 0);
>  
>  	/* Enable SP clock */
> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
>  
>  	gpu_write(gpu, REG_A6XX_RBBM_CLOCK_CNTL, state ? clock_cntl_on : 0);
>  }
> @@ -1018,10 +1022,13 @@ static int hw_init(struct msm_gpu *gpu)
>  {
>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>  	int ret;
>  
> -	/* Make sure the GMU keeps the GPU on while we set it up */
> -	a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> +		/* Make sure the GMU keeps the GPU on while we set it up */
> +		a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> +	}
>  
>  	/* Clear GBIF halt in case GX domain was not collapsed */
>  	if (a6xx_has_gbif(adreno_gpu))
> @@ -1144,6 +1151,17 @@ static int hw_init(struct msm_gpu *gpu)
>  			0x3f0243f0);
>  	}
>  
> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> +		/* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
> +
> +		/* Set up the CX GMU counter 0 to count busy ticks */
> +		gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
> +
> +		/* Enable power counter 0 */
> +		gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
> +		gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
> +	}
> +
>  	/* Protect registers from the CP */
>  	a6xx_set_cp_protect(gpu);
>  
> @@ -1233,6 +1251,8 @@ static int hw_init(struct msm_gpu *gpu)
>  	}
>  
>  out:
> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> +		return ret;
>  	/*
>  	 * Tell the GMU that we are done touching the GPU and it can start power
>  	 * management
> @@ -1267,6 +1287,9 @@ static void a6xx_dump(struct msm_gpu *gpu)
>  	adreno_dump(gpu);
>  }
>  
> +#define GBIF_GX_HALT_MASK	BIT(0)
> +#define GBIF_CLIENT_HALT_MASK	BIT(0)
> +#define GBIF_ARB_HALT_MASK	BIT(1)
>  #define VBIF_RESET_ACK_TIMEOUT	100
>  #define VBIF_RESET_ACK_MASK	0x00f0
>  
> @@ -1299,7 +1322,8 @@ static void a6xx_recover(struct msm_gpu *gpu)
>  	 * Turn off keep alive that might have been enabled by the hang
>  	 * interrupt
>  	 */
> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);

Maybe it is better to move this to a6xx_gmu_force_power_off.

>  
>  	pm_runtime_dont_use_autosuspend(&gpu->pdev->dev);
>  
> @@ -1329,6 +1353,32 @@ static void a6xx_recover(struct msm_gpu *gpu)
>  
>  	dev_pm_genpd_remove_notifier(gmu->cxpd);
>  
> +	/* Software-reset the GPU */

This is not soft reset sequence. We are trying to quiescent gpu - ddr
traffic with this sequence.

> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> +		/* Halt the GX side of GBIF */
> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, GBIF_GX_HALT_MASK);
> +		spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) &
> +			   GBIF_GX_HALT_MASK);
> +
> +		/* Halt new client requests on GBIF */
> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> +			   (GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
> +
> +		/* Halt all AXI requests on GBIF */
> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> +			   (GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
> +
> +		/* Clear the halts */
> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> +
> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
> +
> +		/* This *really* needs to go through before we do anything else! */
> +		mb();
> +	}
> +

This sequence should be before we collapse cx gdsc. Also, please see if
we can create a subroutine to avoid code dup.

>  	pm_runtime_use_autosuspend(&gpu->pdev->dev);
>  
>  	if (active_submits)
> @@ -1463,7 +1513,8 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
>  	 * Force the GPU to stay on until after we finish
>  	 * collecting information
>  	 */
> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
>  
>  	DRM_DEV_ERROR(&gpu->pdev->dev,
>  		"gpu fault ring %d fence %x status %8.8X rb %4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
> @@ -1624,7 +1675,7 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
>  		a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
>  }
>  
> -static int a6xx_pm_resume(struct msm_gpu *gpu)
> +static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
>  {
>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> @@ -1644,10 +1695,61 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
>  
>  	a6xx_llc_activate(a6xx_gpu);
>  
> -	return 0;
> +	return ret;
>  }
>  
> -static int a6xx_pm_suspend(struct msm_gpu *gpu)
> +static int a6xx_pm_resume(struct msm_gpu *gpu)
> +{
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> +	unsigned long freq = 0;
> +	struct dev_pm_opp *opp;
> +	int ret;
> +
> +	gpu->needs_hw_init = true;
> +
> +	trace_msm_gpu_resume(0);
> +
> +	mutex_lock(&a6xx_gpu->gmu.lock);
I think we can ignore gmu lock as there is no real gmu device.

> +
> +	pm_runtime_resume_and_get(gmu->dev);
> +	pm_runtime_resume_and_get(gmu->gxpd);
> +
> +	/* Set the core clock, having VDD scaling in mind */
> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, gpu->fast_rate);
> +	if (ret)
> +		goto err_core_clk;
> +
> +	ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
> +	if (ret)
> +		goto err_bulk_clk;
> +
> +	ret = clk_prepare_enable(gpu->ebi1_clk);
> +	if (ret)
> +		goto err_mem_clk;
> +
> +	/* If anything goes south, tear the GPU down piece by piece.. */
> +	if (ret) {
> +err_mem_clk:
> +		clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
> +err_bulk_clk:
> +		opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
> +		dev_pm_opp_put(opp);
> +		dev_pm_opp_set_rate(&gpu->pdev->dev, 0);
> +err_core_clk:
> +		pm_runtime_put(gmu->gxpd);
> +		pm_runtime_put(gmu->dev);
> +	}
> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> +
> +	if (!ret)
> +		msm_devfreq_resume(gpu);
> +
> +	return ret;
> +}
> +
> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
>  {
>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> @@ -1674,11 +1776,62 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>  	return 0;
>  }
>  
> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
> +{
> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> +	unsigned long freq = 0;
> +	struct dev_pm_opp *opp;
> +	int i, ret;
> +
> +	trace_msm_gpu_suspend(0);
> +
> +	opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
> +	dev_pm_opp_put(opp);
> +
> +	msm_devfreq_suspend(gpu);
> +
> +	mutex_lock(&a6xx_gpu->gmu.lock);
> +
> +	clk_disable_unprepare(gpu->ebi1_clk);
> +
> +	clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
> +
> +	/* Set frequency to the minimum supported level (no 27MHz on A6xx!) */
> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, freq);
> +	if (ret)
> +		goto err;
> +
> +	pm_runtime_put_sync(gmu->gxpd);
> +	pm_runtime_put_sync(gmu->dev);
> +
> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> +
> +	if (a6xx_gpu->shadow_bo)
> +		for (i = 0; i < gpu->nr_rings; i++)
> +			a6xx_gpu->shadow[i] = 0;
> +
> +	gpu->suspend_count++;
> +
> +	return 0;
> +
> +err:
> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> +
> +	return ret;
> +}
> +
>  static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>  {
>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>  
> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> +		*value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
> +		return 0;
> +	}
> +
Instead of wrapper check here, we can just create a separate op. I don't
see any benefit in reusing the same function here.


>  	mutex_lock(&a6xx_gpu->gmu.lock);
>  
>  	/* Force the GPU power on so we can read this register */
> @@ -1716,7 +1869,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
>  		drm_gem_object_put(a6xx_gpu->shadow_bo);
>  	}
>  
> -	a6xx_llc_slices_destroy(a6xx_gpu);
> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> +		a6xx_llc_slices_destroy(a6xx_gpu);
>  
>  	mutex_lock(&a6xx_gpu->gmu.lock);
>  	a6xx_gmu_remove(a6xx_gpu);
> @@ -1957,8 +2111,8 @@ static const struct adreno_gpu_funcs funcs = {
>  		.set_param = adreno_set_param,
>  		.hw_init = a6xx_hw_init,
>  		.ucode_load = a6xx_ucode_load,
> -		.pm_suspend = a6xx_pm_suspend,
> -		.pm_resume = a6xx_pm_resume,
> +		.pm_suspend = a6xx_gmu_pm_suspend,
> +		.pm_resume = a6xx_gmu_pm_resume,
>  		.recover = a6xx_recover,
>  		.submit = a6xx_submit,
>  		.active_ring = a6xx_active_ring,
> @@ -1982,6 +2136,35 @@ static const struct adreno_gpu_funcs funcs = {
>  	.get_timestamp = a6xx_get_timestamp,
>  };
>  
> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
> +	.base = {
> +		.get_param = adreno_get_param,
> +		.set_param = adreno_set_param,
> +		.hw_init = a6xx_hw_init,
> +		.ucode_load = a6xx_ucode_load,
> +		.pm_suspend = a6xx_pm_suspend,
> +		.pm_resume = a6xx_pm_resume,
> +		.recover = a6xx_recover,
> +		.submit = a6xx_submit,
> +		.active_ring = a6xx_active_ring,
> +		.irq = a6xx_irq,
> +		.destroy = a6xx_destroy,
> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> +		.show = a6xx_show,
> +#endif
> +		.gpu_busy = a6xx_gpu_busy,
> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> +		.gpu_state_get = a6xx_gpu_state_get,
> +		.gpu_state_put = a6xx_gpu_state_put,
> +#endif
> +		.create_address_space = a6xx_create_address_space,
> +		.create_private_address_space = a6xx_create_private_address_space,
> +		.get_rptr = a6xx_get_rptr,
> +		.progress = a6xx_progress,
> +	},
> +	.get_timestamp = a6xx_get_timestamp,
> +};
> +
>  struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  {
>  	struct msm_drm_private *priv = dev->dev_private;
> @@ -2003,18 +2186,36 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  
>  	adreno_gpu->registers = NULL;
>  
> +	/* Check if there is a GMU phandle and set it up */
> +	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> +	/* FIXME: How do we gracefully handle this? */
> +	BUG_ON(!node);
How will you handle this BUG() when there is no GMU (a610 gpu)?

> +
> +	adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
> +
>  	/*
>  	 * We need to know the platform type before calling into adreno_gpu_init
>  	 * so that the hw_apriv flag can be correctly set. Snoop into the info
>  	 * and grab the revision number
>  	 */
>  	info = adreno_info(config->rev);
> -
> -	if (info && (info->revn == 650 || info->revn == 660 ||
> -			adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
> +	if (!info)
> +		return ERR_PTR(-EINVAL);
> +
> +	/* Assign these early so that we can use the is_aXYZ helpers */
> +	/* Numeric revision IDs (e.g. 630) */
> +	adreno_gpu->revn = info->revn;
> +	/* New-style ADRENO_REV()-only */
> +	adreno_gpu->rev = info->rev;
> +	/* Quirk data */
> +	adreno_gpu->info = info;
> +
> +	if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
>  		adreno_gpu->base.hw_apriv = true;
>  
> -	a6xx_llc_slices_init(pdev, a6xx_gpu);
> +	/* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> +		a6xx_llc_slices_init(pdev, a6xx_gpu);
>  
>  	ret = a6xx_set_supported_hw(&pdev->dev, config->rev);
>  	if (ret) {
> @@ -2022,7 +2223,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  		return ERR_PTR(ret);
>  	}
>  
> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
> +	else
> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
>  	if (ret) {
>  		a6xx_destroy(&(a6xx_gpu->base.base));
>  		return ERR_PTR(ret);
> @@ -2035,13 +2239,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>  	if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
>  		priv->gpu_clamp_to_idle = true;
>  
> -	/* Check if there is a GMU phandle and set it up */
> -	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> -
> -	/* FIXME: How do we gracefully handle this? */
> -	BUG_ON(!node);
> -
> -	ret = a6xx_gmu_init(a6xx_gpu, node);
> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> +		ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
> +	else
> +		ret = a6xx_gmu_init(a6xx_gpu, node);
>  	of_node_put(node);
>  	if (ret) {
>  		a6xx_destroy(&(a6xx_gpu->base.base));
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index eea2e60ce3b7..51a7656072fa 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>  void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>  
>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
>  
>  void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> index 30ecdff363e7..4e5d650578c6 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
>  	/* Get the generic state from the adreno core */
>  	adreno_gpu_state_get(gpu, &a6xx_state->base);
>  
> -	a6xx_get_gmu_registers(gpu, a6xx_state);
> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
nit: Kinda misleading function name to a layman. Should we invert the
function to "adreno_has_gmu"?

-Akhil
> +		a6xx_get_gmu_registers(gpu, a6xx_state);
>  
> -	a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> -	a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> -	a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> +		a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> +		a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> +		a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
>  
> -	a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> +		a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> +	}
>  
>  	/* If GX isn't on the rest of the data isn't going to be accessible */
> -	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> +	if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>  		return &a6xx_state->base;
>  
>  	/* Get the banks of indexed registers */
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index 6934cee07d42..5c5901d65950 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
>  		if (!adreno_gpu->info->fw[i])
>  			continue;
>  
> +		/* Skip loading GMU firwmare with GMU Wrapper */
> +		if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
> +			continue;
> +
>  		/* Skip if the firmware has already been loaded */
>  		if (adreno_gpu->fw[i])
>  			continue;
> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>  	u32 speedbin;
>  	int ret;
>  
> -	/* Only handle the core clock when GMU is not in use */
> -	if (config->rev.core < 6) {
> +	/* Only handle the core clock when GMU is not in use (or is absent). */
> +	if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
>  		/*
>  		 * This can only be done before devm_pm_opp_of_add_table(), or
>  		 * dev_pm_opp_set_config() will WARN_ON()
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index f62612a5c70f..ee5352bc5329 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -115,6 +115,7 @@ struct adreno_gpu {
>  	 * code (a3xx_gpu.c) and stored in this common location.
>  	 */
>  	const unsigned int *reg_offsets;
> +	bool gmu_is_wrapper;
>  };
>  #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
>  
> @@ -145,6 +146,11 @@ struct adreno_platform_config {
>  
>  bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
>  
> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
> +{
> +	return gpu->gmu_is_wrapper;
> +}
> +
>  static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
>  {
>  	return (gpu->revn < 300);
> 
> -- 
> 2.40.0
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support
  2023-05-02  7:49   ` Akhil P Oommen
@ 2023-05-02  9:40     ` Konrad Dybcio
  2023-05-03 20:32       ` Akhil P Oommen
  0 siblings, 1 reply; 30+ messages in thread
From: Konrad Dybcio @ 2023-05-02  9:40 UTC (permalink / raw)
  To: Akhil P Oommen
  Cc: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, linux-arm-msm, dri-devel,
	freedreno, devicetree, linux-kernel, Rob Clark, Marijn Suijten



On 2.05.2023 09:49, Akhil P Oommen wrote:
> On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
>> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
>> but don't implement the associated GMUs. This is due to the fact that
>> the GMU directly pokes at RPMh. Sadly, this means we have to take care
>> of enabling & scaling power rails, clocks and bandwidth ourselves.
>>
>> Reuse existing Adreno-common code and modify the deeply-GMU-infused
>> A6XX code to facilitate these GPUs. This involves if-ing out lots
>> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
>> the actual name that Qualcomm uses in their downstream kernels).
>>
>> This is essentially a register region which is convenient to model
>> as a device. We'll use it for managing the GDSCs. The register
>> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
>> and lets us reuse quite a bit of gmu_read/write/rmw calls.
> << I sent a reply to this patch earlier, but not sure where it went.
> Still figuring out Mutt... >>
Answered it here:

https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8f77@linaro.org/

I don't think I see any new comments in this "reply revision" (heh), so please
check that one out.

> 
> Only convenience I found is that we can reuse gmu register ops in a few
> places (< 10 I think). If we just model this as another gpu memory
> region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
> architecture code with clean separation. Also, it looks like we need to
> keep a dummy gmu platform device in the devicetree with the current
> approach. That doesn't sound right.
That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
need additional, gmuwrapper-configuration specific code anyway, as
OPP & genpd will no longer make use of the default behavior which
only gets triggered if there's a single power-domains=<> entry, afaicu.

If nothing else, this is a very convenient way to model a part of the
GPU (as that's essentially what GMU_CX is, to my understanding) and
the bindings people didn't shoot me in the head for proposing this, so
I assume it'd be cool to pursue this..

Konrad
>>
>> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
>> ---
>>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c       |  72 +++++++-
>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 255 +++++++++++++++++++++++++---
>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h       |   1 +
>>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
>>  drivers/gpu/drm/msm/adreno/adreno_gpu.c     |   8 +-
>>  drivers/gpu/drm/msm/adreno/adreno_gpu.h     |   6 +
>>  6 files changed, 318 insertions(+), 38 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> index 87babbb2a19f..b1acdb027205 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> @@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, struct platform_device *pdev,
>>  
>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>>  {
>> +	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>>  	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>  	struct platform_device *pdev = to_platform_device(gmu->dev);
>>  
>> @@ -1494,10 +1495,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>>  	gmu->mmio = NULL;
>>  	gmu->rscc = NULL;
>>  
>> -	a6xx_gmu_memory_free(gmu);
>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>> +		a6xx_gmu_memory_free(gmu);
>>  
>> -	free_irq(gmu->gmu_irq, gmu);
>> -	free_irq(gmu->hfi_irq, gmu);
>> +		free_irq(gmu->gmu_irq, gmu);
>> +		free_irq(gmu->hfi_irq, gmu);
>> +	}
>>  
>>  	/* Drop reference taken in of_find_device_by_node */
>>  	put_device(gmu->dev);
>> @@ -1516,6 +1519,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
>>  	return 0;
>>  }
>>  
>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
>> +{
>> +	struct platform_device *pdev = of_find_device_by_node(node);
>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>> +	int ret;
>> +
>> +	if (!pdev)
>> +		return -ENODEV;
>> +
>> +	gmu->dev = &pdev->dev;
>> +
>> +	of_dma_configure(gmu->dev, node, true);
> why setup dma for a device that is not actually present?
>> +
>> +	pm_runtime_enable(gmu->dev);
>> +
>> +	/* Mark legacy for manual SPTPRAC control */
>> +	gmu->legacy = true;
>> +
>> +	/* Map the GMU registers */
>> +	gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
>> +	if (IS_ERR(gmu->mmio)) {
>> +		ret = PTR_ERR(gmu->mmio);
>> +		goto err_mmio;
>> +	}
>> +
>> +	gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
>> +	if (IS_ERR(gmu->cxpd)) {
>> +		ret = PTR_ERR(gmu->cxpd);
>> +		goto err_mmio;
>> +	}
>> +
>> +	if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
>> +		ret = -ENODEV;
>> +		goto detach_cxpd;
>> +	}
>> +
>> +	init_completion(&gmu->pd_gate);
>> +	complete_all(&gmu->pd_gate);
>> +	gmu->pd_nb.notifier_call = cxpd_notifier_cb;
>> +
>> +	/* Get a link to the GX power domain to reset the GPU */
>> +	gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
>> +	if (IS_ERR(gmu->gxpd)) {
>> +		ret = PTR_ERR(gmu->gxpd);
>> +		goto err_mmio;
>> +	}
>> +
>> +	gmu->initialized = true;
>> +
>> +	return 0;
>> +
>> +detach_cxpd:
>> +	dev_pm_domain_detach(gmu->cxpd, false);
>> +
>> +err_mmio:
>> +	iounmap(gmu->mmio);
>> +
>> +	/* Drop reference taken in of_find_device_by_node */
>> +	put_device(gmu->dev);
>> +
>> +	return ret;
>> +}
>> +
>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
>>  {
>>  	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> index 931f9f3b3a85..8e0345ffab81 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> @@ -20,9 +20,11 @@ static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>  
>> -	/* Check that the GMU is idle */
>> -	if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
>> -		return false;
>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>> +		/* Check that the GMU is idle */
>> +		if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
>> +			return false;
>> +	}
>>  
>>  	/* Check tha the CX master is idle */
>>  	if (gpu_read(gpu, REG_A6XX_RBBM_STATUS) &
>> @@ -612,13 +614,15 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
>>  		return;
>>  
>>  	/* Disable SP clock before programming HWCG registers */
>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
>>  
>>  	for (i = 0; (reg = &adreno_gpu->info->hwcg[i], reg->offset); i++)
>>  		gpu_write(gpu, reg->offset, state ? reg->value : 0);
>>  
>>  	/* Enable SP clock */
>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
>>  
>>  	gpu_write(gpu, REG_A6XX_RBBM_CLOCK_CNTL, state ? clock_cntl_on : 0);
>>  }
>> @@ -1018,10 +1022,13 @@ static int hw_init(struct msm_gpu *gpu)
>>  {
>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>  	int ret;
>>  
>> -	/* Make sure the GMU keeps the GPU on while we set it up */
>> -	a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>> +		/* Make sure the GMU keeps the GPU on while we set it up */
>> +		a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
>> +	}
>>  
>>  	/* Clear GBIF halt in case GX domain was not collapsed */
>>  	if (a6xx_has_gbif(adreno_gpu))
>> @@ -1144,6 +1151,17 @@ static int hw_init(struct msm_gpu *gpu)
>>  			0x3f0243f0);
>>  	}
>>  
>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
>> +		/* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
>> +
>> +		/* Set up the CX GMU counter 0 to count busy ticks */
>> +		gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
>> +
>> +		/* Enable power counter 0 */
>> +		gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
>> +		gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
>> +	}
>> +
>>  	/* Protect registers from the CP */
>>  	a6xx_set_cp_protect(gpu);
>>  
>> @@ -1233,6 +1251,8 @@ static int hw_init(struct msm_gpu *gpu)
>>  	}
>>  
>>  out:
>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
>> +		return ret;
>>  	/*
>>  	 * Tell the GMU that we are done touching the GPU and it can start power
>>  	 * management
>> @@ -1267,6 +1287,9 @@ static void a6xx_dump(struct msm_gpu *gpu)
>>  	adreno_dump(gpu);
>>  }
>>  
>> +#define GBIF_GX_HALT_MASK	BIT(0)
>> +#define GBIF_CLIENT_HALT_MASK	BIT(0)
>> +#define GBIF_ARB_HALT_MASK	BIT(1)
>>  #define VBIF_RESET_ACK_TIMEOUT	100
>>  #define VBIF_RESET_ACK_MASK	0x00f0
>>  
>> @@ -1299,7 +1322,8 @@ static void a6xx_recover(struct msm_gpu *gpu)
>>  	 * Turn off keep alive that might have been enabled by the hang
>>  	 * interrupt
>>  	 */
>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
> 
> Maybe it is better to move this to a6xx_gmu_force_power_off.
> 
>>  
>>  	pm_runtime_dont_use_autosuspend(&gpu->pdev->dev);
>>  
>> @@ -1329,6 +1353,32 @@ static void a6xx_recover(struct msm_gpu *gpu)
>>  
>>  	dev_pm_genpd_remove_notifier(gmu->cxpd);
>>  
>> +	/* Software-reset the GPU */
> 
> This is not soft reset sequence. We are trying to quiescent gpu - ddr
> traffic with this sequence.
> 
>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
>> +		/* Halt the GX side of GBIF */
>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, GBIF_GX_HALT_MASK);
>> +		spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) &
>> +			   GBIF_GX_HALT_MASK);
>> +
>> +		/* Halt new client requests on GBIF */
>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
>> +			   (GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
>> +
>> +		/* Halt all AXI requests on GBIF */
>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
>> +			   (GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
>> +
>> +		/* Clear the halts */
>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
>> +
>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
>> +
>> +		/* This *really* needs to go through before we do anything else! */
>> +		mb();
>> +	}
>> +
> 
> This sequence should be before we collapse cx gdsc. Also, please see if
> we can create a subroutine to avoid code dup.
> 
>>  	pm_runtime_use_autosuspend(&gpu->pdev->dev);
>>  
>>  	if (active_submits)
>> @@ -1463,7 +1513,8 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
>>  	 * Force the GPU to stay on until after we finish
>>  	 * collecting information
>>  	 */
>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
>>  
>>  	DRM_DEV_ERROR(&gpu->pdev->dev,
>>  		"gpu fault ring %d fence %x status %8.8X rb %4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
>> @@ -1624,7 +1675,7 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
>>  		a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
>>  }
>>  
>> -static int a6xx_pm_resume(struct msm_gpu *gpu)
>> +static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
>>  {
>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>> @@ -1644,10 +1695,61 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
>>  
>>  	a6xx_llc_activate(a6xx_gpu);
>>  
>> -	return 0;
>> +	return ret;
>>  }
>>  
>> -static int a6xx_pm_suspend(struct msm_gpu *gpu)
>> +static int a6xx_pm_resume(struct msm_gpu *gpu)
>> +{
>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>> +	unsigned long freq = 0;
>> +	struct dev_pm_opp *opp;
>> +	int ret;
>> +
>> +	gpu->needs_hw_init = true;
>> +
>> +	trace_msm_gpu_resume(0);
>> +
>> +	mutex_lock(&a6xx_gpu->gmu.lock);
> I think we can ignore gmu lock as there is no real gmu device.
> 
>> +
>> +	pm_runtime_resume_and_get(gmu->dev);
>> +	pm_runtime_resume_and_get(gmu->gxpd);
>> +
>> +	/* Set the core clock, having VDD scaling in mind */
>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, gpu->fast_rate);
>> +	if (ret)
>> +		goto err_core_clk;
>> +
>> +	ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
>> +	if (ret)
>> +		goto err_bulk_clk;
>> +
>> +	ret = clk_prepare_enable(gpu->ebi1_clk);
>> +	if (ret)
>> +		goto err_mem_clk;
>> +
>> +	/* If anything goes south, tear the GPU down piece by piece.. */
>> +	if (ret) {
>> +err_mem_clk:
>> +		clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
>> +err_bulk_clk:
>> +		opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
>> +		dev_pm_opp_put(opp);
>> +		dev_pm_opp_set_rate(&gpu->pdev->dev, 0);
>> +err_core_clk:
>> +		pm_runtime_put(gmu->gxpd);
>> +		pm_runtime_put(gmu->dev);
>> +	}
>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
>> +
>> +	if (!ret)
>> +		msm_devfreq_resume(gpu);
>> +
>> +	return ret;
>> +}
>> +
>> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
>>  {
>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>> @@ -1674,11 +1776,62 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>  	return 0;
>>  }
>>  
>> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
>> +{
>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>> +	unsigned long freq = 0;
>> +	struct dev_pm_opp *opp;
>> +	int i, ret;
>> +
>> +	trace_msm_gpu_suspend(0);
>> +
>> +	opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
>> +	dev_pm_opp_put(opp);
>> +
>> +	msm_devfreq_suspend(gpu);
>> +
>> +	mutex_lock(&a6xx_gpu->gmu.lock);
>> +
>> +	clk_disable_unprepare(gpu->ebi1_clk);
>> +
>> +	clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
>> +
>> +	/* Set frequency to the minimum supported level (no 27MHz on A6xx!) */
>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, freq);
>> +	if (ret)
>> +		goto err;
>> +
>> +	pm_runtime_put_sync(gmu->gxpd);
>> +	pm_runtime_put_sync(gmu->dev);
>> +
>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
>> +
>> +	if (a6xx_gpu->shadow_bo)
>> +		for (i = 0; i < gpu->nr_rings; i++)
>> +			a6xx_gpu->shadow[i] = 0;
>> +
>> +	gpu->suspend_count++;
>> +
>> +	return 0;
>> +
>> +err:
>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
>> +
>> +	return ret;
>> +}
>> +
>>  static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>>  {
>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>  
>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
>> +		*value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
>> +		return 0;
>> +	}
>> +
> Instead of wrapper check here, we can just create a separate op. I don't
> see any benefit in reusing the same function here.
> 
> 
>>  	mutex_lock(&a6xx_gpu->gmu.lock);
>>  
>>  	/* Force the GPU power on so we can read this register */
>> @@ -1716,7 +1869,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
>>  		drm_gem_object_put(a6xx_gpu->shadow_bo);
>>  	}
>>  
>> -	a6xx_llc_slices_destroy(a6xx_gpu);
>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>> +		a6xx_llc_slices_destroy(a6xx_gpu);
>>  
>>  	mutex_lock(&a6xx_gpu->gmu.lock);
>>  	a6xx_gmu_remove(a6xx_gpu);
>> @@ -1957,8 +2111,8 @@ static const struct adreno_gpu_funcs funcs = {
>>  		.set_param = adreno_set_param,
>>  		.hw_init = a6xx_hw_init,
>>  		.ucode_load = a6xx_ucode_load,
>> -		.pm_suspend = a6xx_pm_suspend,
>> -		.pm_resume = a6xx_pm_resume,
>> +		.pm_suspend = a6xx_gmu_pm_suspend,
>> +		.pm_resume = a6xx_gmu_pm_resume,
>>  		.recover = a6xx_recover,
>>  		.submit = a6xx_submit,
>>  		.active_ring = a6xx_active_ring,
>> @@ -1982,6 +2136,35 @@ static const struct adreno_gpu_funcs funcs = {
>>  	.get_timestamp = a6xx_get_timestamp,
>>  };
>>  
>> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
>> +	.base = {
>> +		.get_param = adreno_get_param,
>> +		.set_param = adreno_set_param,
>> +		.hw_init = a6xx_hw_init,
>> +		.ucode_load = a6xx_ucode_load,
>> +		.pm_suspend = a6xx_pm_suspend,
>> +		.pm_resume = a6xx_pm_resume,
>> +		.recover = a6xx_recover,
>> +		.submit = a6xx_submit,
>> +		.active_ring = a6xx_active_ring,
>> +		.irq = a6xx_irq,
>> +		.destroy = a6xx_destroy,
>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
>> +		.show = a6xx_show,
>> +#endif
>> +		.gpu_busy = a6xx_gpu_busy,
>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
>> +		.gpu_state_get = a6xx_gpu_state_get,
>> +		.gpu_state_put = a6xx_gpu_state_put,
>> +#endif
>> +		.create_address_space = a6xx_create_address_space,
>> +		.create_private_address_space = a6xx_create_private_address_space,
>> +		.get_rptr = a6xx_get_rptr,
>> +		.progress = a6xx_progress,
>> +	},
>> +	.get_timestamp = a6xx_get_timestamp,
>> +};
>> +
>>  struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>  {
>>  	struct msm_drm_private *priv = dev->dev_private;
>> @@ -2003,18 +2186,36 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>  
>>  	adreno_gpu->registers = NULL;
>>  
>> +	/* Check if there is a GMU phandle and set it up */
>> +	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
>> +	/* FIXME: How do we gracefully handle this? */
>> +	BUG_ON(!node);
> How will you handle this BUG() when there is no GMU (a610 gpu)?
> 
>> +
>> +	adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
>> +
>>  	/*
>>  	 * We need to know the platform type before calling into adreno_gpu_init
>>  	 * so that the hw_apriv flag can be correctly set. Snoop into the info
>>  	 * and grab the revision number
>>  	 */
>>  	info = adreno_info(config->rev);
>> -
>> -	if (info && (info->revn == 650 || info->revn == 660 ||
>> -			adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
>> +	if (!info)
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	/* Assign these early so that we can use the is_aXYZ helpers */
>> +	/* Numeric revision IDs (e.g. 630) */
>> +	adreno_gpu->revn = info->revn;
>> +	/* New-style ADRENO_REV()-only */
>> +	adreno_gpu->rev = info->rev;
>> +	/* Quirk data */
>> +	adreno_gpu->info = info;
>> +
>> +	if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
>>  		adreno_gpu->base.hw_apriv = true;
>>  
>> -	a6xx_llc_slices_init(pdev, a6xx_gpu);
>> +	/* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>> +		a6xx_llc_slices_init(pdev, a6xx_gpu);
>>  
>>  	ret = a6xx_set_supported_hw(&pdev->dev, config->rev);
>>  	if (ret) {
>> @@ -2022,7 +2223,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>  		return ERR_PTR(ret);
>>  	}
>>  
>> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
>> +	else
>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
>>  	if (ret) {
>>  		a6xx_destroy(&(a6xx_gpu->base.base));
>>  		return ERR_PTR(ret);
>> @@ -2035,13 +2239,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>  	if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
>>  		priv->gpu_clamp_to_idle = true;
>>  
>> -	/* Check if there is a GMU phandle and set it up */
>> -	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
>> -
>> -	/* FIXME: How do we gracefully handle this? */
>> -	BUG_ON(!node);
>> -
>> -	ret = a6xx_gmu_init(a6xx_gpu, node);
>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
>> +		ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
>> +	else
>> +		ret = a6xx_gmu_init(a6xx_gpu, node);
>>  	of_node_put(node);
>>  	if (ret) {
>>  		a6xx_destroy(&(a6xx_gpu->base.base));
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> index eea2e60ce3b7..51a7656072fa 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>>  void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>>  
>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
>>  
>>  void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>> index 30ecdff363e7..4e5d650578c6 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
>>  	/* Get the generic state from the adreno core */
>>  	adreno_gpu_state_get(gpu, &a6xx_state->base);
>>  
>> -	a6xx_get_gmu_registers(gpu, a6xx_state);
>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> nit: Kinda misleading function name to a layman. Should we invert the
> function to "adreno_has_gmu"?
> 
> -Akhil
>> +		a6xx_get_gmu_registers(gpu, a6xx_state);
>>  
>> -	a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
>> -	a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
>> -	a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
>> +		a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
>> +		a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
>> +		a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
>>  
>> -	a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
>> +		a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
>> +	}
>>  
>>  	/* If GX isn't on the rest of the data isn't going to be accessible */
>> -	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>> +	if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>>  		return &a6xx_state->base;
>>  
>>  	/* Get the banks of indexed registers */
>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>> index 6934cee07d42..5c5901d65950 100644
>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
>>  		if (!adreno_gpu->info->fw[i])
>>  			continue;
>>  
>> +		/* Skip loading GMU firwmare with GMU Wrapper */
>> +		if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
>> +			continue;
>> +
>>  		/* Skip if the firmware has already been loaded */
>>  		if (adreno_gpu->fw[i])
>>  			continue;
>> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>>  	u32 speedbin;
>>  	int ret;
>>  
>> -	/* Only handle the core clock when GMU is not in use */
>> -	if (config->rev.core < 6) {
>> +	/* Only handle the core clock when GMU is not in use (or is absent). */
>> +	if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
>>  		/*
>>  		 * This can only be done before devm_pm_opp_of_add_table(), or
>>  		 * dev_pm_opp_set_config() will WARN_ON()
>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>> index f62612a5c70f..ee5352bc5329 100644
>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>> @@ -115,6 +115,7 @@ struct adreno_gpu {
>>  	 * code (a3xx_gpu.c) and stored in this common location.
>>  	 */
>>  	const unsigned int *reg_offsets;
>> +	bool gmu_is_wrapper;
>>  };
>>  #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
>>  
>> @@ -145,6 +146,11 @@ struct adreno_platform_config {
>>  
>>  bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
>>  
>> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
>> +{
>> +	return gpu->gmu_is_wrapper;
>> +}
>> +
>>  static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
>>  {
>>  	return (gpu->revn < 300);
>>
>> -- 
>> 2.40.0
>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support
  2023-05-02  9:40     ` Konrad Dybcio
@ 2023-05-03 20:32       ` Akhil P Oommen
  2023-05-04  6:34         ` Konrad Dybcio
  0 siblings, 1 reply; 30+ messages in thread
From: Akhil P Oommen @ 2023-05-03 20:32 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, linux-arm-msm, dri-devel,
	freedreno, devicetree, linux-kernel, Rob Clark, Marijn Suijten

On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
> 
> 
> On 2.05.2023 09:49, Akhil P Oommen wrote:
> > On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
> >> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
> >> but don't implement the associated GMUs. This is due to the fact that
> >> the GMU directly pokes at RPMh. Sadly, this means we have to take care
> >> of enabling & scaling power rails, clocks and bandwidth ourselves.
> >>
> >> Reuse existing Adreno-common code and modify the deeply-GMU-infused
> >> A6XX code to facilitate these GPUs. This involves if-ing out lots
> >> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
> >> the actual name that Qualcomm uses in their downstream kernels).
> >>
> >> This is essentially a register region which is convenient to model
> >> as a device. We'll use it for managing the GDSCs. The register
> >> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
> >> and lets us reuse quite a bit of gmu_read/write/rmw calls.
> > << I sent a reply to this patch earlier, but not sure where it went.
> > Still figuring out Mutt... >>
> Answered it here:
> 
> https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8f77@linaro.org/

Thanks. Will check and respond there if needed.

> 
> I don't think I see any new comments in this "reply revision" (heh), so please
> check that one out.
> 
> > 
> > Only convenience I found is that we can reuse gmu register ops in a few
> > places (< 10 I think). If we just model this as another gpu memory
> > region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
> > architecture code with clean separation. Also, it looks like we need to
> > keep a dummy gmu platform device in the devicetree with the current
> > approach. That doesn't sound right.
> That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
> need additional, gmuwrapper-configuration specific code anyway, as
> OPP & genpd will no longer make use of the default behavior which
> only gets triggered if there's a single power-domains=<> entry, afaicu.
Can you please tell me which specific *default behviour* do you mean here?
I am curious to know what I am overlooking here. We can always get a cxpd/gxpd device
and vote for the gdscs directly from the driver. Anything related to
OPP?

-Akhil
> 
> If nothing else, this is a very convenient way to model a part of the
> GPU (as that's essentially what GMU_CX is, to my understanding) and
> the bindings people didn't shoot me in the head for proposing this, so
> I assume it'd be cool to pursue this..
> 
> Konrad
> >>
> >> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> >> ---
> >>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c       |  72 +++++++-
> >>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 255 +++++++++++++++++++++++++---
> >>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h       |   1 +
> >>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
> >>  drivers/gpu/drm/msm/adreno/adreno_gpu.c     |   8 +-
> >>  drivers/gpu/drm/msm/adreno/adreno_gpu.h     |   6 +
> >>  6 files changed, 318 insertions(+), 38 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >> index 87babbb2a19f..b1acdb027205 100644
> >> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >> @@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, struct platform_device *pdev,
> >>  
> >>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> >>  {
> >> +	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> >>  	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>  	struct platform_device *pdev = to_platform_device(gmu->dev);
> >>  
> >> @@ -1494,10 +1495,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> >>  	gmu->mmio = NULL;
> >>  	gmu->rscc = NULL;
> >>  
> >> -	a6xx_gmu_memory_free(gmu);
> >> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >> +		a6xx_gmu_memory_free(gmu);
> >>  
> >> -	free_irq(gmu->gmu_irq, gmu);
> >> -	free_irq(gmu->hfi_irq, gmu);
> >> +		free_irq(gmu->gmu_irq, gmu);
> >> +		free_irq(gmu->hfi_irq, gmu);
> >> +	}
> >>  
> >>  	/* Drop reference taken in of_find_device_by_node */
> >>  	put_device(gmu->dev);
> >> @@ -1516,6 +1519,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
> >>  	return 0;
> >>  }
> >>  
> >> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> >> +{
> >> +	struct platform_device *pdev = of_find_device_by_node(node);
> >> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >> +	int ret;
> >> +
> >> +	if (!pdev)
> >> +		return -ENODEV;
> >> +
> >> +	gmu->dev = &pdev->dev;
> >> +
> >> +	of_dma_configure(gmu->dev, node, true);
> > why setup dma for a device that is not actually present?
> >> +
> >> +	pm_runtime_enable(gmu->dev);
> >> +
> >> +	/* Mark legacy for manual SPTPRAC control */
> >> +	gmu->legacy = true;
> >> +
> >> +	/* Map the GMU registers */
> >> +	gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
> >> +	if (IS_ERR(gmu->mmio)) {
> >> +		ret = PTR_ERR(gmu->mmio);
> >> +		goto err_mmio;
> >> +	}
> >> +
> >> +	gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
> >> +	if (IS_ERR(gmu->cxpd)) {
> >> +		ret = PTR_ERR(gmu->cxpd);
> >> +		goto err_mmio;
> >> +	}
> >> +
> >> +	if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
> >> +		ret = -ENODEV;
> >> +		goto detach_cxpd;
> >> +	}
> >> +
> >> +	init_completion(&gmu->pd_gate);
> >> +	complete_all(&gmu->pd_gate);
> >> +	gmu->pd_nb.notifier_call = cxpd_notifier_cb;
> >> +
> >> +	/* Get a link to the GX power domain to reset the GPU */
> >> +	gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
> >> +	if (IS_ERR(gmu->gxpd)) {
> >> +		ret = PTR_ERR(gmu->gxpd);
> >> +		goto err_mmio;
> >> +	}
> >> +
> >> +	gmu->initialized = true;
> >> +
> >> +	return 0;
> >> +
> >> +detach_cxpd:
> >> +	dev_pm_domain_detach(gmu->cxpd, false);
> >> +
> >> +err_mmio:
> >> +	iounmap(gmu->mmio);
> >> +
> >> +	/* Drop reference taken in of_find_device_by_node */
> >> +	put_device(gmu->dev);
> >> +
> >> +	return ret;
> >> +}
> >> +
> >>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> >>  {
> >>  	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> >> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >> index 931f9f3b3a85..8e0345ffab81 100644
> >> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >> @@ -20,9 +20,11 @@ static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
> >>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>  
> >> -	/* Check that the GMU is idle */
> >> -	if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
> >> -		return false;
> >> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >> +		/* Check that the GMU is idle */
> >> +		if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
> >> +			return false;
> >> +	}
> >>  
> >>  	/* Check tha the CX master is idle */
> >>  	if (gpu_read(gpu, REG_A6XX_RBBM_STATUS) &
> >> @@ -612,13 +614,15 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
> >>  		return;
> >>  
> >>  	/* Disable SP clock before programming HWCG registers */
> >> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
> >> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
> >>  
> >>  	for (i = 0; (reg = &adreno_gpu->info->hwcg[i], reg->offset); i++)
> >>  		gpu_write(gpu, reg->offset, state ? reg->value : 0);
> >>  
> >>  	/* Enable SP clock */
> >> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
> >> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
> >>  
> >>  	gpu_write(gpu, REG_A6XX_RBBM_CLOCK_CNTL, state ? clock_cntl_on : 0);
> >>  }
> >> @@ -1018,10 +1022,13 @@ static int hw_init(struct msm_gpu *gpu)
> >>  {
> >>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>  	int ret;
> >>  
> >> -	/* Make sure the GMU keeps the GPU on while we set it up */
> >> -	a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> >> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >> +		/* Make sure the GMU keeps the GPU on while we set it up */
> >> +		a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> >> +	}
> >>  
> >>  	/* Clear GBIF halt in case GX domain was not collapsed */
> >>  	if (a6xx_has_gbif(adreno_gpu))
> >> @@ -1144,6 +1151,17 @@ static int hw_init(struct msm_gpu *gpu)
> >>  			0x3f0243f0);
> >>  	}
> >>  
> >> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> >> +		/* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
> >> +
> >> +		/* Set up the CX GMU counter 0 to count busy ticks */
> >> +		gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
> >> +
> >> +		/* Enable power counter 0 */
> >> +		gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
> >> +		gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
> >> +	}
> >> +
> >>  	/* Protect registers from the CP */
> >>  	a6xx_set_cp_protect(gpu);
> >>  
> >> @@ -1233,6 +1251,8 @@ static int hw_init(struct msm_gpu *gpu)
> >>  	}
> >>  
> >>  out:
> >> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> >> +		return ret;
> >>  	/*
> >>  	 * Tell the GMU that we are done touching the GPU and it can start power
> >>  	 * management
> >> @@ -1267,6 +1287,9 @@ static void a6xx_dump(struct msm_gpu *gpu)
> >>  	adreno_dump(gpu);
> >>  }
> >>  
> >> +#define GBIF_GX_HALT_MASK	BIT(0)
> >> +#define GBIF_CLIENT_HALT_MASK	BIT(0)
> >> +#define GBIF_ARB_HALT_MASK	BIT(1)
> >>  #define VBIF_RESET_ACK_TIMEOUT	100
> >>  #define VBIF_RESET_ACK_MASK	0x00f0
> >>  
> >> @@ -1299,7 +1322,8 @@ static void a6xx_recover(struct msm_gpu *gpu)
> >>  	 * Turn off keep alive that might have been enabled by the hang
> >>  	 * interrupt
> >>  	 */
> >> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
> >> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
> > 
> > Maybe it is better to move this to a6xx_gmu_force_power_off.
> > 
> >>  
> >>  	pm_runtime_dont_use_autosuspend(&gpu->pdev->dev);
> >>  
> >> @@ -1329,6 +1353,32 @@ static void a6xx_recover(struct msm_gpu *gpu)
> >>  
> >>  	dev_pm_genpd_remove_notifier(gmu->cxpd);
> >>  
> >> +	/* Software-reset the GPU */
> > 
> > This is not soft reset sequence. We are trying to quiescent gpu - ddr
> > traffic with this sequence.
> > 
> >> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> >> +		/* Halt the GX side of GBIF */
> >> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, GBIF_GX_HALT_MASK);
> >> +		spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) &
> >> +			   GBIF_GX_HALT_MASK);
> >> +
> >> +		/* Halt new client requests on GBIF */
> >> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
> >> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> >> +			   (GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
> >> +
> >> +		/* Halt all AXI requests on GBIF */
> >> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
> >> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> >> +			   (GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
> >> +
> >> +		/* Clear the halts */
> >> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> >> +
> >> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
> >> +
> >> +		/* This *really* needs to go through before we do anything else! */
> >> +		mb();
> >> +	}
> >> +
> > 
> > This sequence should be before we collapse cx gdsc. Also, please see if
> > we can create a subroutine to avoid code dup.
> > 
> >>  	pm_runtime_use_autosuspend(&gpu->pdev->dev);
> >>  
> >>  	if (active_submits)
> >> @@ -1463,7 +1513,8 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
> >>  	 * Force the GPU to stay on until after we finish
> >>  	 * collecting information
> >>  	 */
> >> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
> >> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
> >>  
> >>  	DRM_DEV_ERROR(&gpu->pdev->dev,
> >>  		"gpu fault ring %d fence %x status %8.8X rb %4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
> >> @@ -1624,7 +1675,7 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
> >>  		a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
> >>  }
> >>  
> >> -static int a6xx_pm_resume(struct msm_gpu *gpu)
> >> +static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
> >>  {
> >>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >> @@ -1644,10 +1695,61 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
> >>  
> >>  	a6xx_llc_activate(a6xx_gpu);
> >>  
> >> -	return 0;
> >> +	return ret;
> >>  }
> >>  
> >> -static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >> +static int a6xx_pm_resume(struct msm_gpu *gpu)
> >> +{
> >> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >> +	unsigned long freq = 0;
> >> +	struct dev_pm_opp *opp;
> >> +	int ret;
> >> +
> >> +	gpu->needs_hw_init = true;
> >> +
> >> +	trace_msm_gpu_resume(0);
> >> +
> >> +	mutex_lock(&a6xx_gpu->gmu.lock);
> > I think we can ignore gmu lock as there is no real gmu device.
> > 
> >> +
> >> +	pm_runtime_resume_and_get(gmu->dev);
> >> +	pm_runtime_resume_and_get(gmu->gxpd);
> >> +
> >> +	/* Set the core clock, having VDD scaling in mind */
> >> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, gpu->fast_rate);
> >> +	if (ret)
> >> +		goto err_core_clk;
> >> +
> >> +	ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
> >> +	if (ret)
> >> +		goto err_bulk_clk;
> >> +
> >> +	ret = clk_prepare_enable(gpu->ebi1_clk);
> >> +	if (ret)
> >> +		goto err_mem_clk;
> >> +
> >> +	/* If anything goes south, tear the GPU down piece by piece.. */
> >> +	if (ret) {
> >> +err_mem_clk:
> >> +		clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
> >> +err_bulk_clk:
> >> +		opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
> >> +		dev_pm_opp_put(opp);
> >> +		dev_pm_opp_set_rate(&gpu->pdev->dev, 0);
> >> +err_core_clk:
> >> +		pm_runtime_put(gmu->gxpd);
> >> +		pm_runtime_put(gmu->dev);
> >> +	}
> >> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> >> +
> >> +	if (!ret)
> >> +		msm_devfreq_resume(gpu);
> >> +
> >> +	return ret;
> >> +}
> >> +
> >> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
> >>  {
> >>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >> @@ -1674,11 +1776,62 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >>  	return 0;
> >>  }
> >>  
> >> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >> +{
> >> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >> +	unsigned long freq = 0;
> >> +	struct dev_pm_opp *opp;
> >> +	int i, ret;
> >> +
> >> +	trace_msm_gpu_suspend(0);
> >> +
> >> +	opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
> >> +	dev_pm_opp_put(opp);
> >> +
> >> +	msm_devfreq_suspend(gpu);
> >> +
> >> +	mutex_lock(&a6xx_gpu->gmu.lock);
> >> +
> >> +	clk_disable_unprepare(gpu->ebi1_clk);
> >> +
> >> +	clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
> >> +
> >> +	/* Set frequency to the minimum supported level (no 27MHz on A6xx!) */
> >> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, freq);
> >> +	if (ret)
> >> +		goto err;
> >> +
> >> +	pm_runtime_put_sync(gmu->gxpd);
> >> +	pm_runtime_put_sync(gmu->dev);
> >> +
> >> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> >> +
> >> +	if (a6xx_gpu->shadow_bo)
> >> +		for (i = 0; i < gpu->nr_rings; i++)
> >> +			a6xx_gpu->shadow[i] = 0;
> >> +
> >> +	gpu->suspend_count++;
> >> +
> >> +	return 0;
> >> +
> >> +err:
> >> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> >> +
> >> +	return ret;
> >> +}
> >> +
> >>  static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> >>  {
> >>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>  
> >> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> >> +		*value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
> >> +		return 0;
> >> +	}
> >> +
> > Instead of wrapper check here, we can just create a separate op. I don't
> > see any benefit in reusing the same function here.
> > 
> > 
> >>  	mutex_lock(&a6xx_gpu->gmu.lock);
> >>  
> >>  	/* Force the GPU power on so we can read this register */
> >> @@ -1716,7 +1869,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
> >>  		drm_gem_object_put(a6xx_gpu->shadow_bo);
> >>  	}
> >>  
> >> -	a6xx_llc_slices_destroy(a6xx_gpu);
> >> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >> +		a6xx_llc_slices_destroy(a6xx_gpu);
> >>  
> >>  	mutex_lock(&a6xx_gpu->gmu.lock);
> >>  	a6xx_gmu_remove(a6xx_gpu);
> >> @@ -1957,8 +2111,8 @@ static const struct adreno_gpu_funcs funcs = {
> >>  		.set_param = adreno_set_param,
> >>  		.hw_init = a6xx_hw_init,
> >>  		.ucode_load = a6xx_ucode_load,
> >> -		.pm_suspend = a6xx_pm_suspend,
> >> -		.pm_resume = a6xx_pm_resume,
> >> +		.pm_suspend = a6xx_gmu_pm_suspend,
> >> +		.pm_resume = a6xx_gmu_pm_resume,
> >>  		.recover = a6xx_recover,
> >>  		.submit = a6xx_submit,
> >>  		.active_ring = a6xx_active_ring,
> >> @@ -1982,6 +2136,35 @@ static const struct adreno_gpu_funcs funcs = {
> >>  	.get_timestamp = a6xx_get_timestamp,
> >>  };
> >>  
> >> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
> >> +	.base = {
> >> +		.get_param = adreno_get_param,
> >> +		.set_param = adreno_set_param,
> >> +		.hw_init = a6xx_hw_init,
> >> +		.ucode_load = a6xx_ucode_load,
> >> +		.pm_suspend = a6xx_pm_suspend,
> >> +		.pm_resume = a6xx_pm_resume,
> >> +		.recover = a6xx_recover,
> >> +		.submit = a6xx_submit,
> >> +		.active_ring = a6xx_active_ring,
> >> +		.irq = a6xx_irq,
> >> +		.destroy = a6xx_destroy,
> >> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> >> +		.show = a6xx_show,
> >> +#endif
> >> +		.gpu_busy = a6xx_gpu_busy,
> >> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> >> +		.gpu_state_get = a6xx_gpu_state_get,
> >> +		.gpu_state_put = a6xx_gpu_state_put,
> >> +#endif
> >> +		.create_address_space = a6xx_create_address_space,
> >> +		.create_private_address_space = a6xx_create_private_address_space,
> >> +		.get_rptr = a6xx_get_rptr,
> >> +		.progress = a6xx_progress,
> >> +	},
> >> +	.get_timestamp = a6xx_get_timestamp,
> >> +};
> >> +
> >>  struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>  {
> >>  	struct msm_drm_private *priv = dev->dev_private;
> >> @@ -2003,18 +2186,36 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>  
> >>  	adreno_gpu->registers = NULL;
> >>  
> >> +	/* Check if there is a GMU phandle and set it up */
> >> +	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> >> +	/* FIXME: How do we gracefully handle this? */
> >> +	BUG_ON(!node);
> > How will you handle this BUG() when there is no GMU (a610 gpu)?
> > 
> >> +
> >> +	adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
> >> +
> >>  	/*
> >>  	 * We need to know the platform type before calling into adreno_gpu_init
> >>  	 * so that the hw_apriv flag can be correctly set. Snoop into the info
> >>  	 * and grab the revision number
> >>  	 */
> >>  	info = adreno_info(config->rev);
> >> -
> >> -	if (info && (info->revn == 650 || info->revn == 660 ||
> >> -			adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
> >> +	if (!info)
> >> +		return ERR_PTR(-EINVAL);
> >> +
> >> +	/* Assign these early so that we can use the is_aXYZ helpers */
> >> +	/* Numeric revision IDs (e.g. 630) */
> >> +	adreno_gpu->revn = info->revn;
> >> +	/* New-style ADRENO_REV()-only */
> >> +	adreno_gpu->rev = info->rev;
> >> +	/* Quirk data */
> >> +	adreno_gpu->info = info;
> >> +
> >> +	if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
> >>  		adreno_gpu->base.hw_apriv = true;
> >>  
> >> -	a6xx_llc_slices_init(pdev, a6xx_gpu);
> >> +	/* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
> >> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >> +		a6xx_llc_slices_init(pdev, a6xx_gpu);
> >>  
> >>  	ret = a6xx_set_supported_hw(&pdev->dev, config->rev);
> >>  	if (ret) {
> >> @@ -2022,7 +2223,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>  		return ERR_PTR(ret);
> >>  	}
> >>  
> >> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> >> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> >> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
> >> +	else
> >> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> >>  	if (ret) {
> >>  		a6xx_destroy(&(a6xx_gpu->base.base));
> >>  		return ERR_PTR(ret);
> >> @@ -2035,13 +2239,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>  	if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
> >>  		priv->gpu_clamp_to_idle = true;
> >>  
> >> -	/* Check if there is a GMU phandle and set it up */
> >> -	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> >> -
> >> -	/* FIXME: How do we gracefully handle this? */
> >> -	BUG_ON(!node);
> >> -
> >> -	ret = a6xx_gmu_init(a6xx_gpu, node);
> >> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> >> +		ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
> >> +	else
> >> +		ret = a6xx_gmu_init(a6xx_gpu, node);
> >>  	of_node_put(node);
> >>  	if (ret) {
> >>  		a6xx_destroy(&(a6xx_gpu->base.base));
> >> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >> index eea2e60ce3b7..51a7656072fa 100644
> >> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> >>  void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> >>  
> >>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> >> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> >>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
> >>  
> >>  void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> >> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >> index 30ecdff363e7..4e5d650578c6 100644
> >> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
> >>  	/* Get the generic state from the adreno core */
> >>  	adreno_gpu_state_get(gpu, &a6xx_state->base);
> >>  
> >> -	a6xx_get_gmu_registers(gpu, a6xx_state);
> >> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> > nit: Kinda misleading function name to a layman. Should we invert the
> > function to "adreno_has_gmu"?
> > 
> > -Akhil
> >> +		a6xx_get_gmu_registers(gpu, a6xx_state);
> >>  
> >> -	a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> >> -	a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> >> -	a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> >> +		a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> >> +		a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> >> +		a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> >>  
> >> -	a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> >> +		a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> >> +	}
> >>  
> >>  	/* If GX isn't on the rest of the data isn't going to be accessible */
> >> -	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> >> +	if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> >>  		return &a6xx_state->base;
> >>  
> >>  	/* Get the banks of indexed registers */
> >> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >> index 6934cee07d42..5c5901d65950 100644
> >> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
> >>  		if (!adreno_gpu->info->fw[i])
> >>  			continue;
> >>  
> >> +		/* Skip loading GMU firwmare with GMU Wrapper */
> >> +		if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
> >> +			continue;
> >> +
> >>  		/* Skip if the firmware has already been loaded */
> >>  		if (adreno_gpu->fw[i])
> >>  			continue;
> >> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
> >>  	u32 speedbin;
> >>  	int ret;
> >>  
> >> -	/* Only handle the core clock when GMU is not in use */
> >> -	if (config->rev.core < 6) {
> >> +	/* Only handle the core clock when GMU is not in use (or is absent). */
> >> +	if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
> >>  		/*
> >>  		 * This can only be done before devm_pm_opp_of_add_table(), or
> >>  		 * dev_pm_opp_set_config() will WARN_ON()
> >> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >> index f62612a5c70f..ee5352bc5329 100644
> >> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >> @@ -115,6 +115,7 @@ struct adreno_gpu {
> >>  	 * code (a3xx_gpu.c) and stored in this common location.
> >>  	 */
> >>  	const unsigned int *reg_offsets;
> >> +	bool gmu_is_wrapper;
> >>  };
> >>  #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
> >>  
> >> @@ -145,6 +146,11 @@ struct adreno_platform_config {
> >>  
> >>  bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
> >>  
> >> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
> >> +{
> >> +	return gpu->gmu_is_wrapper;
> >> +}
> >> +
> >>  static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
> >>  {
> >>  	return (gpu->revn < 300);
> >>
> >> -- 
> >> 2.40.0
> >>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support
  2023-05-03 20:32       ` Akhil P Oommen
@ 2023-05-04  6:34         ` Konrad Dybcio
  2023-05-05  8:46           ` Akhil P Oommen
  0 siblings, 1 reply; 30+ messages in thread
From: Konrad Dybcio @ 2023-05-04  6:34 UTC (permalink / raw)
  To: Akhil P Oommen
  Cc: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, linux-arm-msm, dri-devel,
	freedreno, devicetree, linux-kernel, Rob Clark, Marijn Suijten



On 3.05.2023 22:32, Akhil P Oommen wrote:
> On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
>>
>>
>> On 2.05.2023 09:49, Akhil P Oommen wrote:
>>> On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
>>>> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
>>>> but don't implement the associated GMUs. This is due to the fact that
>>>> the GMU directly pokes at RPMh. Sadly, this means we have to take care
>>>> of enabling & scaling power rails, clocks and bandwidth ourselves.
>>>>
>>>> Reuse existing Adreno-common code and modify the deeply-GMU-infused
>>>> A6XX code to facilitate these GPUs. This involves if-ing out lots
>>>> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
>>>> the actual name that Qualcomm uses in their downstream kernels).
>>>>
>>>> This is essentially a register region which is convenient to model
>>>> as a device. We'll use it for managing the GDSCs. The register
>>>> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
>>>> and lets us reuse quite a bit of gmu_read/write/rmw calls.
>>> << I sent a reply to this patch earlier, but not sure where it went.
>>> Still figuring out Mutt... >>
>> Answered it here:
>>
>> https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8f77@linaro.org/
> 
> Thanks. Will check and respond there if needed.
> 
>>
>> I don't think I see any new comments in this "reply revision" (heh), so please
>> check that one out.
>>
>>>
>>> Only convenience I found is that we can reuse gmu register ops in a few
>>> places (< 10 I think). If we just model this as another gpu memory
>>> region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
>>> architecture code with clean separation. Also, it looks like we need to
>>> keep a dummy gmu platform device in the devicetree with the current
>>> approach. That doesn't sound right.
>> That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
>> need additional, gmuwrapper-configuration specific code anyway, as
>> OPP & genpd will no longer make use of the default behavior which
>> only gets triggered if there's a single power-domains=<> entry, afaicu.
> Can you please tell me which specific *default behviour* do you mean here?
> I am curious to know what I am overlooking here. We can always get a cxpd/gxpd device
> and vote for the gdscs directly from the driver. Anything related to
> OPP?
I *believe* this is true:

if (ARRAY_SIZE(power-domains) == 1) {
	of generic code will enable the power domain at .probe time

	opp APIs will default to scaling that domain with required-opps
}

and we do need to put GX/CX (with an MX parent to match) there, as the
AP is responsible for voting in this configuration

Konrad
> 
> -Akhil
>>
>> If nothing else, this is a very convenient way to model a part of the
>> GPU (as that's essentially what GMU_CX is, to my understanding) and
>> the bindings people didn't shoot me in the head for proposing this, so
>> I assume it'd be cool to pursue this..
>>
>> Konrad
>>>>
>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
>>>> ---
>>>>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c       |  72 +++++++-
>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 255 +++++++++++++++++++++++++---
>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h       |   1 +
>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
>>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.c     |   8 +-
>>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.h     |   6 +
>>>>  6 files changed, 318 insertions(+), 38 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>> index 87babbb2a19f..b1acdb027205 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>> @@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, struct platform_device *pdev,
>>>>  
>>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>>>>  {
>>>> +	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>>>>  	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>  	struct platform_device *pdev = to_platform_device(gmu->dev);
>>>>  
>>>> @@ -1494,10 +1495,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>>>>  	gmu->mmio = NULL;
>>>>  	gmu->rscc = NULL;
>>>>  
>>>> -	a6xx_gmu_memory_free(gmu);
>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>>> +		a6xx_gmu_memory_free(gmu);
>>>>  
>>>> -	free_irq(gmu->gmu_irq, gmu);
>>>> -	free_irq(gmu->hfi_irq, gmu);
>>>> +		free_irq(gmu->gmu_irq, gmu);
>>>> +		free_irq(gmu->hfi_irq, gmu);
>>>> +	}
>>>>  
>>>>  	/* Drop reference taken in of_find_device_by_node */
>>>>  	put_device(gmu->dev);
>>>> @@ -1516,6 +1519,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
>>>>  	return 0;
>>>>  }
>>>>  
>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
>>>> +{
>>>> +	struct platform_device *pdev = of_find_device_by_node(node);
>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>> +	int ret;
>>>> +
>>>> +	if (!pdev)
>>>> +		return -ENODEV;
>>>> +
>>>> +	gmu->dev = &pdev->dev;
>>>> +
>>>> +	of_dma_configure(gmu->dev, node, true);
>>> why setup dma for a device that is not actually present?
>>>> +
>>>> +	pm_runtime_enable(gmu->dev);
>>>> +
>>>> +	/* Mark legacy for manual SPTPRAC control */
>>>> +	gmu->legacy = true;
>>>> +
>>>> +	/* Map the GMU registers */
>>>> +	gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
>>>> +	if (IS_ERR(gmu->mmio)) {
>>>> +		ret = PTR_ERR(gmu->mmio);
>>>> +		goto err_mmio;
>>>> +	}
>>>> +
>>>> +	gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
>>>> +	if (IS_ERR(gmu->cxpd)) {
>>>> +		ret = PTR_ERR(gmu->cxpd);
>>>> +		goto err_mmio;
>>>> +	}
>>>> +
>>>> +	if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
>>>> +		ret = -ENODEV;
>>>> +		goto detach_cxpd;
>>>> +	}
>>>> +
>>>> +	init_completion(&gmu->pd_gate);
>>>> +	complete_all(&gmu->pd_gate);
>>>> +	gmu->pd_nb.notifier_call = cxpd_notifier_cb;
>>>> +
>>>> +	/* Get a link to the GX power domain to reset the GPU */
>>>> +	gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
>>>> +	if (IS_ERR(gmu->gxpd)) {
>>>> +		ret = PTR_ERR(gmu->gxpd);
>>>> +		goto err_mmio;
>>>> +	}
>>>> +
>>>> +	gmu->initialized = true;
>>>> +
>>>> +	return 0;
>>>> +
>>>> +detach_cxpd:
>>>> +	dev_pm_domain_detach(gmu->cxpd, false);
>>>> +
>>>> +err_mmio:
>>>> +	iounmap(gmu->mmio);
>>>> +
>>>> +	/* Drop reference taken in of_find_device_by_node */
>>>> +	put_device(gmu->dev);
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
>>>>  {
>>>>  	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>> index 931f9f3b3a85..8e0345ffab81 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>> @@ -20,9 +20,11 @@ static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>  
>>>> -	/* Check that the GMU is idle */
>>>> -	if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
>>>> -		return false;
>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>>> +		/* Check that the GMU is idle */
>>>> +		if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
>>>> +			return false;
>>>> +	}
>>>>  
>>>>  	/* Check tha the CX master is idle */
>>>>  	if (gpu_read(gpu, REG_A6XX_RBBM_STATUS) &
>>>> @@ -612,13 +614,15 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
>>>>  		return;
>>>>  
>>>>  	/* Disable SP clock before programming HWCG registers */
>>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
>>>>  
>>>>  	for (i = 0; (reg = &adreno_gpu->info->hwcg[i], reg->offset); i++)
>>>>  		gpu_write(gpu, reg->offset, state ? reg->value : 0);
>>>>  
>>>>  	/* Enable SP clock */
>>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
>>>>  
>>>>  	gpu_write(gpu, REG_A6XX_RBBM_CLOCK_CNTL, state ? clock_cntl_on : 0);
>>>>  }
>>>> @@ -1018,10 +1022,13 @@ static int hw_init(struct msm_gpu *gpu)
>>>>  {
>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>  	int ret;
>>>>  
>>>> -	/* Make sure the GMU keeps the GPU on while we set it up */
>>>> -	a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>>> +		/* Make sure the GMU keeps the GPU on while we set it up */
>>>> +		a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
>>>> +	}
>>>>  
>>>>  	/* Clear GBIF halt in case GX domain was not collapsed */
>>>>  	if (a6xx_has_gbif(adreno_gpu))
>>>> @@ -1144,6 +1151,17 @@ static int hw_init(struct msm_gpu *gpu)
>>>>  			0x3f0243f0);
>>>>  	}
>>>>  
>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
>>>> +		/* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
>>>> +
>>>> +		/* Set up the CX GMU counter 0 to count busy ticks */
>>>> +		gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
>>>> +
>>>> +		/* Enable power counter 0 */
>>>> +		gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
>>>> +		gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
>>>> +	}
>>>> +
>>>>  	/* Protect registers from the CP */
>>>>  	a6xx_set_cp_protect(gpu);
>>>>  
>>>> @@ -1233,6 +1251,8 @@ static int hw_init(struct msm_gpu *gpu)
>>>>  	}
>>>>  
>>>>  out:
>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
>>>> +		return ret;
>>>>  	/*
>>>>  	 * Tell the GMU that we are done touching the GPU and it can start power
>>>>  	 * management
>>>> @@ -1267,6 +1287,9 @@ static void a6xx_dump(struct msm_gpu *gpu)
>>>>  	adreno_dump(gpu);
>>>>  }
>>>>  
>>>> +#define GBIF_GX_HALT_MASK	BIT(0)
>>>> +#define GBIF_CLIENT_HALT_MASK	BIT(0)
>>>> +#define GBIF_ARB_HALT_MASK	BIT(1)
>>>>  #define VBIF_RESET_ACK_TIMEOUT	100
>>>>  #define VBIF_RESET_ACK_MASK	0x00f0
>>>>  
>>>> @@ -1299,7 +1322,8 @@ static void a6xx_recover(struct msm_gpu *gpu)
>>>>  	 * Turn off keep alive that might have been enabled by the hang
>>>>  	 * interrupt
>>>>  	 */
>>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
>>>
>>> Maybe it is better to move this to a6xx_gmu_force_power_off.
>>>
>>>>  
>>>>  	pm_runtime_dont_use_autosuspend(&gpu->pdev->dev);
>>>>  
>>>> @@ -1329,6 +1353,32 @@ static void a6xx_recover(struct msm_gpu *gpu)
>>>>  
>>>>  	dev_pm_genpd_remove_notifier(gmu->cxpd);
>>>>  
>>>> +	/* Software-reset the GPU */
>>>
>>> This is not soft reset sequence. We are trying to quiescent gpu - ddr
>>> traffic with this sequence.
>>>
>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
>>>> +		/* Halt the GX side of GBIF */
>>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, GBIF_GX_HALT_MASK);
>>>> +		spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) &
>>>> +			   GBIF_GX_HALT_MASK);
>>>> +
>>>> +		/* Halt new client requests on GBIF */
>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
>>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
>>>> +			   (GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
>>>> +
>>>> +		/* Halt all AXI requests on GBIF */
>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
>>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
>>>> +			   (GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
>>>> +
>>>> +		/* Clear the halts */
>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
>>>> +
>>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
>>>> +
>>>> +		/* This *really* needs to go through before we do anything else! */
>>>> +		mb();
>>>> +	}
>>>> +
>>>
>>> This sequence should be before we collapse cx gdsc. Also, please see if
>>> we can create a subroutine to avoid code dup.
>>>
>>>>  	pm_runtime_use_autosuspend(&gpu->pdev->dev);
>>>>  
>>>>  	if (active_submits)
>>>> @@ -1463,7 +1513,8 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
>>>>  	 * Force the GPU to stay on until after we finish
>>>>  	 * collecting information
>>>>  	 */
>>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
>>>>  
>>>>  	DRM_DEV_ERROR(&gpu->pdev->dev,
>>>>  		"gpu fault ring %d fence %x status %8.8X rb %4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
>>>> @@ -1624,7 +1675,7 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
>>>>  		a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
>>>>  }
>>>>  
>>>> -static int a6xx_pm_resume(struct msm_gpu *gpu)
>>>> +static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
>>>>  {
>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>> @@ -1644,10 +1695,61 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
>>>>  
>>>>  	a6xx_llc_activate(a6xx_gpu);
>>>>  
>>>> -	return 0;
>>>> +	return ret;
>>>>  }
>>>>  
>>>> -static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>>> +static int a6xx_pm_resume(struct msm_gpu *gpu)
>>>> +{
>>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>> +	unsigned long freq = 0;
>>>> +	struct dev_pm_opp *opp;
>>>> +	int ret;
>>>> +
>>>> +	gpu->needs_hw_init = true;
>>>> +
>>>> +	trace_msm_gpu_resume(0);
>>>> +
>>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
>>> I think we can ignore gmu lock as there is no real gmu device.
>>>
>>>> +
>>>> +	pm_runtime_resume_and_get(gmu->dev);
>>>> +	pm_runtime_resume_and_get(gmu->gxpd);
>>>> +
>>>> +	/* Set the core clock, having VDD scaling in mind */
>>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, gpu->fast_rate);
>>>> +	if (ret)
>>>> +		goto err_core_clk;
>>>> +
>>>> +	ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
>>>> +	if (ret)
>>>> +		goto err_bulk_clk;
>>>> +
>>>> +	ret = clk_prepare_enable(gpu->ebi1_clk);
>>>> +	if (ret)
>>>> +		goto err_mem_clk;
>>>> +
>>>> +	/* If anything goes south, tear the GPU down piece by piece.. */
>>>> +	if (ret) {
>>>> +err_mem_clk:
>>>> +		clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
>>>> +err_bulk_clk:
>>>> +		opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
>>>> +		dev_pm_opp_put(opp);
>>>> +		dev_pm_opp_set_rate(&gpu->pdev->dev, 0);
>>>> +err_core_clk:
>>>> +		pm_runtime_put(gmu->gxpd);
>>>> +		pm_runtime_put(gmu->dev);
>>>> +	}
>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
>>>> +
>>>> +	if (!ret)
>>>> +		msm_devfreq_resume(gpu);
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
>>>>  {
>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>> @@ -1674,11 +1776,62 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>>>  	return 0;
>>>>  }
>>>>  
>>>> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>>> +{
>>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>> +	unsigned long freq = 0;
>>>> +	struct dev_pm_opp *opp;
>>>> +	int i, ret;
>>>> +
>>>> +	trace_msm_gpu_suspend(0);
>>>> +
>>>> +	opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
>>>> +	dev_pm_opp_put(opp);
>>>> +
>>>> +	msm_devfreq_suspend(gpu);
>>>> +
>>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
>>>> +
>>>> +	clk_disable_unprepare(gpu->ebi1_clk);
>>>> +
>>>> +	clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
>>>> +
>>>> +	/* Set frequency to the minimum supported level (no 27MHz on A6xx!) */
>>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, freq);
>>>> +	if (ret)
>>>> +		goto err;
>>>> +
>>>> +	pm_runtime_put_sync(gmu->gxpd);
>>>> +	pm_runtime_put_sync(gmu->dev);
>>>> +
>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
>>>> +
>>>> +	if (a6xx_gpu->shadow_bo)
>>>> +		for (i = 0; i < gpu->nr_rings; i++)
>>>> +			a6xx_gpu->shadow[i] = 0;
>>>> +
>>>> +	gpu->suspend_count++;
>>>> +
>>>> +	return 0;
>>>> +
>>>> +err:
>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>>  static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>>>>  {
>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>  
>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
>>>> +		*value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
>>>> +		return 0;
>>>> +	}
>>>> +
>>> Instead of wrapper check here, we can just create a separate op. I don't
>>> see any benefit in reusing the same function here.
>>>
>>>
>>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
>>>>  
>>>>  	/* Force the GPU power on so we can read this register */
>>>> @@ -1716,7 +1869,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
>>>>  		drm_gem_object_put(a6xx_gpu->shadow_bo);
>>>>  	}
>>>>  
>>>> -	a6xx_llc_slices_destroy(a6xx_gpu);
>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>> +		a6xx_llc_slices_destroy(a6xx_gpu);
>>>>  
>>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
>>>>  	a6xx_gmu_remove(a6xx_gpu);
>>>> @@ -1957,8 +2111,8 @@ static const struct adreno_gpu_funcs funcs = {
>>>>  		.set_param = adreno_set_param,
>>>>  		.hw_init = a6xx_hw_init,
>>>>  		.ucode_load = a6xx_ucode_load,
>>>> -		.pm_suspend = a6xx_pm_suspend,
>>>> -		.pm_resume = a6xx_pm_resume,
>>>> +		.pm_suspend = a6xx_gmu_pm_suspend,
>>>> +		.pm_resume = a6xx_gmu_pm_resume,
>>>>  		.recover = a6xx_recover,
>>>>  		.submit = a6xx_submit,
>>>>  		.active_ring = a6xx_active_ring,
>>>> @@ -1982,6 +2136,35 @@ static const struct adreno_gpu_funcs funcs = {
>>>>  	.get_timestamp = a6xx_get_timestamp,
>>>>  };
>>>>  
>>>> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
>>>> +	.base = {
>>>> +		.get_param = adreno_get_param,
>>>> +		.set_param = adreno_set_param,
>>>> +		.hw_init = a6xx_hw_init,
>>>> +		.ucode_load = a6xx_ucode_load,
>>>> +		.pm_suspend = a6xx_pm_suspend,
>>>> +		.pm_resume = a6xx_pm_resume,
>>>> +		.recover = a6xx_recover,
>>>> +		.submit = a6xx_submit,
>>>> +		.active_ring = a6xx_active_ring,
>>>> +		.irq = a6xx_irq,
>>>> +		.destroy = a6xx_destroy,
>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
>>>> +		.show = a6xx_show,
>>>> +#endif
>>>> +		.gpu_busy = a6xx_gpu_busy,
>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
>>>> +		.gpu_state_get = a6xx_gpu_state_get,
>>>> +		.gpu_state_put = a6xx_gpu_state_put,
>>>> +#endif
>>>> +		.create_address_space = a6xx_create_address_space,
>>>> +		.create_private_address_space = a6xx_create_private_address_space,
>>>> +		.get_rptr = a6xx_get_rptr,
>>>> +		.progress = a6xx_progress,
>>>> +	},
>>>> +	.get_timestamp = a6xx_get_timestamp,
>>>> +};
>>>> +
>>>>  struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>  {
>>>>  	struct msm_drm_private *priv = dev->dev_private;
>>>> @@ -2003,18 +2186,36 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>  
>>>>  	adreno_gpu->registers = NULL;
>>>>  
>>>> +	/* Check if there is a GMU phandle and set it up */
>>>> +	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
>>>> +	/* FIXME: How do we gracefully handle this? */
>>>> +	BUG_ON(!node);
>>> How will you handle this BUG() when there is no GMU (a610 gpu)?
>>>
>>>> +
>>>> +	adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
>>>> +
>>>>  	/*
>>>>  	 * We need to know the platform type before calling into adreno_gpu_init
>>>>  	 * so that the hw_apriv flag can be correctly set. Snoop into the info
>>>>  	 * and grab the revision number
>>>>  	 */
>>>>  	info = adreno_info(config->rev);
>>>> -
>>>> -	if (info && (info->revn == 650 || info->revn == 660 ||
>>>> -			adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
>>>> +	if (!info)
>>>> +		return ERR_PTR(-EINVAL);
>>>> +
>>>> +	/* Assign these early so that we can use the is_aXYZ helpers */
>>>> +	/* Numeric revision IDs (e.g. 630) */
>>>> +	adreno_gpu->revn = info->revn;
>>>> +	/* New-style ADRENO_REV()-only */
>>>> +	adreno_gpu->rev = info->rev;
>>>> +	/* Quirk data */
>>>> +	adreno_gpu->info = info;
>>>> +
>>>> +	if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
>>>>  		adreno_gpu->base.hw_apriv = true;
>>>>  
>>>> -	a6xx_llc_slices_init(pdev, a6xx_gpu);
>>>> +	/* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>> +		a6xx_llc_slices_init(pdev, a6xx_gpu);
>>>>  
>>>>  	ret = a6xx_set_supported_hw(&pdev->dev, config->rev);
>>>>  	if (ret) {
>>>> @@ -2022,7 +2223,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>  		return ERR_PTR(ret);
>>>>  	}
>>>>  
>>>> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
>>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
>>>> +	else
>>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
>>>>  	if (ret) {
>>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
>>>>  		return ERR_PTR(ret);
>>>> @@ -2035,13 +2239,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>  	if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
>>>>  		priv->gpu_clamp_to_idle = true;
>>>>  
>>>> -	/* Check if there is a GMU phandle and set it up */
>>>> -	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
>>>> -
>>>> -	/* FIXME: How do we gracefully handle this? */
>>>> -	BUG_ON(!node);
>>>> -
>>>> -	ret = a6xx_gmu_init(a6xx_gpu, node);
>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
>>>> +		ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
>>>> +	else
>>>> +		ret = a6xx_gmu_init(a6xx_gpu, node);
>>>>  	of_node_put(node);
>>>>  	if (ret) {
>>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>> index eea2e60ce3b7..51a7656072fa 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>>>>  void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>>>>  
>>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
>>>>  
>>>>  void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>> index 30ecdff363e7..4e5d650578c6 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
>>>>  	/* Get the generic state from the adreno core */
>>>>  	adreno_gpu_state_get(gpu, &a6xx_state->base);
>>>>  
>>>> -	a6xx_get_gmu_registers(gpu, a6xx_state);
>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>> nit: Kinda misleading function name to a layman. Should we invert the
>>> function to "adreno_has_gmu"?
>>>
>>> -Akhil
>>>> +		a6xx_get_gmu_registers(gpu, a6xx_state);
>>>>  
>>>> -	a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
>>>> -	a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
>>>> -	a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
>>>> +		a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
>>>> +		a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
>>>> +		a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
>>>>  
>>>> -	a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
>>>> +		a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
>>>> +	}
>>>>  
>>>>  	/* If GX isn't on the rest of the data isn't going to be accessible */
>>>> -	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>>>>  		return &a6xx_state->base;
>>>>  
>>>>  	/* Get the banks of indexed registers */
>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>> index 6934cee07d42..5c5901d65950 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
>>>>  		if (!adreno_gpu->info->fw[i])
>>>>  			continue;
>>>>  
>>>> +		/* Skip loading GMU firwmare with GMU Wrapper */
>>>> +		if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
>>>> +			continue;
>>>> +
>>>>  		/* Skip if the firmware has already been loaded */
>>>>  		if (adreno_gpu->fw[i])
>>>>  			continue;
>>>> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>>>>  	u32 speedbin;
>>>>  	int ret;
>>>>  
>>>> -	/* Only handle the core clock when GMU is not in use */
>>>> -	if (config->rev.core < 6) {
>>>> +	/* Only handle the core clock when GMU is not in use (or is absent). */
>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
>>>>  		/*
>>>>  		 * This can only be done before devm_pm_opp_of_add_table(), or
>>>>  		 * dev_pm_opp_set_config() will WARN_ON()
>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>> index f62612a5c70f..ee5352bc5329 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>> @@ -115,6 +115,7 @@ struct adreno_gpu {
>>>>  	 * code (a3xx_gpu.c) and stored in this common location.
>>>>  	 */
>>>>  	const unsigned int *reg_offsets;
>>>> +	bool gmu_is_wrapper;
>>>>  };
>>>>  #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
>>>>  
>>>> @@ -145,6 +146,11 @@ struct adreno_platform_config {
>>>>  
>>>>  bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
>>>>  
>>>> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
>>>> +{
>>>> +	return gpu->gmu_is_wrapper;
>>>> +}
>>>> +
>>>>  static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
>>>>  {
>>>>  	return (gpu->revn < 300);
>>>>
>>>> -- 
>>>> 2.40.0
>>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support
  2023-05-04  6:34         ` Konrad Dybcio
@ 2023-05-05  8:46           ` Akhil P Oommen
  2023-05-05 10:35             ` Konrad Dybcio
  0 siblings, 1 reply; 30+ messages in thread
From: Akhil P Oommen @ 2023-05-05  8:46 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, linux-arm-msm, dri-devel,
	freedreno, devicetree, linux-kernel, Rob Clark, Marijn Suijten

On Thu, May 04, 2023 at 08:34:07AM +0200, Konrad Dybcio wrote:
> 
> 
> On 3.05.2023 22:32, Akhil P Oommen wrote:
> > On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
> >>
> >>
> >> On 2.05.2023 09:49, Akhil P Oommen wrote:
> >>> On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
> >>>> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
> >>>> but don't implement the associated GMUs. This is due to the fact that
> >>>> the GMU directly pokes at RPMh. Sadly, this means we have to take care
> >>>> of enabling & scaling power rails, clocks and bandwidth ourselves.
> >>>>
> >>>> Reuse existing Adreno-common code and modify the deeply-GMU-infused
> >>>> A6XX code to facilitate these GPUs. This involves if-ing out lots
> >>>> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
> >>>> the actual name that Qualcomm uses in their downstream kernels).
> >>>>
> >>>> This is essentially a register region which is convenient to model
> >>>> as a device. We'll use it for managing the GDSCs. The register
> >>>> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
> >>>> and lets us reuse quite a bit of gmu_read/write/rmw calls.
> >>> << I sent a reply to this patch earlier, but not sure where it went.
> >>> Still figuring out Mutt... >>
> >> Answered it here:
> >>
> >> https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8f77@linaro.org/
> > 
> > Thanks. Will check and respond there if needed.
> > 
> >>
> >> I don't think I see any new comments in this "reply revision" (heh), so please
> >> check that one out.
> >>
> >>>
> >>> Only convenience I found is that we can reuse gmu register ops in a few
> >>> places (< 10 I think). If we just model this as another gpu memory
> >>> region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
> >>> architecture code with clean separation. Also, it looks like we need to
> >>> keep a dummy gmu platform device in the devicetree with the current
> >>> approach. That doesn't sound right.
> >> That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
> >> need additional, gmuwrapper-configuration specific code anyway, as
> >> OPP & genpd will no longer make use of the default behavior which
> >> only gets triggered if there's a single power-domains=<> entry, afaicu.
> > Can you please tell me which specific *default behviour* do you mean here?
> > I am curious to know what I am overlooking here. We can always get a cxpd/gxpd device
> > and vote for the gdscs directly from the driver. Anything related to
> > OPP?
> I *believe* this is true:
> 
> if (ARRAY_SIZE(power-domains) == 1) {
> 	of generic code will enable the power domain at .probe time
we need to handle the voting directly. I recently shared a patch to
vote cx gdsc from gpu driver. Maybe we can ignore this when gpu has
only cx rail due to this logic you quoted here.

I see that you have handled it mostly correctly from the gpu driver in the updated
a6xx_pm_suspend() callback. Just the power domain device ptrs should be moved to
gpu from gmu.

> 
> 	opp APIs will default to scaling that domain with required-opps

> }
> 
> and we do need to put GX/CX (with an MX parent to match) there, as the
> AP is responsible for voting in this configuration

We should vote to turn ON gx/cx headswitches through genpd from gpu driver. When you vote for
core clk frequency, *clock driver is supposed to scale* all the necessary
regulators. At least that is how downstream works. You can refer the downstream
gpucc clk driver of these SoCs. I am not sure how much of that can be easily converted to
upstream.

Also, how does having a gmu dt node help in this regard? Feel free to
elaborate, I am not very familiar with clk/regulator implementations.

-Akhil.
> 
> Konrad
> > 
> > -Akhil
> >>
> >> If nothing else, this is a very convenient way to model a part of the
> >> GPU (as that's essentially what GMU_CX is, to my understanding) and
> >> the bindings people didn't shoot me in the head for proposing this, so
> >> I assume it'd be cool to pursue this..
> >>
> >> Konrad
> >>>>
> >>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> >>>> ---
> >>>>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c       |  72 +++++++-
> >>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 255 +++++++++++++++++++++++++---
> >>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h       |   1 +
> >>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
> >>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.c     |   8 +-
> >>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.h     |   6 +
> >>>>  6 files changed, 318 insertions(+), 38 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>>> index 87babbb2a19f..b1acdb027205 100644
> >>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>>> @@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, struct platform_device *pdev,
> >>>>  
> >>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> >>>>  {
> >>>> +	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> >>>>  	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>>>  	struct platform_device *pdev = to_platform_device(gmu->dev);
> >>>>  
> >>>> @@ -1494,10 +1495,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> >>>>  	gmu->mmio = NULL;
> >>>>  	gmu->rscc = NULL;
> >>>>  
> >>>> -	a6xx_gmu_memory_free(gmu);
> >>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>> +		a6xx_gmu_memory_free(gmu);
> >>>>  
> >>>> -	free_irq(gmu->gmu_irq, gmu);
> >>>> -	free_irq(gmu->hfi_irq, gmu);
> >>>> +		free_irq(gmu->gmu_irq, gmu);
> >>>> +		free_irq(gmu->hfi_irq, gmu);
> >>>> +	}
> >>>>  
> >>>>  	/* Drop reference taken in of_find_device_by_node */
> >>>>  	put_device(gmu->dev);
> >>>> @@ -1516,6 +1519,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
> >>>>  	return 0;
> >>>>  }
> >>>>  
> >>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> >>>> +{
> >>>> +	struct platform_device *pdev = of_find_device_by_node(node);
> >>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>>> +	int ret;
> >>>> +
> >>>> +	if (!pdev)
> >>>> +		return -ENODEV;
> >>>> +
> >>>> +	gmu->dev = &pdev->dev;
> >>>> +
> >>>> +	of_dma_configure(gmu->dev, node, true);
> >>> why setup dma for a device that is not actually present?
> >>>> +
> >>>> +	pm_runtime_enable(gmu->dev);
> >>>> +
> >>>> +	/* Mark legacy for manual SPTPRAC control */
> >>>> +	gmu->legacy = true;
> >>>> +
> >>>> +	/* Map the GMU registers */
> >>>> +	gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
> >>>> +	if (IS_ERR(gmu->mmio)) {
> >>>> +		ret = PTR_ERR(gmu->mmio);
> >>>> +		goto err_mmio;
> >>>> +	}
> >>>> +
> >>>> +	gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
> >>>> +	if (IS_ERR(gmu->cxpd)) {
> >>>> +		ret = PTR_ERR(gmu->cxpd);
> >>>> +		goto err_mmio;
> >>>> +	}
> >>>> +
> >>>> +	if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
> >>>> +		ret = -ENODEV;
> >>>> +		goto detach_cxpd;
> >>>> +	}
> >>>> +
> >>>> +	init_completion(&gmu->pd_gate);
> >>>> +	complete_all(&gmu->pd_gate);
> >>>> +	gmu->pd_nb.notifier_call = cxpd_notifier_cb;
> >>>> +
> >>>> +	/* Get a link to the GX power domain to reset the GPU */
> >>>> +	gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
> >>>> +	if (IS_ERR(gmu->gxpd)) {
> >>>> +		ret = PTR_ERR(gmu->gxpd);
> >>>> +		goto err_mmio;
> >>>> +	}
> >>>> +
> >>>> +	gmu->initialized = true;
> >>>> +
> >>>> +	return 0;
> >>>> +
> >>>> +detach_cxpd:
> >>>> +	dev_pm_domain_detach(gmu->cxpd, false);
> >>>> +
> >>>> +err_mmio:
> >>>> +	iounmap(gmu->mmio);
> >>>> +
> >>>> +	/* Drop reference taken in of_find_device_by_node */
> >>>> +	put_device(gmu->dev);
> >>>> +
> >>>> +	return ret;
> >>>> +}
> >>>> +
> >>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> >>>>  {
> >>>>  	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> >>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>> index 931f9f3b3a85..8e0345ffab81 100644
> >>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>> @@ -20,9 +20,11 @@ static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
> >>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>>  
> >>>> -	/* Check that the GMU is idle */
> >>>> -	if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
> >>>> -		return false;
> >>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>> +		/* Check that the GMU is idle */
> >>>> +		if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
> >>>> +			return false;
> >>>> +	}
> >>>>  
> >>>>  	/* Check tha the CX master is idle */
> >>>>  	if (gpu_read(gpu, REG_A6XX_RBBM_STATUS) &
> >>>> @@ -612,13 +614,15 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
> >>>>  		return;
> >>>>  
> >>>>  	/* Disable SP clock before programming HWCG registers */
> >>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
> >>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
> >>>>  
> >>>>  	for (i = 0; (reg = &adreno_gpu->info->hwcg[i], reg->offset); i++)
> >>>>  		gpu_write(gpu, reg->offset, state ? reg->value : 0);
> >>>>  
> >>>>  	/* Enable SP clock */
> >>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
> >>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
> >>>>  
> >>>>  	gpu_write(gpu, REG_A6XX_RBBM_CLOCK_CNTL, state ? clock_cntl_on : 0);
> >>>>  }
> >>>> @@ -1018,10 +1022,13 @@ static int hw_init(struct msm_gpu *gpu)
> >>>>  {
> >>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>>>  	int ret;
> >>>>  
> >>>> -	/* Make sure the GMU keeps the GPU on while we set it up */
> >>>> -	a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> >>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>> +		/* Make sure the GMU keeps the GPU on while we set it up */
> >>>> +		a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> >>>> +	}
> >>>>  
> >>>>  	/* Clear GBIF halt in case GX domain was not collapsed */
> >>>>  	if (a6xx_has_gbif(adreno_gpu))
> >>>> @@ -1144,6 +1151,17 @@ static int hw_init(struct msm_gpu *gpu)
> >>>>  			0x3f0243f0);
> >>>>  	}
> >>>>  
> >>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>> +		/* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
> >>>> +
> >>>> +		/* Set up the CX GMU counter 0 to count busy ticks */
> >>>> +		gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
> >>>> +
> >>>> +		/* Enable power counter 0 */
> >>>> +		gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
> >>>> +		gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
> >>>> +	}
> >>>> +
> >>>>  	/* Protect registers from the CP */
> >>>>  	a6xx_set_cp_protect(gpu);
> >>>>  
> >>>> @@ -1233,6 +1251,8 @@ static int hw_init(struct msm_gpu *gpu)
> >>>>  	}
> >>>>  
> >>>>  out:
> >>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> >>>> +		return ret;
> >>>>  	/*
> >>>>  	 * Tell the GMU that we are done touching the GPU and it can start power
> >>>>  	 * management
> >>>> @@ -1267,6 +1287,9 @@ static void a6xx_dump(struct msm_gpu *gpu)
> >>>>  	adreno_dump(gpu);
> >>>>  }
> >>>>  
> >>>> +#define GBIF_GX_HALT_MASK	BIT(0)
> >>>> +#define GBIF_CLIENT_HALT_MASK	BIT(0)
> >>>> +#define GBIF_ARB_HALT_MASK	BIT(1)
> >>>>  #define VBIF_RESET_ACK_TIMEOUT	100
> >>>>  #define VBIF_RESET_ACK_MASK	0x00f0
> >>>>  
> >>>> @@ -1299,7 +1322,8 @@ static void a6xx_recover(struct msm_gpu *gpu)
> >>>>  	 * Turn off keep alive that might have been enabled by the hang
> >>>>  	 * interrupt
> >>>>  	 */
> >>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
> >>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
> >>>
> >>> Maybe it is better to move this to a6xx_gmu_force_power_off.
> >>>
> >>>>  
> >>>>  	pm_runtime_dont_use_autosuspend(&gpu->pdev->dev);
> >>>>  
> >>>> @@ -1329,6 +1353,32 @@ static void a6xx_recover(struct msm_gpu *gpu)
> >>>>  
> >>>>  	dev_pm_genpd_remove_notifier(gmu->cxpd);
> >>>>  
> >>>> +	/* Software-reset the GPU */
> >>>
> >>> This is not soft reset sequence. We are trying to quiescent gpu - ddr
> >>> traffic with this sequence.
> >>>
> >>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>> +		/* Halt the GX side of GBIF */
> >>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, GBIF_GX_HALT_MASK);
> >>>> +		spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) &
> >>>> +			   GBIF_GX_HALT_MASK);
> >>>> +
> >>>> +		/* Halt new client requests on GBIF */
> >>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
> >>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> >>>> +			   (GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
> >>>> +
> >>>> +		/* Halt all AXI requests on GBIF */
> >>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
> >>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> >>>> +			   (GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
> >>>> +
> >>>> +		/* Clear the halts */
> >>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> >>>> +
> >>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
> >>>> +
> >>>> +		/* This *really* needs to go through before we do anything else! */
> >>>> +		mb();
> >>>> +	}
> >>>> +
> >>>
> >>> This sequence should be before we collapse cx gdsc. Also, please see if
> >>> we can create a subroutine to avoid code dup.
> >>>
> >>>>  	pm_runtime_use_autosuspend(&gpu->pdev->dev);
> >>>>  
> >>>>  	if (active_submits)
> >>>> @@ -1463,7 +1513,8 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
> >>>>  	 * Force the GPU to stay on until after we finish
> >>>>  	 * collecting information
> >>>>  	 */
> >>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
> >>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
> >>>>  
> >>>>  	DRM_DEV_ERROR(&gpu->pdev->dev,
> >>>>  		"gpu fault ring %d fence %x status %8.8X rb %4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
> >>>> @@ -1624,7 +1675,7 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
> >>>>  		a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
> >>>>  }
> >>>>  
> >>>> -static int a6xx_pm_resume(struct msm_gpu *gpu)
> >>>> +static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
> >>>>  {
> >>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>> @@ -1644,10 +1695,61 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
> >>>>  
> >>>>  	a6xx_llc_activate(a6xx_gpu);
> >>>>  
> >>>> -	return 0;
> >>>> +	return ret;
> >>>>  }
> >>>>  
> >>>> -static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >>>> +static int a6xx_pm_resume(struct msm_gpu *gpu)
> >>>> +{
> >>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>>> +	unsigned long freq = 0;
> >>>> +	struct dev_pm_opp *opp;
> >>>> +	int ret;
> >>>> +
> >>>> +	gpu->needs_hw_init = true;
> >>>> +
> >>>> +	trace_msm_gpu_resume(0);
> >>>> +
> >>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
> >>> I think we can ignore gmu lock as there is no real gmu device.
> >>>
> >>>> +
> >>>> +	pm_runtime_resume_and_get(gmu->dev);
> >>>> +	pm_runtime_resume_and_get(gmu->gxpd);
> >>>> +
> >>>> +	/* Set the core clock, having VDD scaling in mind */
> >>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, gpu->fast_rate);
> >>>> +	if (ret)
> >>>> +		goto err_core_clk;
> >>>> +
> >>>> +	ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
> >>>> +	if (ret)
> >>>> +		goto err_bulk_clk;
> >>>> +
> >>>> +	ret = clk_prepare_enable(gpu->ebi1_clk);
> >>>> +	if (ret)
> >>>> +		goto err_mem_clk;
> >>>> +
> >>>> +	/* If anything goes south, tear the GPU down piece by piece.. */
> >>>> +	if (ret) {
> >>>> +err_mem_clk:
> >>>> +		clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
> >>>> +err_bulk_clk:
> >>>> +		opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
> >>>> +		dev_pm_opp_put(opp);
> >>>> +		dev_pm_opp_set_rate(&gpu->pdev->dev, 0);
> >>>> +err_core_clk:
> >>>> +		pm_runtime_put(gmu->gxpd);
> >>>> +		pm_runtime_put(gmu->dev);
> >>>> +	}
> >>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> >>>> +
> >>>> +	if (!ret)
> >>>> +		msm_devfreq_resume(gpu);
> >>>> +
> >>>> +	return ret;
> >>>> +}
> >>>> +
> >>>> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
> >>>>  {
> >>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>> @@ -1674,11 +1776,62 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >>>>  	return 0;
> >>>>  }
> >>>>  
> >>>> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >>>> +{
> >>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>>> +	unsigned long freq = 0;
> >>>> +	struct dev_pm_opp *opp;
> >>>> +	int i, ret;
> >>>> +
> >>>> +	trace_msm_gpu_suspend(0);
> >>>> +
> >>>> +	opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
> >>>> +	dev_pm_opp_put(opp);
> >>>> +
> >>>> +	msm_devfreq_suspend(gpu);
> >>>> +
> >>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
> >>>> +
> >>>> +	clk_disable_unprepare(gpu->ebi1_clk);
> >>>> +
> >>>> +	clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
> >>>> +
> >>>> +	/* Set frequency to the minimum supported level (no 27MHz on A6xx!) */
> >>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, freq);
> >>>> +	if (ret)
> >>>> +		goto err;
> >>>> +
> >>>> +	pm_runtime_put_sync(gmu->gxpd);
> >>>> +	pm_runtime_put_sync(gmu->dev);
> >>>> +
> >>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> >>>> +
> >>>> +	if (a6xx_gpu->shadow_bo)
> >>>> +		for (i = 0; i < gpu->nr_rings; i++)
> >>>> +			a6xx_gpu->shadow[i] = 0;
> >>>> +
> >>>> +	gpu->suspend_count++;
> >>>> +
> >>>> +	return 0;
> >>>> +
> >>>> +err:
> >>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> >>>> +
> >>>> +	return ret;
> >>>> +}
> >>>> +
> >>>>  static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> >>>>  {
> >>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>>  
> >>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>> +		*value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
> >>>> +		return 0;
> >>>> +	}
> >>>> +
> >>> Instead of wrapper check here, we can just create a separate op. I don't
> >>> see any benefit in reusing the same function here.
> >>>
> >>>
> >>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
> >>>>  
> >>>>  	/* Force the GPU power on so we can read this register */
> >>>> @@ -1716,7 +1869,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
> >>>>  		drm_gem_object_put(a6xx_gpu->shadow_bo);
>6xx_pm_suspend >>>>  	}
> >>>>  
> >>>> -	a6xx_llc_slices_destroy(a6xx_gpu);
> >>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>> +		a6xx_llc_slices_destroy(a6xx_gpu);
> >>>>  
> >>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
> >>>>  	a6xx_gmu_remove(a6xx_gpu);
> >>>> @@ -1957,8 +2111,8 @@ static const struct adreno_gpu_funcs funcs = {
> >>>>  		.set_param = adreno_set_param,
> >>>>  		.hw_init = a6xx_hw_init,
> >>>>  		.ucode_load = a6xx_ucode_load,
> >>>> -		.pm_suspend = a6xx_pm_suspend,
> >>>> -		.pm_resume = a6xx_pm_resume,
> >>>> +		.pm_suspend = a6xx_gmu_pm_suspend,
> >>>> +		.pm_resume = a6xx_gmu_pm_resume,
> >>>>  		.recover = a6xx_recover,
> >>>>  		.submit = a6xx_submit,
> >>>>  		.active_ring = a6xx_active_ring,
> >>>> @@ -1982,6 +2136,35 @@ static const struct adreno_gpu_funcs funcs = {
> >>>>  	.get_timestamp = a6xx_get_timestamp,
> >>>>  };
> >>>>  
> >>>> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
> >>>> +	.base = {
> >>>> +		.get_param = adreno_get_param,
> >>>> +		.set_param = adreno_set_param,
> >>>> +		.hw_init = a6xx_hw_init,
> >>>> +		.ucode_load = a6xx_ucode_load,
> >>>> +		.pm_suspend = a6xx_pm_suspend,
> >>>> +		.pm_resume = a6xx_pm_resume,
> >>>> +		.recover = a6xx_recover,
> >>>> +		.submit = a6xx_submit,
> >>>> +		.active_ring = a6xx_active_ring,
> >>>> +		.irq = a6xx_irq,
> >>>> +		.destroy = a6xx_destroy,
> >>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> >>>> +		.show = a6xx_show,
> >>>> +#endif
> >>>> +		.gpu_busy = a6xx_gpu_busy,
> >>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> >>>> +		.gpu_state_get = a6xx_gpu_state_get,
> >>>> +		.gpu_state_put = a6xx_gpu_state_put,
> >>>> +#endif
> >>>> +		.create_address_space = a6xx_create_address_space,
> >>>> +		.create_private_address_space = a6xx_create_private_address_space,
> >>>> +		.get_rptr = a6xx_get_rptr,
> >>>> +		.progress = a6xx_progress,
> >>>> +	},
> >>>> +	.get_timestamp = a6xx_get_timestamp,
> >>>> +};
> >>>> +
> >>>>  struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>>>  {
> >>>>  	struct msm_drm_private *priv = dev->dev_private;
> >>>> @@ -2003,18 +2186,36 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>>>  
> >>>>  	adreno_gpu->registers = NULL;
> >>>>  
> >>>> +	/* Check if there is a GMU phandle and set it up */
> >>>> +	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> >>>> +	/* FIXME: How do we gracefully handle this? */
> >>>> +	BUG_ON(!node);
> >>> How will you handle this BUG() when there is no GMU (a610 gpu)?
> >>>
> >>>> +
> >>>> +	adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
> >>>> +
> >>>>  	/*
> >>>>  	 * We need to know the platform type before calling into adreno_gpu_init
> >>>>  	 * so that the hw_apriv flag can be correctly set. Snoop into the info
> >>>>  	 * and grab the revision number
> >>>>  	 */
> >>>>  	info = adreno_info(config->rev);
> >>>> -
> >>>> -	if (info && (info->revn == 650 || info->revn == 660 ||
> >>>> -			adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
> >>>> +	if (!info)
> >>>> +		return ERR_PTR(-EINVAL);
> >>>> +
> >>>> +	/* Assign these early so that we can use the is_aXYZ helpers */
> >>>> +	/* Numeric revision IDs (e.g. 630) */
> >>>> +	adreno_gpu->revn = info->revn;
> >>>> +	/* New-style ADRENO_REV()-only */
> >>>> +	adreno_gpu->rev = info->rev;
> >>>> +	/* Quirk data */
> >>>> +	adreno_gpu->info = info;
> >>>> +
> >>>> +	if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
> >>>>  		adreno_gpu->base.hw_apriv = true;
> >>>>  
> >>>> -	a6xx_llc_slices_init(pdev, a6xx_gpu);
> >>>> +	/* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
> >>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>> +		a6xx_llc_slices_init(pdev, a6xx_gpu);
> >>>>  
> >>>>  	ret = a6xx_set_supported_hw(&pdev->dev, config->rev);
> >>>>  	if (ret) {
> >>>> @@ -2022,7 +2223,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>>>  		return ERR_PTR(ret);
> >>>>  	}
> >>>>  
> >>>> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> >>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> >>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
> >>>> +	else
> >>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> >>>>  	if (ret) {
> >>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
> >>>>  		return ERR_PTR(ret);
> >>>> @@ -2035,13 +2239,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>>>  	if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
> >>>>  		priv->gpu_clamp_to_idle = true;
> >>>>  
> >>>> -	/* Check if there is a GMU phandle and set it up */
> >>>> -	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> >>>> -
> >>>> -	/* FIXME: How do we gracefully handle this? */
> >>>> -	BUG_ON(!node);
> >>>> -
> >>>> -	ret = a6xx_gmu_init(a6xx_gpu, node);
> >>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> >>>> +		ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
> >>>> +	else
> >>>> +		ret = a6xx_gmu_init(a6xx_gpu, node);
> >>>>  	of_node_put(node);
> >>>>  	if (ret) {
> >>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
> >>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >>>> index eea2e60ce3b7..51a7656072fa 100644
> >>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >>>> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> >>>>  void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> >>>>  
> >>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> >>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> >>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
> >>>>  
> >>>>  void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> >>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >>>> index 30ecdff363e7..4e5d650578c6 100644
> >>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >>>> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
> >>>>  	/* Get the generic state from the adreno core */
> >>>>  	adreno_gpu_state_get(gpu, &a6xx_state->base);
> >>>>  
> >>>> -	a6xx_get_gmu_registers(gpu, a6xx_state);
> >>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >>> nit: Kinda misleading function name to a layman. Should we invert the
> >>> function to "adreno_has_gmu"?
> >>>
> >>> -Akhil
> >>>> +		a6xx_get_gmu_registers(gpu, a6xx_state);
> >>>>  
> >>>> -	a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> >>>> -	a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> >>>> -	a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> >>>> +		a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> >>>> +		a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> >>>> +		a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> >>>>  
> >>>> -	a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> >>>> +		a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> >>>> +	}
> >>>>  
> >>>>  	/* If GX isn't on the rest of the data isn't going to be accessible */
> >>>> -	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> >>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> >>>>  		return &a6xx_state->base;
> >>>>  
> >>>>  	/* Get the banks of indexed registers */
> >>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >>>> index 6934cee07d42..5c5901d65950 100644
> >>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >>>> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
> >>>>  		if (!adreno_gpu->info->fw[i])
> >>>>  			continue;
> >>>>  
> >>>> +		/* Skip loading GMU firwmare with GMU Wrapper */
> >>>> +		if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
> >>>> +			continue;
> >>>> +
> >>>>  		/* Skip if the firmware has already been loaded */
> >>>>  		if (adreno_gpu->fw[i])
> >>>>  			continue;
> >>>> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
> >>>>  	u32 speedbin;
> >>>>  	int ret;
> >>>>  
> >>>> -	/* Only handle the core clock when GMU is not in use */
> >>>> -	if (config->rev.core < 6) {
> >>>> +	/* Only handle the core clock when GMU is not in use (or is absent). */
> >>>> +	if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
> >>>>  		/*
> >>>>  		 * This can only be done before devm_pm_opp_of_add_table(), or
> >>>>  		 * dev_pm_opp_set_config() will WARN_ON()
> >>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >>>> index f62612a5c70f..ee5352bc5329 100644
> >>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >>>> @@ -115,6 +115,7 @@ struct adreno_gpu {
> >>>>  	 * code (a3xx_gpu.c) and stored in this common location.
> >>>>  	 */
> >>>>  	const unsigned int *reg_offsets;
> >>>> +	bool gmu_is_wrapper;
> >>>>  };
> >>>>  #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
> >>>>  
> >>>> @@ -145,6 +146,11 @@ struct adreno_platform_config {
> >>>>  
> >>>>  bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
> >>>>  
> >>>> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
> >>>> +{
> >>>> +	return gpu->gmu_is_wrapper;
> >>>> +}
> >>>> +
> >>>>  static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
> >>>>  {
> >>>>  	return (gpu->revn < 300);
> >>>>
> >>>> -- 
> >>>> 2.40.0
> >>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support
  2023-05-05  8:46           ` Akhil P Oommen
@ 2023-05-05 10:35             ` Konrad Dybcio
  2023-05-06 14:46               ` Akhil P Oommen
  0 siblings, 1 reply; 30+ messages in thread
From: Konrad Dybcio @ 2023-05-05 10:35 UTC (permalink / raw)
  To: Akhil P Oommen
  Cc: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, linux-arm-msm, dri-devel,
	freedreno, devicetree, linux-kernel, Rob Clark, Marijn Suijten



On 5.05.2023 10:46, Akhil P Oommen wrote:
> On Thu, May 04, 2023 at 08:34:07AM +0200, Konrad Dybcio wrote:
>>
>>
>> On 3.05.2023 22:32, Akhil P Oommen wrote:
>>> On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
>>>>
>>>>
>>>> On 2.05.2023 09:49, Akhil P Oommen wrote:
>>>>> On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
>>>>>> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
>>>>>> but don't implement the associated GMUs. This is due to the fact that
>>>>>> the GMU directly pokes at RPMh. Sadly, this means we have to take care
>>>>>> of enabling & scaling power rails, clocks and bandwidth ourselves.
>>>>>>
>>>>>> Reuse existing Adreno-common code and modify the deeply-GMU-infused
>>>>>> A6XX code to facilitate these GPUs. This involves if-ing out lots
>>>>>> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
>>>>>> the actual name that Qualcomm uses in their downstream kernels).
>>>>>>
>>>>>> This is essentially a register region which is convenient to model
>>>>>> as a device. We'll use it for managing the GDSCs. The register
>>>>>> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
>>>>>> and lets us reuse quite a bit of gmu_read/write/rmw calls.
>>>>> << I sent a reply to this patch earlier, but not sure where it went.
>>>>> Still figuring out Mutt... >>
>>>> Answered it here:
>>>>
>>>> https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8f77@linaro.org/
>>>
>>> Thanks. Will check and respond there if needed.
>>>
>>>>
>>>> I don't think I see any new comments in this "reply revision" (heh), so please
>>>> check that one out.
>>>>
>>>>>
>>>>> Only convenience I found is that we can reuse gmu register ops in a few
>>>>> places (< 10 I think). If we just model this as another gpu memory
>>>>> region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
>>>>> architecture code with clean separation. Also, it looks like we need to
>>>>> keep a dummy gmu platform device in the devicetree with the current
>>>>> approach. That doesn't sound right.
>>>> That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
>>>> need additional, gmuwrapper-configuration specific code anyway, as
>>>> OPP & genpd will no longer make use of the default behavior which
>>>> only gets triggered if there's a single power-domains=<> entry, afaicu.
>>> Can you please tell me which specific *default behviour* do you mean here?
>>> I am curious to know what I am overlooking here. We can always get a cxpd/gxpd device
>>> and vote for the gdscs directly from the driver. Anything related to
>>> OPP?
>> I *believe* this is true:
>>
>> if (ARRAY_SIZE(power-domains) == 1) {
>> 	of generic code will enable the power domain at .probe time
> we need to handle the voting directly. I recently shared a patch to
> vote cx gdsc from gpu driver. Maybe we can ignore this when gpu has
> only cx rail due to this logic you quoted here.
> 
> I see that you have handled it mostly correctly from the gpu driver in the updated
> a6xx_pm_suspend() callback. Just the power domain device ptrs should be moved to
> gpu from gmu.
> 
>>
>> 	opp APIs will default to scaling that domain with required-opps
> 
>> }
>>
>> and we do need to put GX/CX (with an MX parent to match) there, as the
>> AP is responsible for voting in this configuration
> 
> We should vote to turn ON gx/cx headswitches through genpd from gpu driver. When you vote for
> core clk frequency, *clock driver is supposed to scale* all the necessary
> regulators. At least that is how downstream works. You can refer the downstream
> gpucc clk driver of these SoCs. I am not sure how much of that can be easily converted to
> upstream.
> 
> Also, how does having a gmu dt node help in this regard? Feel free to
> elaborate, I am not very familiar with clk/regulator implementations.
Okay so I think we have a bit of a confusion here.

Currently, with this patchset we manage things like this:

1. GPU has a VDD_GX (or equivalent[1]) line passed in power-domains=<>, which
   is then used with OPP APIs to ensure it's being scaled on freq change [2].
   The VDD_lines coming from RPM(h) are described as power domains upstream
   *unlike downstream*, which represents them as regulators with preset voltage
   steps (and perhaps that's what had you confused). What's more is that GDSCs
   are also modeled as genpds instead of regulators, hence they sort of "fight"
   for the spot in power-domains=<> of a given node.

2. GMU wrapper gets CX_GDSC & GX_GDSC handles in power-domains=<> (just like
   the real GMU in the current state of upstream [3]), which are then governed
   through explicit genpd calls to turn them on/off when the GPU resume/suspend/
   crash recovery functions are called.

3. GPUs with GMU, like A630, don't get any power-domains=<> entries in DT,
   instead relying on the GMU firmware to communicate necessary requests
   to the VDD_xyz resources directly to RPMh, as part of the DVFS routines.
   If GMU wasn't so smart, we would have to do the exact same VDD_xyz+OPP dance
   there - that's precisely what's going on under the hood.

4. Adreno SMMU gets a handle to CX_GDSC so that when OF probe funcs are called,
   (and SMMUs probe way way before all things drm) the headswitch is de-asserted
   and its registers and related clocks are accessible.


All this makes me believe the way I generally architected things in
this series is correct.


[1] A610 (and I think A612) lack a VDD_GX line, so they power the GPU from
    VDD_CX, but that's just an implementation detail which is handled by
    simply passing the correct one in DTS, the code doesn't care.

[2] Hence my recent changes to use dev_pm_opp_set_rate() wherever possible,
    this func reads requires-opps in OPP table entries and ensures to elevate
    the GENPD's performance state before switching frequencies

[3] Please take a look at the "end product" here:
    https://github.com/SoMainline/linux/commit/fb16757c3bf4c087ac597d70c7a98755d46bb323
    you can open e.g. sdm845.dtsi for comparison with real GMU

I hope this answers your concerns. If not, I'll be happy to elaborate.

Konrad
> 
> -Akhil.
>>
>> Konrad
>>>
>>> -Akhil
>>>>
>>>> If nothing else, this is a very convenient way to model a part of the
>>>> GPU (as that's essentially what GMU_CX is, to my understanding) and
>>>> the bindings people didn't shoot me in the head for proposing this, so
>>>> I assume it'd be cool to pursue this..
>>>>
>>>> Konrad
>>>>>>
>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
>>>>>> ---
>>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c       |  72 +++++++-
>>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 255 +++++++++++++++++++++++++---
>>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h       |   1 +
>>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
>>>>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.c     |   8 +-
>>>>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.h     |   6 +
>>>>>>  6 files changed, 318 insertions(+), 38 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>>>> index 87babbb2a19f..b1acdb027205 100644
>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>>>> @@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, struct platform_device *pdev,
>>>>>>  
>>>>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>>>>>>  {
>>>>>> +	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>>>>>>  	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>>>  	struct platform_device *pdev = to_platform_device(gmu->dev);
>>>>>>  
>>>>>> @@ -1494,10 +1495,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>>>>>>  	gmu->mmio = NULL;
>>>>>>  	gmu->rscc = NULL;
>>>>>>  
>>>>>> -	a6xx_gmu_memory_free(gmu);
>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>> +		a6xx_gmu_memory_free(gmu);
>>>>>>  
>>>>>> -	free_irq(gmu->gmu_irq, gmu);
>>>>>> -	free_irq(gmu->hfi_irq, gmu);
>>>>>> +		free_irq(gmu->gmu_irq, gmu);
>>>>>> +		free_irq(gmu->hfi_irq, gmu);
>>>>>> +	}
>>>>>>  
>>>>>>  	/* Drop reference taken in of_find_device_by_node */
>>>>>>  	put_device(gmu->dev);
>>>>>> @@ -1516,6 +1519,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
>>>>>>  	return 0;
>>>>>>  }
>>>>>>  
>>>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
>>>>>> +{
>>>>>> +	struct platform_device *pdev = of_find_device_by_node(node);
>>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>>> +	int ret;
>>>>>> +
>>>>>> +	if (!pdev)
>>>>>> +		return -ENODEV;
>>>>>> +
>>>>>> +	gmu->dev = &pdev->dev;
>>>>>> +
>>>>>> +	of_dma_configure(gmu->dev, node, true);
>>>>> why setup dma for a device that is not actually present?
>>>>>> +
>>>>>> +	pm_runtime_enable(gmu->dev);
>>>>>> +
>>>>>> +	/* Mark legacy for manual SPTPRAC control */
>>>>>> +	gmu->legacy = true;
>>>>>> +
>>>>>> +	/* Map the GMU registers */
>>>>>> +	gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
>>>>>> +	if (IS_ERR(gmu->mmio)) {
>>>>>> +		ret = PTR_ERR(gmu->mmio);
>>>>>> +		goto err_mmio;
>>>>>> +	}
>>>>>> +
>>>>>> +	gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
>>>>>> +	if (IS_ERR(gmu->cxpd)) {
>>>>>> +		ret = PTR_ERR(gmu->cxpd);
>>>>>> +		goto err_mmio;
>>>>>> +	}
>>>>>> +
>>>>>> +	if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
>>>>>> +		ret = -ENODEV;
>>>>>> +		goto detach_cxpd;
>>>>>> +	}
>>>>>> +
>>>>>> +	init_completion(&gmu->pd_gate);
>>>>>> +	complete_all(&gmu->pd_gate);
>>>>>> +	gmu->pd_nb.notifier_call = cxpd_notifier_cb;
>>>>>> +
>>>>>> +	/* Get a link to the GX power domain to reset the GPU */
>>>>>> +	gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
>>>>>> +	if (IS_ERR(gmu->gxpd)) {
>>>>>> +		ret = PTR_ERR(gmu->gxpd);
>>>>>> +		goto err_mmio;
>>>>>> +	}
>>>>>> +
>>>>>> +	gmu->initialized = true;
>>>>>> +
>>>>>> +	return 0;
>>>>>> +
>>>>>> +detach_cxpd:
>>>>>> +	dev_pm_domain_detach(gmu->cxpd, false);
>>>>>> +
>>>>>> +err_mmio:
>>>>>> +	iounmap(gmu->mmio);
>>>>>> +
>>>>>> +	/* Drop reference taken in of_find_device_by_node */
>>>>>> +	put_device(gmu->dev);
>>>>>> +
>>>>>> +	return ret;
>>>>>> +}
>>>>>> +
>>>>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
>>>>>>  {
>>>>>>  	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>>>> index 931f9f3b3a85..8e0345ffab81 100644
>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>>>> @@ -20,9 +20,11 @@ static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>>  
>>>>>> -	/* Check that the GMU is idle */
>>>>>> -	if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
>>>>>> -		return false;
>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>> +		/* Check that the GMU is idle */
>>>>>> +		if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
>>>>>> +			return false;
>>>>>> +	}
>>>>>>  
>>>>>>  	/* Check tha the CX master is idle */
>>>>>>  	if (gpu_read(gpu, REG_A6XX_RBBM_STATUS) &
>>>>>> @@ -612,13 +614,15 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
>>>>>>  		return;
>>>>>>  
>>>>>>  	/* Disable SP clock before programming HWCG registers */
>>>>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
>>>>>>  
>>>>>>  	for (i = 0; (reg = &adreno_gpu->info->hwcg[i], reg->offset); i++)
>>>>>>  		gpu_write(gpu, reg->offset, state ? reg->value : 0);
>>>>>>  
>>>>>>  	/* Enable SP clock */
>>>>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
>>>>>>  
>>>>>>  	gpu_write(gpu, REG_A6XX_RBBM_CLOCK_CNTL, state ? clock_cntl_on : 0);
>>>>>>  }
>>>>>> @@ -1018,10 +1022,13 @@ static int hw_init(struct msm_gpu *gpu)
>>>>>>  {
>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>>>  	int ret;
>>>>>>  
>>>>>> -	/* Make sure the GMU keeps the GPU on while we set it up */
>>>>>> -	a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>> +		/* Make sure the GMU keeps the GPU on while we set it up */
>>>>>> +		a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
>>>>>> +	}
>>>>>>  
>>>>>>  	/* Clear GBIF halt in case GX domain was not collapsed */
>>>>>>  	if (a6xx_has_gbif(adreno_gpu))
>>>>>> @@ -1144,6 +1151,17 @@ static int hw_init(struct msm_gpu *gpu)
>>>>>>  			0x3f0243f0);
>>>>>>  	}
>>>>>>  
>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>> +		/* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
>>>>>> +
>>>>>> +		/* Set up the CX GMU counter 0 to count busy ticks */
>>>>>> +		gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
>>>>>> +
>>>>>> +		/* Enable power counter 0 */
>>>>>> +		gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
>>>>>> +		gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
>>>>>> +	}
>>>>>> +
>>>>>>  	/* Protect registers from the CP */
>>>>>>  	a6xx_set_cp_protect(gpu);
>>>>>>  
>>>>>> @@ -1233,6 +1251,8 @@ static int hw_init(struct msm_gpu *gpu)
>>>>>>  	}
>>>>>>  
>>>>>>  out:
>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
>>>>>> +		return ret;
>>>>>>  	/*
>>>>>>  	 * Tell the GMU that we are done touching the GPU and it can start power
>>>>>>  	 * management
>>>>>> @@ -1267,6 +1287,9 @@ static void a6xx_dump(struct msm_gpu *gpu)
>>>>>>  	adreno_dump(gpu);
>>>>>>  }
>>>>>>  
>>>>>> +#define GBIF_GX_HALT_MASK	BIT(0)
>>>>>> +#define GBIF_CLIENT_HALT_MASK	BIT(0)
>>>>>> +#define GBIF_ARB_HALT_MASK	BIT(1)
>>>>>>  #define VBIF_RESET_ACK_TIMEOUT	100
>>>>>>  #define VBIF_RESET_ACK_MASK	0x00f0
>>>>>>  
>>>>>> @@ -1299,7 +1322,8 @@ static void a6xx_recover(struct msm_gpu *gpu)
>>>>>>  	 * Turn off keep alive that might have been enabled by the hang
>>>>>>  	 * interrupt
>>>>>>  	 */
>>>>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
>>>>>
>>>>> Maybe it is better to move this to a6xx_gmu_force_power_off.
>>>>>
>>>>>>  
>>>>>>  	pm_runtime_dont_use_autosuspend(&gpu->pdev->dev);
>>>>>>  
>>>>>> @@ -1329,6 +1353,32 @@ static void a6xx_recover(struct msm_gpu *gpu)
>>>>>>  
>>>>>>  	dev_pm_genpd_remove_notifier(gmu->cxpd);
>>>>>>  
>>>>>> +	/* Software-reset the GPU */
>>>>>
>>>>> This is not soft reset sequence. We are trying to quiescent gpu - ddr
>>>>> traffic with this sequence.
>>>>>
>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>> +		/* Halt the GX side of GBIF */
>>>>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, GBIF_GX_HALT_MASK);
>>>>>> +		spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) &
>>>>>> +			   GBIF_GX_HALT_MASK);
>>>>>> +
>>>>>> +		/* Halt new client requests on GBIF */
>>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
>>>>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
>>>>>> +			   (GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
>>>>>> +
>>>>>> +		/* Halt all AXI requests on GBIF */
>>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
>>>>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
>>>>>> +			   (GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
>>>>>> +
>>>>>> +		/* Clear the halts */
>>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
>>>>>> +
>>>>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
>>>>>> +
>>>>>> +		/* This *really* needs to go through before we do anything else! */
>>>>>> +		mb();
>>>>>> +	}
>>>>>> +
>>>>>
>>>>> This sequence should be before we collapse cx gdsc. Also, please see if
>>>>> we can create a subroutine to avoid code dup.
>>>>>
>>>>>>  	pm_runtime_use_autosuspend(&gpu->pdev->dev);
>>>>>>  
>>>>>>  	if (active_submits)
>>>>>> @@ -1463,7 +1513,8 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
>>>>>>  	 * Force the GPU to stay on until after we finish
>>>>>>  	 * collecting information
>>>>>>  	 */
>>>>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
>>>>>>  
>>>>>>  	DRM_DEV_ERROR(&gpu->pdev->dev,
>>>>>>  		"gpu fault ring %d fence %x status %8.8X rb %4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
>>>>>> @@ -1624,7 +1675,7 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
>>>>>>  		a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
>>>>>>  }
>>>>>>  
>>>>>> -static int a6xx_pm_resume(struct msm_gpu *gpu)
>>>>>> +static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
>>>>>>  {
>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>> @@ -1644,10 +1695,61 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
>>>>>>  
>>>>>>  	a6xx_llc_activate(a6xx_gpu);
>>>>>>  
>>>>>> -	return 0;
>>>>>> +	return ret;
>>>>>>  }
>>>>>>  
>>>>>> -static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>>>>> +static int a6xx_pm_resume(struct msm_gpu *gpu)
>>>>>> +{
>>>>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>>> +	unsigned long freq = 0;
>>>>>> +	struct dev_pm_opp *opp;
>>>>>> +	int ret;
>>>>>> +
>>>>>> +	gpu->needs_hw_init = true;
>>>>>> +
>>>>>> +	trace_msm_gpu_resume(0);
>>>>>> +
>>>>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
>>>>> I think we can ignore gmu lock as there is no real gmu device.
>>>>>
>>>>>> +
>>>>>> +	pm_runtime_resume_and_get(gmu->dev);
>>>>>> +	pm_runtime_resume_and_get(gmu->gxpd);
>>>>>> +
>>>>>> +	/* Set the core clock, having VDD scaling in mind */
>>>>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, gpu->fast_rate);
>>>>>> +	if (ret)
>>>>>> +		goto err_core_clk;
>>>>>> +
>>>>>> +	ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
>>>>>> +	if (ret)
>>>>>> +		goto err_bulk_clk;
>>>>>> +
>>>>>> +	ret = clk_prepare_enable(gpu->ebi1_clk);
>>>>>> +	if (ret)
>>>>>> +		goto err_mem_clk;
>>>>>> +
>>>>>> +	/* If anything goes south, tear the GPU down piece by piece.. */
>>>>>> +	if (ret) {
>>>>>> +err_mem_clk:
>>>>>> +		clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
>>>>>> +err_bulk_clk:
>>>>>> +		opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
>>>>>> +		dev_pm_opp_put(opp);
>>>>>> +		dev_pm_opp_set_rate(&gpu->pdev->dev, 0);
>>>>>> +err_core_clk:
>>>>>> +		pm_runtime_put(gmu->gxpd);
>>>>>> +		pm_runtime_put(gmu->dev);
>>>>>> +	}
>>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
>>>>>> +
>>>>>> +	if (!ret)
>>>>>> +		msm_devfreq_resume(gpu);
>>>>>> +
>>>>>> +	return ret;
>>>>>> +}
>>>>>> +
>>>>>> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
>>>>>>  {
>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>> @@ -1674,11 +1776,62 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>>>>>  	return 0;
>>>>>>  }
>>>>>>  
>>>>>> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>>>>> +{
>>>>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>>> +	unsigned long freq = 0;
>>>>>> +	struct dev_pm_opp *opp;
>>>>>> +	int i, ret;
>>>>>> +
>>>>>> +	trace_msm_gpu_suspend(0);
>>>>>> +
>>>>>> +	opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
>>>>>> +	dev_pm_opp_put(opp);
>>>>>> +
>>>>>> +	msm_devfreq_suspend(gpu);
>>>>>> +
>>>>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
>>>>>> +
>>>>>> +	clk_disable_unprepare(gpu->ebi1_clk);
>>>>>> +
>>>>>> +	clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
>>>>>> +
>>>>>> +	/* Set frequency to the minimum supported level (no 27MHz on A6xx!) */
>>>>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, freq);
>>>>>> +	if (ret)
>>>>>> +		goto err;
>>>>>> +
>>>>>> +	pm_runtime_put_sync(gmu->gxpd);
>>>>>> +	pm_runtime_put_sync(gmu->dev);
>>>>>> +
>>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
>>>>>> +
>>>>>> +	if (a6xx_gpu->shadow_bo)
>>>>>> +		for (i = 0; i < gpu->nr_rings; i++)
>>>>>> +			a6xx_gpu->shadow[i] = 0;
>>>>>> +
>>>>>> +	gpu->suspend_count++;
>>>>>> +
>>>>>> +	return 0;
>>>>>> +
>>>>>> +err:
>>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
>>>>>> +
>>>>>> +	return ret;
>>>>>> +}
>>>>>> +
>>>>>>  static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>>>>>>  {
>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>>  
>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>> +		*value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
>>>>>> +		return 0;
>>>>>> +	}
>>>>>> +
>>>>> Instead of wrapper check here, we can just create a separate op. I don't
>>>>> see any benefit in reusing the same function here.
>>>>>
>>>>>
>>>>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
>>>>>>  
>>>>>>  	/* Force the GPU power on so we can read this register */
>>>>>> @@ -1716,7 +1869,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
>>>>>>  		drm_gem_object_put(a6xx_gpu->shadow_bo);
>> 6xx_pm_suspend >>>>  	}
>>>>>>  
>>>>>> -	a6xx_llc_slices_destroy(a6xx_gpu);
>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>> +		a6xx_llc_slices_destroy(a6xx_gpu);
>>>>>>  
>>>>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
>>>>>>  	a6xx_gmu_remove(a6xx_gpu);
>>>>>> @@ -1957,8 +2111,8 @@ static const struct adreno_gpu_funcs funcs = {
>>>>>>  		.set_param = adreno_set_param,
>>>>>>  		.hw_init = a6xx_hw_init,
>>>>>>  		.ucode_load = a6xx_ucode_load,
>>>>>> -		.pm_suspend = a6xx_pm_suspend,
>>>>>> -		.pm_resume = a6xx_pm_resume,
>>>>>> +		.pm_suspend = a6xx_gmu_pm_suspend,
>>>>>> +		.pm_resume = a6xx_gmu_pm_resume,
>>>>>>  		.recover = a6xx_recover,
>>>>>>  		.submit = a6xx_submit,
>>>>>>  		.active_ring = a6xx_active_ring,
>>>>>> @@ -1982,6 +2136,35 @@ static const struct adreno_gpu_funcs funcs = {
>>>>>>  	.get_timestamp = a6xx_get_timestamp,
>>>>>>  };
>>>>>>  
>>>>>> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
>>>>>> +	.base = {
>>>>>> +		.get_param = adreno_get_param,
>>>>>> +		.set_param = adreno_set_param,
>>>>>> +		.hw_init = a6xx_hw_init,
>>>>>> +		.ucode_load = a6xx_ucode_load,
>>>>>> +		.pm_suspend = a6xx_pm_suspend,
>>>>>> +		.pm_resume = a6xx_pm_resume,
>>>>>> +		.recover = a6xx_recover,
>>>>>> +		.submit = a6xx_submit,
>>>>>> +		.active_ring = a6xx_active_ring,
>>>>>> +		.irq = a6xx_irq,
>>>>>> +		.destroy = a6xx_destroy,
>>>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
>>>>>> +		.show = a6xx_show,
>>>>>> +#endif
>>>>>> +		.gpu_busy = a6xx_gpu_busy,
>>>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
>>>>>> +		.gpu_state_get = a6xx_gpu_state_get,
>>>>>> +		.gpu_state_put = a6xx_gpu_state_put,
>>>>>> +#endif
>>>>>> +		.create_address_space = a6xx_create_address_space,
>>>>>> +		.create_private_address_space = a6xx_create_private_address_space,
>>>>>> +		.get_rptr = a6xx_get_rptr,
>>>>>> +		.progress = a6xx_progress,
>>>>>> +	},
>>>>>> +	.get_timestamp = a6xx_get_timestamp,
>>>>>> +};
>>>>>> +
>>>>>>  struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>>>  {
>>>>>>  	struct msm_drm_private *priv = dev->dev_private;
>>>>>> @@ -2003,18 +2186,36 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>>>  
>>>>>>  	adreno_gpu->registers = NULL;
>>>>>>  
>>>>>> +	/* Check if there is a GMU phandle and set it up */
>>>>>> +	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
>>>>>> +	/* FIXME: How do we gracefully handle this? */
>>>>>> +	BUG_ON(!node);
>>>>> How will you handle this BUG() when there is no GMU (a610 gpu)?
>>>>>
>>>>>> +
>>>>>> +	adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
>>>>>> +
>>>>>>  	/*
>>>>>>  	 * We need to know the platform type before calling into adreno_gpu_init
>>>>>>  	 * so that the hw_apriv flag can be correctly set. Snoop into the info
>>>>>>  	 * and grab the revision number
>>>>>>  	 */
>>>>>>  	info = adreno_info(config->rev);
>>>>>> -
>>>>>> -	if (info && (info->revn == 650 || info->revn == 660 ||
>>>>>> -			adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
>>>>>> +	if (!info)
>>>>>> +		return ERR_PTR(-EINVAL);
>>>>>> +
>>>>>> +	/* Assign these early so that we can use the is_aXYZ helpers */
>>>>>> +	/* Numeric revision IDs (e.g. 630) */
>>>>>> +	adreno_gpu->revn = info->revn;
>>>>>> +	/* New-style ADRENO_REV()-only */
>>>>>> +	adreno_gpu->rev = info->rev;
>>>>>> +	/* Quirk data */
>>>>>> +	adreno_gpu->info = info;
>>>>>> +
>>>>>> +	if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
>>>>>>  		adreno_gpu->base.hw_apriv = true;
>>>>>>  
>>>>>> -	a6xx_llc_slices_init(pdev, a6xx_gpu);
>>>>>> +	/* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>> +		a6xx_llc_slices_init(pdev, a6xx_gpu);
>>>>>>  
>>>>>>  	ret = a6xx_set_supported_hw(&pdev->dev, config->rev);
>>>>>>  	if (ret) {
>>>>>> @@ -2022,7 +2223,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>>>  		return ERR_PTR(ret);
>>>>>>  	}
>>>>>>  
>>>>>> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
>>>>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
>>>>>> +	else
>>>>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
>>>>>>  	if (ret) {
>>>>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
>>>>>>  		return ERR_PTR(ret);
>>>>>> @@ -2035,13 +2239,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>>>  	if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
>>>>>>  		priv->gpu_clamp_to_idle = true;
>>>>>>  
>>>>>> -	/* Check if there is a GMU phandle and set it up */
>>>>>> -	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
>>>>>> -
>>>>>> -	/* FIXME: How do we gracefully handle this? */
>>>>>> -	BUG_ON(!node);
>>>>>> -
>>>>>> -	ret = a6xx_gmu_init(a6xx_gpu, node);
>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
>>>>>> +		ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
>>>>>> +	else
>>>>>> +		ret = a6xx_gmu_init(a6xx_gpu, node);
>>>>>>  	of_node_put(node);
>>>>>>  	if (ret) {
>>>>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>>>> index eea2e60ce3b7..51a7656072fa 100644
>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>>>> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>>>>>>  void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>>>>>>  
>>>>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>>>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>>>>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
>>>>>>  
>>>>>>  void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>>>> index 30ecdff363e7..4e5d650578c6 100644
>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>>>> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
>>>>>>  	/* Get the generic state from the adreno core */
>>>>>>  	adreno_gpu_state_get(gpu, &a6xx_state->base);
>>>>>>  
>>>>>> -	a6xx_get_gmu_registers(gpu, a6xx_state);
>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>> nit: Kinda misleading function name to a layman. Should we invert the
>>>>> function to "adreno_has_gmu"?
>>>>>
>>>>> -Akhil
>>>>>> +		a6xx_get_gmu_registers(gpu, a6xx_state);
>>>>>>  
>>>>>> -	a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
>>>>>> -	a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
>>>>>> -	a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
>>>>>> +		a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
>>>>>> +		a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
>>>>>> +		a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
>>>>>>  
>>>>>> -	a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
>>>>>> +		a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
>>>>>> +	}
>>>>>>  
>>>>>>  	/* If GX isn't on the rest of the data isn't going to be accessible */
>>>>>> -	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>>>>>>  		return &a6xx_state->base;
>>>>>>  
>>>>>>  	/* Get the banks of indexed registers */
>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>>>> index 6934cee07d42..5c5901d65950 100644
>>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>>>> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
>>>>>>  		if (!adreno_gpu->info->fw[i])
>>>>>>  			continue;
>>>>>>  
>>>>>> +		/* Skip loading GMU firwmare with GMU Wrapper */
>>>>>> +		if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
>>>>>> +			continue;
>>>>>> +
>>>>>>  		/* Skip if the firmware has already been loaded */
>>>>>>  		if (adreno_gpu->fw[i])
>>>>>>  			continue;
>>>>>> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>>>>>>  	u32 speedbin;
>>>>>>  	int ret;
>>>>>>  
>>>>>> -	/* Only handle the core clock when GMU is not in use */
>>>>>> -	if (config->rev.core < 6) {
>>>>>> +	/* Only handle the core clock when GMU is not in use (or is absent). */
>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
>>>>>>  		/*
>>>>>>  		 * This can only be done before devm_pm_opp_of_add_table(), or
>>>>>>  		 * dev_pm_opp_set_config() will WARN_ON()
>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>>>> index f62612a5c70f..ee5352bc5329 100644
>>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>>>> @@ -115,6 +115,7 @@ struct adreno_gpu {
>>>>>>  	 * code (a3xx_gpu.c) and stored in this common location.
>>>>>>  	 */
>>>>>>  	const unsigned int *reg_offsets;
>>>>>> +	bool gmu_is_wrapper;
>>>>>>  };
>>>>>>  #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
>>>>>>  
>>>>>> @@ -145,6 +146,11 @@ struct adreno_platform_config {
>>>>>>  
>>>>>>  bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
>>>>>>  
>>>>>> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
>>>>>> +{
>>>>>> +	return gpu->gmu_is_wrapper;
>>>>>> +}
>>>>>> +
>>>>>>  static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
>>>>>>  {
>>>>>>  	return (gpu->revn < 300);
>>>>>>
>>>>>> -- 
>>>>>> 2.40.0
>>>>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support
  2023-05-05 10:35             ` Konrad Dybcio
@ 2023-05-06 14:46               ` Akhil P Oommen
  2023-05-06 20:46                 ` [Freedreno] " Akhil P Oommen
  2023-05-08  8:59                 ` Konrad Dybcio
  0 siblings, 2 replies; 30+ messages in thread
From: Akhil P Oommen @ 2023-05-06 14:46 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, linux-arm-msm, dri-devel,
	freedreno, devicetree, linux-kernel, Rob Clark, Marijn Suijten

On Fri, May 05, 2023 at 12:35:18PM +0200, Konrad Dybcio wrote:
> 
> 
> On 5.05.2023 10:46, Akhil P Oommen wrote:
> > On Thu, May 04, 2023 at 08:34:07AM +0200, Konrad Dybcio wrote:
> >>
> >>
> >> On 3.05.2023 22:32, Akhil P Oommen wrote:
> >>> On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
> >>>>
> >>>>
> >>>> On 2.05.2023 09:49, Akhil P Oommen wrote:
> >>>>> On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
> >>>>>> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
> >>>>>> but don't implement the associated GMUs. This is due to the fact that
> >>>>>> the GMU directly pokes at RPMh. Sadly, this means we have to take care
> >>>>>> of enabling & scaling power rails, clocks and bandwidth ourselves.
> >>>>>>
> >>>>>> Reuse existing Adreno-common code and modify the deeply-GMU-infused
> >>>>>> A6XX code to facilitate these GPUs. This involves if-ing out lots
> >>>>>> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
> >>>>>> the actual name that Qualcomm uses in their downstream kernels).
> >>>>>>
> >>>>>> This is essentially a register region which is convenient to model
> >>>>>> as a device. We'll use it for managing the GDSCs. The register
> >>>>>> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
> >>>>>> and lets us reuse quite a bit of gmu_read/write/rmw calls.
> >>>>> << I sent a reply to this patch earlier, but not sure where it went.
> >>>>> Still figuring out Mutt... >>
> >>>> Answered it here:
> >>>>
> >>>> https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8f77@linaro.org/
> >>>
> >>> Thanks. Will check and respond there if needed.
> >>>
> >>>>
> >>>> I don't think I see any new comments in this "reply revision" (heh), so please
> >>>> check that one out.
> >>>>
> >>>>>
> >>>>> Only convenience I found is that we can reuse gmu register ops in a few
> >>>>> places (< 10 I think). If we just model this as another gpu memory
> >>>>> region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
> >>>>> architecture code with clean separation. Also, it looks like we need to
> >>>>> keep a dummy gmu platform device in the devicetree with the current
> >>>>> approach. That doesn't sound right.
> >>>> That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
> >>>> need additional, gmuwrapper-configuration specific code anyway, as
> >>>> OPP & genpd will no longer make use of the default behavior which
> >>>> only gets triggered if there's a single power-domains=<> entry, afaicu.
> >>> Can you please tell me which specific *default behviour* do you mean here?
> >>> I am curious to know what I am overlooking here. We can always get a cxpd/gxpd device
> >>> and vote for the gdscs directly from the driver. Anything related to
> >>> OPP?
> >> I *believe* this is true:
> >>
> >> if (ARRAY_SIZE(power-domains) == 1) {
> >> 	of generic code will enable the power domain at .probe time
> > we need to handle the voting directly. I recently shared a patch to
> > vote cx gdsc from gpu driver. Maybe we can ignore this when gpu has
> > only cx rail due to this logic you quoted here.
> > 
> > I see that you have handled it mostly correctly from the gpu driver in the updated
> > a6xx_pm_suspend() callback. Just the power domain device ptrs should be moved to
> > gpu from gmu.
> > 
> >>
> >> 	opp APIs will default to scaling that domain with required-opps
> > 
> >> }
> >>
> >> and we do need to put GX/CX (with an MX parent to match) there, as the
> >> AP is responsible for voting in this configuration
> > 
> > We should vote to turn ON gx/cx headswitches through genpd from gpu driver. When you vote for
> > core clk frequency, *clock driver is supposed to scale* all the necessary
> > regulators. At least that is how downstream works. You can refer the downstream
> > gpucc clk driver of these SoCs. I am not sure how much of that can be easily converted to
> > upstream.
> > 
> > Also, how does having a gmu dt node help in this regard? Feel free to
> > elaborate, I am not very familiar with clk/regulator implementations.
> Okay so I think we have a bit of a confusion here.
> 
> Currently, with this patchset we manage things like this:
> 
> 1. GPU has a VDD_GX (or equivalent[1]) line passed in power-domains=<>, which
>    is then used with OPP APIs to ensure it's being scaled on freq change [2].
>    The VDD_lines coming from RPM(h) are described as power domains upstream
>    *unlike downstream*, which represents them as regulators with preset voltage
>    steps (and perhaps that's what had you confused). What's more is that GDSCs
>    are also modeled as genpds instead of regulators, hence they sort of "fight"
>    for the spot in power-domains=<> of a given node.

Thanks for clarifying. I didn't get this part "hence they sort of "fight" for the spot in power-domains".
What spot exactly did you mean here? The spot for PD to be used during scaling?

It seems like you are hinting that there is some sort of limitation in keeping all the
3 power domains (cx gdsc, gx gdsc and cx rail) under the gpu node in dt. Please explain
why we can't keep all the 3 power domains under gpu node and call an API
(devm_pm_opp_attach_genpd() ??) to select the power domain which should be scaled?

> 
> 2. GMU wrapper gets CX_GDSC & GX_GDSC handles in power-domains=<> (just like
>    the real GMU in the current state of upstream [3]), which are then governed
>    through explicit genpd calls to turn them on/off when the GPU resume/suspend/
>    crash recovery functions are called.
> 
> 3. GPUs with GMU, like A630, don't get any power-domains=<> entries in DT,
>    instead relying on the GMU firmware to communicate necessary requests
>    to the VDD_xyz resources directly to RPMh, as part of the DVFS routines.
>    If GMU wasn't so smart, we would have to do the exact same VDD_xyz+OPP dance
>    there - that's precisely what's going on under the hood.
> 
> 4. Adreno SMMU gets a handle to CX_GDSC so that when OF probe funcs are called,
>    (and SMMUs probe way way before all things drm) the headswitch is de-asserted
>    and its registers and related clocks are accessible.
> 
> 
> All this makes me believe the way I generally architected things in
> this series is correct.
> 
> 
> [1] A610 (and I think A612) lack a VDD_GX line, so they power the GPU from
>     VDD_CX, but that's just an implementation detail which is handled by
>     simply passing the correct one in DTS, the code doesn't care.
> 
> [2] Hence my recent changes to use dev_pm_opp_set_rate() wherever possible,
>     this func reads requires-opps in OPP table entries and ensures to elevate
>     the GENPD's performance state before switching frequencies
> 
> [3] Please take a look at the "end product" here:
>     https://github.com/SoMainline/linux/commit/fb16757c3bf4c087ac597d70c7a98755d46bb323
>     you can open e.g. sdm845.dtsi for comparison with real GMU

This dt definition for a610 gpu clearly shows the issue I have here. Someone
looking at this gets a very wrong picture about the platform like there is actually nothing
resembling a gmu IP in a610. Is gmu or gmu-cx register region really present in this hw?

Just a side note about the dt file you shared:
	1. At line: 1243, It shouldn't have gx gdsc, right?
	2. At line: 1172, SM6115_VDDCX -> SM6115_VDDGX?

-Akhil

> 
> I hope this answers your concerns. If not, I'll be happy to elaborate.
> 
> Konrad
> > 
> > -Akhil.
> >>
> >> Konrad
> >>>
> >>> -Akhil
> >>>>
> >>>> If nothing else, this is a very convenient way to model a part of the
> >>>> GPU (as that's essentially what GMU_CX is, to my understanding) and
> >>>> the bindings people didn't shoot me in the head for proposing this, so
> >>>> I assume it'd be cool to pursue this..
> >>>>
> >>>> Konrad
> >>>>>>
> >>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> >>>>>> ---
> >>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c       |  72 +++++++-
> >>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 255 +++++++++++++++++++++++++---
> >>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h       |   1 +
> >>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
> >>>>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.c     |   8 +-
> >>>>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.h     |   6 +
> >>>>>>  6 files changed, 318 insertions(+), 38 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>>>>> index 87babbb2a19f..b1acdb027205 100644
> >>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>>>>> @@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, struct platform_device *pdev,
> >>>>>>  
> >>>>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> >>>>>>  {
> >>>>>> +	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> >>>>>>  	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>>>>>  	struct platform_device *pdev = to_platform_device(gmu->dev);
> >>>>>>  
> >>>>>> @@ -1494,10 +1495,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> >>>>>>  	gmu->mmio = NULL;
> >>>>>>  	gmu->rscc = NULL;
> >>>>>>  
> >>>>>> -	a6xx_gmu_memory_free(gmu);
> >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>>>> +		a6xx_gmu_memory_free(gmu);
> >>>>>>  
> >>>>>> -	free_irq(gmu->gmu_irq, gmu);
> >>>>>> -	free_irq(gmu->hfi_irq, gmu);
> >>>>>> +		free_irq(gmu->gmu_irq, gmu);
> >>>>>> +		free_irq(gmu->hfi_irq, gmu);
> >>>>>> +	}
> >>>>>>  
> >>>>>>  	/* Drop reference taken in of_find_device_by_node */
> >>>>>>  	put_device(gmu->dev);
> >>>>>> @@ -1516,6 +1519,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
> >>>>>>  	return 0;
> >>>>>>  }
> >>>>>>  
> >>>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> >>>>>> +{
> >>>>>> +	struct platform_device *pdev = of_find_device_by_node(node);
> >>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>>>>> +	int ret;
> >>>>>> +
> >>>>>> +	if (!pdev)
> >>>>>> +		return -ENODEV;
> >>>>>> +
> >>>>>> +	gmu->dev = &pdev->dev;
> >>>>>> +
> >>>>>> +	of_dma_configure(gmu->dev, node, true);
> >>>>> why setup dma for a device that is not actually present?
> >>>>>> +
> >>>>>> +	pm_runtime_enable(gmu->dev);
> >>>>>> +
> >>>>>> +	/* Mark legacy for manual SPTPRAC control */
> >>>>>> +	gmu->legacy = true;
> >>>>>> +
> >>>>>> +	/* Map the GMU registers */
> >>>>>> +	gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
> >>>>>> +	if (IS_ERR(gmu->mmio)) {
> >>>>>> +		ret = PTR_ERR(gmu->mmio);
> >>>>>> +		goto err_mmio;
> >>>>>> +	}
> >>>>>> +
> >>>>>> +	gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
> >>>>>> +	if (IS_ERR(gmu->cxpd)) {
> >>>>>> +		ret = PTR_ERR(gmu->cxpd);
> >>>>>> +		goto err_mmio;
> >>>>>> +	}
> >>>>>> +
> >>>>>> +	if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
> >>>>>> +		ret = -ENODEV;
> >>>>>> +		goto detach_cxpd;
> >>>>>> +	}
> >>>>>> +
> >>>>>> +	init_completion(&gmu->pd_gate);
> >>>>>> +	complete_all(&gmu->pd_gate);
> >>>>>> +	gmu->pd_nb.notifier_call = cxpd_notifier_cb;
> >>>>>> +
> >>>>>> +	/* Get a link to the GX power domain to reset the GPU */
> >>>>>> +	gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
> >>>>>> +	if (IS_ERR(gmu->gxpd)) {
> >>>>>> +		ret = PTR_ERR(gmu->gxpd);
> >>>>>> +		goto err_mmio;
> >>>>>> +	}
> >>>>>> +
> >>>>>> +	gmu->initialized = true;
> >>>>>> +
> >>>>>> +	return 0;
> >>>>>> +
> >>>>>> +detach_cxpd:
> >>>>>> +	dev_pm_domain_detach(gmu->cxpd, false);
> >>>>>> +
> >>>>>> +err_mmio:
> >>>>>> +	iounmap(gmu->mmio);
> >>>>>> +
> >>>>>> +	/* Drop reference taken in of_find_device_by_node */
> >>>>>> +	put_device(gmu->dev);
> >>>>>> +
> >>>>>> +	return ret;
> >>>>>> +}
> >>>>>> +
> >>>>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> >>>>>>  {
> >>>>>>  	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>> index 931f9f3b3a85..8e0345ffab81 100644
> >>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>> @@ -20,9 +20,11 @@ static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
> >>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>>>>  
> >>>>>> -	/* Check that the GMU is idle */
> >>>>>> -	if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
> >>>>>> -		return false;
> >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>>>> +		/* Check that the GMU is idle */
> >>>>>> +		if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
> >>>>>> +			return false;
> >>>>>> +	}
> >>>>>>  
> >>>>>>  	/* Check tha the CX master is idle */
> >>>>>>  	if (gpu_read(gpu, REG_A6XX_RBBM_STATUS) &
> >>>>>> @@ -612,13 +614,15 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
> >>>>>>  		return;
> >>>>>>  
> >>>>>>  	/* Disable SP clock before programming HWCG registers */
> >>>>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
> >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
> >>>>>>  
> >>>>>>  	for (i = 0; (reg = &adreno_gpu->info->hwcg[i], reg->offset); i++)
> >>>>>>  		gpu_write(gpu, reg->offset, state ? reg->value : 0);
> >>>>>>  
> >>>>>>  	/* Enable SP clock */
> >>>>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
> >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
> >>>>>>  
> >>>>>>  	gpu_write(gpu, REG_A6XX_RBBM_CLOCK_CNTL, state ? clock_cntl_on : 0);
> >>>>>>  }
> >>>>>> @@ -1018,10 +1022,13 @@ static int hw_init(struct msm_gpu *gpu)
> >>>>>>  {
> >>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>>>>>  	int ret;
> >>>>>>  
> >>>>>> -	/* Make sure the GMU keeps the GPU on while we set it up */
> >>>>>> -	a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>>>> +		/* Make sure the GMU keeps the GPU on while we set it up */
> >>>>>> +		a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> >>>>>> +	}
> >>>>>>  
> >>>>>>  	/* Clear GBIF halt in case GX domain was not collapsed */
> >>>>>>  	if (a6xx_has_gbif(adreno_gpu))
> >>>>>> @@ -1144,6 +1151,17 @@ static int hw_init(struct msm_gpu *gpu)
> >>>>>>  			0x3f0243f0);
> >>>>>>  	}
> >>>>>>  
> >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>>>> +		/* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
> >>>>>> +
> >>>>>> +		/* Set up the CX GMU counter 0 to count busy ticks */
> >>>>>> +		gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
> >>>>>> +
> >>>>>> +		/* Enable power counter 0 */
> >>>>>> +		gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
> >>>>>> +		gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
> >>>>>> +	}
> >>>>>> +
> >>>>>>  	/* Protect registers from the CP */
> >>>>>>  	a6xx_set_cp_protect(gpu);
> >>>>>>  
> >>>>>> @@ -1233,6 +1251,8 @@ static int hw_init(struct msm_gpu *gpu)
> >>>>>>  	}
> >>>>>>  
> >>>>>>  out:
> >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>> +		return ret;
> >>>>>>  	/*
> >>>>>>  	 * Tell the GMU that we are done touching the GPU and it can start power
> >>>>>>  	 * management
> >>>>>> @@ -1267,6 +1287,9 @@ static void a6xx_dump(struct msm_gpu *gpu)
> >>>>>>  	adreno_dump(gpu);
> >>>>>>  }
> >>>>>>  
> >>>>>> +#define GBIF_GX_HALT_MASK	BIT(0)
> >>>>>> +#define GBIF_CLIENT_HALT_MASK	BIT(0)
> >>>>>> +#define GBIF_ARB_HALT_MASK	BIT(1)
> >>>>>>  #define VBIF_RESET_ACK_TIMEOUT	100
> >>>>>>  #define VBIF_RESET_ACK_MASK	0x00f0
> >>>>>>  
> >>>>>> @@ -1299,7 +1322,8 @@ static void a6xx_recover(struct msm_gpu *gpu)
> >>>>>>  	 * Turn off keep alive that might have been enabled by the hang
> >>>>>>  	 * interrupt
> >>>>>>  	 */
> >>>>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
> >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
> >>>>>
> >>>>> Maybe it is better to move this to a6xx_gmu_force_power_off.
> >>>>>
> >>>>>>  
> >>>>>>  	pm_runtime_dont_use_autosuspend(&gpu->pdev->dev);
> >>>>>>  
> >>>>>> @@ -1329,6 +1353,32 @@ static void a6xx_recover(struct msm_gpu *gpu)
> >>>>>>  
> >>>>>>  	dev_pm_genpd_remove_notifier(gmu->cxpd);
> >>>>>>  
> >>>>>> +	/* Software-reset the GPU */
> >>>>>
> >>>>> This is not soft reset sequence. We are trying to quiescent gpu - ddr
> >>>>> traffic with this sequence.
> >>>>>
> >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>>>> +		/* Halt the GX side of GBIF */
> >>>>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, GBIF_GX_HALT_MASK);
> >>>>>> +		spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) &
> >>>>>> +			   GBIF_GX_HALT_MASK);
> >>>>>> +
> >>>>>> +		/* Halt new client requests on GBIF */
> >>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
> >>>>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> >>>>>> +			   (GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
> >>>>>> +
> >>>>>> +		/* Halt all AXI requests on GBIF */
> >>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
> >>>>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> >>>>>> +			   (GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
> >>>>>> +
> >>>>>> +		/* Clear the halts */
> >>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> >>>>>> +
> >>>>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
> >>>>>> +
> >>>>>> +		/* This *really* needs to go through before we do anything else! */
> >>>>>> +		mb();
> >>>>>> +	}
> >>>>>> +
> >>>>>
> >>>>> This sequence should be before we collapse cx gdsc. Also, please see if
> >>>>> we can create a subroutine to avoid code dup.
> >>>>>
> >>>>>>  	pm_runtime_use_autosuspend(&gpu->pdev->dev);
> >>>>>>  
> >>>>>>  	if (active_submits)
> >>>>>> @@ -1463,7 +1513,8 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
> >>>>>>  	 * Force the GPU to stay on until after we finish
> >>>>>>  	 * collecting information
> >>>>>>  	 */
> >>>>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
> >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
> >>>>>>  
> >>>>>>  	DRM_DEV_ERROR(&gpu->pdev->dev,
> >>>>>>  		"gpu fault ring %d fence %x status %8.8X rb %4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
> >>>>>> @@ -1624,7 +1675,7 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
> >>>>>>  		a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
> >>>>>>  }
> >>>>>>  
> >>>>>> -static int a6xx_pm_resume(struct msm_gpu *gpu)
> >>>>>> +static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
> >>>>>>  {
> >>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>>>> @@ -1644,10 +1695,61 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
> >>>>>>  
> >>>>>>  	a6xx_llc_activate(a6xx_gpu);
> >>>>>>  
> >>>>>> -	return 0;
> >>>>>> +	return ret;
> >>>>>>  }
> >>>>>>  
> >>>>>> -static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >>>>>> +static int a6xx_pm_resume(struct msm_gpu *gpu)
> >>>>>> +{
> >>>>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>>>>> +	unsigned long freq = 0;
> >>>>>> +	struct dev_pm_opp *opp;
> >>>>>> +	int ret;
> >>>>>> +
> >>>>>> +	gpu->needs_hw_init = true;
> >>>>>> +
> >>>>>> +	trace_msm_gpu_resume(0);
> >>>>>> +
> >>>>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
> >>>>> I think we can ignore gmu lock as there is no real gmu device.
> >>>>>
> >>>>>> +
> >>>>>> +	pm_runtime_resume_and_get(gmu->dev);
> >>>>>> +	pm_runtime_resume_and_get(gmu->gxpd);
> >>>>>> +
> >>>>>> +	/* Set the core clock, having VDD scaling in mind */
> >>>>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, gpu->fast_rate);
> >>>>>> +	if (ret)
> >>>>>> +		goto err_core_clk;
> >>>>>> +
> >>>>>> +	ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
> >>>>>> +	if (ret)
> >>>>>> +		goto err_bulk_clk;
> >>>>>> +
> >>>>>> +	ret = clk_prepare_enable(gpu->ebi1_clk);
> >>>>>> +	if (ret)
> >>>>>> +		goto err_mem_clk;
> >>>>>> +
> >>>>>> +	/* If anything goes south, tear the GPU down piece by piece.. */
> >>>>>> +	if (ret) {
> >>>>>> +err_mem_clk:
> >>>>>> +		clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
> >>>>>> +err_bulk_clk:
> >>>>>> +		opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
> >>>>>> +		dev_pm_opp_put(opp);
> >>>>>> +		dev_pm_opp_set_rate(&gpu->pdev->dev, 0);
> >>>>>> +err_core_clk:
> >>>>>> +		pm_runtime_put(gmu->gxpd);
> >>>>>> +		pm_runtime_put(gmu->dev);
> >>>>>> +	}
> >>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> >>>>>> +
> >>>>>> +	if (!ret)
> >>>>>> +		msm_devfreq_resume(gpu);
> >>>>>> +
> >>>>>> +	return ret;
> >>>>>> +}
> >>>>>> +
> >>>>>> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
> >>>>>>  {
> >>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>>>> @@ -1674,11 +1776,62 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >>>>>>  	return 0;
> >>>>>>  }
> >>>>>>  
> >>>>>> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >>>>>> +{
> >>>>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>>>>> +	unsigned long freq = 0;
> >>>>>> +	struct dev_pm_opp *opp;
> >>>>>> +	int i, ret;
> >>>>>> +
> >>>>>> +	trace_msm_gpu_suspend(0);
> >>>>>> +
> >>>>>> +	opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
> >>>>>> +	dev_pm_opp_put(opp);
> >>>>>> +
> >>>>>> +	msm_devfreq_suspend(gpu);
> >>>>>> +
> >>>>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
> >>>>>> +
> >>>>>> +	clk_disable_unprepare(gpu->ebi1_clk);
> >>>>>> +
> >>>>>> +	clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
> >>>>>> +
> >>>>>> +	/* Set frequency to the minimum supported level (no 27MHz on A6xx!) */
> >>>>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, freq);
> >>>>>> +	if (ret)
> >>>>>> +		goto err;
> >>>>>> +
> >>>>>> +	pm_runtime_put_sync(gmu->gxpd);
> >>>>>> +	pm_runtime_put_sync(gmu->dev);
> >>>>>> +
> >>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> >>>>>> +
> >>>>>> +	if (a6xx_gpu->shadow_bo)
> >>>>>> +		for (i = 0; i < gpu->nr_rings; i++)
> >>>>>> +			a6xx_gpu->shadow[i] = 0;
> >>>>>> +
> >>>>>> +	gpu->suspend_count++;
> >>>>>> +
> >>>>>> +	return 0;
> >>>>>> +
> >>>>>> +err:
> >>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> >>>>>> +
> >>>>>> +	return ret;
> >>>>>> +}
> >>>>>> +
> >>>>>>  static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> >>>>>>  {
> >>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>>>>  
> >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>>>> +		*value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
> >>>>>> +		return 0;
> >>>>>> +	}
> >>>>>> +
> >>>>> Instead of wrapper check here, we can just create a separate op. I don't
> >>>>> see any benefit in reusing the same function here.
> >>>>>
> >>>>>
> >>>>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
> >>>>>>  
> >>>>>>  	/* Force the GPU power on so we can read this register */
> >>>>>> @@ -1716,7 +1869,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
> >>>>>>  		drm_gem_object_put(a6xx_gpu->shadow_bo);
> >> 6xx_pm_suspend >>>>  	}
> >>>>>>  
> >>>>>> -	a6xx_llc_slices_destroy(a6xx_gpu);
> >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>> +		a6xx_llc_slices_destroy(a6xx_gpu);
> >>>>>>  
> >>>>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
> >>>>>>  	a6xx_gmu_remove(a6xx_gpu);
> >>>>>> @@ -1957,8 +2111,8 @@ static const struct adreno_gpu_funcs funcs = {
> >>>>>>  		.set_param = adreno_set_param,
> >>>>>>  		.hw_init = a6xx_hw_init,
> >>>>>>  		.ucode_load = a6xx_ucode_load,
> >>>>>> -		.pm_suspend = a6xx_pm_suspend,
> >>>>>> -		.pm_resume = a6xx_pm_resume,
> >>>>>> +		.pm_suspend = a6xx_gmu_pm_suspend,
> >>>>>> +		.pm_resume = a6xx_gmu_pm_resume,
> >>>>>>  		.recover = a6xx_recover,
> >>>>>>  		.submit = a6xx_submit,
> >>>>>>  		.active_ring = a6xx_active_ring,
> >>>>>> @@ -1982,6 +2136,35 @@ static const struct adreno_gpu_funcs funcs = {
> >>>>>>  	.get_timestamp = a6xx_get_timestamp,
> >>>>>>  };
> >>>>>>  
> >>>>>> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
> >>>>>> +	.base = {
> >>>>>> +		.get_param = adreno_get_param,
> >>>>>> +		.set_param = adreno_set_param,
> >>>>>> +		.hw_init = a6xx_hw_init,
> >>>>>> +		.ucode_load = a6xx_ucode_load,
> >>>>>> +		.pm_suspend = a6xx_pm_suspend,
> >>>>>> +		.pm_resume = a6xx_pm_resume,
> >>>>>> +		.recover = a6xx_recover,
> >>>>>> +		.submit = a6xx_submit,
> >>>>>> +		.active_ring = a6xx_active_ring,
> >>>>>> +		.irq = a6xx_irq,
> >>>>>> +		.destroy = a6xx_destroy,
> >>>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> >>>>>> +		.show = a6xx_show,
> >>>>>> +#endif
> >>>>>> +		.gpu_busy = a6xx_gpu_busy,
> >>>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> >>>>>> +		.gpu_state_get = a6xx_gpu_state_get,
> >>>>>> +		.gpu_state_put = a6xx_gpu_state_put,
> >>>>>> +#endif
> >>>>>> +		.create_address_space = a6xx_create_address_space,
> >>>>>> +		.create_private_address_space = a6xx_create_private_address_space,
> >>>>>> +		.get_rptr = a6xx_get_rptr,
> >>>>>> +		.progress = a6xx_progress,
> >>>>>> +	},
> >>>>>> +	.get_timestamp = a6xx_get_timestamp,
> >>>>>> +};
> >>>>>> +
> >>>>>>  struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>>>>>  {
> >>>>>>  	struct msm_drm_private *priv = dev->dev_private;
> >>>>>> @@ -2003,18 +2186,36 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>>>>>  
> >>>>>>  	adreno_gpu->registers = NULL;
> >>>>>>  
> >>>>>> +	/* Check if there is a GMU phandle and set it up */
> >>>>>> +	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> >>>>>> +	/* FIXME: How do we gracefully handle this? */
> >>>>>> +	BUG_ON(!node);
> >>>>> How will you handle this BUG() when there is no GMU (a610 gpu)?
> >>>>>
> >>>>>> +
> >>>>>> +	adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
> >>>>>> +
> >>>>>>  	/*
> >>>>>>  	 * We need to know the platform type before calling into adreno_gpu_init
> >>>>>>  	 * so that the hw_apriv flag can be correctly set. Snoop into the info
> >>>>>>  	 * and grab the revision number
> >>>>>>  	 */
> >>>>>>  	info = adreno_info(config->rev);
> >>>>>> -
> >>>>>> -	if (info && (info->revn == 650 || info->revn == 660 ||
> >>>>>> -			adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
> >>>>>> +	if (!info)
> >>>>>> +		return ERR_PTR(-EINVAL);
> >>>>>> +
> >>>>>> +	/* Assign these early so that we can use the is_aXYZ helpers */
> >>>>>> +	/* Numeric revision IDs (e.g. 630) */
> >>>>>> +	adreno_gpu->revn = info->revn;
> >>>>>> +	/* New-style ADRENO_REV()-only */
> >>>>>> +	adreno_gpu->rev = info->rev;
> >>>>>> +	/* Quirk data */
> >>>>>> +	adreno_gpu->info = info;
> >>>>>> +
> >>>>>> +	if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
> >>>>>>  		adreno_gpu->base.hw_apriv = true;
> >>>>>>  
> >>>>>> -	a6xx_llc_slices_init(pdev, a6xx_gpu);
> >>>>>> +	/* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
> >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>> +		a6xx_llc_slices_init(pdev, a6xx_gpu);
> >>>>>>  
> >>>>>>  	ret = a6xx_set_supported_hw(&pdev->dev, config->rev);
> >>>>>>  	if (ret) {
> >>>>>> @@ -2022,7 +2223,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>>>>>  		return ERR_PTR(ret);
> >>>>>>  	}
> >>>>>>  
> >>>>>> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
> >>>>>> +	else
> >>>>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> >>>>>>  	if (ret) {
> >>>>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
> >>>>>>  		return ERR_PTR(ret);
> >>>>>> @@ -2035,13 +2239,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>>>>>  	if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
> >>>>>>  		priv->gpu_clamp_to_idle = true;
> >>>>>>  
> >>>>>> -	/* Check if there is a GMU phandle and set it up */
> >>>>>> -	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> >>>>>> -
> >>>>>> -	/* FIXME: How do we gracefully handle this? */
> >>>>>> -	BUG_ON(!node);
> >>>>>> -
> >>>>>> -	ret = a6xx_gmu_init(a6xx_gpu, node);
> >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>> +		ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
> >>>>>> +	else
> >>>>>> +		ret = a6xx_gmu_init(a6xx_gpu, node);
> >>>>>>  	of_node_put(node);
> >>>>>>  	if (ret) {
> >>>>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
> >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >>>>>> index eea2e60ce3b7..51a7656072fa 100644
> >>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >>>>>> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> >>>>>>  void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> >>>>>>  
> >>>>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> >>>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> >>>>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
> >>>>>>  
> >>>>>>  void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >>>>>> index 30ecdff363e7..4e5d650578c6 100644
> >>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >>>>>> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
> >>>>>>  	/* Get the generic state from the adreno core */
> >>>>>>  	adreno_gpu_state_get(gpu, &a6xx_state->base);
> >>>>>>  
> >>>>>> -	a6xx_get_gmu_registers(gpu, a6xx_state);
> >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>>> nit: Kinda misleading function name to a layman. Should we invert the
> >>>>> function to "adreno_has_gmu"?
> >>>>>
> >>>>> -Akhil
> >>>>>> +		a6xx_get_gmu_registers(gpu, a6xx_state);
> >>>>>>  
> >>>>>> -	a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> >>>>>> -	a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> >>>>>> -	a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> >>>>>> +		a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> >>>>>> +		a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> >>>>>> +		a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> >>>>>>  
> >>>>>> -	a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> >>>>>> +		a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> >>>>>> +	}
> >>>>>>  
> >>>>>>  	/* If GX isn't on the rest of the data isn't going to be accessible */
> >>>>>> -	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> >>>>>>  		return &a6xx_state->base;
> >>>>>>  
> >>>>>>  	/* Get the banks of indexed registers */
> >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >>>>>> index 6934cee07d42..5c5901d65950 100644
> >>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >>>>>> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
> >>>>>>  		if (!adreno_gpu->info->fw[i])
> >>>>>>  			continue;
> >>>>>>  
> >>>>>> +		/* Skip loading GMU firwmare with GMU Wrapper */
> >>>>>> +		if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
> >>>>>> +			continue;
> >>>>>> +
> >>>>>>  		/* Skip if the firmware has already been loaded */
> >>>>>>  		if (adreno_gpu->fw[i])
> >>>>>>  			continue;
> >>>>>> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
> >>>>>>  	u32 speedbin;
> >>>>>>  	int ret;
> >>>>>>  
> >>>>>> -	/* Only handle the core clock when GMU is not in use */
> >>>>>> -	if (config->rev.core < 6) {
> >>>>>> +	/* Only handle the core clock when GMU is not in use (or is absent). */
> >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
> >>>>>>  		/*
> >>>>>>  		 * This can only be done before devm_pm_opp_of_add_table(), or
> >>>>>>  		 * dev_pm_opp_set_config() will WARN_ON()
> >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >>>>>> index f62612a5c70f..ee5352bc5329 100644
> >>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >>>>>> @@ -115,6 +115,7 @@ struct adreno_gpu {
> >>>>>>  	 * code (a3xx_gpu.c) and stored in this common location.
> >>>>>>  	 */
> >>>>>>  	const unsigned int *reg_offsets;
> >>>>>> +	bool gmu_is_wrapper;
> >>>>>>  };
> >>>>>>  #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
> >>>>>>  
> >>>>>> @@ -145,6 +146,11 @@ struct adreno_platform_config {
> >>>>>>  
> >>>>>>  bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
> >>>>>>  
> >>>>>> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
> >>>>>> +{
> >>>>>> +	return gpu->gmu_is_wrapper;
> >>>>>> +}
> >>>>>> +
> >>>>>>  static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
> >>>>>>  {
> >>>>>>  	return (gpu->revn < 300);
> >>>>>>
> >>>>>> -- 
> >>>>>> 2.40.0
> >>>>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Freedreno] [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support
  2023-05-06 14:46               ` Akhil P Oommen
@ 2023-05-06 20:46                 ` Akhil P Oommen
  2023-05-06 21:07                   ` Akhil P Oommen
  2023-05-08  8:59                 ` Konrad Dybcio
  1 sibling, 1 reply; 30+ messages in thread
From: Akhil P Oommen @ 2023-05-06 20:46 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: Rob Clark, freedreno, Krzysztof Kozlowski, devicetree, Sean Paul,
	Bjorn Andersson, Konrad Dybcio, Abhinav Kumar, dri-devel,
	linux-kernel, Rob Clark, Rob Herring, Daniel Vetter,
	linux-arm-msm, Dmitry Baryshkov, Marijn Suijten, David Airlie

On Sat, May 06, 2023 at 08:16:21PM +0530, Akhil P Oommen wrote:
> On Fri, May 05, 2023 at 12:35:18PM +0200, Konrad Dybcio wrote:
> > 
> > 
> > On 5.05.2023 10:46, Akhil P Oommen wrote:
> > > On Thu, May 04, 2023 at 08:34:07AM +0200, Konrad Dybcio wrote:
> > >>
> > >>
> > >> On 3.05.2023 22:32, Akhil P Oommen wrote:
> > >>> On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
> > >>>>
> > >>>>
> > >>>> On 2.05.2023 09:49, Akhil P Oommen wrote:
> > >>>>> On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
> > >>>>>> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
> > >>>>>> but don't implement the associated GMUs. This is due to the fact that
> > >>>>>> the GMU directly pokes at RPMh. Sadly, this means we have to take care
> > >>>>>> of enabling & scaling power rails, clocks and bandwidth ourselves.
> > >>>>>>
> > >>>>>> Reuse existing Adreno-common code and modify the deeply-GMU-infused
> > >>>>>> A6XX code to facilitate these GPUs. This involves if-ing out lots
> > >>>>>> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
> > >>>>>> the actual name that Qualcomm uses in their downstream kernels).
> > >>>>>>
> > >>>>>> This is essentially a register region which is convenient to model
> > >>>>>> as a device. We'll use it for managing the GDSCs. The register
> > >>>>>> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
> > >>>>>> and lets us reuse quite a bit of gmu_read/write/rmw calls.
> > >>>>> << I sent a reply to this patch earlier, but not sure where it went.
> > >>>>> Still figuring out Mutt... >>
> > >>>> Answered it here:
> > >>>>
> > >>>> https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8f77@linaro.org/
> > >>>
> > >>> Thanks. Will check and respond there if needed.
> > >>>
> > >>>>
> > >>>> I don't think I see any new comments in this "reply revision" (heh), so please
> > >>>> check that one out.
> > >>>>
> > >>>>>
> > >>>>> Only convenience I found is that we can reuse gmu register ops in a few
> > >>>>> places (< 10 I think). If we just model this as another gpu memory
> > >>>>> region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
> > >>>>> architecture code with clean separation. Also, it looks like we need to
> > >>>>> keep a dummy gmu platform device in the devicetree with the current
> > >>>>> approach. That doesn't sound right.
> > >>>> That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
> > >>>> need additional, gmuwrapper-configuration specific code anyway, as
> > >>>> OPP & genpd will no longer make use of the default behavior which
> > >>>> only gets triggered if there's a single power-domains=<> entry, afaicu.
> > >>> Can you please tell me which specific *default behviour* do you mean here?
> > >>> I am curious to know what I am overlooking here. We can always get a cxpd/gxpd device
> > >>> and vote for the gdscs directly from the driver. Anything related to
> > >>> OPP?
> > >> I *believe* this is true:
> > >>
> > >> if (ARRAY_SIZE(power-domains) == 1) {
> > >> 	of generic code will enable the power domain at .probe time
> > > we need to handle the voting directly. I recently shared a patch to
> > > vote cx gdsc from gpu driver. Maybe we can ignore this when gpu has
> > > only cx rail due to this logic you quoted here.
> > > 
> > > I see that you have handled it mostly correctly from the gpu driver in the updated
> > > a6xx_pm_suspend() callback. Just the power domain device ptrs should be moved to
> > > gpu from gmu.
> > > 
> > >>
> > >> 	opp APIs will default to scaling that domain with required-opps
> > > 
> > >> }
> > >>
> > >> and we do need to put GX/CX (with an MX parent to match) there, as the
> > >> AP is responsible for voting in this configuration
> > > 
> > > We should vote to turn ON gx/cx headswitches through genpd from gpu driver. When you vote for
> > > core clk frequency, *clock driver is supposed to scale* all the necessary
> > > regulators. At least that is how downstream works. You can refer the downstream
> > > gpucc clk driver of these SoCs. I am not sure how much of that can be easily converted to
> > > upstream.
> > > 
> > > Also, how does having a gmu dt node help in this regard? Feel free to
> > > elaborate, I am not very familiar with clk/regulator implementations.
> > Okay so I think we have a bit of a confusion here.
> > 
> > Currently, with this patchset we manage things like this:
> > 
> > 1. GPU has a VDD_GX (or equivalent[1]) line passed in power-domains=<>, which
> >    is then used with OPP APIs to ensure it's being scaled on freq change [2].
> >    The VDD_lines coming from RPM(h) are described as power domains upstream
> >    *unlike downstream*, which represents them as regulators with preset voltage
> >    steps (and perhaps that's what had you confused). What's more is that GDSCs
> >    are also modeled as genpds instead of regulators, hence they sort of "fight"
> >    for the spot in power-domains=<> of a given node.
> 
> Thanks for clarifying. I didn't get this part "hence they sort of "fight" for the spot in power-domains".
> What spot exactly did you mean here? The spot for PD to be used during scaling?
> 
> It seems like you are hinting that there is some sort of limitation in keeping all the
> 3 power domains (cx gdsc, gx gdsc and cx rail) under the gpu node in dt. Please explain

Typo. I meant "(Cx gdsc, Gx gdsc and Gx/Cx rail)"

> why we can't keep all the 3 power domains under gpu node and call an API
> (devm_pm_opp_attach_genpd() ??) to select the power domain which should be scaled?
> 
> > 
> > 2. GMU wrapper gets CX_GDSC & GX_GDSC handles in power-domains=<> (just like
> >    the real GMU in the current state of upstream [3]), which are then governed
> >    through explicit genpd calls to turn them on/off when the GPU resume/suspend/
> >    crash recovery functions are called.
> > 
> > 3. GPUs with GMU, like A630, don't get any power-domains=<> entries in DT,
> >    instead relying on the GMU firmware to communicate necessary requests
> >    to the VDD_xyz resources directly to RPMh, as part of the DVFS routines.
> >    If GMU wasn't so smart, we would have to do the exact same VDD_xyz+OPP dance
> >    there - that's precisely what's going on under the hood.
> > 
> > 4. Adreno SMMU gets a handle to CX_GDSC so that when OF probe funcs are called,
> >    (and SMMUs probe way way before all things drm) the headswitch is de-asserted
> >    and its registers and related clocks are accessible.
> > 
> > 
> > All this makes me believe the way I generally architected things in
> > this series is correct.
> > 
> > 
> > [1] A610 (and I think A612) lack a VDD_GX line, so they power the GPU from
> >     VDD_CX, but that's just an implementation detail which is handled by
> >     simply passing the correct one in DTS, the code doesn't care.
> > 
> > [2] Hence my recent changes to use dev_pm_opp_set_rate() wherever possible,
> >     this func reads requires-opps in OPP table entries and ensures to elevate
> >     the GENPD's performance state before switching frequencies
> > 
> > [3] Please take a look at the "end product" here:
> >     https://github.com/SoMainline/linux/commit/fb16757c3bf4c087ac597d70c7a98755d46bb323
> >     you can open e.g. sdm845.dtsi for comparison with real GMU
> 
> This dt definition for a610 gpu clearly shows the issue I have here. Someone
> looking at this gets a very wrong picture about the platform like there is actually nothing
> resembling a gmu IP in a610. Is gmu or gmu-cx register region really present in this hw?
> 
> Just a side note about the dt file you shared:
> 	1. At line: 1243, It shouldn't have gx gdsc, right?
> 	2. At line: 1172, SM6115_VDDCX -> SM6115_VDDGX?
Aah! ignore this. this gpu doesn't have Gx rail.

-Akhil
> 
> -Akhil
> 
> > 
> > I hope this answers your concerns. If not, I'll be happy to elaborate.
> > 
> > Konrad
> > > 
> > > -Akhil.
> > >>
> > >> Konrad
> > >>>
> > >>> -Akhil
> > >>>>
> > >>>> If nothing else, this is a very convenient way to model a part of the
> > >>>> GPU (as that's essentially what GMU_CX is, to my understanding) and
> > >>>> the bindings people didn't shoot me in the head for proposing this, so
> > >>>> I assume it'd be cool to pursue this..
> > >>>>
> > >>>> Konrad
> > >>>>>>
> > >>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> > >>>>>> ---
> > >>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c       |  72 +++++++-
> > >>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 255 +++++++++++++++++++++++++---
> > >>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h       |   1 +
> > >>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
> > >>>>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.c     |   8 +-
> > >>>>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.h     |   6 +
> > >>>>>>  6 files changed, 318 insertions(+), 38 deletions(-)
> > >>>>>>
> > >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > >>>>>> index 87babbb2a19f..b1acdb027205 100644
> > >>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > >>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > >>>>>> @@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, struct platform_device *pdev,
> > >>>>>>  
> > >>>>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> > >>>>>>  {
> > >>>>>> +	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> > >>>>>>  	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> > >>>>>>  	struct platform_device *pdev = to_platform_device(gmu->dev);
> > >>>>>>  
> > >>>>>> @@ -1494,10 +1495,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> > >>>>>>  	gmu->mmio = NULL;
> > >>>>>>  	gmu->rscc = NULL;
> > >>>>>>  
> > >>>>>> -	a6xx_gmu_memory_free(gmu);
> > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> > >>>>>> +		a6xx_gmu_memory_free(gmu);
> > >>>>>>  
> > >>>>>> -	free_irq(gmu->gmu_irq, gmu);
> > >>>>>> -	free_irq(gmu->hfi_irq, gmu);
> > >>>>>> +		free_irq(gmu->gmu_irq, gmu);
> > >>>>>> +		free_irq(gmu->hfi_irq, gmu);
> > >>>>>> +	}
> > >>>>>>  
> > >>>>>>  	/* Drop reference taken in of_find_device_by_node */
> > >>>>>>  	put_device(gmu->dev);
> > >>>>>> @@ -1516,6 +1519,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
> > >>>>>>  	return 0;
> > >>>>>>  }
> > >>>>>>  
> > >>>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> > >>>>>> +{
> > >>>>>> +	struct platform_device *pdev = of_find_device_by_node(node);
> > >>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> > >>>>>> +	int ret;
> > >>>>>> +
> > >>>>>> +	if (!pdev)
> > >>>>>> +		return -ENODEV;
> > >>>>>> +
> > >>>>>> +	gmu->dev = &pdev->dev;
> > >>>>>> +
> > >>>>>> +	of_dma_configure(gmu->dev, node, true);
> > >>>>> why setup dma for a device that is not actually present?
> > >>>>>> +
> > >>>>>> +	pm_runtime_enable(gmu->dev);
> > >>>>>> +
> > >>>>>> +	/* Mark legacy for manual SPTPRAC control */
> > >>>>>> +	gmu->legacy = true;
> > >>>>>> +
> > >>>>>> +	/* Map the GMU registers */
> > >>>>>> +	gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
> > >>>>>> +	if (IS_ERR(gmu->mmio)) {
> > >>>>>> +		ret = PTR_ERR(gmu->mmio);
> > >>>>>> +		goto err_mmio;
> > >>>>>> +	}
> > >>>>>> +
> > >>>>>> +	gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
> > >>>>>> +	if (IS_ERR(gmu->cxpd)) {
> > >>>>>> +		ret = PTR_ERR(gmu->cxpd);
> > >>>>>> +		goto err_mmio;
> > >>>>>> +	}
> > >>>>>> +
> > >>>>>> +	if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
> > >>>>>> +		ret = -ENODEV;
> > >>>>>> +		goto detach_cxpd;
> > >>>>>> +	}
> > >>>>>> +
> > >>>>>> +	init_completion(&gmu->pd_gate);
> > >>>>>> +	complete_all(&gmu->pd_gate);
> > >>>>>> +	gmu->pd_nb.notifier_call = cxpd_notifier_cb;
> > >>>>>> +
> > >>>>>> +	/* Get a link to the GX power domain to reset the GPU */
> > >>>>>> +	gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
> > >>>>>> +	if (IS_ERR(gmu->gxpd)) {
> > >>>>>> +		ret = PTR_ERR(gmu->gxpd);
> > >>>>>> +		goto err_mmio;
> > >>>>>> +	}
> > >>>>>> +
> > >>>>>> +	gmu->initialized = true;
> > >>>>>> +
> > >>>>>> +	return 0;
> > >>>>>> +
> > >>>>>> +detach_cxpd:
> > >>>>>> +	dev_pm_domain_detach(gmu->cxpd, false);
> > >>>>>> +
> > >>>>>> +err_mmio:
> > >>>>>> +	iounmap(gmu->mmio);
> > >>>>>> +
> > >>>>>> +	/* Drop reference taken in of_find_device_by_node */
> > >>>>>> +	put_device(gmu->dev);
> > >>>>>> +
> > >>>>>> +	return ret;
> > >>>>>> +}
> > >>>>>> +
> > >>>>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> > >>>>>>  {
> > >>>>>>  	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> > >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > >>>>>> index 931f9f3b3a85..8e0345ffab81 100644
> > >>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > >>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > >>>>>> @@ -20,9 +20,11 @@ static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
> > >>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > >>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > >>>>>>  
> > >>>>>> -	/* Check that the GMU is idle */
> > >>>>>> -	if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
> > >>>>>> -		return false;
> > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> > >>>>>> +		/* Check that the GMU is idle */
> > >>>>>> +		if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
> > >>>>>> +			return false;
> > >>>>>> +	}
> > >>>>>>  
> > >>>>>>  	/* Check tha the CX master is idle */
> > >>>>>>  	if (gpu_read(gpu, REG_A6XX_RBBM_STATUS) &
> > >>>>>> @@ -612,13 +614,15 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
> > >>>>>>  		return;
> > >>>>>>  
> > >>>>>>  	/* Disable SP clock before programming HWCG registers */
> > >>>>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
> > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> > >>>>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
> > >>>>>>  
> > >>>>>>  	for (i = 0; (reg = &adreno_gpu->info->hwcg[i], reg->offset); i++)
> > >>>>>>  		gpu_write(gpu, reg->offset, state ? reg->value : 0);
> > >>>>>>  
> > >>>>>>  	/* Enable SP clock */
> > >>>>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
> > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> > >>>>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
> > >>>>>>  
> > >>>>>>  	gpu_write(gpu, REG_A6XX_RBBM_CLOCK_CNTL, state ? clock_cntl_on : 0);
> > >>>>>>  }
> > >>>>>> @@ -1018,10 +1022,13 @@ static int hw_init(struct msm_gpu *gpu)
> > >>>>>>  {
> > >>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > >>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > >>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> > >>>>>>  	int ret;
> > >>>>>>  
> > >>>>>> -	/* Make sure the GMU keeps the GPU on while we set it up */
> > >>>>>> -	a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> > >>>>>> +		/* Make sure the GMU keeps the GPU on while we set it up */
> > >>>>>> +		a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> > >>>>>> +	}
> > >>>>>>  
> > >>>>>>  	/* Clear GBIF halt in case GX domain was not collapsed */
> > >>>>>>  	if (a6xx_has_gbif(adreno_gpu))
> > >>>>>> @@ -1144,6 +1151,17 @@ static int hw_init(struct msm_gpu *gpu)
> > >>>>>>  			0x3f0243f0);
> > >>>>>>  	}
> > >>>>>>  
> > >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> > >>>>>> +		/* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
> > >>>>>> +
> > >>>>>> +		/* Set up the CX GMU counter 0 to count busy ticks */
> > >>>>>> +		gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
> > >>>>>> +
> > >>>>>> +		/* Enable power counter 0 */
> > >>>>>> +		gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
> > >>>>>> +		gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
> > >>>>>> +	}
> > >>>>>> +
> > >>>>>>  	/* Protect registers from the CP */
> > >>>>>>  	a6xx_set_cp_protect(gpu);
> > >>>>>>  
> > >>>>>> @@ -1233,6 +1251,8 @@ static int hw_init(struct msm_gpu *gpu)
> > >>>>>>  	}
> > >>>>>>  
> > >>>>>>  out:
> > >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> > >>>>>> +		return ret;
> > >>>>>>  	/*
> > >>>>>>  	 * Tell the GMU that we are done touching the GPU and it can start power
> > >>>>>>  	 * management
> > >>>>>> @@ -1267,6 +1287,9 @@ static void a6xx_dump(struct msm_gpu *gpu)
> > >>>>>>  	adreno_dump(gpu);
> > >>>>>>  }
> > >>>>>>  
> > >>>>>> +#define GBIF_GX_HALT_MASK	BIT(0)
> > >>>>>> +#define GBIF_CLIENT_HALT_MASK	BIT(0)
> > >>>>>> +#define GBIF_ARB_HALT_MASK	BIT(1)
> > >>>>>>  #define VBIF_RESET_ACK_TIMEOUT	100
> > >>>>>>  #define VBIF_RESET_ACK_MASK	0x00f0
> > >>>>>>  
> > >>>>>> @@ -1299,7 +1322,8 @@ static void a6xx_recover(struct msm_gpu *gpu)
> > >>>>>>  	 * Turn off keep alive that might have been enabled by the hang
> > >>>>>>  	 * interrupt
> > >>>>>>  	 */
> > >>>>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
> > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> > >>>>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
> > >>>>>
> > >>>>> Maybe it is better to move this to a6xx_gmu_force_power_off.
> > >>>>>
> > >>>>>>  
> > >>>>>>  	pm_runtime_dont_use_autosuspend(&gpu->pdev->dev);
> > >>>>>>  
> > >>>>>> @@ -1329,6 +1353,32 @@ static void a6xx_recover(struct msm_gpu *gpu)
> > >>>>>>  
> > >>>>>>  	dev_pm_genpd_remove_notifier(gmu->cxpd);
> > >>>>>>  
> > >>>>>> +	/* Software-reset the GPU */
> > >>>>>
> > >>>>> This is not soft reset sequence. We are trying to quiescent gpu - ddr
> > >>>>> traffic with this sequence.
> > >>>>>
> > >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> > >>>>>> +		/* Halt the GX side of GBIF */
> > >>>>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, GBIF_GX_HALT_MASK);
> > >>>>>> +		spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) &
> > >>>>>> +			   GBIF_GX_HALT_MASK);
> > >>>>>> +
> > >>>>>> +		/* Halt new client requests on GBIF */
> > >>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
> > >>>>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> > >>>>>> +			   (GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
> > >>>>>> +
> > >>>>>> +		/* Halt all AXI requests on GBIF */
> > >>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
> > >>>>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> > >>>>>> +			   (GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
> > >>>>>> +
> > >>>>>> +		/* Clear the halts */
> > >>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> > >>>>>> +
> > >>>>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
> > >>>>>> +
> > >>>>>> +		/* This *really* needs to go through before we do anything else! */
> > >>>>>> +		mb();
> > >>>>>> +	}
> > >>>>>> +
> > >>>>>
> > >>>>> This sequence should be before we collapse cx gdsc. Also, please see if
> > >>>>> we can create a subroutine to avoid code dup.
> > >>>>>
> > >>>>>>  	pm_runtime_use_autosuspend(&gpu->pdev->dev);
> > >>>>>>  
> > >>>>>>  	if (active_submits)
> > >>>>>> @@ -1463,7 +1513,8 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
> > >>>>>>  	 * Force the GPU to stay on until after we finish
> > >>>>>>  	 * collecting information
> > >>>>>>  	 */
> > >>>>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
> > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> > >>>>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
> > >>>>>>  
> > >>>>>>  	DRM_DEV_ERROR(&gpu->pdev->dev,
> > >>>>>>  		"gpu fault ring %d fence %x status %8.8X rb %4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
> > >>>>>> @@ -1624,7 +1675,7 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
> > >>>>>>  		a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
> > >>>>>>  }
> > >>>>>>  
> > >>>>>> -static int a6xx_pm_resume(struct msm_gpu *gpu)
> > >>>>>> +static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
> > >>>>>>  {
> > >>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > >>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > >>>>>> @@ -1644,10 +1695,61 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
> > >>>>>>  
> > >>>>>>  	a6xx_llc_activate(a6xx_gpu);
> > >>>>>>  
> > >>>>>> -	return 0;
> > >>>>>> +	return ret;
> > >>>>>>  }
> > >>>>>>  
> > >>>>>> -static int a6xx_pm_suspend(struct msm_gpu *gpu)
> > >>>>>> +static int a6xx_pm_resume(struct msm_gpu *gpu)
> > >>>>>> +{
> > >>>>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > >>>>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > >>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> > >>>>>> +	unsigned long freq = 0;
> > >>>>>> +	struct dev_pm_opp *opp;
> > >>>>>> +	int ret;
> > >>>>>> +
> > >>>>>> +	gpu->needs_hw_init = true;
> > >>>>>> +
> > >>>>>> +	trace_msm_gpu_resume(0);
> > >>>>>> +
> > >>>>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
> > >>>>> I think we can ignore gmu lock as there is no real gmu device.
> > >>>>>
> > >>>>>> +
> > >>>>>> +	pm_runtime_resume_and_get(gmu->dev);
> > >>>>>> +	pm_runtime_resume_and_get(gmu->gxpd);
> > >>>>>> +
> > >>>>>> +	/* Set the core clock, having VDD scaling in mind */
> > >>>>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, gpu->fast_rate);
> > >>>>>> +	if (ret)
> > >>>>>> +		goto err_core_clk;
> > >>>>>> +
> > >>>>>> +	ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
> > >>>>>> +	if (ret)
> > >>>>>> +		goto err_bulk_clk;
> > >>>>>> +
> > >>>>>> +	ret = clk_prepare_enable(gpu->ebi1_clk);
> > >>>>>> +	if (ret)
> > >>>>>> +		goto err_mem_clk;
> > >>>>>> +
> > >>>>>> +	/* If anything goes south, tear the GPU down piece by piece.. */
> > >>>>>> +	if (ret) {
> > >>>>>> +err_mem_clk:
> > >>>>>> +		clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
> > >>>>>> +err_bulk_clk:
> > >>>>>> +		opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
> > >>>>>> +		dev_pm_opp_put(opp);
> > >>>>>> +		dev_pm_opp_set_rate(&gpu->pdev->dev, 0);
> > >>>>>> +err_core_clk:
> > >>>>>> +		pm_runtime_put(gmu->gxpd);
> > >>>>>> +		pm_runtime_put(gmu->dev);
> > >>>>>> +	}
> > >>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> > >>>>>> +
> > >>>>>> +	if (!ret)
> > >>>>>> +		msm_devfreq_resume(gpu);
> > >>>>>> +
> > >>>>>> +	return ret;
> > >>>>>> +}
> > >>>>>> +
> > >>>>>> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
> > >>>>>>  {
> > >>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > >>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > >>>>>> @@ -1674,11 +1776,62 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
> > >>>>>>  	return 0;
> > >>>>>>  }
> > >>>>>>  
> > >>>>>> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
> > >>>>>> +{
> > >>>>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > >>>>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > >>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> > >>>>>> +	unsigned long freq = 0;
> > >>>>>> +	struct dev_pm_opp *opp;
> > >>>>>> +	int i, ret;
> > >>>>>> +
> > >>>>>> +	trace_msm_gpu_suspend(0);
> > >>>>>> +
> > >>>>>> +	opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
> > >>>>>> +	dev_pm_opp_put(opp);
> > >>>>>> +
> > >>>>>> +	msm_devfreq_suspend(gpu);
> > >>>>>> +
> > >>>>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
> > >>>>>> +
> > >>>>>> +	clk_disable_unprepare(gpu->ebi1_clk);
> > >>>>>> +
> > >>>>>> +	clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
> > >>>>>> +
> > >>>>>> +	/* Set frequency to the minimum supported level (no 27MHz on A6xx!) */
> > >>>>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, freq);
> > >>>>>> +	if (ret)
> > >>>>>> +		goto err;
> > >>>>>> +
> > >>>>>> +	pm_runtime_put_sync(gmu->gxpd);
> > >>>>>> +	pm_runtime_put_sync(gmu->dev);
> > >>>>>> +
> > >>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> > >>>>>> +
> > >>>>>> +	if (a6xx_gpu->shadow_bo)
> > >>>>>> +		for (i = 0; i < gpu->nr_rings; i++)
> > >>>>>> +			a6xx_gpu->shadow[i] = 0;
> > >>>>>> +
> > >>>>>> +	gpu->suspend_count++;
> > >>>>>> +
> > >>>>>> +	return 0;
> > >>>>>> +
> > >>>>>> +err:
> > >>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> > >>>>>> +
> > >>>>>> +	return ret;
> > >>>>>> +}
> > >>>>>> +
> > >>>>>>  static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> > >>>>>>  {
> > >>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > >>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > >>>>>>  
> > >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> > >>>>>> +		*value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
> > >>>>>> +		return 0;
> > >>>>>> +	}
> > >>>>>> +
> > >>>>> Instead of wrapper check here, we can just create a separate op. I don't
> > >>>>> see any benefit in reusing the same function here.
> > >>>>>
> > >>>>>
> > >>>>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
> > >>>>>>  
> > >>>>>>  	/* Force the GPU power on so we can read this register */
> > >>>>>> @@ -1716,7 +1869,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
> > >>>>>>  		drm_gem_object_put(a6xx_gpu->shadow_bo);
> > >> 6xx_pm_suspend >>>>  	}
> > >>>>>>  
> > >>>>>> -	a6xx_llc_slices_destroy(a6xx_gpu);
> > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> > >>>>>> +		a6xx_llc_slices_destroy(a6xx_gpu);
> > >>>>>>  
> > >>>>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
> > >>>>>>  	a6xx_gmu_remove(a6xx_gpu);
> > >>>>>> @@ -1957,8 +2111,8 @@ static const struct adreno_gpu_funcs funcs = {
> > >>>>>>  		.set_param = adreno_set_param,
> > >>>>>>  		.hw_init = a6xx_hw_init,
> > >>>>>>  		.ucode_load = a6xx_ucode_load,
> > >>>>>> -		.pm_suspend = a6xx_pm_suspend,
> > >>>>>> -		.pm_resume = a6xx_pm_resume,
> > >>>>>> +		.pm_suspend = a6xx_gmu_pm_suspend,
> > >>>>>> +		.pm_resume = a6xx_gmu_pm_resume,
> > >>>>>>  		.recover = a6xx_recover,
> > >>>>>>  		.submit = a6xx_submit,
> > >>>>>>  		.active_ring = a6xx_active_ring,
> > >>>>>> @@ -1982,6 +2136,35 @@ static const struct adreno_gpu_funcs funcs = {
> > >>>>>>  	.get_timestamp = a6xx_get_timestamp,
> > >>>>>>  };
> > >>>>>>  
> > >>>>>> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
> > >>>>>> +	.base = {
> > >>>>>> +		.get_param = adreno_get_param,
> > >>>>>> +		.set_param = adreno_set_param,
> > >>>>>> +		.hw_init = a6xx_hw_init,
> > >>>>>> +		.ucode_load = a6xx_ucode_load,
> > >>>>>> +		.pm_suspend = a6xx_pm_suspend,
> > >>>>>> +		.pm_resume = a6xx_pm_resume,
> > >>>>>> +		.recover = a6xx_recover,
> > >>>>>> +		.submit = a6xx_submit,
> > >>>>>> +		.active_ring = a6xx_active_ring,
> > >>>>>> +		.irq = a6xx_irq,
> > >>>>>> +		.destroy = a6xx_destroy,
> > >>>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> > >>>>>> +		.show = a6xx_show,
> > >>>>>> +#endif
> > >>>>>> +		.gpu_busy = a6xx_gpu_busy,
> > >>>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> > >>>>>> +		.gpu_state_get = a6xx_gpu_state_get,
> > >>>>>> +		.gpu_state_put = a6xx_gpu_state_put,
> > >>>>>> +#endif
> > >>>>>> +		.create_address_space = a6xx_create_address_space,
> > >>>>>> +		.create_private_address_space = a6xx_create_private_address_space,
> > >>>>>> +		.get_rptr = a6xx_get_rptr,
> > >>>>>> +		.progress = a6xx_progress,
> > >>>>>> +	},
> > >>>>>> +	.get_timestamp = a6xx_get_timestamp,
> > >>>>>> +};
> > >>>>>> +
> > >>>>>>  struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> > >>>>>>  {
> > >>>>>>  	struct msm_drm_private *priv = dev->dev_private;
> > >>>>>> @@ -2003,18 +2186,36 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> > >>>>>>  
> > >>>>>>  	adreno_gpu->registers = NULL;
> > >>>>>>  
> > >>>>>> +	/* Check if there is a GMU phandle and set it up */
> > >>>>>> +	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> > >>>>>> +	/* FIXME: How do we gracefully handle this? */
> > >>>>>> +	BUG_ON(!node);
> > >>>>> How will you handle this BUG() when there is no GMU (a610 gpu)?
> > >>>>>
> > >>>>>> +
> > >>>>>> +	adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
> > >>>>>> +
> > >>>>>>  	/*
> > >>>>>>  	 * We need to know the platform type before calling into adreno_gpu_init
> > >>>>>>  	 * so that the hw_apriv flag can be correctly set. Snoop into the info
> > >>>>>>  	 * and grab the revision number
> > >>>>>>  	 */
> > >>>>>>  	info = adreno_info(config->rev);
> > >>>>>> -
> > >>>>>> -	if (info && (info->revn == 650 || info->revn == 660 ||
> > >>>>>> -			adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
> > >>>>>> +	if (!info)
> > >>>>>> +		return ERR_PTR(-EINVAL);
> > >>>>>> +
> > >>>>>> +	/* Assign these early so that we can use the is_aXYZ helpers */
> > >>>>>> +	/* Numeric revision IDs (e.g. 630) */
> > >>>>>> +	adreno_gpu->revn = info->revn;
> > >>>>>> +	/* New-style ADRENO_REV()-only */
> > >>>>>> +	adreno_gpu->rev = info->rev;
> > >>>>>> +	/* Quirk data */
> > >>>>>> +	adreno_gpu->info = info;
> > >>>>>> +
> > >>>>>> +	if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
> > >>>>>>  		adreno_gpu->base.hw_apriv = true;
> > >>>>>>  
> > >>>>>> -	a6xx_llc_slices_init(pdev, a6xx_gpu);
> > >>>>>> +	/* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
> > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> > >>>>>> +		a6xx_llc_slices_init(pdev, a6xx_gpu);
> > >>>>>>  
> > >>>>>>  	ret = a6xx_set_supported_hw(&pdev->dev, config->rev);
> > >>>>>>  	if (ret) {
> > >>>>>> @@ -2022,7 +2223,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> > >>>>>>  		return ERR_PTR(ret);
> > >>>>>>  	}
> > >>>>>>  
> > >>>>>> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> > >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> > >>>>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
> > >>>>>> +	else
> > >>>>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> > >>>>>>  	if (ret) {
> > >>>>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
> > >>>>>>  		return ERR_PTR(ret);
> > >>>>>> @@ -2035,13 +2239,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> > >>>>>>  	if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
> > >>>>>>  		priv->gpu_clamp_to_idle = true;
> > >>>>>>  
> > >>>>>> -	/* Check if there is a GMU phandle and set it up */
> > >>>>>> -	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> > >>>>>> -
> > >>>>>> -	/* FIXME: How do we gracefully handle this? */
> > >>>>>> -	BUG_ON(!node);
> > >>>>>> -
> > >>>>>> -	ret = a6xx_gmu_init(a6xx_gpu, node);
> > >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> > >>>>>> +		ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
> > >>>>>> +	else
> > >>>>>> +		ret = a6xx_gmu_init(a6xx_gpu, node);
> > >>>>>>  	of_node_put(node);
> > >>>>>>  	if (ret) {
> > >>>>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
> > >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> > >>>>>> index eea2e60ce3b7..51a7656072fa 100644
> > >>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> > >>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> > >>>>>> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> > >>>>>>  void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> > >>>>>>  
> > >>>>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> > >>>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> > >>>>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
> > >>>>>>  
> > >>>>>>  void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> > >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> > >>>>>> index 30ecdff363e7..4e5d650578c6 100644
> > >>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> > >>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> > >>>>>> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
> > >>>>>>  	/* Get the generic state from the adreno core */
> > >>>>>>  	adreno_gpu_state_get(gpu, &a6xx_state->base);
> > >>>>>>  
> > >>>>>> -	a6xx_get_gmu_registers(gpu, a6xx_state);
> > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> > >>>>> nit: Kinda misleading function name to a layman. Should we invert the
> > >>>>> function to "adreno_has_gmu"?
> > >>>>>
> > >>>>> -Akhil
> > >>>>>> +		a6xx_get_gmu_registers(gpu, a6xx_state);
> > >>>>>>  
> > >>>>>> -	a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> > >>>>>> -	a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> > >>>>>> -	a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> > >>>>>> +		a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> > >>>>>> +		a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> > >>>>>> +		a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> > >>>>>>  
> > >>>>>> -	a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> > >>>>>> +		a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> > >>>>>> +	}
> > >>>>>>  
> > >>>>>>  	/* If GX isn't on the rest of the data isn't going to be accessible */
> > >>>>>> -	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> > >>>>>>  		return &a6xx_state->base;
> > >>>>>>  
> > >>>>>>  	/* Get the banks of indexed registers */
> > >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > >>>>>> index 6934cee07d42..5c5901d65950 100644
> > >>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > >>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > >>>>>> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
> > >>>>>>  		if (!adreno_gpu->info->fw[i])
> > >>>>>>  			continue;
> > >>>>>>  
> > >>>>>> +		/* Skip loading GMU firwmare with GMU Wrapper */
> > >>>>>> +		if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
> > >>>>>> +			continue;
> > >>>>>> +
> > >>>>>>  		/* Skip if the firmware has already been loaded */
> > >>>>>>  		if (adreno_gpu->fw[i])
> > >>>>>>  			continue;
> > >>>>>> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
> > >>>>>>  	u32 speedbin;
> > >>>>>>  	int ret;
> > >>>>>>  
> > >>>>>> -	/* Only handle the core clock when GMU is not in use */
> > >>>>>> -	if (config->rev.core < 6) {
> > >>>>>> +	/* Only handle the core clock when GMU is not in use (or is absent). */
> > >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
> > >>>>>>  		/*
> > >>>>>>  		 * This can only be done before devm_pm_opp_of_add_table(), or
> > >>>>>>  		 * dev_pm_opp_set_config() will WARN_ON()
> > >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > >>>>>> index f62612a5c70f..ee5352bc5329 100644
> > >>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > >>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > >>>>>> @@ -115,6 +115,7 @@ struct adreno_gpu {
> > >>>>>>  	 * code (a3xx_gpu.c) and stored in this common location.
> > >>>>>>  	 */
> > >>>>>>  	const unsigned int *reg_offsets;
> > >>>>>> +	bool gmu_is_wrapper;
> > >>>>>>  };
> > >>>>>>  #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
> > >>>>>>  
> > >>>>>> @@ -145,6 +146,11 @@ struct adreno_platform_config {
> > >>>>>>  
> > >>>>>>  bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
> > >>>>>>  
> > >>>>>> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
> > >>>>>> +{
> > >>>>>> +	return gpu->gmu_is_wrapper;
> > >>>>>> +}
> > >>>>>> +
> > >>>>>>  static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
> > >>>>>>  {
> > >>>>>>  	return (gpu->revn < 300);
> > >>>>>>
> > >>>>>> -- 
> > >>>>>> 2.40.0
> > >>>>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Freedreno] [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support
  2023-05-06 20:46                 ` [Freedreno] " Akhil P Oommen
@ 2023-05-06 21:07                   ` Akhil P Oommen
  0 siblings, 0 replies; 30+ messages in thread
From: Akhil P Oommen @ 2023-05-06 21:07 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: Rob Clark, devicetree, Daniel Vetter, freedreno, Bjorn Andersson,
	Konrad Dybcio, Abhinav Kumar, dri-devel, linux-kernel, Rob Clark,
	Rob Herring, Krzysztof Kozlowski, linux-arm-msm,
	Dmitry Baryshkov, Marijn Suijten, David Airlie, Sean Paul

On Sun, May 07, 2023 at 02:16:36AM +0530, Akhil P Oommen wrote:
> On Sat, May 06, 2023 at 08:16:21PM +0530, Akhil P Oommen wrote:
> > On Fri, May 05, 2023 at 12:35:18PM +0200, Konrad Dybcio wrote:
> > > 
> > > 
> > > On 5.05.2023 10:46, Akhil P Oommen wrote:
> > > > On Thu, May 04, 2023 at 08:34:07AM +0200, Konrad Dybcio wrote:
> > > >>
> > > >>
> > > >> On 3.05.2023 22:32, Akhil P Oommen wrote:
> > > >>> On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
> > > >>>>
> > > >>>>
> > > >>>> On 2.05.2023 09:49, Akhil P Oommen wrote:
> > > >>>>> On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
> > > >>>>>> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
> > > >>>>>> but don't implement the associated GMUs. This is due to the fact that
> > > >>>>>> the GMU directly pokes at RPMh. Sadly, this means we have to take care
> > > >>>>>> of enabling & scaling power rails, clocks and bandwidth ourselves.
> > > >>>>>>
> > > >>>>>> Reuse existing Adreno-common code and modify the deeply-GMU-infused
> > > >>>>>> A6XX code to facilitate these GPUs. This involves if-ing out lots
> > > >>>>>> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
> > > >>>>>> the actual name that Qualcomm uses in their downstream kernels).
> > > >>>>>>
> > > >>>>>> This is essentially a register region which is convenient to model
> > > >>>>>> as a device. We'll use it for managing the GDSCs. The register
> > > >>>>>> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
> > > >>>>>> and lets us reuse quite a bit of gmu_read/write/rmw calls.
> > > >>>>> << I sent a reply to this patch earlier, but not sure where it went.
> > > >>>>> Still figuring out Mutt... >>
> > > >>>> Answered it here:
> > > >>>>
> > > >>>> https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8f77@linaro.org/
> > > >>>
> > > >>> Thanks. Will check and respond there if needed.
> > > >>>
> > > >>>>
> > > >>>> I don't think I see any new comments in this "reply revision" (heh), so please
> > > >>>> check that one out.
> > > >>>>
> > > >>>>>
> > > >>>>> Only convenience I found is that we can reuse gmu register ops in a few
> > > >>>>> places (< 10 I think). If we just model this as another gpu memory
> > > >>>>> region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
> > > >>>>> architecture code with clean separation. Also, it looks like we need to
> > > >>>>> keep a dummy gmu platform device in the devicetree with the current
> > > >>>>> approach. That doesn't sound right.
> > > >>>> That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
> > > >>>> need additional, gmuwrapper-configuration specific code anyway, as
> > > >>>> OPP & genpd will no longer make use of the default behavior which
> > > >>>> only gets triggered if there's a single power-domains=<> entry, afaicu.
> > > >>> Can you please tell me which specific *default behviour* do you mean here?
> > > >>> I am curious to know what I am overlooking here. We can always get a cxpd/gxpd device
> > > >>> and vote for the gdscs directly from the driver. Anything related to
> > > >>> OPP?
> > > >> I *believe* this is true:
> > > >>
> > > >> if (ARRAY_SIZE(power-domains) == 1) {
> > > >> 	of generic code will enable the power domain at .probe time
> > > > we need to handle the voting directly. I recently shared a patch to
> > > > vote cx gdsc from gpu driver. Maybe we can ignore this when gpu has
> > > > only cx rail due to this logic you quoted here.
> > > > 
> > > > I see that you have handled it mostly correctly from the gpu driver in the updated
> > > > a6xx_pm_suspend() callback. Just the power domain device ptrs should be moved to
> > > > gpu from gmu.
> > > > 
> > > >>
> > > >> 	opp APIs will default to scaling that domain with required-opps
> > > > 
> > > >> }
> > > >>
> > > >> and we do need to put GX/CX (with an MX parent to match) there, as the
> > > >> AP is responsible for voting in this configuration
> > > > 
> > > > We should vote to turn ON gx/cx headswitches through genpd from gpu driver. When you vote for
> > > > core clk frequency, *clock driver is supposed to scale* all the necessary
> > > > regulators. At least that is how downstream works. You can refer the downstream
> > > > gpucc clk driver of these SoCs. I am not sure how much of that can be easily converted to
> > > > upstream.
> > > > 
> > > > Also, how does having a gmu dt node help in this regard? Feel free to
> > > > elaborate, I am not very familiar with clk/regulator implementations.
> > > Okay so I think we have a bit of a confusion here.
> > > 
> > > Currently, with this patchset we manage things like this:
> > > 
> > > 1. GPU has a VDD_GX (or equivalent[1]) line passed in power-domains=<>, which
> > >    is then used with OPP APIs to ensure it's being scaled on freq change [2].
> > >    The VDD_lines coming from RPM(h) are described as power domains upstream
> > >    *unlike downstream*, which represents them as regulators with preset voltage
> > >    steps (and perhaps that's what had you confused). What's more is that GDSCs
> > >    are also modeled as genpds instead of regulators, hence they sort of "fight"
> > >    for the spot in power-domains=<> of a given node.
> > 
> > Thanks for clarifying. I didn't get this part "hence they sort of "fight" for the spot in power-domains".
> > What spot exactly did you mean here? The spot for PD to be used during scaling?
> > 
> > It seems like you are hinting that there is some sort of limitation in keeping all the
> > 3 power domains (cx gdsc, gx gdsc and cx rail) under the gpu node in dt. Please explain
> 
> Typo. I meant "(Cx gdsc, Gx gdsc and Gx/Cx rail)"
> 
> > why we can't keep all the 3 power domains under gpu node and call an API
> > (devm_pm_opp_attach_genpd() ??) to select the power domain which should be scaled?
> > 
> > > 
> > > 2. GMU wrapper gets CX_GDSC & GX_GDSC handles in power-domains=<> (just like
> > >    the real GMU in the current state of upstream [3]), which are then governed
> > >    through explicit genpd calls to turn them on/off when the GPU resume/suspend/
> > >    crash recovery functions are called.
> > > 
> > > 3. GPUs with GMU, like A630, don't get any power-domains=<> entries in DT,
> > >    instead relying on the GMU firmware to communicate necessary requests
> > >    to the VDD_xyz resources directly to RPMh, as part of the DVFS routines.
> > >    If GMU wasn't so smart, we would have to do the exact same VDD_xyz+OPP dance
> > >    there - that's precisely what's going on under the hood.
> > > 
> > > 4. Adreno SMMU gets a handle to CX_GDSC so that when OF probe funcs are called,
> > >    (and SMMUs probe way way before all things drm) the headswitch is de-asserted
> > >    and its registers and related clocks are accessible.
> > > 
> > > 
> > > All this makes me believe the way I generally architected things in
> > > this series is correct.
> > > 
> > > 
> > > [1] A610 (and I think A612) lack a VDD_GX line, so they power the GPU from
> > >     VDD_CX, but that's just an implementation detail which is handled by
> > >     simply passing the correct one in DTS, the code doesn't care.
> > > 
> > > [2] Hence my recent changes to use dev_pm_opp_set_rate() wherever possible,
> > >     this func reads requires-opps in OPP table entries and ensures to elevate
> > >     the GENPD's performance state before switching frequencies
> > > 
> > > [3] Please take a look at the "end product" here:
> > >     https://github.com/SoMainline/linux/commit/fb16757c3bf4c087ac597d70c7a98755d46bb323
> > >     you can open e.g. sdm845.dtsi for comparison with real GMU
> > 
> > This dt definition for a610 gpu clearly shows the issue I have here. Someone
> > looking at this gets a very wrong picture about the platform like there is actually nothing
> > resembling a gmu IP in a610. Is gmu or gmu-cx register region really present in this hw?
> > 
> > Just a side note about the dt file you shared:
> > 	1. At line: 1243, It shouldn't have gx gdsc, right?
ignore this one too. It can still have a gx gdsc.

> > 	2. At line: 1172, SM6115_VDDCX -> SM6115_VDDGX?
> Aah! ignore this. this gpu doesn't have Gx rail.
> 
> -Akhil
> > 
> > -Akhil
> > 
> > > 
> > > I hope this answers your concerns. If not, I'll be happy to elaborate.
> > > 
> > > Konrad
> > > > 
> > > > -Akhil.
> > > >>
> > > >> Konrad
> > > >>>
> > > >>> -Akhil
> > > >>>>
> > > >>>> If nothing else, this is a very convenient way to model a part of the
> > > >>>> GPU (as that's essentially what GMU_CX is, to my understanding) and
> > > >>>> the bindings people didn't shoot me in the head for proposing this, so
> > > >>>> I assume it'd be cool to pursue this..
> > > >>>>
> > > >>>> Konrad
> > > >>>>>>
> > > >>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> > > >>>>>> ---
> > > >>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c       |  72 +++++++-
> > > >>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 255 +++++++++++++++++++++++++---
> > > >>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h       |   1 +
> > > >>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
> > > >>>>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.c     |   8 +-
> > > >>>>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.h     |   6 +
> > > >>>>>>  6 files changed, 318 insertions(+), 38 deletions(-)
> > > >>>>>>
> > > >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > >>>>>> index 87babbb2a19f..b1acdb027205 100644
> > > >>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > >>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > >>>>>> @@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, struct platform_device *pdev,
> > > >>>>>>  
> > > >>>>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> > > >>>>>>  {
> > > >>>>>> +	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> > > >>>>>>  	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> > > >>>>>>  	struct platform_device *pdev = to_platform_device(gmu->dev);
> > > >>>>>>  
> > > >>>>>> @@ -1494,10 +1495,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> > > >>>>>>  	gmu->mmio = NULL;
> > > >>>>>>  	gmu->rscc = NULL;
> > > >>>>>>  
> > > >>>>>> -	a6xx_gmu_memory_free(gmu);
> > > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> > > >>>>>> +		a6xx_gmu_memory_free(gmu);
> > > >>>>>>  
> > > >>>>>> -	free_irq(gmu->gmu_irq, gmu);
> > > >>>>>> -	free_irq(gmu->hfi_irq, gmu);
> > > >>>>>> +		free_irq(gmu->gmu_irq, gmu);
> > > >>>>>> +		free_irq(gmu->hfi_irq, gmu);
> > > >>>>>> +	}
> > > >>>>>>  
> > > >>>>>>  	/* Drop reference taken in of_find_device_by_node */
> > > >>>>>>  	put_device(gmu->dev);
> > > >>>>>> @@ -1516,6 +1519,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
> > > >>>>>>  	return 0;
> > > >>>>>>  }
> > > >>>>>>  
> > > >>>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> > > >>>>>> +{
> > > >>>>>> +	struct platform_device *pdev = of_find_device_by_node(node);
> > > >>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> > > >>>>>> +	int ret;
> > > >>>>>> +
> > > >>>>>> +	if (!pdev)
> > > >>>>>> +		return -ENODEV;
> > > >>>>>> +
> > > >>>>>> +	gmu->dev = &pdev->dev;
> > > >>>>>> +
> > > >>>>>> +	of_dma_configure(gmu->dev, node, true);
> > > >>>>> why setup dma for a device that is not actually present?
> > > >>>>>> +
> > > >>>>>> +	pm_runtime_enable(gmu->dev);
> > > >>>>>> +
> > > >>>>>> +	/* Mark legacy for manual SPTPRAC control */
> > > >>>>>> +	gmu->legacy = true;
> > > >>>>>> +
> > > >>>>>> +	/* Map the GMU registers */
> > > >>>>>> +	gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
> > > >>>>>> +	if (IS_ERR(gmu->mmio)) {
> > > >>>>>> +		ret = PTR_ERR(gmu->mmio);
> > > >>>>>> +		goto err_mmio;
> > > >>>>>> +	}
> > > >>>>>> +
> > > >>>>>> +	gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
> > > >>>>>> +	if (IS_ERR(gmu->cxpd)) {
> > > >>>>>> +		ret = PTR_ERR(gmu->cxpd);
> > > >>>>>> +		goto err_mmio;
> > > >>>>>> +	}
> > > >>>>>> +
> > > >>>>>> +	if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
> > > >>>>>> +		ret = -ENODEV;
> > > >>>>>> +		goto detach_cxpd;
> > > >>>>>> +	}
> > > >>>>>> +
> > > >>>>>> +	init_completion(&gmu->pd_gate);
> > > >>>>>> +	complete_all(&gmu->pd_gate);
> > > >>>>>> +	gmu->pd_nb.notifier_call = cxpd_notifier_cb;
> > > >>>>>> +
> > > >>>>>> +	/* Get a link to the GX power domain to reset the GPU */
> > > >>>>>> +	gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
> > > >>>>>> +	if (IS_ERR(gmu->gxpd)) {
> > > >>>>>> +		ret = PTR_ERR(gmu->gxpd);
> > > >>>>>> +		goto err_mmio;
> > > >>>>>> +	}
> > > >>>>>> +
> > > >>>>>> +	gmu->initialized = true;
> > > >>>>>> +
> > > >>>>>> +	return 0;
> > > >>>>>> +
> > > >>>>>> +detach_cxpd:
> > > >>>>>> +	dev_pm_domain_detach(gmu->cxpd, false);
> > > >>>>>> +
> > > >>>>>> +err_mmio:
> > > >>>>>> +	iounmap(gmu->mmio);
> > > >>>>>> +
> > > >>>>>> +	/* Drop reference taken in of_find_device_by_node */
> > > >>>>>> +	put_device(gmu->dev);
> > > >>>>>> +
> > > >>>>>> +	return ret;
> > > >>>>>> +}
> > > >>>>>> +
> > > >>>>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> > > >>>>>>  {
> > > >>>>>>  	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> > > >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > >>>>>> index 931f9f3b3a85..8e0345ffab81 100644
> > > >>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > >>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > >>>>>> @@ -20,9 +20,11 @@ static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
> > > >>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > > >>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > > >>>>>>  
> > > >>>>>> -	/* Check that the GMU is idle */
> > > >>>>>> -	if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
> > > >>>>>> -		return false;
> > > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> > > >>>>>> +		/* Check that the GMU is idle */
> > > >>>>>> +		if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
> > > >>>>>> +			return false;
> > > >>>>>> +	}
> > > >>>>>>  
> > > >>>>>>  	/* Check tha the CX master is idle */
> > > >>>>>>  	if (gpu_read(gpu, REG_A6XX_RBBM_STATUS) &
> > > >>>>>> @@ -612,13 +614,15 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
> > > >>>>>>  		return;
> > > >>>>>>  
> > > >>>>>>  	/* Disable SP clock before programming HWCG registers */
> > > >>>>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
> > > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> > > >>>>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
> > > >>>>>>  
> > > >>>>>>  	for (i = 0; (reg = &adreno_gpu->info->hwcg[i], reg->offset); i++)
> > > >>>>>>  		gpu_write(gpu, reg->offset, state ? reg->value : 0);
> > > >>>>>>  
> > > >>>>>>  	/* Enable SP clock */
> > > >>>>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
> > > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> > > >>>>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
> > > >>>>>>  
> > > >>>>>>  	gpu_write(gpu, REG_A6XX_RBBM_CLOCK_CNTL, state ? clock_cntl_on : 0);
> > > >>>>>>  }
> > > >>>>>> @@ -1018,10 +1022,13 @@ static int hw_init(struct msm_gpu *gpu)
> > > >>>>>>  {
> > > >>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > > >>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > > >>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> > > >>>>>>  	int ret;
> > > >>>>>>  
> > > >>>>>> -	/* Make sure the GMU keeps the GPU on while we set it up */
> > > >>>>>> -	a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> > > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> > > >>>>>> +		/* Make sure the GMU keeps the GPU on while we set it up */
> > > >>>>>> +		a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> > > >>>>>> +	}
> > > >>>>>>  
> > > >>>>>>  	/* Clear GBIF halt in case GX domain was not collapsed */
> > > >>>>>>  	if (a6xx_has_gbif(adreno_gpu))
> > > >>>>>> @@ -1144,6 +1151,17 @@ static int hw_init(struct msm_gpu *gpu)
> > > >>>>>>  			0x3f0243f0);
> > > >>>>>>  	}
> > > >>>>>>  
> > > >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> > > >>>>>> +		/* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
> > > >>>>>> +
> > > >>>>>> +		/* Set up the CX GMU counter 0 to count busy ticks */
> > > >>>>>> +		gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
> > > >>>>>> +
> > > >>>>>> +		/* Enable power counter 0 */
> > > >>>>>> +		gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
> > > >>>>>> +		gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
> > > >>>>>> +	}
> > > >>>>>> +
> > > >>>>>>  	/* Protect registers from the CP */
> > > >>>>>>  	a6xx_set_cp_protect(gpu);
> > > >>>>>>  
> > > >>>>>> @@ -1233,6 +1251,8 @@ static int hw_init(struct msm_gpu *gpu)
> > > >>>>>>  	}
> > > >>>>>>  
> > > >>>>>>  out:
> > > >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> > > >>>>>> +		return ret;
> > > >>>>>>  	/*
> > > >>>>>>  	 * Tell the GMU that we are done touching the GPU and it can start power
> > > >>>>>>  	 * management
> > > >>>>>> @@ -1267,6 +1287,9 @@ static void a6xx_dump(struct msm_gpu *gpu)
> > > >>>>>>  	adreno_dump(gpu);
> > > >>>>>>  }
> > > >>>>>>  
> > > >>>>>> +#define GBIF_GX_HALT_MASK	BIT(0)
> > > >>>>>> +#define GBIF_CLIENT_HALT_MASK	BIT(0)
> > > >>>>>> +#define GBIF_ARB_HALT_MASK	BIT(1)
> > > >>>>>>  #define VBIF_RESET_ACK_TIMEOUT	100
> > > >>>>>>  #define VBIF_RESET_ACK_MASK	0x00f0
> > > >>>>>>  
> > > >>>>>> @@ -1299,7 +1322,8 @@ static void a6xx_recover(struct msm_gpu *gpu)
> > > >>>>>>  	 * Turn off keep alive that might have been enabled by the hang
> > > >>>>>>  	 * interrupt
> > > >>>>>>  	 */
> > > >>>>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
> > > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> > > >>>>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
> > > >>>>>
> > > >>>>> Maybe it is better to move this to a6xx_gmu_force_power_off.
> > > >>>>>
> > > >>>>>>  
> > > >>>>>>  	pm_runtime_dont_use_autosuspend(&gpu->pdev->dev);
> > > >>>>>>  
> > > >>>>>> @@ -1329,6 +1353,32 @@ static void a6xx_recover(struct msm_gpu *gpu)
> > > >>>>>>  
> > > >>>>>>  	dev_pm_genpd_remove_notifier(gmu->cxpd);
> > > >>>>>>  
> > > >>>>>> +	/* Software-reset the GPU */
> > > >>>>>
> > > >>>>> This is not soft reset sequence. We are trying to quiescent gpu - ddr
> > > >>>>> traffic with this sequence.
> > > >>>>>
> > > >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> > > >>>>>> +		/* Halt the GX side of GBIF */
> > > >>>>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, GBIF_GX_HALT_MASK);
> > > >>>>>> +		spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) &
> > > >>>>>> +			   GBIF_GX_HALT_MASK);
> > > >>>>>> +
> > > >>>>>> +		/* Halt new client requests on GBIF */
> > > >>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
> > > >>>>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> > > >>>>>> +			   (GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
> > > >>>>>> +
> > > >>>>>> +		/* Halt all AXI requests on GBIF */
> > > >>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
> > > >>>>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> > > >>>>>> +			   (GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
> > > >>>>>> +
> > > >>>>>> +		/* Clear the halts */
> > > >>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> > > >>>>>> +
> > > >>>>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
> > > >>>>>> +
> > > >>>>>> +		/* This *really* needs to go through before we do anything else! */
> > > >>>>>> +		mb();
> > > >>>>>> +	}
> > > >>>>>> +
> > > >>>>>
> > > >>>>> This sequence should be before we collapse cx gdsc. Also, please see if
> > > >>>>> we can create a subroutine to avoid code dup.
> > > >>>>>
> > > >>>>>>  	pm_runtime_use_autosuspend(&gpu->pdev->dev);
> > > >>>>>>  
> > > >>>>>>  	if (active_submits)
> > > >>>>>> @@ -1463,7 +1513,8 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
> > > >>>>>>  	 * Force the GPU to stay on until after we finish
> > > >>>>>>  	 * collecting information
> > > >>>>>>  	 */
> > > >>>>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
> > > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> > > >>>>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
> > > >>>>>>  
> > > >>>>>>  	DRM_DEV_ERROR(&gpu->pdev->dev,
> > > >>>>>>  		"gpu fault ring %d fence %x status %8.8X rb %4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
> > > >>>>>> @@ -1624,7 +1675,7 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
> > > >>>>>>  		a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
> > > >>>>>>  }
> > > >>>>>>  
> > > >>>>>> -static int a6xx_pm_resume(struct msm_gpu *gpu)
> > > >>>>>> +static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
> > > >>>>>>  {
> > > >>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > > >>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > > >>>>>> @@ -1644,10 +1695,61 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
> > > >>>>>>  
> > > >>>>>>  	a6xx_llc_activate(a6xx_gpu);
> > > >>>>>>  
> > > >>>>>> -	return 0;
> > > >>>>>> +	return ret;
> > > >>>>>>  }
> > > >>>>>>  
> > > >>>>>> -static int a6xx_pm_suspend(struct msm_gpu *gpu)
> > > >>>>>> +static int a6xx_pm_resume(struct msm_gpu *gpu)
> > > >>>>>> +{
> > > >>>>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > > >>>>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > > >>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> > > >>>>>> +	unsigned long freq = 0;
> > > >>>>>> +	struct dev_pm_opp *opp;
> > > >>>>>> +	int ret;
> > > >>>>>> +
> > > >>>>>> +	gpu->needs_hw_init = true;
> > > >>>>>> +
> > > >>>>>> +	trace_msm_gpu_resume(0);
> > > >>>>>> +
> > > >>>>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
> > > >>>>> I think we can ignore gmu lock as there is no real gmu device.
> > > >>>>>
> > > >>>>>> +
> > > >>>>>> +	pm_runtime_resume_and_get(gmu->dev);
> > > >>>>>> +	pm_runtime_resume_and_get(gmu->gxpd);
> > > >>>>>> +
> > > >>>>>> +	/* Set the core clock, having VDD scaling in mind */
> > > >>>>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, gpu->fast_rate);
> > > >>>>>> +	if (ret)
> > > >>>>>> +		goto err_core_clk;
> > > >>>>>> +
> > > >>>>>> +	ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
> > > >>>>>> +	if (ret)
> > > >>>>>> +		goto err_bulk_clk;
> > > >>>>>> +
> > > >>>>>> +	ret = clk_prepare_enable(gpu->ebi1_clk);
> > > >>>>>> +	if (ret)
> > > >>>>>> +		goto err_mem_clk;
> > > >>>>>> +
> > > >>>>>> +	/* If anything goes south, tear the GPU down piece by piece.. */
> > > >>>>>> +	if (ret) {
> > > >>>>>> +err_mem_clk:
> > > >>>>>> +		clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
> > > >>>>>> +err_bulk_clk:
> > > >>>>>> +		opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
> > > >>>>>> +		dev_pm_opp_put(opp);
> > > >>>>>> +		dev_pm_opp_set_rate(&gpu->pdev->dev, 0);
> > > >>>>>> +err_core_clk:
> > > >>>>>> +		pm_runtime_put(gmu->gxpd);
> > > >>>>>> +		pm_runtime_put(gmu->dev);
> > > >>>>>> +	}
> > > >>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> > > >>>>>> +
> > > >>>>>> +	if (!ret)
> > > >>>>>> +		msm_devfreq_resume(gpu);
> > > >>>>>> +
> > > >>>>>> +	return ret;
> > > >>>>>> +}
> > > >>>>>> +
> > > >>>>>> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
> > > >>>>>>  {
> > > >>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > > >>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > > >>>>>> @@ -1674,11 +1776,62 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
> > > >>>>>>  	return 0;
> > > >>>>>>  }
> > > >>>>>>  
> > > >>>>>> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
> > > >>>>>> +{
> > > >>>>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > > >>>>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > > >>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> > > >>>>>> +	unsigned long freq = 0;
> > > >>>>>> +	struct dev_pm_opp *opp;
> > > >>>>>> +	int i, ret;
> > > >>>>>> +
> > > >>>>>> +	trace_msm_gpu_suspend(0);
> > > >>>>>> +
> > > >>>>>> +	opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
> > > >>>>>> +	dev_pm_opp_put(opp);
> > > >>>>>> +
> > > >>>>>> +	msm_devfreq_suspend(gpu);
> > > >>>>>> +
> > > >>>>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
> > > >>>>>> +
> > > >>>>>> +	clk_disable_unprepare(gpu->ebi1_clk);
> > > >>>>>> +
> > > >>>>>> +	clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
> > > >>>>>> +
> > > >>>>>> +	/* Set frequency to the minimum supported level (no 27MHz on A6xx!) */
> > > >>>>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, freq);
> > > >>>>>> +	if (ret)
> > > >>>>>> +		goto err;
> > > >>>>>> +
> > > >>>>>> +	pm_runtime_put_sync(gmu->gxpd);
> > > >>>>>> +	pm_runtime_put_sync(gmu->dev);
> > > >>>>>> +
> > > >>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> > > >>>>>> +
> > > >>>>>> +	if (a6xx_gpu->shadow_bo)
> > > >>>>>> +		for (i = 0; i < gpu->nr_rings; i++)
> > > >>>>>> +			a6xx_gpu->shadow[i] = 0;
> > > >>>>>> +
> > > >>>>>> +	gpu->suspend_count++;
> > > >>>>>> +
> > > >>>>>> +	return 0;
> > > >>>>>> +
> > > >>>>>> +err:
> > > >>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> > > >>>>>> +
> > > >>>>>> +	return ret;
> > > >>>>>> +}
> > > >>>>>> +
> > > >>>>>>  static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> > > >>>>>>  {
> > > >>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > > >>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > > >>>>>>  
> > > >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> > > >>>>>> +		*value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
> > > >>>>>> +		return 0;
> > > >>>>>> +	}
> > > >>>>>> +
> > > >>>>> Instead of wrapper check here, we can just create a separate op. I don't
> > > >>>>> see any benefit in reusing the same function here.
> > > >>>>>
> > > >>>>>
> > > >>>>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
> > > >>>>>>  
> > > >>>>>>  	/* Force the GPU power on so we can read this register */
> > > >>>>>> @@ -1716,7 +1869,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
> > > >>>>>>  		drm_gem_object_put(a6xx_gpu->shadow_bo);
> > > >> 6xx_pm_suspend >>>>  	}
> > > >>>>>>  
> > > >>>>>> -	a6xx_llc_slices_destroy(a6xx_gpu);
> > > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> > > >>>>>> +		a6xx_llc_slices_destroy(a6xx_gpu);
> > > >>>>>>  
> > > >>>>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
> > > >>>>>>  	a6xx_gmu_remove(a6xx_gpu);
> > > >>>>>> @@ -1957,8 +2111,8 @@ static const struct adreno_gpu_funcs funcs = {
> > > >>>>>>  		.set_param = adreno_set_param,
> > > >>>>>>  		.hw_init = a6xx_hw_init,
> > > >>>>>>  		.ucode_load = a6xx_ucode_load,
> > > >>>>>> -		.pm_suspend = a6xx_pm_suspend,
> > > >>>>>> -		.pm_resume = a6xx_pm_resume,
> > > >>>>>> +		.pm_suspend = a6xx_gmu_pm_suspend,
> > > >>>>>> +		.pm_resume = a6xx_gmu_pm_resume,
> > > >>>>>>  		.recover = a6xx_recover,
> > > >>>>>>  		.submit = a6xx_submit,
> > > >>>>>>  		.active_ring = a6xx_active_ring,
> > > >>>>>> @@ -1982,6 +2136,35 @@ static const struct adreno_gpu_funcs funcs = {
> > > >>>>>>  	.get_timestamp = a6xx_get_timestamp,
> > > >>>>>>  };
> > > >>>>>>  
> > > >>>>>> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
> > > >>>>>> +	.base = {
> > > >>>>>> +		.get_param = adreno_get_param,
> > > >>>>>> +		.set_param = adreno_set_param,
> > > >>>>>> +		.hw_init = a6xx_hw_init,
> > > >>>>>> +		.ucode_load = a6xx_ucode_load,
> > > >>>>>> +		.pm_suspend = a6xx_pm_suspend,
> > > >>>>>> +		.pm_resume = a6xx_pm_resume,
> > > >>>>>> +		.recover = a6xx_recover,
> > > >>>>>> +		.submit = a6xx_submit,
> > > >>>>>> +		.active_ring = a6xx_active_ring,
> > > >>>>>> +		.irq = a6xx_irq,
> > > >>>>>> +		.destroy = a6xx_destroy,
> > > >>>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> > > >>>>>> +		.show = a6xx_show,
> > > >>>>>> +#endif
> > > >>>>>> +		.gpu_busy = a6xx_gpu_busy,
> > > >>>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> > > >>>>>> +		.gpu_state_get = a6xx_gpu_state_get,
> > > >>>>>> +		.gpu_state_put = a6xx_gpu_state_put,
> > > >>>>>> +#endif
> > > >>>>>> +		.create_address_space = a6xx_create_address_space,
> > > >>>>>> +		.create_private_address_space = a6xx_create_private_address_space,
> > > >>>>>> +		.get_rptr = a6xx_get_rptr,
> > > >>>>>> +		.progress = a6xx_progress,
> > > >>>>>> +	},
> > > >>>>>> +	.get_timestamp = a6xx_get_timestamp,
> > > >>>>>> +};
> > > >>>>>> +
> > > >>>>>>  struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> > > >>>>>>  {
> > > >>>>>>  	struct msm_drm_private *priv = dev->dev_private;
> > > >>>>>> @@ -2003,18 +2186,36 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> > > >>>>>>  
> > > >>>>>>  	adreno_gpu->registers = NULL;
> > > >>>>>>  
> > > >>>>>> +	/* Check if there is a GMU phandle and set it up */
> > > >>>>>> +	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> > > >>>>>> +	/* FIXME: How do we gracefully handle this? */
> > > >>>>>> +	BUG_ON(!node);
> > > >>>>> How will you handle this BUG() when there is no GMU (a610 gpu)?
> > > >>>>>
> > > >>>>>> +
> > > >>>>>> +	adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
> > > >>>>>> +
> > > >>>>>>  	/*
> > > >>>>>>  	 * We need to know the platform type before calling into adreno_gpu_init
> > > >>>>>>  	 * so that the hw_apriv flag can be correctly set. Snoop into the info
> > > >>>>>>  	 * and grab the revision number
> > > >>>>>>  	 */
> > > >>>>>>  	info = adreno_info(config->rev);
> > > >>>>>> -
> > > >>>>>> -	if (info && (info->revn == 650 || info->revn == 660 ||
> > > >>>>>> -			adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
> > > >>>>>> +	if (!info)
> > > >>>>>> +		return ERR_PTR(-EINVAL);
> > > >>>>>> +
> > > >>>>>> +	/* Assign these early so that we can use the is_aXYZ helpers */
> > > >>>>>> +	/* Numeric revision IDs (e.g. 630) */
> > > >>>>>> +	adreno_gpu->revn = info->revn;
> > > >>>>>> +	/* New-style ADRENO_REV()-only */
> > > >>>>>> +	adreno_gpu->rev = info->rev;
> > > >>>>>> +	/* Quirk data */
> > > >>>>>> +	adreno_gpu->info = info;
> > > >>>>>> +
> > > >>>>>> +	if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
> > > >>>>>>  		adreno_gpu->base.hw_apriv = true;
> > > >>>>>>  
> > > >>>>>> -	a6xx_llc_slices_init(pdev, a6xx_gpu);
> > > >>>>>> +	/* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
> > > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> > > >>>>>> +		a6xx_llc_slices_init(pdev, a6xx_gpu);
> > > >>>>>>  
> > > >>>>>>  	ret = a6xx_set_supported_hw(&pdev->dev, config->rev);
> > > >>>>>>  	if (ret) {
> > > >>>>>> @@ -2022,7 +2223,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> > > >>>>>>  		return ERR_PTR(ret);
> > > >>>>>>  	}
> > > >>>>>>  
> > > >>>>>> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> > > >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> > > >>>>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
> > > >>>>>> +	else
> > > >>>>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> > > >>>>>>  	if (ret) {
> > > >>>>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
> > > >>>>>>  		return ERR_PTR(ret);
> > > >>>>>> @@ -2035,13 +2239,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> > > >>>>>>  	if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
> > > >>>>>>  		priv->gpu_clamp_to_idle = true;
> > > >>>>>>  
> > > >>>>>> -	/* Check if there is a GMU phandle and set it up */
> > > >>>>>> -	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> > > >>>>>> -
> > > >>>>>> -	/* FIXME: How do we gracefully handle this? */
> > > >>>>>> -	BUG_ON(!node);
> > > >>>>>> -
> > > >>>>>> -	ret = a6xx_gmu_init(a6xx_gpu, node);
> > > >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> > > >>>>>> +		ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
> > > >>>>>> +	else
> > > >>>>>> +		ret = a6xx_gmu_init(a6xx_gpu, node);
> > > >>>>>>  	of_node_put(node);
> > > >>>>>>  	if (ret) {
> > > >>>>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
> > > >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> > > >>>>>> index eea2e60ce3b7..51a7656072fa 100644
> > > >>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> > > >>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> > > >>>>>> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> > > >>>>>>  void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> > > >>>>>>  
> > > >>>>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> > > >>>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> > > >>>>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
> > > >>>>>>  
> > > >>>>>>  void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> > > >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> > > >>>>>> index 30ecdff363e7..4e5d650578c6 100644
> > > >>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> > > >>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> > > >>>>>> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
> > > >>>>>>  	/* Get the generic state from the adreno core */
> > > >>>>>>  	adreno_gpu_state_get(gpu, &a6xx_state->base);
> > > >>>>>>  
> > > >>>>>> -	a6xx_get_gmu_registers(gpu, a6xx_state);
> > > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> > > >>>>> nit: Kinda misleading function name to a layman. Should we invert the
> > > >>>>> function to "adreno_has_gmu"?
> > > >>>>>
> > > >>>>> -Akhil
> > > >>>>>> +		a6xx_get_gmu_registers(gpu, a6xx_state);
> > > >>>>>>  
> > > >>>>>> -	a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> > > >>>>>> -	a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> > > >>>>>> -	a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> > > >>>>>> +		a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> > > >>>>>> +		a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> > > >>>>>> +		a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> > > >>>>>>  
> > > >>>>>> -	a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> > > >>>>>> +		a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> > > >>>>>> +	}
> > > >>>>>>  
> > > >>>>>>  	/* If GX isn't on the rest of the data isn't going to be accessible */
> > > >>>>>> -	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> > > >>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> > > >>>>>>  		return &a6xx_state->base;
> > > >>>>>>  
> > > >>>>>>  	/* Get the banks of indexed registers */
> > > >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > > >>>>>> index 6934cee07d42..5c5901d65950 100644
> > > >>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > > >>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > > >>>>>> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
> > > >>>>>>  		if (!adreno_gpu->info->fw[i])
> > > >>>>>>  			continue;
> > > >>>>>>  
> > > >>>>>> +		/* Skip loading GMU firwmare with GMU Wrapper */
> > > >>>>>> +		if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
> > > >>>>>> +			continue;
> > > >>>>>> +
> > > >>>>>>  		/* Skip if the firmware has already been loaded */
> > > >>>>>>  		if (adreno_gpu->fw[i])
> > > >>>>>>  			continue;
> > > >>>>>> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
> > > >>>>>>  	u32 speedbin;
> > > >>>>>>  	int ret;
> > > >>>>>>  
> > > >>>>>> -	/* Only handle the core clock when GMU is not in use */
> > > >>>>>> -	if (config->rev.core < 6) {
> > > >>>>>> +	/* Only handle the core clock when GMU is not in use (or is absent). */
> > > >>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
> > > >>>>>>  		/*
> > > >>>>>>  		 * This can only be done before devm_pm_opp_of_add_table(), or
> > > >>>>>>  		 * dev_pm_opp_set_config() will WARN_ON()
> > > >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > > >>>>>> index f62612a5c70f..ee5352bc5329 100644
> > > >>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > > >>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > > >>>>>> @@ -115,6 +115,7 @@ struct adreno_gpu {
> > > >>>>>>  	 * code (a3xx_gpu.c) and stored in this common location.
> > > >>>>>>  	 */
> > > >>>>>>  	const unsigned int *reg_offsets;
> > > >>>>>> +	bool gmu_is_wrapper;
> > > >>>>>>  };
> > > >>>>>>  #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
> > > >>>>>>  
> > > >>>>>> @@ -145,6 +146,11 @@ struct adreno_platform_config {
> > > >>>>>>  
> > > >>>>>>  bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
> > > >>>>>>  
> > > >>>>>> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
> > > >>>>>> +{
> > > >>>>>> +	return gpu->gmu_is_wrapper;
> > > >>>>>> +}
> > > >>>>>> +
> > > >>>>>>  static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
> > > >>>>>>  {
> > > >>>>>>  	return (gpu->revn < 300);
> > > >>>>>>
> > > >>>>>> -- 
> > > >>>>>> 2.40.0
> > > >>>>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support
  2023-05-06 14:46               ` Akhil P Oommen
  2023-05-06 20:46                 ` [Freedreno] " Akhil P Oommen
@ 2023-05-08  8:59                 ` Konrad Dybcio
  2023-05-08 21:15                   ` Akhil P Oommen
  1 sibling, 1 reply; 30+ messages in thread
From: Konrad Dybcio @ 2023-05-08  8:59 UTC (permalink / raw)
  To: Akhil P Oommen
  Cc: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, linux-arm-msm, dri-devel,
	freedreno, devicetree, linux-kernel, Rob Clark, Marijn Suijten



On 6.05.2023 16:46, Akhil P Oommen wrote:
> On Fri, May 05, 2023 at 12:35:18PM +0200, Konrad Dybcio wrote:
>>
>>
>> On 5.05.2023 10:46, Akhil P Oommen wrote:
>>> On Thu, May 04, 2023 at 08:34:07AM +0200, Konrad Dybcio wrote:
>>>>
>>>>
>>>> On 3.05.2023 22:32, Akhil P Oommen wrote:
>>>>> On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
>>>>>>
>>>>>>
>>>>>> On 2.05.2023 09:49, Akhil P Oommen wrote:
>>>>>>> On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
>>>>>>>> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
>>>>>>>> but don't implement the associated GMUs. This is due to the fact that
>>>>>>>> the GMU directly pokes at RPMh. Sadly, this means we have to take care
>>>>>>>> of enabling & scaling power rails, clocks and bandwidth ourselves.
>>>>>>>>
>>>>>>>> Reuse existing Adreno-common code and modify the deeply-GMU-infused
>>>>>>>> A6XX code to facilitate these GPUs. This involves if-ing out lots
>>>>>>>> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
>>>>>>>> the actual name that Qualcomm uses in their downstream kernels).
>>>>>>>>
>>>>>>>> This is essentially a register region which is convenient to model
>>>>>>>> as a device. We'll use it for managing the GDSCs. The register
>>>>>>>> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
>>>>>>>> and lets us reuse quite a bit of gmu_read/write/rmw calls.
>>>>>>> << I sent a reply to this patch earlier, but not sure where it went.
>>>>>>> Still figuring out Mutt... >>
>>>>>> Answered it here:
>>>>>>
>>>>>> https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8f77@linaro.org/
>>>>>
>>>>> Thanks. Will check and respond there if needed.
>>>>>
>>>>>>
>>>>>> I don't think I see any new comments in this "reply revision" (heh), so please
>>>>>> check that one out.
>>>>>>
>>>>>>>
>>>>>>> Only convenience I found is that we can reuse gmu register ops in a few
>>>>>>> places (< 10 I think). If we just model this as another gpu memory
>>>>>>> region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
>>>>>>> architecture code with clean separation. Also, it looks like we need to
>>>>>>> keep a dummy gmu platform device in the devicetree with the current
>>>>>>> approach. That doesn't sound right.
>>>>>> That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
>>>>>> need additional, gmuwrapper-configuration specific code anyway, as
>>>>>> OPP & genpd will no longer make use of the default behavior which
>>>>>> only gets triggered if there's a single power-domains=<> entry, afaicu.
>>>>> Can you please tell me which specific *default behviour* do you mean here?
>>>>> I am curious to know what I am overlooking here. We can always get a cxpd/gxpd device
>>>>> and vote for the gdscs directly from the driver. Anything related to
>>>>> OPP?
>>>> I *believe* this is true:
>>>>
>>>> if (ARRAY_SIZE(power-domains) == 1) {
>>>> 	of generic code will enable the power domain at .probe time
>>> we need to handle the voting directly. I recently shared a patch to
>>> vote cx gdsc from gpu driver. Maybe we can ignore this when gpu has
>>> only cx rail due to this logic you quoted here.
>>>
>>> I see that you have handled it mostly correctly from the gpu driver in the updated
>>> a6xx_pm_suspend() callback. Just the power domain device ptrs should be moved to
>>> gpu from gmu.
>>>
>>>>
>>>> 	opp APIs will default to scaling that domain with required-opps
>>>
>>>> }
>>>>
>>>> and we do need to put GX/CX (with an MX parent to match) there, as the
>>>> AP is responsible for voting in this configuration
>>>
>>> We should vote to turn ON gx/cx headswitches through genpd from gpu driver. When you vote for
>>> core clk frequency, *clock driver is supposed to scale* all the necessary
>>> regulators. At least that is how downstream works. You can refer the downstream
>>> gpucc clk driver of these SoCs. I am not sure how much of that can be easily converted to
>>> upstream.
>>>
>>> Also, how does having a gmu dt node help in this regard? Feel free to
>>> elaborate, I am not very familiar with clk/regulator implementations.
>> Okay so I think we have a bit of a confusion here.
>>
>> Currently, with this patchset we manage things like this:
>>
>> 1. GPU has a VDD_GX (or equivalent[1]) line passed in power-domains=<>, which
>>    is then used with OPP APIs to ensure it's being scaled on freq change [2].
>>    The VDD_lines coming from RPM(h) are described as power domains upstream
>>    *unlike downstream*, which represents them as regulators with preset voltage
>>    steps (and perhaps that's what had you confused). What's more is that GDSCs
>>    are also modeled as genpds instead of regulators, hence they sort of "fight"
>>    for the spot in power-domains=<> of a given node.
> 
> Thanks for clarifying. I didn't get this part "hence they sort of "fight" for the spot in power-domains".
> What spot exactly did you mean here? The spot for PD to be used during scaling?
> 
> It seems like you are hinting that there is some sort of limitation in keeping all the
> 3 power domains (cx gdsc, gx gdsc and cx rail) under the gpu node in dt. Please explain
> why we can't keep all the 3 power domains under gpu node and call an API
> (devm_pm_opp_attach_genpd() ??) to select the power domain which should be scaled?
Eh we could, but this adds a lot of boilerplate code:

- genpd handling with get/put (I'm no genpd master but devm_pm_opp_attach_genpd
  sounds rather hacky to me)
- new r/w/rmw functions would need to be introduced for accessing
  GMU_CX registers as the reg defines wouldn't match so we'd have
  to include a new magic offset
- all the reused gmu_ callbacks would need to be separated out
- A619_holi would be even more problematic to distinguish from A619,
  similar story goes for firmware loading requirements

> 
>>
>> 2. GMU wrapper gets CX_GDSC & GX_GDSC handles in power-domains=<> (just like
>>    the real GMU in the current state of upstream [3]), which are then governed
>>    through explicit genpd calls to turn them on/off when the GPU resume/suspend/
>>    crash recovery functions are called.
>>
>> 3. GPUs with GMU, like A630, don't get any power-domains=<> entries in DT,
>>    instead relying on the GMU firmware to communicate necessary requests
>>    to the VDD_xyz resources directly to RPMh, as part of the DVFS routines.
>>    If GMU wasn't so smart, we would have to do the exact same VDD_xyz+OPP dance
>>    there - that's precisely what's going on under the hood.
>>
>> 4. Adreno SMMU gets a handle to CX_GDSC so that when OF probe funcs are called,
>>    (and SMMUs probe way way before all things drm) the headswitch is de-asserted
>>    and its registers and related clocks are accessible.
>>
>>
>> All this makes me believe the way I generally architected things in
>> this series is correct.
>>
>>
>> [1] A610 (and I think A612) lack a VDD_GX line, so they power the GPU from
>>     VDD_CX, but that's just an implementation detail which is handled by
>>     simply passing the correct one in DTS, the code doesn't care.
>>
>> [2] Hence my recent changes to use dev_pm_opp_set_rate() wherever possible,
>>     this func reads requires-opps in OPP table entries and ensures to elevate
>>     the GENPD's performance state before switching frequencies
>>
>> [3] Please take a look at the "end product" here:
>>     https://github.com/SoMainline/linux/commit/fb16757c3bf4c087ac597d70c7a98755d46bb323
>>     you can open e.g. sdm845.dtsi for comparison with real GMU
> 
> This dt definition for a610 gpu clearly shows the issue I have here. Someone
> looking at this gets a very wrong picture about the platform like there is actually nothing
> resembling a gmu IP in a610. Is gmu or gmu-cx register region really present in this hw?
Yes it is! Take a look at this hunk for example

if (adreno_has_gmu_wrapper(adreno_gpu)) {
	/* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */

	/* Set up the CX GMU counter 0 to count busy ticks */
	gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);

	/* Enable power counter 0 */
	gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
	gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
}

they are present, functional and at a predictable offset, just like
the real thing! The "only" difference is that there's no GMU_GX, so
registers responsible for interfacing with the MCU that lives (well,
would live) on the GPU island. For all I know, it may actually even
be physically there, just disabled / fused off as there's no RPMh to
communicate with.. In fact, I'd suspect that's precisely the case
on A619_holi (vs normal A619 which uses a GMU on RPMh-enabled SM6350).

Perhaps I could rename the GMU wrapper's reg-names entry to "gmu_cx" to
make things more obvious?

Konrad

> 
> Just a side note about the dt file you shared:
> 	1. At line: 1243, It shouldn't have gx gdsc, right?
> 	2. At line: 1172, SM6115_VDDCX -> SM6115_VDDGX?
> 
> -Akhil
(ignoring as agreed in your replies)

> 
>>
>> I hope this answers your concerns. If not, I'll be happy to elaborate.
>>
>> Konrad
>>>
>>> -Akhil.
>>>>
>>>> Konrad
>>>>>
>>>>> -Akhil
>>>>>>
>>>>>> If nothing else, this is a very convenient way to model a part of the
>>>>>> GPU (as that's essentially what GMU_CX is, to my understanding) and
>>>>>> the bindings people didn't shoot me in the head for proposing this, so
>>>>>> I assume it'd be cool to pursue this..
>>>>>>
>>>>>> Konrad
>>>>>>>>
>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
>>>>>>>> ---
>>>>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c       |  72 +++++++-
>>>>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 255 +++++++++++++++++++++++++---
>>>>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h       |   1 +
>>>>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
>>>>>>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.c     |   8 +-
>>>>>>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.h     |   6 +
>>>>>>>>  6 files changed, 318 insertions(+), 38 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>>>>>> index 87babbb2a19f..b1acdb027205 100644
>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>>>>>> @@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, struct platform_device *pdev,
>>>>>>>>  
>>>>>>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>>>>>>>>  {
>>>>>>>> +	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>>>>>>>>  	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>>>>>  	struct platform_device *pdev = to_platform_device(gmu->dev);
>>>>>>>>  
>>>>>>>> @@ -1494,10 +1495,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>>>>>>>>  	gmu->mmio = NULL;
>>>>>>>>  	gmu->rscc = NULL;
>>>>>>>>  
>>>>>>>> -	a6xx_gmu_memory_free(gmu);
>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>>>> +		a6xx_gmu_memory_free(gmu);
>>>>>>>>  
>>>>>>>> -	free_irq(gmu->gmu_irq, gmu);
>>>>>>>> -	free_irq(gmu->hfi_irq, gmu);
>>>>>>>> +		free_irq(gmu->gmu_irq, gmu);
>>>>>>>> +		free_irq(gmu->hfi_irq, gmu);
>>>>>>>> +	}
>>>>>>>>  
>>>>>>>>  	/* Drop reference taken in of_find_device_by_node */
>>>>>>>>  	put_device(gmu->dev);
>>>>>>>> @@ -1516,6 +1519,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
>>>>>>>>  	return 0;
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
>>>>>>>> +{
>>>>>>>> +	struct platform_device *pdev = of_find_device_by_node(node);
>>>>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>>>>> +	int ret;
>>>>>>>> +
>>>>>>>> +	if (!pdev)
>>>>>>>> +		return -ENODEV;
>>>>>>>> +
>>>>>>>> +	gmu->dev = &pdev->dev;
>>>>>>>> +
>>>>>>>> +	of_dma_configure(gmu->dev, node, true);
>>>>>>> why setup dma for a device that is not actually present?
>>>>>>>> +
>>>>>>>> +	pm_runtime_enable(gmu->dev);
>>>>>>>> +
>>>>>>>> +	/* Mark legacy for manual SPTPRAC control */
>>>>>>>> +	gmu->legacy = true;
>>>>>>>> +
>>>>>>>> +	/* Map the GMU registers */
>>>>>>>> +	gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
>>>>>>>> +	if (IS_ERR(gmu->mmio)) {
>>>>>>>> +		ret = PTR_ERR(gmu->mmio);
>>>>>>>> +		goto err_mmio;
>>>>>>>> +	}
>>>>>>>> +
>>>>>>>> +	gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
>>>>>>>> +	if (IS_ERR(gmu->cxpd)) {
>>>>>>>> +		ret = PTR_ERR(gmu->cxpd);
>>>>>>>> +		goto err_mmio;
>>>>>>>> +	}
>>>>>>>> +
>>>>>>>> +	if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
>>>>>>>> +		ret = -ENODEV;
>>>>>>>> +		goto detach_cxpd;
>>>>>>>> +	}
>>>>>>>> +
>>>>>>>> +	init_completion(&gmu->pd_gate);
>>>>>>>> +	complete_all(&gmu->pd_gate);
>>>>>>>> +	gmu->pd_nb.notifier_call = cxpd_notifier_cb;
>>>>>>>> +
>>>>>>>> +	/* Get a link to the GX power domain to reset the GPU */
>>>>>>>> +	gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
>>>>>>>> +	if (IS_ERR(gmu->gxpd)) {
>>>>>>>> +		ret = PTR_ERR(gmu->gxpd);
>>>>>>>> +		goto err_mmio;
>>>>>>>> +	}
>>>>>>>> +
>>>>>>>> +	gmu->initialized = true;
>>>>>>>> +
>>>>>>>> +	return 0;
>>>>>>>> +
>>>>>>>> +detach_cxpd:
>>>>>>>> +	dev_pm_domain_detach(gmu->cxpd, false);
>>>>>>>> +
>>>>>>>> +err_mmio:
>>>>>>>> +	iounmap(gmu->mmio);
>>>>>>>> +
>>>>>>>> +	/* Drop reference taken in of_find_device_by_node */
>>>>>>>> +	put_device(gmu->dev);
>>>>>>>> +
>>>>>>>> +	return ret;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
>>>>>>>>  {
>>>>>>>>  	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>>>>>> index 931f9f3b3a85..8e0345ffab81 100644
>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>>>>>> @@ -20,9 +20,11 @@ static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
>>>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>>>>  
>>>>>>>> -	/* Check that the GMU is idle */
>>>>>>>> -	if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
>>>>>>>> -		return false;
>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>>>> +		/* Check that the GMU is idle */
>>>>>>>> +		if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
>>>>>>>> +			return false;
>>>>>>>> +	}
>>>>>>>>  
>>>>>>>>  	/* Check tha the CX master is idle */
>>>>>>>>  	if (gpu_read(gpu, REG_A6XX_RBBM_STATUS) &
>>>>>>>> @@ -612,13 +614,15 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
>>>>>>>>  		return;
>>>>>>>>  
>>>>>>>>  	/* Disable SP clock before programming HWCG registers */
>>>>>>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
>>>>>>>>  
>>>>>>>>  	for (i = 0; (reg = &adreno_gpu->info->hwcg[i], reg->offset); i++)
>>>>>>>>  		gpu_write(gpu, reg->offset, state ? reg->value : 0);
>>>>>>>>  
>>>>>>>>  	/* Enable SP clock */
>>>>>>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
>>>>>>>>  
>>>>>>>>  	gpu_write(gpu, REG_A6XX_RBBM_CLOCK_CNTL, state ? clock_cntl_on : 0);
>>>>>>>>  }
>>>>>>>> @@ -1018,10 +1022,13 @@ static int hw_init(struct msm_gpu *gpu)
>>>>>>>>  {
>>>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>>>>>  	int ret;
>>>>>>>>  
>>>>>>>> -	/* Make sure the GMU keeps the GPU on while we set it up */
>>>>>>>> -	a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>>>> +		/* Make sure the GMU keeps the GPU on while we set it up */
>>>>>>>> +		a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
>>>>>>>> +	}
>>>>>>>>  
>>>>>>>>  	/* Clear GBIF halt in case GX domain was not collapsed */
>>>>>>>>  	if (a6xx_has_gbif(adreno_gpu))
>>>>>>>> @@ -1144,6 +1151,17 @@ static int hw_init(struct msm_gpu *gpu)
>>>>>>>>  			0x3f0243f0);
>>>>>>>>  	}
>>>>>>>>  
>>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>>>> +		/* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
>>>>>>>> +
>>>>>>>> +		/* Set up the CX GMU counter 0 to count busy ticks */
>>>>>>>> +		gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
>>>>>>>> +
>>>>>>>> +		/* Enable power counter 0 */
>>>>>>>> +		gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
>>>>>>>> +		gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
>>>>>>>> +	}
>>>>>>>> +
>>>>>>>>  	/* Protect registers from the CP */
>>>>>>>>  	a6xx_set_cp_protect(gpu);
>>>>>>>>  
>>>>>>>> @@ -1233,6 +1251,8 @@ static int hw_init(struct msm_gpu *gpu)
>>>>>>>>  	}
>>>>>>>>  
>>>>>>>>  out:
>>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>> +		return ret;
>>>>>>>>  	/*
>>>>>>>>  	 * Tell the GMU that we are done touching the GPU and it can start power
>>>>>>>>  	 * management
>>>>>>>> @@ -1267,6 +1287,9 @@ static void a6xx_dump(struct msm_gpu *gpu)
>>>>>>>>  	adreno_dump(gpu);
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +#define GBIF_GX_HALT_MASK	BIT(0)
>>>>>>>> +#define GBIF_CLIENT_HALT_MASK	BIT(0)
>>>>>>>> +#define GBIF_ARB_HALT_MASK	BIT(1)
>>>>>>>>  #define VBIF_RESET_ACK_TIMEOUT	100
>>>>>>>>  #define VBIF_RESET_ACK_MASK	0x00f0
>>>>>>>>  
>>>>>>>> @@ -1299,7 +1322,8 @@ static void a6xx_recover(struct msm_gpu *gpu)
>>>>>>>>  	 * Turn off keep alive that might have been enabled by the hang
>>>>>>>>  	 * interrupt
>>>>>>>>  	 */
>>>>>>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
>>>>>>>
>>>>>>> Maybe it is better to move this to a6xx_gmu_force_power_off.
>>>>>>>
>>>>>>>>  
>>>>>>>>  	pm_runtime_dont_use_autosuspend(&gpu->pdev->dev);
>>>>>>>>  
>>>>>>>> @@ -1329,6 +1353,32 @@ static void a6xx_recover(struct msm_gpu *gpu)
>>>>>>>>  
>>>>>>>>  	dev_pm_genpd_remove_notifier(gmu->cxpd);
>>>>>>>>  
>>>>>>>> +	/* Software-reset the GPU */
>>>>>>>
>>>>>>> This is not soft reset sequence. We are trying to quiescent gpu - ddr
>>>>>>> traffic with this sequence.
>>>>>>>
>>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>>>> +		/* Halt the GX side of GBIF */
>>>>>>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, GBIF_GX_HALT_MASK);
>>>>>>>> +		spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) &
>>>>>>>> +			   GBIF_GX_HALT_MASK);
>>>>>>>> +
>>>>>>>> +		/* Halt new client requests on GBIF */
>>>>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
>>>>>>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
>>>>>>>> +			   (GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
>>>>>>>> +
>>>>>>>> +		/* Halt all AXI requests on GBIF */
>>>>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
>>>>>>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
>>>>>>>> +			   (GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
>>>>>>>> +
>>>>>>>> +		/* Clear the halts */
>>>>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
>>>>>>>> +
>>>>>>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
>>>>>>>> +
>>>>>>>> +		/* This *really* needs to go through before we do anything else! */
>>>>>>>> +		mb();
>>>>>>>> +	}
>>>>>>>> +
>>>>>>>
>>>>>>> This sequence should be before we collapse cx gdsc. Also, please see if
>>>>>>> we can create a subroutine to avoid code dup.
>>>>>>>
>>>>>>>>  	pm_runtime_use_autosuspend(&gpu->pdev->dev);
>>>>>>>>  
>>>>>>>>  	if (active_submits)
>>>>>>>> @@ -1463,7 +1513,8 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
>>>>>>>>  	 * Force the GPU to stay on until after we finish
>>>>>>>>  	 * collecting information
>>>>>>>>  	 */
>>>>>>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
>>>>>>>>  
>>>>>>>>  	DRM_DEV_ERROR(&gpu->pdev->dev,
>>>>>>>>  		"gpu fault ring %d fence %x status %8.8X rb %4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
>>>>>>>> @@ -1624,7 +1675,7 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
>>>>>>>>  		a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> -static int a6xx_pm_resume(struct msm_gpu *gpu)
>>>>>>>> +static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
>>>>>>>>  {
>>>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>>>> @@ -1644,10 +1695,61 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
>>>>>>>>  
>>>>>>>>  	a6xx_llc_activate(a6xx_gpu);
>>>>>>>>  
>>>>>>>> -	return 0;
>>>>>>>> +	return ret;
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> -static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>>>>>>> +static int a6xx_pm_resume(struct msm_gpu *gpu)
>>>>>>>> +{
>>>>>>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>>>>> +	unsigned long freq = 0;
>>>>>>>> +	struct dev_pm_opp *opp;
>>>>>>>> +	int ret;
>>>>>>>> +
>>>>>>>> +	gpu->needs_hw_init = true;
>>>>>>>> +
>>>>>>>> +	trace_msm_gpu_resume(0);
>>>>>>>> +
>>>>>>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
>>>>>>> I think we can ignore gmu lock as there is no real gmu device.
>>>>>>>
>>>>>>>> +
>>>>>>>> +	pm_runtime_resume_and_get(gmu->dev);
>>>>>>>> +	pm_runtime_resume_and_get(gmu->gxpd);
>>>>>>>> +
>>>>>>>> +	/* Set the core clock, having VDD scaling in mind */
>>>>>>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, gpu->fast_rate);
>>>>>>>> +	if (ret)
>>>>>>>> +		goto err_core_clk;
>>>>>>>> +
>>>>>>>> +	ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
>>>>>>>> +	if (ret)
>>>>>>>> +		goto err_bulk_clk;
>>>>>>>> +
>>>>>>>> +	ret = clk_prepare_enable(gpu->ebi1_clk);
>>>>>>>> +	if (ret)
>>>>>>>> +		goto err_mem_clk;
>>>>>>>> +
>>>>>>>> +	/* If anything goes south, tear the GPU down piece by piece.. */
>>>>>>>> +	if (ret) {
>>>>>>>> +err_mem_clk:
>>>>>>>> +		clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
>>>>>>>> +err_bulk_clk:
>>>>>>>> +		opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
>>>>>>>> +		dev_pm_opp_put(opp);
>>>>>>>> +		dev_pm_opp_set_rate(&gpu->pdev->dev, 0);
>>>>>>>> +err_core_clk:
>>>>>>>> +		pm_runtime_put(gmu->gxpd);
>>>>>>>> +		pm_runtime_put(gmu->dev);
>>>>>>>> +	}
>>>>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
>>>>>>>> +
>>>>>>>> +	if (!ret)
>>>>>>>> +		msm_devfreq_resume(gpu);
>>>>>>>> +
>>>>>>>> +	return ret;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
>>>>>>>>  {
>>>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>>>> @@ -1674,11 +1776,62 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>>>>>>>  	return 0;
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>>>>>>> +{
>>>>>>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>>>>> +	unsigned long freq = 0;
>>>>>>>> +	struct dev_pm_opp *opp;
>>>>>>>> +	int i, ret;
>>>>>>>> +
>>>>>>>> +	trace_msm_gpu_suspend(0);
>>>>>>>> +
>>>>>>>> +	opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
>>>>>>>> +	dev_pm_opp_put(opp);
>>>>>>>> +
>>>>>>>> +	msm_devfreq_suspend(gpu);
>>>>>>>> +
>>>>>>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
>>>>>>>> +
>>>>>>>> +	clk_disable_unprepare(gpu->ebi1_clk);
>>>>>>>> +
>>>>>>>> +	clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
>>>>>>>> +
>>>>>>>> +	/* Set frequency to the minimum supported level (no 27MHz on A6xx!) */
>>>>>>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, freq);
>>>>>>>> +	if (ret)
>>>>>>>> +		goto err;
>>>>>>>> +
>>>>>>>> +	pm_runtime_put_sync(gmu->gxpd);
>>>>>>>> +	pm_runtime_put_sync(gmu->dev);
>>>>>>>> +
>>>>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
>>>>>>>> +
>>>>>>>> +	if (a6xx_gpu->shadow_bo)
>>>>>>>> +		for (i = 0; i < gpu->nr_rings; i++)
>>>>>>>> +			a6xx_gpu->shadow[i] = 0;
>>>>>>>> +
>>>>>>>> +	gpu->suspend_count++;
>>>>>>>> +
>>>>>>>> +	return 0;
>>>>>>>> +
>>>>>>>> +err:
>>>>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
>>>>>>>> +
>>>>>>>> +	return ret;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>  static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>>>>>>>>  {
>>>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>>>>  
>>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>>>> +		*value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
>>>>>>>> +		return 0;
>>>>>>>> +	}
>>>>>>>> +
>>>>>>> Instead of wrapper check here, we can just create a separate op. I don't
>>>>>>> see any benefit in reusing the same function here.
>>>>>>>
>>>>>>>
>>>>>>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
>>>>>>>>  
>>>>>>>>  	/* Force the GPU power on so we can read this register */
>>>>>>>> @@ -1716,7 +1869,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
>>>>>>>>  		drm_gem_object_put(a6xx_gpu->shadow_bo);
>>>> 6xx_pm_suspend >>>>  	}
>>>>>>>>  
>>>>>>>> -	a6xx_llc_slices_destroy(a6xx_gpu);
>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>> +		a6xx_llc_slices_destroy(a6xx_gpu);
>>>>>>>>  
>>>>>>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
>>>>>>>>  	a6xx_gmu_remove(a6xx_gpu);
>>>>>>>> @@ -1957,8 +2111,8 @@ static const struct adreno_gpu_funcs funcs = {
>>>>>>>>  		.set_param = adreno_set_param,
>>>>>>>>  		.hw_init = a6xx_hw_init,
>>>>>>>>  		.ucode_load = a6xx_ucode_load,
>>>>>>>> -		.pm_suspend = a6xx_pm_suspend,
>>>>>>>> -		.pm_resume = a6xx_pm_resume,
>>>>>>>> +		.pm_suspend = a6xx_gmu_pm_suspend,
>>>>>>>> +		.pm_resume = a6xx_gmu_pm_resume,
>>>>>>>>  		.recover = a6xx_recover,
>>>>>>>>  		.submit = a6xx_submit,
>>>>>>>>  		.active_ring = a6xx_active_ring,
>>>>>>>> @@ -1982,6 +2136,35 @@ static const struct adreno_gpu_funcs funcs = {
>>>>>>>>  	.get_timestamp = a6xx_get_timestamp,
>>>>>>>>  };
>>>>>>>>  
>>>>>>>> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
>>>>>>>> +	.base = {
>>>>>>>> +		.get_param = adreno_get_param,
>>>>>>>> +		.set_param = adreno_set_param,
>>>>>>>> +		.hw_init = a6xx_hw_init,
>>>>>>>> +		.ucode_load = a6xx_ucode_load,
>>>>>>>> +		.pm_suspend = a6xx_pm_suspend,
>>>>>>>> +		.pm_resume = a6xx_pm_resume,
>>>>>>>> +		.recover = a6xx_recover,
>>>>>>>> +		.submit = a6xx_submit,
>>>>>>>> +		.active_ring = a6xx_active_ring,
>>>>>>>> +		.irq = a6xx_irq,
>>>>>>>> +		.destroy = a6xx_destroy,
>>>>>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
>>>>>>>> +		.show = a6xx_show,
>>>>>>>> +#endif
>>>>>>>> +		.gpu_busy = a6xx_gpu_busy,
>>>>>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
>>>>>>>> +		.gpu_state_get = a6xx_gpu_state_get,
>>>>>>>> +		.gpu_state_put = a6xx_gpu_state_put,
>>>>>>>> +#endif
>>>>>>>> +		.create_address_space = a6xx_create_address_space,
>>>>>>>> +		.create_private_address_space = a6xx_create_private_address_space,
>>>>>>>> +		.get_rptr = a6xx_get_rptr,
>>>>>>>> +		.progress = a6xx_progress,
>>>>>>>> +	},
>>>>>>>> +	.get_timestamp = a6xx_get_timestamp,
>>>>>>>> +};
>>>>>>>> +
>>>>>>>>  struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>>>>>  {
>>>>>>>>  	struct msm_drm_private *priv = dev->dev_private;
>>>>>>>> @@ -2003,18 +2186,36 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>>>>>  
>>>>>>>>  	adreno_gpu->registers = NULL;
>>>>>>>>  
>>>>>>>> +	/* Check if there is a GMU phandle and set it up */
>>>>>>>> +	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
>>>>>>>> +	/* FIXME: How do we gracefully handle this? */
>>>>>>>> +	BUG_ON(!node);
>>>>>>> How will you handle this BUG() when there is no GMU (a610 gpu)?
>>>>>>>
>>>>>>>> +
>>>>>>>> +	adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
>>>>>>>> +
>>>>>>>>  	/*
>>>>>>>>  	 * We need to know the platform type before calling into adreno_gpu_init
>>>>>>>>  	 * so that the hw_apriv flag can be correctly set. Snoop into the info
>>>>>>>>  	 * and grab the revision number
>>>>>>>>  	 */
>>>>>>>>  	info = adreno_info(config->rev);
>>>>>>>> -
>>>>>>>> -	if (info && (info->revn == 650 || info->revn == 660 ||
>>>>>>>> -			adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
>>>>>>>> +	if (!info)
>>>>>>>> +		return ERR_PTR(-EINVAL);
>>>>>>>> +
>>>>>>>> +	/* Assign these early so that we can use the is_aXYZ helpers */
>>>>>>>> +	/* Numeric revision IDs (e.g. 630) */
>>>>>>>> +	adreno_gpu->revn = info->revn;
>>>>>>>> +	/* New-style ADRENO_REV()-only */
>>>>>>>> +	adreno_gpu->rev = info->rev;
>>>>>>>> +	/* Quirk data */
>>>>>>>> +	adreno_gpu->info = info;
>>>>>>>> +
>>>>>>>> +	if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
>>>>>>>>  		adreno_gpu->base.hw_apriv = true;
>>>>>>>>  
>>>>>>>> -	a6xx_llc_slices_init(pdev, a6xx_gpu);
>>>>>>>> +	/* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>> +		a6xx_llc_slices_init(pdev, a6xx_gpu);
>>>>>>>>  
>>>>>>>>  	ret = a6xx_set_supported_hw(&pdev->dev, config->rev);
>>>>>>>>  	if (ret) {
>>>>>>>> @@ -2022,7 +2223,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>>>>>  		return ERR_PTR(ret);
>>>>>>>>  	}
>>>>>>>>  
>>>>>>>> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
>>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
>>>>>>>> +	else
>>>>>>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
>>>>>>>>  	if (ret) {
>>>>>>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
>>>>>>>>  		return ERR_PTR(ret);
>>>>>>>> @@ -2035,13 +2239,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>>>>>  	if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
>>>>>>>>  		priv->gpu_clamp_to_idle = true;
>>>>>>>>  
>>>>>>>> -	/* Check if there is a GMU phandle and set it up */
>>>>>>>> -	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
>>>>>>>> -
>>>>>>>> -	/* FIXME: How do we gracefully handle this? */
>>>>>>>> -	BUG_ON(!node);
>>>>>>>> -
>>>>>>>> -	ret = a6xx_gmu_init(a6xx_gpu, node);
>>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>> +		ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
>>>>>>>> +	else
>>>>>>>> +		ret = a6xx_gmu_init(a6xx_gpu, node);
>>>>>>>>  	of_node_put(node);
>>>>>>>>  	if (ret) {
>>>>>>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>>>>>> index eea2e60ce3b7..51a7656072fa 100644
>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>>>>>> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>>>>>>>>  void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>>>>>>>>  
>>>>>>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>>>>>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>>>>>>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
>>>>>>>>  
>>>>>>>>  void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>>>>>> index 30ecdff363e7..4e5d650578c6 100644
>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>>>>>> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
>>>>>>>>  	/* Get the generic state from the adreno core */
>>>>>>>>  	adreno_gpu_state_get(gpu, &a6xx_state->base);
>>>>>>>>  
>>>>>>>> -	a6xx_get_gmu_registers(gpu, a6xx_state);
>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>>> nit: Kinda misleading function name to a layman. Should we invert the
>>>>>>> function to "adreno_has_gmu"?
>>>>>>>
>>>>>>> -Akhil
>>>>>>>> +		a6xx_get_gmu_registers(gpu, a6xx_state);
>>>>>>>>  
>>>>>>>> -	a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
>>>>>>>> -	a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
>>>>>>>> -	a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
>>>>>>>> +		a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
>>>>>>>> +		a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
>>>>>>>> +		a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
>>>>>>>>  
>>>>>>>> -	a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
>>>>>>>> +		a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
>>>>>>>> +	}
>>>>>>>>  
>>>>>>>>  	/* If GX isn't on the rest of the data isn't going to be accessible */
>>>>>>>> -	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>>>>>>>>  		return &a6xx_state->base;
>>>>>>>>  
>>>>>>>>  	/* Get the banks of indexed registers */
>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>>>>>> index 6934cee07d42..5c5901d65950 100644
>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>>>>>> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
>>>>>>>>  		if (!adreno_gpu->info->fw[i])
>>>>>>>>  			continue;
>>>>>>>>  
>>>>>>>> +		/* Skip loading GMU firwmare with GMU Wrapper */
>>>>>>>> +		if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
>>>>>>>> +			continue;
>>>>>>>> +
>>>>>>>>  		/* Skip if the firmware has already been loaded */
>>>>>>>>  		if (adreno_gpu->fw[i])
>>>>>>>>  			continue;
>>>>>>>> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>>>>>>>>  	u32 speedbin;
>>>>>>>>  	int ret;
>>>>>>>>  
>>>>>>>> -	/* Only handle the core clock when GMU is not in use */
>>>>>>>> -	if (config->rev.core < 6) {
>>>>>>>> +	/* Only handle the core clock when GMU is not in use (or is absent). */
>>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
>>>>>>>>  		/*
>>>>>>>>  		 * This can only be done before devm_pm_opp_of_add_table(), or
>>>>>>>>  		 * dev_pm_opp_set_config() will WARN_ON()
>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>>>>>> index f62612a5c70f..ee5352bc5329 100644
>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>>>>>> @@ -115,6 +115,7 @@ struct adreno_gpu {
>>>>>>>>  	 * code (a3xx_gpu.c) and stored in this common location.
>>>>>>>>  	 */
>>>>>>>>  	const unsigned int *reg_offsets;
>>>>>>>> +	bool gmu_is_wrapper;
>>>>>>>>  };
>>>>>>>>  #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
>>>>>>>>  
>>>>>>>> @@ -145,6 +146,11 @@ struct adreno_platform_config {
>>>>>>>>  
>>>>>>>>  bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
>>>>>>>>  
>>>>>>>> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
>>>>>>>> +{
>>>>>>>> +	return gpu->gmu_is_wrapper;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>  static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
>>>>>>>>  {
>>>>>>>>  	return (gpu->revn < 300);
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> 2.40.0
>>>>>>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support
  2023-05-08  8:59                 ` Konrad Dybcio
@ 2023-05-08 21:15                   ` Akhil P Oommen
  2023-05-09  8:46                     ` Konrad Dybcio
  0 siblings, 1 reply; 30+ messages in thread
From: Akhil P Oommen @ 2023-05-08 21:15 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, linux-arm-msm, dri-devel,
	freedreno, devicetree, linux-kernel, Rob Clark, Marijn Suijten

On Mon, May 08, 2023 at 10:59:24AM +0200, Konrad Dybcio wrote:
> 
> 
> On 6.05.2023 16:46, Akhil P Oommen wrote:
> > On Fri, May 05, 2023 at 12:35:18PM +0200, Konrad Dybcio wrote:
> >>
> >>
> >> On 5.05.2023 10:46, Akhil P Oommen wrote:
> >>> On Thu, May 04, 2023 at 08:34:07AM +0200, Konrad Dybcio wrote:
> >>>>
> >>>>
> >>>> On 3.05.2023 22:32, Akhil P Oommen wrote:
> >>>>> On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 2.05.2023 09:49, Akhil P Oommen wrote:
> >>>>>>> On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
> >>>>>>>> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
> >>>>>>>> but don't implement the associated GMUs. This is due to the fact that
> >>>>>>>> the GMU directly pokes at RPMh. Sadly, this means we have to take care
> >>>>>>>> of enabling & scaling power rails, clocks and bandwidth ourselves.
> >>>>>>>>
> >>>>>>>> Reuse existing Adreno-common code and modify the deeply-GMU-infused
> >>>>>>>> A6XX code to facilitate these GPUs. This involves if-ing out lots
> >>>>>>>> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
> >>>>>>>> the actual name that Qualcomm uses in their downstream kernels).
> >>>>>>>>
> >>>>>>>> This is essentially a register region which is convenient to model
> >>>>>>>> as a device. We'll use it for managing the GDSCs. The register
> >>>>>>>> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
> >>>>>>>> and lets us reuse quite a bit of gmu_read/write/rmw calls.
> >>>>>>> << I sent a reply to this patch earlier, but not sure where it went.
> >>>>>>> Still figuring out Mutt... >>
> >>>>>> Answered it here:
> >>>>>>
> >>>>>> https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8f77@linaro.org/
> >>>>>
> >>>>> Thanks. Will check and respond there if needed.
> >>>>>
> >>>>>>
> >>>>>> I don't think I see any new comments in this "reply revision" (heh), so please
> >>>>>> check that one out.
> >>>>>>
> >>>>>>>
> >>>>>>> Only convenience I found is that we can reuse gmu register ops in a few
> >>>>>>> places (< 10 I think). If we just model this as another gpu memory
> >>>>>>> region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
> >>>>>>> architecture code with clean separation. Also, it looks like we need to
> >>>>>>> keep a dummy gmu platform device in the devicetree with the current
> >>>>>>> approach. That doesn't sound right.
> >>>>>> That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
> >>>>>> need additional, gmuwrapper-configuration specific code anyway, as
> >>>>>> OPP & genpd will no longer make use of the default behavior which
> >>>>>> only gets triggered if there's a single power-domains=<> entry, afaicu.
> >>>>> Can you please tell me which specific *default behviour* do you mean here?
> >>>>> I am curious to know what I am overlooking here. We can always get a cxpd/gxpd device
> >>>>> and vote for the gdscs directly from the driver. Anything related to
> >>>>> OPP?
> >>>> I *believe* this is true:
> >>>>
> >>>> if (ARRAY_SIZE(power-domains) == 1) {
> >>>> 	of generic code will enable the power domain at .probe time
> >>> we need to handle the voting directly. I recently shared a patch to
> >>> vote cx gdsc from gpu driver. Maybe we can ignore this when gpu has
> >>> only cx rail due to this logic you quoted here.
> >>>
> >>> I see that you have handled it mostly correctly from the gpu driver in the updated
> >>> a6xx_pm_suspend() callback. Just the power domain device ptrs should be moved to
> >>> gpu from gmu.
> >>>
> >>>>
> >>>> 	opp APIs will default to scaling that domain with required-opps
> >>>
> >>>> }
> >>>>
> >>>> and we do need to put GX/CX (with an MX parent to match) there, as the
> >>>> AP is responsible for voting in this configuration
> >>>
> >>> We should vote to turn ON gx/cx headswitches through genpd from gpu driver. When you vote for
> >>> core clk frequency, *clock driver is supposed to scale* all the necessary
> >>> regulators. At least that is how downstream works. You can refer the downstream
> >>> gpucc clk driver of these SoCs. I am not sure how much of that can be easily converted to
> >>> upstream.
> >>>
> >>> Also, how does having a gmu dt node help in this regard? Feel free to
> >>> elaborate, I am not very familiar with clk/regulator implementations.
> >> Okay so I think we have a bit of a confusion here.
> >>
> >> Currently, with this patchset we manage things like this:
> >>
> >> 1. GPU has a VDD_GX (or equivalent[1]) line passed in power-domains=<>, which
> >>    is then used with OPP APIs to ensure it's being scaled on freq change [2].
> >>    The VDD_lines coming from RPM(h) are described as power domains upstream
> >>    *unlike downstream*, which represents them as regulators with preset voltage
> >>    steps (and perhaps that's what had you confused). What's more is that GDSCs
> >>    are also modeled as genpds instead of regulators, hence they sort of "fight"
> >>    for the spot in power-domains=<> of a given node.
> > 
> > Thanks for clarifying. I didn't get this part "hence they sort of "fight" for the spot in power-domains".
> > What spot exactly did you mean here? The spot for PD to be used during scaling?
> > 
> > It seems like you are hinting that there is some sort of limitation in keeping all the
> > 3 power domains (cx gdsc, gx gdsc and cx rail) under the gpu node in dt. Please explain
> > why we can't keep all the 3 power domains under gpu node and call an API
> > (devm_pm_opp_attach_genpd() ??) to select the power domain which should be scaled?
> Eh we could, but this adds a lot of boilerplate code:
> 
> - genpd handling with get/put (I'm no genpd master but devm_pm_opp_attach_genpd
>   sounds rather hacky to me)

Not sure if it is hacky I see similar approach by Venus driver here:
dt: https://elixir.bootlin.com/linux/latest/source/arch/arm64/boot/dts/qcom/sc7280.dtsi#L3699
driver: https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/qcom/venus/pm_helpers.c#L882

> - new r/w/rmw functions would need to be introduced for accessing
>   GMU_CX registers as the reg defines wouldn't match so we'd have
>   to include a new magic offset
Yeah, this is the ugly part. On the bright side, there are very few register accesses which are
mostly perfcounter related.

> - all the reused gmu_ callbacks would need to be separated out

More LoC of course, but it would be more readable/maintainable if we can make a clean
separation with different callbacks btw gmu vs no-gmu.

> - A619_holi would be even more problematic to distinguish from A619,
>   similar story goes for firmware loading requirements

I suppose we can check the presence of gmu to identify holi?
For eg:, dynamically allocate 'struct a6xx_gmu' within a6xx_gmu_init() and
then we can null check a6xx_gpu->(*gmu).

For holi's gmu fw loading, moving gmu fw loading to the gmu start sequence might be the
cleanest approach.


> > 
> >>
> >> 2. GMU wrapper gets CX_GDSC & GX_GDSC handles in power-domains=<> (just like
> >>    the real GMU in the current state of upstream [3]), which are then governed
> >>    through explicit genpd calls to turn them on/off when the GPU resume/suspend/
> >>    crash recovery functions are called.
> >>
> >> 3. GPUs with GMU, like A630, don't get any power-domains=<> entries in DT,
> >>    instead relying on the GMU firmware to communicate necessary requests
> >>    to the VDD_xyz resources directly to RPMh, as part of the DVFS routines.
> >>    If GMU wasn't so smart, we would have to do the exact same VDD_xyz+OPP dance
> >>    there - that's precisely what's going on under the hood.
> >>
> >> 4. Adreno SMMU gets a handle to CX_GDSC so that when OF probe funcs are called,
> >>    (and SMMUs probe way way before all things drm) the headswitch is de-asserted
> >>    and its registers and related clocks are accessible.
> >>
> >>
> >> All this makes me believe the way I generally architected things in
> >> this series is correct.
> >>
> >>
> >> [1] A610 (and I think A612) lack a VDD_GX line, so they power the GPU from
> >>     VDD_CX, but that's just an implementation detail which is handled by
> >>     simply passing the correct one in DTS, the code doesn't care.
> >>
> >> [2] Hence my recent changes to use dev_pm_opp_set_rate() wherever possible,
> >>     this func reads requires-opps in OPP table entries and ensures to elevate
> >>     the GENPD's performance state before switching frequencies
> >>
> >> [3] Please take a look at the "end product" here:
> >>     https://github.com/SoMainline/linux/commit/fb16757c3bf4c087ac597d70c7a98755d46bb323
> >>     you can open e.g. sdm845.dtsi for comparison with real GMU
> > 
> > This dt definition for a610 gpu clearly shows the issue I have here. Someone
> > looking at this gets a very wrong picture about the platform like there is actually nothing
> > resembling a gmu IP in a610. Is gmu or gmu-cx register region really present in this hw?
> Yes it is! Take a look at this hunk for example
> 
> if (adreno_has_gmu_wrapper(adreno_gpu)) {
> 	/* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
> 
> 	/* Set up the CX GMU counter 0 to count busy ticks */
> 	gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
> 
> 	/* Enable power counter 0 */
> 	gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
> 	gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
> }
> 
> they are present, functional and at a predictable offset, just like
> the real thing! The "only" difference is that there's no GMU_GX, so
> registers responsible for interfacing with the MCU that lives (well,
> would live) on the GPU island. For all I know, it may actually even
> be physically there, just disabled / fused off as there's no RPMh to
> communicate with.. In fact, I'd suspect that's precisely the case
> on A619_holi (vs normal A619 which uses a GMU on RPMh-enabled SM6350).

I have to go with your findings. I am just finding it hard to get onboard with
a dummy node in devicetree to save few loC in the driver. I believe we should
start with a dt definition that is close to the platform and handle the
implementation complexity in the driver, not the other way around.

-Akhil

> 
> Perhaps I could rename the GMU wrapper's reg-names entry to "gmu_cx" to
> make things more obvious?
> 
> Konrad
> 
> > 
> > Just a side note about the dt file you shared:
> > 	1. At line: 1243, It shouldn't have gx gdsc, right?
> > 	2. At line: 1172, SM6115_VDDCX -> SM6115_VDDGX?
> > 
> > -Akhil
> (ignoring as agreed in your replies)
> 
> > 
> >>
> >> I hope this answers your concerns. If not, I'll be happy to elaborate.
> >>
> >> Konrad
> >>>
> >>> -Akhil.
> >>>>
> >>>> Konrad
> >>>>>
> >>>>> -Akhil
> >>>>>>
> >>>>>> If nothing else, this is a very convenient way to model a part of the
> >>>>>> GPU (as that's essentially what GMU_CX is, to my understanding) and
> >>>>>> the bindings people didn't shoot me in the head for proposing this, so
> >>>>>> I assume it'd be cool to pursue this..
> >>>>>>
> >>>>>> Konrad
> >>>>>>>>
> >>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> >>>>>>>> ---
> >>>>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c       |  72 +++++++-
> >>>>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 255 +++++++++++++++++++++++++---
> >>>>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h       |   1 +
> >>>>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
> >>>>>>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.c     |   8 +-
> >>>>>>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.h     |   6 +
> >>>>>>>>  6 files changed, 318 insertions(+), 38 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>>>>>>> index 87babbb2a19f..b1acdb027205 100644
> >>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>>>>>>> @@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, struct platform_device *pdev,
> >>>>>>>>  
> >>>>>>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> >>>>>>>>  {
> >>>>>>>> +	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> >>>>>>>>  	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>>>>>>>  	struct platform_device *pdev = to_platform_device(gmu->dev);
> >>>>>>>>  
> >>>>>>>> @@ -1494,10 +1495,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> >>>>>>>>  	gmu->mmio = NULL;
> >>>>>>>>  	gmu->rscc = NULL;
> >>>>>>>>  
> >>>>>>>> -	a6xx_gmu_memory_free(gmu);
> >>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>>>>>> +		a6xx_gmu_memory_free(gmu);
> >>>>>>>>  
> >>>>>>>> -	free_irq(gmu->gmu_irq, gmu);
> >>>>>>>> -	free_irq(gmu->hfi_irq, gmu);
> >>>>>>>> +		free_irq(gmu->gmu_irq, gmu);
> >>>>>>>> +		free_irq(gmu->hfi_irq, gmu);
> >>>>>>>> +	}
> >>>>>>>>  
> >>>>>>>>  	/* Drop reference taken in of_find_device_by_node */
> >>>>>>>>  	put_device(gmu->dev);
> >>>>>>>> @@ -1516,6 +1519,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
> >>>>>>>>  	return 0;
> >>>>>>>>  }
> >>>>>>>>  
> >>>>>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> >>>>>>>> +{
> >>>>>>>> +	struct platform_device *pdev = of_find_device_by_node(node);
> >>>>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>>>>>>> +	int ret;
> >>>>>>>> +
> >>>>>>>> +	if (!pdev)
> >>>>>>>> +		return -ENODEV;
> >>>>>>>> +
> >>>>>>>> +	gmu->dev = &pdev->dev;
> >>>>>>>> +
> >>>>>>>> +	of_dma_configure(gmu->dev, node, true);
> >>>>>>> why setup dma for a device that is not actually present?
> >>>>>>>> +
> >>>>>>>> +	pm_runtime_enable(gmu->dev);
> >>>>>>>> +
> >>>>>>>> +	/* Mark legacy for manual SPTPRAC control */
> >>>>>>>> +	gmu->legacy = true;
> >>>>>>>> +
> >>>>>>>> +	/* Map the GMU registers */
> >>>>>>>> +	gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
> >>>>>>>> +	if (IS_ERR(gmu->mmio)) {
> >>>>>>>> +		ret = PTR_ERR(gmu->mmio);
> >>>>>>>> +		goto err_mmio;
> >>>>>>>> +	}
> >>>>>>>> +
> >>>>>>>> +	gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
> >>>>>>>> +	if (IS_ERR(gmu->cxpd)) {
> >>>>>>>> +		ret = PTR_ERR(gmu->cxpd);
> >>>>>>>> +		goto err_mmio;
> >>>>>>>> +	}
> >>>>>>>> +
> >>>>>>>> +	if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
> >>>>>>>> +		ret = -ENODEV;
> >>>>>>>> +		goto detach_cxpd;
> >>>>>>>> +	}
> >>>>>>>> +
> >>>>>>>> +	init_completion(&gmu->pd_gate);
> >>>>>>>> +	complete_all(&gmu->pd_gate);
> >>>>>>>> +	gmu->pd_nb.notifier_call = cxpd_notifier_cb;
> >>>>>>>> +
> >>>>>>>> +	/* Get a link to the GX power domain to reset the GPU */
> >>>>>>>> +	gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
> >>>>>>>> +	if (IS_ERR(gmu->gxpd)) {
> >>>>>>>> +		ret = PTR_ERR(gmu->gxpd);
> >>>>>>>> +		goto err_mmio;
> >>>>>>>> +	}
> >>>>>>>> +
> >>>>>>>> +	gmu->initialized = true;
> >>>>>>>> +
> >>>>>>>> +	return 0;
> >>>>>>>> +
> >>>>>>>> +detach_cxpd:
> >>>>>>>> +	dev_pm_domain_detach(gmu->cxpd, false);
> >>>>>>>> +
> >>>>>>>> +err_mmio:
> >>>>>>>> +	iounmap(gmu->mmio);
> >>>>>>>> +
> >>>>>>>> +	/* Drop reference taken in of_find_device_by_node */
> >>>>>>>> +	put_device(gmu->dev);
> >>>>>>>> +
> >>>>>>>> +	return ret;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> >>>>>>>>  {
> >>>>>>>>  	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
> >>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>>>> index 931f9f3b3a85..8e0345ffab81 100644
> >>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>>>> @@ -20,9 +20,11 @@ static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
> >>>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>>>>>>  
> >>>>>>>> -	/* Check that the GMU is idle */
> >>>>>>>> -	if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
> >>>>>>>> -		return false;
> >>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>>>>>> +		/* Check that the GMU is idle */
> >>>>>>>> +		if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
> >>>>>>>> +			return false;
> >>>>>>>> +	}
> >>>>>>>>  
> >>>>>>>>  	/* Check tha the CX master is idle */
> >>>>>>>>  	if (gpu_read(gpu, REG_A6XX_RBBM_STATUS) &
> >>>>>>>> @@ -612,13 +614,15 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
> >>>>>>>>  		return;
> >>>>>>>>  
> >>>>>>>>  	/* Disable SP clock before programming HWCG registers */
> >>>>>>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
> >>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
> >>>>>>>>  
> >>>>>>>>  	for (i = 0; (reg = &adreno_gpu->info->hwcg[i], reg->offset); i++)
> >>>>>>>>  		gpu_write(gpu, reg->offset, state ? reg->value : 0);
> >>>>>>>>  
> >>>>>>>>  	/* Enable SP clock */
> >>>>>>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
> >>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
> >>>>>>>>  
> >>>>>>>>  	gpu_write(gpu, REG_A6XX_RBBM_CLOCK_CNTL, state ? clock_cntl_on : 0);
> >>>>>>>>  }
> >>>>>>>> @@ -1018,10 +1022,13 @@ static int hw_init(struct msm_gpu *gpu)
> >>>>>>>>  {
> >>>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>>>>>>>  	int ret;
> >>>>>>>>  
> >>>>>>>> -	/* Make sure the GMU keeps the GPU on while we set it up */
> >>>>>>>> -	a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> >>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>>>>>> +		/* Make sure the GMU keeps the GPU on while we set it up */
> >>>>>>>> +		a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
> >>>>>>>> +	}
> >>>>>>>>  
> >>>>>>>>  	/* Clear GBIF halt in case GX domain was not collapsed */
> >>>>>>>>  	if (a6xx_has_gbif(adreno_gpu))
> >>>>>>>> @@ -1144,6 +1151,17 @@ static int hw_init(struct msm_gpu *gpu)
> >>>>>>>>  			0x3f0243f0);
> >>>>>>>>  	}
> >>>>>>>>  
> >>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>>>>>> +		/* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
> >>>>>>>> +
> >>>>>>>> +		/* Set up the CX GMU counter 0 to count busy ticks */
> >>>>>>>> +		gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
> >>>>>>>> +
> >>>>>>>> +		/* Enable power counter 0 */
> >>>>>>>> +		gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
> >>>>>>>> +		gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
> >>>>>>>> +	}
> >>>>>>>> +
> >>>>>>>>  	/* Protect registers from the CP */
> >>>>>>>>  	a6xx_set_cp_protect(gpu);
> >>>>>>>>  
> >>>>>>>> @@ -1233,6 +1251,8 @@ static int hw_init(struct msm_gpu *gpu)
> >>>>>>>>  	}
> >>>>>>>>  
> >>>>>>>>  out:
> >>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>>>> +		return ret;
> >>>>>>>>  	/*
> >>>>>>>>  	 * Tell the GMU that we are done touching the GPU and it can start power
> >>>>>>>>  	 * management
> >>>>>>>> @@ -1267,6 +1287,9 @@ static void a6xx_dump(struct msm_gpu *gpu)
> >>>>>>>>  	adreno_dump(gpu);
> >>>>>>>>  }
> >>>>>>>>  
> >>>>>>>> +#define GBIF_GX_HALT_MASK	BIT(0)
> >>>>>>>> +#define GBIF_CLIENT_HALT_MASK	BIT(0)
> >>>>>>>> +#define GBIF_ARB_HALT_MASK	BIT(1)
> >>>>>>>>  #define VBIF_RESET_ACK_TIMEOUT	100
> >>>>>>>>  #define VBIF_RESET_ACK_MASK	0x00f0
> >>>>>>>>  
> >>>>>>>> @@ -1299,7 +1322,8 @@ static void a6xx_recover(struct msm_gpu *gpu)
> >>>>>>>>  	 * Turn off keep alive that might have been enabled by the hang
> >>>>>>>>  	 * interrupt
> >>>>>>>>  	 */
> >>>>>>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
> >>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
> >>>>>>>
> >>>>>>> Maybe it is better to move this to a6xx_gmu_force_power_off.
> >>>>>>>
> >>>>>>>>  
> >>>>>>>>  	pm_runtime_dont_use_autosuspend(&gpu->pdev->dev);
> >>>>>>>>  
> >>>>>>>> @@ -1329,6 +1353,32 @@ static void a6xx_recover(struct msm_gpu *gpu)
> >>>>>>>>  
> >>>>>>>>  	dev_pm_genpd_remove_notifier(gmu->cxpd);
> >>>>>>>>  
> >>>>>>>> +	/* Software-reset the GPU */
> >>>>>>>
> >>>>>>> This is not soft reset sequence. We are trying to quiescent gpu - ddr
> >>>>>>> traffic with this sequence.
> >>>>>>>
> >>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>>>>>> +		/* Halt the GX side of GBIF */
> >>>>>>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, GBIF_GX_HALT_MASK);
> >>>>>>>> +		spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) &
> >>>>>>>> +			   GBIF_GX_HALT_MASK);
> >>>>>>>> +
> >>>>>>>> +		/* Halt new client requests on GBIF */
> >>>>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
> >>>>>>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> >>>>>>>> +			   (GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
> >>>>>>>> +
> >>>>>>>> +		/* Halt all AXI requests on GBIF */
> >>>>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
> >>>>>>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
> >>>>>>>> +			   (GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
> >>>>>>>> +
> >>>>>>>> +		/* Clear the halts */
> >>>>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> >>>>>>>> +
> >>>>>>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
> >>>>>>>> +
> >>>>>>>> +		/* This *really* needs to go through before we do anything else! */
> >>>>>>>> +		mb();
> >>>>>>>> +	}
> >>>>>>>> +
> >>>>>>>
> >>>>>>> This sequence should be before we collapse cx gdsc. Also, please see if
> >>>>>>> we can create a subroutine to avoid code dup.
> >>>>>>>
> >>>>>>>>  	pm_runtime_use_autosuspend(&gpu->pdev->dev);
> >>>>>>>>  
> >>>>>>>>  	if (active_submits)
> >>>>>>>> @@ -1463,7 +1513,8 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
> >>>>>>>>  	 * Force the GPU to stay on until after we finish
> >>>>>>>>  	 * collecting information
> >>>>>>>>  	 */
> >>>>>>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
> >>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
> >>>>>>>>  
> >>>>>>>>  	DRM_DEV_ERROR(&gpu->pdev->dev,
> >>>>>>>>  		"gpu fault ring %d fence %x status %8.8X rb %4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
> >>>>>>>> @@ -1624,7 +1675,7 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
> >>>>>>>>  		a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
> >>>>>>>>  }
> >>>>>>>>  
> >>>>>>>> -static int a6xx_pm_resume(struct msm_gpu *gpu)
> >>>>>>>> +static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
> >>>>>>>>  {
> >>>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>>>>>> @@ -1644,10 +1695,61 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
> >>>>>>>>  
> >>>>>>>>  	a6xx_llc_activate(a6xx_gpu);
> >>>>>>>>  
> >>>>>>>> -	return 0;
> >>>>>>>> +	return ret;
> >>>>>>>>  }
> >>>>>>>>  
> >>>>>>>> -static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >>>>>>>> +static int a6xx_pm_resume(struct msm_gpu *gpu)
> >>>>>>>> +{
> >>>>>>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>>>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>>>>>>> +	unsigned long freq = 0;
> >>>>>>>> +	struct dev_pm_opp *opp;
> >>>>>>>> +	int ret;
> >>>>>>>> +
> >>>>>>>> +	gpu->needs_hw_init = true;
> >>>>>>>> +
> >>>>>>>> +	trace_msm_gpu_resume(0);
> >>>>>>>> +
> >>>>>>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
> >>>>>>> I think we can ignore gmu lock as there is no real gmu device.
> >>>>>>>
> >>>>>>>> +
> >>>>>>>> +	pm_runtime_resume_and_get(gmu->dev);
> >>>>>>>> +	pm_runtime_resume_and_get(gmu->gxpd);
> >>>>>>>> +
> >>>>>>>> +	/* Set the core clock, having VDD scaling in mind */
> >>>>>>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, gpu->fast_rate);
> >>>>>>>> +	if (ret)
> >>>>>>>> +		goto err_core_clk;
> >>>>>>>> +
> >>>>>>>> +	ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
> >>>>>>>> +	if (ret)
> >>>>>>>> +		goto err_bulk_clk;
> >>>>>>>> +
> >>>>>>>> +	ret = clk_prepare_enable(gpu->ebi1_clk);
> >>>>>>>> +	if (ret)
> >>>>>>>> +		goto err_mem_clk;
> >>>>>>>> +
> >>>>>>>> +	/* If anything goes south, tear the GPU down piece by piece.. */
> >>>>>>>> +	if (ret) {
> >>>>>>>> +err_mem_clk:
> >>>>>>>> +		clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
> >>>>>>>> +err_bulk_clk:
> >>>>>>>> +		opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
> >>>>>>>> +		dev_pm_opp_put(opp);
> >>>>>>>> +		dev_pm_opp_set_rate(&gpu->pdev->dev, 0);
> >>>>>>>> +err_core_clk:
> >>>>>>>> +		pm_runtime_put(gmu->gxpd);
> >>>>>>>> +		pm_runtime_put(gmu->dev);
> >>>>>>>> +	}
> >>>>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> >>>>>>>> +
> >>>>>>>> +	if (!ret)
> >>>>>>>> +		msm_devfreq_resume(gpu);
> >>>>>>>> +
> >>>>>>>> +	return ret;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
> >>>>>>>>  {
> >>>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>>>>>> @@ -1674,11 +1776,62 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >>>>>>>>  	return 0;
> >>>>>>>>  }
> >>>>>>>>  
> >>>>>>>> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
> >>>>>>>> +{
> >>>>>>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>>>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>>>>>>> +	unsigned long freq = 0;
> >>>>>>>> +	struct dev_pm_opp *opp;
> >>>>>>>> +	int i, ret;
> >>>>>>>> +
> >>>>>>>> +	trace_msm_gpu_suspend(0);
> >>>>>>>> +
> >>>>>>>> +	opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
> >>>>>>>> +	dev_pm_opp_put(opp);
> >>>>>>>> +
> >>>>>>>> +	msm_devfreq_suspend(gpu);
> >>>>>>>> +
> >>>>>>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
> >>>>>>>> +
> >>>>>>>> +	clk_disable_unprepare(gpu->ebi1_clk);
> >>>>>>>> +
> >>>>>>>> +	clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
> >>>>>>>> +
> >>>>>>>> +	/* Set frequency to the minimum supported level (no 27MHz on A6xx!) */
> >>>>>>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, freq);
> >>>>>>>> +	if (ret)
> >>>>>>>> +		goto err;
> >>>>>>>> +
> >>>>>>>> +	pm_runtime_put_sync(gmu->gxpd);
> >>>>>>>> +	pm_runtime_put_sync(gmu->dev);
> >>>>>>>> +
> >>>>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> >>>>>>>> +
> >>>>>>>> +	if (a6xx_gpu->shadow_bo)
> >>>>>>>> +		for (i = 0; i < gpu->nr_rings; i++)
> >>>>>>>> +			a6xx_gpu->shadow[i] = 0;
> >>>>>>>> +
> >>>>>>>> +	gpu->suspend_count++;
> >>>>>>>> +
> >>>>>>>> +	return 0;
> >>>>>>>> +
> >>>>>>>> +err:
> >>>>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
> >>>>>>>> +
> >>>>>>>> +	return ret;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>>  static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
> >>>>>>>>  {
> >>>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >>>>>>>>  
> >>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>>>>>> +		*value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
> >>>>>>>> +		return 0;
> >>>>>>>> +	}
> >>>>>>>> +
> >>>>>>> Instead of wrapper check here, we can just create a separate op. I don't
> >>>>>>> see any benefit in reusing the same function here.
> >>>>>>>
> >>>>>>>
> >>>>>>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
> >>>>>>>>  
> >>>>>>>>  	/* Force the GPU power on so we can read this register */
> >>>>>>>> @@ -1716,7 +1869,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
> >>>>>>>>  		drm_gem_object_put(a6xx_gpu->shadow_bo);
> >>>> 6xx_pm_suspend >>>>  	}
> >>>>>>>>  
> >>>>>>>> -	a6xx_llc_slices_destroy(a6xx_gpu);
> >>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>>>> +		a6xx_llc_slices_destroy(a6xx_gpu);
> >>>>>>>>  
> >>>>>>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
> >>>>>>>>  	a6xx_gmu_remove(a6xx_gpu);
> >>>>>>>> @@ -1957,8 +2111,8 @@ static const struct adreno_gpu_funcs funcs = {
> >>>>>>>>  		.set_param = adreno_set_param,
> >>>>>>>>  		.hw_init = a6xx_hw_init,
> >>>>>>>>  		.ucode_load = a6xx_ucode_load,
> >>>>>>>> -		.pm_suspend = a6xx_pm_suspend,
> >>>>>>>> -		.pm_resume = a6xx_pm_resume,
> >>>>>>>> +		.pm_suspend = a6xx_gmu_pm_suspend,
> >>>>>>>> +		.pm_resume = a6xx_gmu_pm_resume,
> >>>>>>>>  		.recover = a6xx_recover,
> >>>>>>>>  		.submit = a6xx_submit,
> >>>>>>>>  		.active_ring = a6xx_active_ring,
> >>>>>>>> @@ -1982,6 +2136,35 @@ static const struct adreno_gpu_funcs funcs = {
> >>>>>>>>  	.get_timestamp = a6xx_get_timestamp,
> >>>>>>>>  };
> >>>>>>>>  
> >>>>>>>> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
> >>>>>>>> +	.base = {
> >>>>>>>> +		.get_param = adreno_get_param,
> >>>>>>>> +		.set_param = adreno_set_param,
> >>>>>>>> +		.hw_init = a6xx_hw_init,
> >>>>>>>> +		.ucode_load = a6xx_ucode_load,
> >>>>>>>> +		.pm_suspend = a6xx_pm_suspend,
> >>>>>>>> +		.pm_resume = a6xx_pm_resume,
> >>>>>>>> +		.recover = a6xx_recover,
> >>>>>>>> +		.submit = a6xx_submit,
> >>>>>>>> +		.active_ring = a6xx_active_ring,
> >>>>>>>> +		.irq = a6xx_irq,
> >>>>>>>> +		.destroy = a6xx_destroy,
> >>>>>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> >>>>>>>> +		.show = a6xx_show,
> >>>>>>>> +#endif
> >>>>>>>> +		.gpu_busy = a6xx_gpu_busy,
> >>>>>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
> >>>>>>>> +		.gpu_state_get = a6xx_gpu_state_get,
> >>>>>>>> +		.gpu_state_put = a6xx_gpu_state_put,
> >>>>>>>> +#endif
> >>>>>>>> +		.create_address_space = a6xx_create_address_space,
> >>>>>>>> +		.create_private_address_space = a6xx_create_private_address_space,
> >>>>>>>> +		.get_rptr = a6xx_get_rptr,
> >>>>>>>> +		.progress = a6xx_progress,
> >>>>>>>> +	},
> >>>>>>>> +	.get_timestamp = a6xx_get_timestamp,
> >>>>>>>> +};
> >>>>>>>> +
> >>>>>>>>  struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>>>>>>>  {
> >>>>>>>>  	struct msm_drm_private *priv = dev->dev_private;
> >>>>>>>> @@ -2003,18 +2186,36 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>>>>>>>  
> >>>>>>>>  	adreno_gpu->registers = NULL;
> >>>>>>>>  
> >>>>>>>> +	/* Check if there is a GMU phandle and set it up */
> >>>>>>>> +	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> >>>>>>>> +	/* FIXME: How do we gracefully handle this? */
> >>>>>>>> +	BUG_ON(!node);
> >>>>>>> How will you handle this BUG() when there is no GMU (a610 gpu)?
> >>>>>>>
> >>>>>>>> +
> >>>>>>>> +	adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
> >>>>>>>> +
> >>>>>>>>  	/*
> >>>>>>>>  	 * We need to know the platform type before calling into adreno_gpu_init
> >>>>>>>>  	 * so that the hw_apriv flag can be correctly set. Snoop into the info
> >>>>>>>>  	 * and grab the revision number
> >>>>>>>>  	 */
> >>>>>>>>  	info = adreno_info(config->rev);
> >>>>>>>> -
> >>>>>>>> -	if (info && (info->revn == 650 || info->revn == 660 ||
> >>>>>>>> -			adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
> >>>>>>>> +	if (!info)
> >>>>>>>> +		return ERR_PTR(-EINVAL);
> >>>>>>>> +
> >>>>>>>> +	/* Assign these early so that we can use the is_aXYZ helpers */
> >>>>>>>> +	/* Numeric revision IDs (e.g. 630) */
> >>>>>>>> +	adreno_gpu->revn = info->revn;
> >>>>>>>> +	/* New-style ADRENO_REV()-only */
> >>>>>>>> +	adreno_gpu->rev = info->rev;
> >>>>>>>> +	/* Quirk data */
> >>>>>>>> +	adreno_gpu->info = info;
> >>>>>>>> +
> >>>>>>>> +	if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
> >>>>>>>>  		adreno_gpu->base.hw_apriv = true;
> >>>>>>>>  
> >>>>>>>> -	a6xx_llc_slices_init(pdev, a6xx_gpu);
> >>>>>>>> +	/* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
> >>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>>>> +		a6xx_llc_slices_init(pdev, a6xx_gpu);
> >>>>>>>>  
> >>>>>>>>  	ret = a6xx_set_supported_hw(&pdev->dev, config->rev);
> >>>>>>>>  	if (ret) {
> >>>>>>>> @@ -2022,7 +2223,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>>>>>>>  		return ERR_PTR(ret);
> >>>>>>>>  	}
> >>>>>>>>  
> >>>>>>>> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> >>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
> >>>>>>>> +	else
> >>>>>>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
> >>>>>>>>  	if (ret) {
> >>>>>>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
> >>>>>>>>  		return ERR_PTR(ret);
> >>>>>>>> @@ -2035,13 +2239,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> >>>>>>>>  	if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
> >>>>>>>>  		priv->gpu_clamp_to_idle = true;
> >>>>>>>>  
> >>>>>>>> -	/* Check if there is a GMU phandle and set it up */
> >>>>>>>> -	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
> >>>>>>>> -
> >>>>>>>> -	/* FIXME: How do we gracefully handle this? */
> >>>>>>>> -	BUG_ON(!node);
> >>>>>>>> -
> >>>>>>>> -	ret = a6xx_gmu_init(a6xx_gpu, node);
> >>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
> >>>>>>>> +		ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
> >>>>>>>> +	else
> >>>>>>>> +		ret = a6xx_gmu_init(a6xx_gpu, node);
> >>>>>>>>  	of_node_put(node);
> >>>>>>>>  	if (ret) {
> >>>>>>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
> >>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >>>>>>>> index eea2e60ce3b7..51a7656072fa 100644
> >>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >>>>>>>> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> >>>>>>>>  void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> >>>>>>>>  
> >>>>>>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> >>>>>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> >>>>>>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
> >>>>>>>>  
> >>>>>>>>  void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> >>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >>>>>>>> index 30ecdff363e7..4e5d650578c6 100644
> >>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> >>>>>>>> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
> >>>>>>>>  	/* Get the generic state from the adreno core */
> >>>>>>>>  	adreno_gpu_state_get(gpu, &a6xx_state->base);
> >>>>>>>>  
> >>>>>>>> -	a6xx_get_gmu_registers(gpu, a6xx_state);
> >>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
> >>>>>>> nit: Kinda misleading function name to a layman. Should we invert the
> >>>>>>> function to "adreno_has_gmu"?
> >>>>>>>
> >>>>>>> -Akhil
> >>>>>>>> +		a6xx_get_gmu_registers(gpu, a6xx_state);
> >>>>>>>>  
> >>>>>>>> -	a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> >>>>>>>> -	a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> >>>>>>>> -	a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> >>>>>>>> +		a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
> >>>>>>>> +		a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
> >>>>>>>> +		a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
> >>>>>>>>  
> >>>>>>>> -	a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> >>>>>>>> +		a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
> >>>>>>>> +	}
> >>>>>>>>  
> >>>>>>>>  	/* If GX isn't on the rest of the data isn't going to be accessible */
> >>>>>>>> -	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> >>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
> >>>>>>>>  		return &a6xx_state->base;
> >>>>>>>>  
> >>>>>>>>  	/* Get the banks of indexed registers */
> >>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >>>>>>>> index 6934cee07d42..5c5901d65950 100644
> >>>>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >>>>>>>> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
> >>>>>>>>  		if (!adreno_gpu->info->fw[i])
> >>>>>>>>  			continue;
> >>>>>>>>  
> >>>>>>>> +		/* Skip loading GMU firwmare with GMU Wrapper */
> >>>>>>>> +		if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
> >>>>>>>> +			continue;
> >>>>>>>> +
> >>>>>>>>  		/* Skip if the firmware has already been loaded */
> >>>>>>>>  		if (adreno_gpu->fw[i])
> >>>>>>>>  			continue;
> >>>>>>>> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
> >>>>>>>>  	u32 speedbin;
> >>>>>>>>  	int ret;
> >>>>>>>>  
> >>>>>>>> -	/* Only handle the core clock when GMU is not in use */
> >>>>>>>> -	if (config->rev.core < 6) {
> >>>>>>>> +	/* Only handle the core clock when GMU is not in use (or is absent). */
> >>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
> >>>>>>>>  		/*
> >>>>>>>>  		 * This can only be done before devm_pm_opp_of_add_table(), or
> >>>>>>>>  		 * dev_pm_opp_set_config() will WARN_ON()
> >>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >>>>>>>> index f62612a5c70f..ee5352bc5329 100644
> >>>>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> >>>>>>>> @@ -115,6 +115,7 @@ struct adreno_gpu {
> >>>>>>>>  	 * code (a3xx_gpu.c) and stored in this common location.
> >>>>>>>>  	 */
> >>>>>>>>  	const unsigned int *reg_offsets;
> >>>>>>>> +	bool gmu_is_wrapper;
> >>>>>>>>  };
> >>>>>>>>  #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
> >>>>>>>>  
> >>>>>>>> @@ -145,6 +146,11 @@ struct adreno_platform_config {
> >>>>>>>>  
> >>>>>>>>  bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
> >>>>>>>>  
> >>>>>>>> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
> >>>>>>>> +{
> >>>>>>>> +	return gpu->gmu_is_wrapper;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>>  static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
> >>>>>>>>  {
> >>>>>>>>  	return (gpu->revn < 300);
> >>>>>>>>
> >>>>>>>> -- 
> >>>>>>>> 2.40.0
> >>>>>>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support
  2023-05-08 21:15                   ` Akhil P Oommen
@ 2023-05-09  8:46                     ` Konrad Dybcio
  0 siblings, 0 replies; 30+ messages in thread
From: Konrad Dybcio @ 2023-05-09  8:46 UTC (permalink / raw)
  To: Akhil P Oommen
  Cc: Rob Clark, Abhinav Kumar, Dmitry Baryshkov, Sean Paul,
	David Airlie, Daniel Vetter, Rob Herring, Krzysztof Kozlowski,
	Bjorn Andersson, Konrad Dybcio, linux-arm-msm, dri-devel,
	freedreno, devicetree, linux-kernel, Rob Clark, Marijn Suijten



On 8.05.2023 23:15, Akhil P Oommen wrote:
> On Mon, May 08, 2023 at 10:59:24AM +0200, Konrad Dybcio wrote:
>>
>>
>> On 6.05.2023 16:46, Akhil P Oommen wrote:
>>> On Fri, May 05, 2023 at 12:35:18PM +0200, Konrad Dybcio wrote:
>>>>
>>>>
>>>> On 5.05.2023 10:46, Akhil P Oommen wrote:
>>>>> On Thu, May 04, 2023 at 08:34:07AM +0200, Konrad Dybcio wrote:
>>>>>>
>>>>>>
>>>>>> On 3.05.2023 22:32, Akhil P Oommen wrote:
>>>>>>> On Tue, May 02, 2023 at 11:40:26AM +0200, Konrad Dybcio wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2.05.2023 09:49, Akhil P Oommen wrote:
>>>>>>>>> On Sat, Apr 01, 2023 at 01:54:43PM +0200, Konrad Dybcio wrote:
>>>>>>>>>> Some (particularly SMD_RPM, a.k.a non-RPMh) SoCs implement A6XX GPUs
>>>>>>>>>> but don't implement the associated GMUs. This is due to the fact that
>>>>>>>>>> the GMU directly pokes at RPMh. Sadly, this means we have to take care
>>>>>>>>>> of enabling & scaling power rails, clocks and bandwidth ourselves.
>>>>>>>>>>
>>>>>>>>>> Reuse existing Adreno-common code and modify the deeply-GMU-infused
>>>>>>>>>> A6XX code to facilitate these GPUs. This involves if-ing out lots
>>>>>>>>>> of GMU callbacks and introducing a new type of GMU - GMU wrapper (it's
>>>>>>>>>> the actual name that Qualcomm uses in their downstream kernels).
>>>>>>>>>>
>>>>>>>>>> This is essentially a register region which is convenient to model
>>>>>>>>>> as a device. We'll use it for managing the GDSCs. The register
>>>>>>>>>> layout matches the actual GMU_CX/GX regions on the "real GMU" devices
>>>>>>>>>> and lets us reuse quite a bit of gmu_read/write/rmw calls.
>>>>>>>>> << I sent a reply to this patch earlier, but not sure where it went.
>>>>>>>>> Still figuring out Mutt... >>
>>>>>>>> Answered it here:
>>>>>>>>
>>>>>>>> https://lore.kernel.org/linux-arm-msm/4d3000c1-c3f9-0bfd-3eb3-23393f9a8f77@linaro.org/
>>>>>>>
>>>>>>> Thanks. Will check and respond there if needed.
>>>>>>>
>>>>>>>>
>>>>>>>> I don't think I see any new comments in this "reply revision" (heh), so please
>>>>>>>> check that one out.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Only convenience I found is that we can reuse gmu register ops in a few
>>>>>>>>> places (< 10 I think). If we just model this as another gpu memory
>>>>>>>>> region, I think it will help to keep gmu vs gmu-wrapper/no-gmu
>>>>>>>>> architecture code with clean separation. Also, it looks like we need to
>>>>>>>>> keep a dummy gmu platform device in the devicetree with the current
>>>>>>>>> approach. That doesn't sound right.
>>>>>>>> That's correct, but.. if we switch away from that, VDD_GX/VDD_CX will
>>>>>>>> need additional, gmuwrapper-configuration specific code anyway, as
>>>>>>>> OPP & genpd will no longer make use of the default behavior which
>>>>>>>> only gets triggered if there's a single power-domains=<> entry, afaicu.
>>>>>>> Can you please tell me which specific *default behviour* do you mean here?
>>>>>>> I am curious to know what I am overlooking here. We can always get a cxpd/gxpd device
>>>>>>> and vote for the gdscs directly from the driver. Anything related to
>>>>>>> OPP?
>>>>>> I *believe* this is true:
>>>>>>
>>>>>> if (ARRAY_SIZE(power-domains) == 1) {
>>>>>> 	of generic code will enable the power domain at .probe time
>>>>> we need to handle the voting directly. I recently shared a patch to
>>>>> vote cx gdsc from gpu driver. Maybe we can ignore this when gpu has
>>>>> only cx rail due to this logic you quoted here.
>>>>>
>>>>> I see that you have handled it mostly correctly from the gpu driver in the updated
>>>>> a6xx_pm_suspend() callback. Just the power domain device ptrs should be moved to
>>>>> gpu from gmu.
>>>>>
>>>>>>
>>>>>> 	opp APIs will default to scaling that domain with required-opps
>>>>>
>>>>>> }
>>>>>>
>>>>>> and we do need to put GX/CX (with an MX parent to match) there, as the
>>>>>> AP is responsible for voting in this configuration
>>>>>
>>>>> We should vote to turn ON gx/cx headswitches through genpd from gpu driver. When you vote for
>>>>> core clk frequency, *clock driver is supposed to scale* all the necessary
>>>>> regulators. At least that is how downstream works. You can refer the downstream
>>>>> gpucc clk driver of these SoCs. I am not sure how much of that can be easily converted to
>>>>> upstream.
>>>>>
>>>>> Also, how does having a gmu dt node help in this regard? Feel free to
>>>>> elaborate, I am not very familiar with clk/regulator implementations.
>>>> Okay so I think we have a bit of a confusion here.
>>>>
>>>> Currently, with this patchset we manage things like this:
>>>>
>>>> 1. GPU has a VDD_GX (or equivalent[1]) line passed in power-domains=<>, which
>>>>    is then used with OPP APIs to ensure it's being scaled on freq change [2].
>>>>    The VDD_lines coming from RPM(h) are described as power domains upstream
>>>>    *unlike downstream*, which represents them as regulators with preset voltage
>>>>    steps (and perhaps that's what had you confused). What's more is that GDSCs
>>>>    are also modeled as genpds instead of regulators, hence they sort of "fight"
>>>>    for the spot in power-domains=<> of a given node.
>>>
>>> Thanks for clarifying. I didn't get this part "hence they sort of "fight" for the spot in power-domains".
>>> What spot exactly did you mean here? The spot for PD to be used during scaling?
>>>
>>> It seems like you are hinting that there is some sort of limitation in keeping all the
>>> 3 power domains (cx gdsc, gx gdsc and cx rail) under the gpu node in dt. Please explain
>>> why we can't keep all the 3 power domains under gpu node and call an API
>>> (devm_pm_opp_attach_genpd() ??) to select the power domain which should be scaled?
>> Eh we could, but this adds a lot of boilerplate code:
>>
>> - genpd handling with get/put (I'm no genpd master but devm_pm_opp_attach_genpd
>>   sounds rather hacky to me)
> 
> Not sure if it is hacky I see similar approach by Venus driver here:
> dt: https://elixir.bootlin.com/linux/latest/source/arch/arm64/boot/dts/qcom/sc7280.dtsi#L3699
> driver: https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/qcom/venus/pm_helpers.c#L882
The venus driver is not a very strong argument of proving something is
not hacky in my eyes.. I have some opinions on that one :P

If not hacky per se, it's still very explicit and error-prone. And
we know what happens when one mismanages Adreno's PDs..

> 
>> - new r/w/rmw functions would need to be introduced for accessing
>>   GMU_CX registers as the reg defines wouldn't match so we'd have
>>   to include a new magic offset
> Yeah, this is the ugly part. On the bright side, there are very few register accesses which are
> mostly perfcounter related.
> 
>> - all the reused gmu_ callbacks would need to be separated out
> 
> More LoC of course, but it would be more readable/maintainable if we can make a clean
> separation with different callbacks btw gmu vs no-gmu.
> 
>> - A619_holi would be even more problematic to distinguish from A619,
>>   similar story goes for firmware loading requirements
> 
> I suppose we can check the presence of gmu to identify holi?
> For eg:, dynamically allocate 'struct a6xx_gmu' within a6xx_gmu_init() and
> then we can null check a6xx_gpu->(*gmu).
> 
> For holi's gmu fw loading, moving gmu fw loading to the gmu start sequence might be the
> cleanest approach.
Eeeh.. I'd say the current approach I took is better than this..

Otherwise we'd have to hack around calling fw loader functions from
two places, take care of how they defer/interact with the userland
fw helper, how that plays around with pm ops etc.

> 
> 
>>>
>>>>
>>>> 2. GMU wrapper gets CX_GDSC & GX_GDSC handles in power-domains=<> (just like
>>>>    the real GMU in the current state of upstream [3]), which are then governed
>>>>    through explicit genpd calls to turn them on/off when the GPU resume/suspend/
>>>>    crash recovery functions are called.
>>>>
>>>> 3. GPUs with GMU, like A630, don't get any power-domains=<> entries in DT,
>>>>    instead relying on the GMU firmware to communicate necessary requests
>>>>    to the VDD_xyz resources directly to RPMh, as part of the DVFS routines.
>>>>    If GMU wasn't so smart, we would have to do the exact same VDD_xyz+OPP dance
>>>>    there - that's precisely what's going on under the hood.
>>>>
>>>> 4. Adreno SMMU gets a handle to CX_GDSC so that when OF probe funcs are called,
>>>>    (and SMMUs probe way way before all things drm) the headswitch is de-asserted
>>>>    and its registers and related clocks are accessible.
>>>>
>>>>
>>>> All this makes me believe the way I generally architected things in
>>>> this series is correct.
>>>>
>>>>
>>>> [1] A610 (and I think A612) lack a VDD_GX line, so they power the GPU from
>>>>     VDD_CX, but that's just an implementation detail which is handled by
>>>>     simply passing the correct one in DTS, the code doesn't care.
>>>>
>>>> [2] Hence my recent changes to use dev_pm_opp_set_rate() wherever possible,
>>>>     this func reads requires-opps in OPP table entries and ensures to elevate
>>>>     the GENPD's performance state before switching frequencies
>>>>
>>>> [3] Please take a look at the "end product" here:
>>>>     https://github.com/SoMainline/linux/commit/fb16757c3bf4c087ac597d70c7a98755d46bb323
>>>>     you can open e.g. sdm845.dtsi for comparison with real GMU
>>>
>>> This dt definition for a610 gpu clearly shows the issue I have here. Someone
>>> looking at this gets a very wrong picture about the platform like there is actually nothing
>>> resembling a gmu IP in a610. Is gmu or gmu-cx register region really present in this hw?
>> Yes it is! Take a look at this hunk for example
>>
>> if (adreno_has_gmu_wrapper(adreno_gpu)) {
>> 	/* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
>>
>> 	/* Set up the CX GMU counter 0 to count busy ticks */
>> 	gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
>>
>> 	/* Enable power counter 0 */
>> 	gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
>> 	gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
>> }
>>
>> they are present, functional and at a predictable offset, just like
>> the real thing! The "only" difference is that there's no GMU_GX, so
>> registers responsible for interfacing with the MCU that lives (well,
>> would live) on the GPU island. For all I know, it may actually even
>> be physically there, just disabled / fused off as there's no RPMh to
>> communicate with.. In fact, I'd suspect that's precisely the case
>> on A619_holi (vs normal A619 which uses a GMU on RPMh-enabled SM6350).
> 
> I have to go with your findings. I am just finding it hard to get onboard with
> a dummy node in devicetree to save few loC in the driver. I believe we should
> start with a dt definition that is close to the platform and handle the
> implementation complexity in the driver, not the other way around.
As I mentioned, the definition is not "dummy", there are physical registers
associated with it that we interact with. The only difference is that there
are less of them and the M3 core that would usually tickle that outstanding
part is disabled/removed.

This disagreement seems to come from our (perhaps wrong) assumption that
"GMU" === "cortex M3", whereas the register naming scheme suggests that it's
only a part of a bigger picture.. Here's my attempt at drawing the A6xx:

                   +--------------------------------+--------------------+
                   |                                |                    |
                   |                               <+>                   |
                   |                                |                    |
                   |        part of GPUCC           |      GBIF/VBIF     |       MEMNoC/BIMC
        CNoC       |        RBBM+CP+SQE+...         |   SPTPRAC/GFXreg   +-----------------------
    ---------------+                                |                    |
                   |                       \x7f         |                    |
                   |         \x7fGPU GX_core            |       GPU_GX       |
                   |          ^                 ^   |                 ^  |
                   +----------+------------+----+---+-----------------+--+
                   |          v            |    v   |                 v  |
                   |                       |        |     RPMh client    |
                   |     part of GPUCC     |        |      PWR mgmt      |
                   |     HWCG+LLC intf     | CNTRs  |   \x7f\x7f\x7f  \x7f    M3\x7f\x7f\x7f         |
                   |                      <+>       |                    |
CX_GDSC            |       \x7fGPU_CX\x7f          | GMU_CX |       GMU_GX       |
      ------------->                       |        |                    |
                   +-+-------^--------^----+--------+-+-----+------------+
                     |       |        |               |     |
                     |       |        |               |     |
                     |       |        |               |     |
                     |       |        |               |     |
                     |       |        |               |     |
                     |       |        |               |     |
                     |       |        |               |     |
                     v       |        |               |     |
                                 GPUSS_PLL           \x7fRSC  AOP mbx
                    LLCC  SW_RES

it's simplified and contains some obvious mistakes but I believe
the "gmu wrapper" parts still contain all that silicon..

Take a look at the clock controller drivers for these platforms
and you'll notice they even have GMU and GX domain clocks!

Hence I'm still trying to convince you the approach in this patch
is correct, or least "correct enough" :D

Associating GMU purely with the MCU may be a legacy of the A5xx
GPMU where your GPU either had it or not, but this is not so binary
on A6xx (and A7xx for that matter) :/

Konrad


P.S

Note to Qualcomm Legal in case you read this: I have no access to
Adreno docs, this is a best-guess visualisation of what I see
based on the register layout that comes from public downstream
kernel sources

> 
> -Akhil
> 
>>
>> Perhaps I could rename the GMU wrapper's reg-names entry to "gmu_cx" to
>> make things more obvious?
>>
>> Konrad
>>
>>>
>>> Just a side note about the dt file you shared:
>>> 	1. At line: 1243, It shouldn't have gx gdsc, right?
>>> 	2. At line: 1172, SM6115_VDDCX -> SM6115_VDDGX?
>>>
>>> -Akhil
>> (ignoring as agreed in your replies)
>>
>>>
>>>>
>>>> I hope this answers your concerns. If not, I'll be happy to elaborate.
>>>>
>>>> Konrad
>>>>>
>>>>> -Akhil.
>>>>>>
>>>>>> Konrad
>>>>>>>
>>>>>>> -Akhil
>>>>>>>>
>>>>>>>> If nothing else, this is a very convenient way to model a part of the
>>>>>>>> GPU (as that's essentially what GMU_CX is, to my understanding) and
>>>>>>>> the bindings people didn't shoot me in the head for proposing this, so
>>>>>>>> I assume it'd be cool to pursue this..
>>>>>>>>
>>>>>>>> Konrad
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
>>>>>>>>>> ---
>>>>>>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c       |  72 +++++++-
>>>>>>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 255 +++++++++++++++++++++++++---
>>>>>>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h       |   1 +
>>>>>>>>>>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  14 +-
>>>>>>>>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.c     |   8 +-
>>>>>>>>>>  drivers/gpu/drm/msm/adreno/adreno_gpu.h     |   6 +
>>>>>>>>>>  6 files changed, 318 insertions(+), 38 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>>>>>>>> index 87babbb2a19f..b1acdb027205 100644
>>>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>>>>>>>>>> @@ -1469,6 +1469,7 @@ static int a6xx_gmu_get_irq(struct a6xx_gmu *gmu, struct platform_device *pdev,
>>>>>>>>>>  
>>>>>>>>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>>>>>>>>>>  {
>>>>>>>>>> +	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>>>>>>>>>>  	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>>>>>>>  	struct platform_device *pdev = to_platform_device(gmu->dev);
>>>>>>>>>>  
>>>>>>>>>> @@ -1494,10 +1495,12 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>>>>>>>>>>  	gmu->mmio = NULL;
>>>>>>>>>>  	gmu->rscc = NULL;
>>>>>>>>>>  
>>>>>>>>>> -	a6xx_gmu_memory_free(gmu);
>>>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>>>>>> +		a6xx_gmu_memory_free(gmu);
>>>>>>>>>>  
>>>>>>>>>> -	free_irq(gmu->gmu_irq, gmu);
>>>>>>>>>> -	free_irq(gmu->hfi_irq, gmu);
>>>>>>>>>> +		free_irq(gmu->gmu_irq, gmu);
>>>>>>>>>> +		free_irq(gmu->hfi_irq, gmu);
>>>>>>>>>> +	}
>>>>>>>>>>  
>>>>>>>>>>  	/* Drop reference taken in of_find_device_by_node */
>>>>>>>>>>  	put_device(gmu->dev);
>>>>>>>>>> @@ -1516,6 +1519,69 @@ static int cxpd_notifier_cb(struct notifier_block *nb,
>>>>>>>>>>  	return 0;
>>>>>>>>>>  }
>>>>>>>>>>  
>>>>>>>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
>>>>>>>>>> +{
>>>>>>>>>> +	struct platform_device *pdev = of_find_device_by_node(node);
>>>>>>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>>>>>>> +	int ret;
>>>>>>>>>> +
>>>>>>>>>> +	if (!pdev)
>>>>>>>>>> +		return -ENODEV;
>>>>>>>>>> +
>>>>>>>>>> +	gmu->dev = &pdev->dev;
>>>>>>>>>> +
>>>>>>>>>> +	of_dma_configure(gmu->dev, node, true);
>>>>>>>>> why setup dma for a device that is not actually present?
>>>>>>>>>> +
>>>>>>>>>> +	pm_runtime_enable(gmu->dev);
>>>>>>>>>> +
>>>>>>>>>> +	/* Mark legacy for manual SPTPRAC control */
>>>>>>>>>> +	gmu->legacy = true;
>>>>>>>>>> +
>>>>>>>>>> +	/* Map the GMU registers */
>>>>>>>>>> +	gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
>>>>>>>>>> +	if (IS_ERR(gmu->mmio)) {
>>>>>>>>>> +		ret = PTR_ERR(gmu->mmio);
>>>>>>>>>> +		goto err_mmio;
>>>>>>>>>> +	}
>>>>>>>>>> +
>>>>>>>>>> +	gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
>>>>>>>>>> +	if (IS_ERR(gmu->cxpd)) {
>>>>>>>>>> +		ret = PTR_ERR(gmu->cxpd);
>>>>>>>>>> +		goto err_mmio;
>>>>>>>>>> +	}
>>>>>>>>>> +
>>>>>>>>>> +	if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
>>>>>>>>>> +		ret = -ENODEV;
>>>>>>>>>> +		goto detach_cxpd;
>>>>>>>>>> +	}
>>>>>>>>>> +
>>>>>>>>>> +	init_completion(&gmu->pd_gate);
>>>>>>>>>> +	complete_all(&gmu->pd_gate);
>>>>>>>>>> +	gmu->pd_nb.notifier_call = cxpd_notifier_cb;
>>>>>>>>>> +
>>>>>>>>>> +	/* Get a link to the GX power domain to reset the GPU */
>>>>>>>>>> +	gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
>>>>>>>>>> +	if (IS_ERR(gmu->gxpd)) {
>>>>>>>>>> +		ret = PTR_ERR(gmu->gxpd);
>>>>>>>>>> +		goto err_mmio;
>>>>>>>>>> +	}
>>>>>>>>>> +
>>>>>>>>>> +	gmu->initialized = true;
>>>>>>>>>> +
>>>>>>>>>> +	return 0;
>>>>>>>>>> +
>>>>>>>>>> +detach_cxpd:
>>>>>>>>>> +	dev_pm_domain_detach(gmu->cxpd, false);
>>>>>>>>>> +
>>>>>>>>>> +err_mmio:
>>>>>>>>>> +	iounmap(gmu->mmio);
>>>>>>>>>> +
>>>>>>>>>> +	/* Drop reference taken in of_find_device_by_node */
>>>>>>>>>> +	put_device(gmu->dev);
>>>>>>>>>> +
>>>>>>>>>> +	return ret;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
>>>>>>>>>>  {
>>>>>>>>>>  	struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
>>>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>>>>>>>> index 931f9f3b3a85..8e0345ffab81 100644
>>>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>>>>>>>> @@ -20,9 +20,11 @@ static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
>>>>>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>>>>>>  
>>>>>>>>>> -	/* Check that the GMU is idle */
>>>>>>>>>> -	if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
>>>>>>>>>> -		return false;
>>>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>>>>>> +		/* Check that the GMU is idle */
>>>>>>>>>> +		if (!a6xx_gmu_isidle(&a6xx_gpu->gmu))
>>>>>>>>>> +			return false;
>>>>>>>>>> +	}
>>>>>>>>>>  
>>>>>>>>>>  	/* Check tha the CX master is idle */
>>>>>>>>>>  	if (gpu_read(gpu, REG_A6XX_RBBM_STATUS) &
>>>>>>>>>> @@ -612,13 +614,15 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
>>>>>>>>>>  		return;
>>>>>>>>>>  
>>>>>>>>>>  	/* Disable SP clock before programming HWCG registers */
>>>>>>>>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
>>>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 1, 0);
>>>>>>>>>>  
>>>>>>>>>>  	for (i = 0; (reg = &adreno_gpu->info->hwcg[i], reg->offset); i++)
>>>>>>>>>>  		gpu_write(gpu, reg->offset, state ? reg->value : 0);
>>>>>>>>>>  
>>>>>>>>>>  	/* Enable SP clock */
>>>>>>>>>> -	gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
>>>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>>>> +		gmu_rmw(gmu, REG_A6XX_GPU_GMU_GX_SPTPRAC_CLOCK_CONTROL, 0, 1);
>>>>>>>>>>  
>>>>>>>>>>  	gpu_write(gpu, REG_A6XX_RBBM_CLOCK_CNTL, state ? clock_cntl_on : 0);
>>>>>>>>>>  }
>>>>>>>>>> @@ -1018,10 +1022,13 @@ static int hw_init(struct msm_gpu *gpu)
>>>>>>>>>>  {
>>>>>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>>>>>>>  	int ret;
>>>>>>>>>>  
>>>>>>>>>> -	/* Make sure the GMU keeps the GPU on while we set it up */
>>>>>>>>>> -	a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
>>>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>>>>>> +		/* Make sure the GMU keeps the GPU on while we set it up */
>>>>>>>>>> +		a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_GPU_SET);
>>>>>>>>>> +	}
>>>>>>>>>>  
>>>>>>>>>>  	/* Clear GBIF halt in case GX domain was not collapsed */
>>>>>>>>>>  	if (a6xx_has_gbif(adreno_gpu))
>>>>>>>>>> @@ -1144,6 +1151,17 @@ static int hw_init(struct msm_gpu *gpu)
>>>>>>>>>>  			0x3f0243f0);
>>>>>>>>>>  	}
>>>>>>>>>>  
>>>>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>>>>>> +		/* Do it here, as GMU wrapper only inits the GMU for memory reservation etc. */
>>>>>>>>>> +
>>>>>>>>>> +		/* Set up the CX GMU counter 0 to count busy ticks */
>>>>>>>>>> +		gmu_write(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_MASK, 0xff000000);
>>>>>>>>>> +
>>>>>>>>>> +		/* Enable power counter 0 */
>>>>>>>>>> +		gmu_rmw(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_SELECT_0, 0xff, BIT(5));
>>>>>>>>>> +		gmu_write(gmu, REG_A6XX_GMU_CX_GMU_POWER_COUNTER_ENABLE, 1);
>>>>>>>>>> +	}
>>>>>>>>>> +
>>>>>>>>>>  	/* Protect registers from the CP */
>>>>>>>>>>  	a6xx_set_cp_protect(gpu);
>>>>>>>>>>  
>>>>>>>>>> @@ -1233,6 +1251,8 @@ static int hw_init(struct msm_gpu *gpu)
>>>>>>>>>>  	}
>>>>>>>>>>  
>>>>>>>>>>  out:
>>>>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>>>> +		return ret;
>>>>>>>>>>  	/*
>>>>>>>>>>  	 * Tell the GMU that we are done touching the GPU and it can start power
>>>>>>>>>>  	 * management
>>>>>>>>>> @@ -1267,6 +1287,9 @@ static void a6xx_dump(struct msm_gpu *gpu)
>>>>>>>>>>  	adreno_dump(gpu);
>>>>>>>>>>  }
>>>>>>>>>>  
>>>>>>>>>> +#define GBIF_GX_HALT_MASK	BIT(0)
>>>>>>>>>> +#define GBIF_CLIENT_HALT_MASK	BIT(0)
>>>>>>>>>> +#define GBIF_ARB_HALT_MASK	BIT(1)
>>>>>>>>>>  #define VBIF_RESET_ACK_TIMEOUT	100
>>>>>>>>>>  #define VBIF_RESET_ACK_MASK	0x00f0
>>>>>>>>>>  
>>>>>>>>>> @@ -1299,7 +1322,8 @@ static void a6xx_recover(struct msm_gpu *gpu)
>>>>>>>>>>  	 * Turn off keep alive that might have been enabled by the hang
>>>>>>>>>>  	 * interrupt
>>>>>>>>>>  	 */
>>>>>>>>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
>>>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
>>>>>>>>>
>>>>>>>>> Maybe it is better to move this to a6xx_gmu_force_power_off.
>>>>>>>>>
>>>>>>>>>>  
>>>>>>>>>>  	pm_runtime_dont_use_autosuspend(&gpu->pdev->dev);
>>>>>>>>>>  
>>>>>>>>>> @@ -1329,6 +1353,32 @@ static void a6xx_recover(struct msm_gpu *gpu)
>>>>>>>>>>  
>>>>>>>>>>  	dev_pm_genpd_remove_notifier(gmu->cxpd);
>>>>>>>>>>  
>>>>>>>>>> +	/* Software-reset the GPU */
>>>>>>>>>
>>>>>>>>> This is not soft reset sequence. We are trying to quiescent gpu - ddr
>>>>>>>>> traffic with this sequence.
>>>>>>>>>
>>>>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>>>>>> +		/* Halt the GX side of GBIF */
>>>>>>>>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, GBIF_GX_HALT_MASK);
>>>>>>>>>> +		spin_until(gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT_ACK) &
>>>>>>>>>> +			   GBIF_GX_HALT_MASK);
>>>>>>>>>> +
>>>>>>>>>> +		/* Halt new client requests on GBIF */
>>>>>>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_CLIENT_HALT_MASK);
>>>>>>>>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
>>>>>>>>>> +			   (GBIF_CLIENT_HALT_MASK)) == GBIF_CLIENT_HALT_MASK);
>>>>>>>>>> +
>>>>>>>>>> +		/* Halt all AXI requests on GBIF */
>>>>>>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, GBIF_ARB_HALT_MASK);
>>>>>>>>>> +		spin_until((gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK) &
>>>>>>>>>> +			   (GBIF_ARB_HALT_MASK)) == GBIF_ARB_HALT_MASK);
>>>>>>>>>> +
>>>>>>>>>> +		/* Clear the halts */
>>>>>>>>>> +		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
>>>>>>>>>> +
>>>>>>>>>> +		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
>>>>>>>>>> +
>>>>>>>>>> +		/* This *really* needs to go through before we do anything else! */
>>>>>>>>>> +		mb();
>>>>>>>>>> +	}
>>>>>>>>>> +
>>>>>>>>>
>>>>>>>>> This sequence should be before we collapse cx gdsc. Also, please see if
>>>>>>>>> we can create a subroutine to avoid code dup.
>>>>>>>>>
>>>>>>>>>>  	pm_runtime_use_autosuspend(&gpu->pdev->dev);
>>>>>>>>>>  
>>>>>>>>>>  	if (active_submits)
>>>>>>>>>> @@ -1463,7 +1513,8 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
>>>>>>>>>>  	 * Force the GPU to stay on until after we finish
>>>>>>>>>>  	 * collecting information
>>>>>>>>>>  	 */
>>>>>>>>>> -	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
>>>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>>>> +		gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 1);
>>>>>>>>>>  
>>>>>>>>>>  	DRM_DEV_ERROR(&gpu->pdev->dev,
>>>>>>>>>>  		"gpu fault ring %d fence %x status %8.8X rb %4.4x/%4.4x ib1 %16.16llX/%4.4x ib2 %16.16llX/%4.4x\n",
>>>>>>>>>> @@ -1624,7 +1675,7 @@ static void a6xx_llc_slices_init(struct platform_device *pdev,
>>>>>>>>>>  		a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
>>>>>>>>>>  }
>>>>>>>>>>  
>>>>>>>>>> -static int a6xx_pm_resume(struct msm_gpu *gpu)
>>>>>>>>>> +static int a6xx_gmu_pm_resume(struct msm_gpu *gpu)
>>>>>>>>>>  {
>>>>>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>>>>>> @@ -1644,10 +1695,61 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
>>>>>>>>>>  
>>>>>>>>>>  	a6xx_llc_activate(a6xx_gpu);
>>>>>>>>>>  
>>>>>>>>>> -	return 0;
>>>>>>>>>> +	return ret;
>>>>>>>>>>  }
>>>>>>>>>>  
>>>>>>>>>> -static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>>>>>>>>> +static int a6xx_pm_resume(struct msm_gpu *gpu)
>>>>>>>>>> +{
>>>>>>>>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>>>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>>>>>>> +	unsigned long freq = 0;
>>>>>>>>>> +	struct dev_pm_opp *opp;
>>>>>>>>>> +	int ret;
>>>>>>>>>> +
>>>>>>>>>> +	gpu->needs_hw_init = true;
>>>>>>>>>> +
>>>>>>>>>> +	trace_msm_gpu_resume(0);
>>>>>>>>>> +
>>>>>>>>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
>>>>>>>>> I think we can ignore gmu lock as there is no real gmu device.
>>>>>>>>>
>>>>>>>>>> +
>>>>>>>>>> +	pm_runtime_resume_and_get(gmu->dev);
>>>>>>>>>> +	pm_runtime_resume_and_get(gmu->gxpd);
>>>>>>>>>> +
>>>>>>>>>> +	/* Set the core clock, having VDD scaling in mind */
>>>>>>>>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, gpu->fast_rate);
>>>>>>>>>> +	if (ret)
>>>>>>>>>> +		goto err_core_clk;
>>>>>>>>>> +
>>>>>>>>>> +	ret = clk_bulk_prepare_enable(gpu->nr_clocks, gpu->grp_clks);
>>>>>>>>>> +	if (ret)
>>>>>>>>>> +		goto err_bulk_clk;
>>>>>>>>>> +
>>>>>>>>>> +	ret = clk_prepare_enable(gpu->ebi1_clk);
>>>>>>>>>> +	if (ret)
>>>>>>>>>> +		goto err_mem_clk;
>>>>>>>>>> +
>>>>>>>>>> +	/* If anything goes south, tear the GPU down piece by piece.. */
>>>>>>>>>> +	if (ret) {
>>>>>>>>>> +err_mem_clk:
>>>>>>>>>> +		clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
>>>>>>>>>> +err_bulk_clk:
>>>>>>>>>> +		opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
>>>>>>>>>> +		dev_pm_opp_put(opp);
>>>>>>>>>> +		dev_pm_opp_set_rate(&gpu->pdev->dev, 0);
>>>>>>>>>> +err_core_clk:
>>>>>>>>>> +		pm_runtime_put(gmu->gxpd);
>>>>>>>>>> +		pm_runtime_put(gmu->dev);
>>>>>>>>>> +	}
>>>>>>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
>>>>>>>>>> +
>>>>>>>>>> +	if (!ret)
>>>>>>>>>> +		msm_devfreq_resume(gpu);
>>>>>>>>>> +
>>>>>>>>>> +	return ret;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +static int a6xx_gmu_pm_suspend(struct msm_gpu *gpu)
>>>>>>>>>>  {
>>>>>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>>>>>> @@ -1674,11 +1776,62 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>>>>>>>>>  	return 0;
>>>>>>>>>>  }
>>>>>>>>>>  
>>>>>>>>>> +static int a6xx_pm_suspend(struct msm_gpu *gpu)
>>>>>>>>>> +{
>>>>>>>>>> +	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>>>>> +	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>>>>>> +	struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
>>>>>>>>>> +	unsigned long freq = 0;
>>>>>>>>>> +	struct dev_pm_opp *opp;
>>>>>>>>>> +	int i, ret;
>>>>>>>>>> +
>>>>>>>>>> +	trace_msm_gpu_suspend(0);
>>>>>>>>>> +
>>>>>>>>>> +	opp = dev_pm_opp_find_freq_ceil(&gpu->pdev->dev, &freq);
>>>>>>>>>> +	dev_pm_opp_put(opp);
>>>>>>>>>> +
>>>>>>>>>> +	msm_devfreq_suspend(gpu);
>>>>>>>>>> +
>>>>>>>>>> +	mutex_lock(&a6xx_gpu->gmu.lock);
>>>>>>>>>> +
>>>>>>>>>> +	clk_disable_unprepare(gpu->ebi1_clk);
>>>>>>>>>> +
>>>>>>>>>> +	clk_bulk_disable_unprepare(gpu->nr_clocks, gpu->grp_clks);
>>>>>>>>>> +
>>>>>>>>>> +	/* Set frequency to the minimum supported level (no 27MHz on A6xx!) */
>>>>>>>>>> +	ret = dev_pm_opp_set_rate(&gpu->pdev->dev, freq);
>>>>>>>>>> +	if (ret)
>>>>>>>>>> +		goto err;
>>>>>>>>>> +
>>>>>>>>>> +	pm_runtime_put_sync(gmu->gxpd);
>>>>>>>>>> +	pm_runtime_put_sync(gmu->dev);
>>>>>>>>>> +
>>>>>>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
>>>>>>>>>> +
>>>>>>>>>> +	if (a6xx_gpu->shadow_bo)
>>>>>>>>>> +		for (i = 0; i < gpu->nr_rings; i++)
>>>>>>>>>> +			a6xx_gpu->shadow[i] = 0;
>>>>>>>>>> +
>>>>>>>>>> +	gpu->suspend_count++;
>>>>>>>>>> +
>>>>>>>>>> +	return 0;
>>>>>>>>>> +
>>>>>>>>>> +err:
>>>>>>>>>> +	mutex_unlock(&a6xx_gpu->gmu.lock);
>>>>>>>>>> +
>>>>>>>>>> +	return ret;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>  static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>>>>>>>>>>  {
>>>>>>>>>>  	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>>>>>>>>>>  	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>>>>>>>>>>  
>>>>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>>>>>> +		*value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER);
>>>>>>>>>> +		return 0;
>>>>>>>>>> +	}
>>>>>>>>>> +
>>>>>>>>> Instead of wrapper check here, we can just create a separate op. I don't
>>>>>>>>> see any benefit in reusing the same function here.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
>>>>>>>>>>  
>>>>>>>>>>  	/* Force the GPU power on so we can read this register */
>>>>>>>>>> @@ -1716,7 +1869,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
>>>>>>>>>>  		drm_gem_object_put(a6xx_gpu->shadow_bo);
>>>>>> 6xx_pm_suspend >>>>  	}
>>>>>>>>>>  
>>>>>>>>>> -	a6xx_llc_slices_destroy(a6xx_gpu);
>>>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>>>> +		a6xx_llc_slices_destroy(a6xx_gpu);
>>>>>>>>>>  
>>>>>>>>>>  	mutex_lock(&a6xx_gpu->gmu.lock);
>>>>>>>>>>  	a6xx_gmu_remove(a6xx_gpu);
>>>>>>>>>> @@ -1957,8 +2111,8 @@ static const struct adreno_gpu_funcs funcs = {
>>>>>>>>>>  		.set_param = adreno_set_param,
>>>>>>>>>>  		.hw_init = a6xx_hw_init,
>>>>>>>>>>  		.ucode_load = a6xx_ucode_load,
>>>>>>>>>> -		.pm_suspend = a6xx_pm_suspend,
>>>>>>>>>> -		.pm_resume = a6xx_pm_resume,
>>>>>>>>>> +		.pm_suspend = a6xx_gmu_pm_suspend,
>>>>>>>>>> +		.pm_resume = a6xx_gmu_pm_resume,
>>>>>>>>>>  		.recover = a6xx_recover,
>>>>>>>>>>  		.submit = a6xx_submit,
>>>>>>>>>>  		.active_ring = a6xx_active_ring,
>>>>>>>>>> @@ -1982,6 +2136,35 @@ static const struct adreno_gpu_funcs funcs = {
>>>>>>>>>>  	.get_timestamp = a6xx_get_timestamp,
>>>>>>>>>>  };
>>>>>>>>>>  
>>>>>>>>>> +static const struct adreno_gpu_funcs funcs_gmuwrapper = {
>>>>>>>>>> +	.base = {
>>>>>>>>>> +		.get_param = adreno_get_param,
>>>>>>>>>> +		.set_param = adreno_set_param,
>>>>>>>>>> +		.hw_init = a6xx_hw_init,
>>>>>>>>>> +		.ucode_load = a6xx_ucode_load,
>>>>>>>>>> +		.pm_suspend = a6xx_pm_suspend,
>>>>>>>>>> +		.pm_resume = a6xx_pm_resume,
>>>>>>>>>> +		.recover = a6xx_recover,
>>>>>>>>>> +		.submit = a6xx_submit,
>>>>>>>>>> +		.active_ring = a6xx_active_ring,
>>>>>>>>>> +		.irq = a6xx_irq,
>>>>>>>>>> +		.destroy = a6xx_destroy,
>>>>>>>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
>>>>>>>>>> +		.show = a6xx_show,
>>>>>>>>>> +#endif
>>>>>>>>>> +		.gpu_busy = a6xx_gpu_busy,
>>>>>>>>>> +#if defined(CONFIG_DRM_MSM_GPU_STATE)
>>>>>>>>>> +		.gpu_state_get = a6xx_gpu_state_get,
>>>>>>>>>> +		.gpu_state_put = a6xx_gpu_state_put,
>>>>>>>>>> +#endif
>>>>>>>>>> +		.create_address_space = a6xx_create_address_space,
>>>>>>>>>> +		.create_private_address_space = a6xx_create_private_address_space,
>>>>>>>>>> +		.get_rptr = a6xx_get_rptr,
>>>>>>>>>> +		.progress = a6xx_progress,
>>>>>>>>>> +	},
>>>>>>>>>> +	.get_timestamp = a6xx_get_timestamp,
>>>>>>>>>> +};
>>>>>>>>>> +
>>>>>>>>>>  struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>>>>>>>  {
>>>>>>>>>>  	struct msm_drm_private *priv = dev->dev_private;
>>>>>>>>>> @@ -2003,18 +2186,36 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>>>>>>>  
>>>>>>>>>>  	adreno_gpu->registers = NULL;
>>>>>>>>>>  
>>>>>>>>>> +	/* Check if there is a GMU phandle and set it up */
>>>>>>>>>> +	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
>>>>>>>>>> +	/* FIXME: How do we gracefully handle this? */
>>>>>>>>>> +	BUG_ON(!node);
>>>>>>>>> How will you handle this BUG() when there is no GMU (a610 gpu)?
>>>>>>>>>
>>>>>>>>>> +
>>>>>>>>>> +	adreno_gpu->gmu_is_wrapper = of_device_is_compatible(node, "qcom,adreno-gmu-wrapper");
>>>>>>>>>> +
>>>>>>>>>>  	/*
>>>>>>>>>>  	 * We need to know the platform type before calling into adreno_gpu_init
>>>>>>>>>>  	 * so that the hw_apriv flag can be correctly set. Snoop into the info
>>>>>>>>>>  	 * and grab the revision number
>>>>>>>>>>  	 */
>>>>>>>>>>  	info = adreno_info(config->rev);
>>>>>>>>>> -
>>>>>>>>>> -	if (info && (info->revn == 650 || info->revn == 660 ||
>>>>>>>>>> -			adreno_cmp_rev(ADRENO_REV(6, 3, 5, ANY_ID), info->rev)))
>>>>>>>>>> +	if (!info)
>>>>>>>>>> +		return ERR_PTR(-EINVAL);
>>>>>>>>>> +
>>>>>>>>>> +	/* Assign these early so that we can use the is_aXYZ helpers */
>>>>>>>>>> +	/* Numeric revision IDs (e.g. 630) */
>>>>>>>>>> +	adreno_gpu->revn = info->revn;
>>>>>>>>>> +	/* New-style ADRENO_REV()-only */
>>>>>>>>>> +	adreno_gpu->rev = info->rev;
>>>>>>>>>> +	/* Quirk data */
>>>>>>>>>> +	adreno_gpu->info = info;
>>>>>>>>>> +
>>>>>>>>>> +	if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
>>>>>>>>>>  		adreno_gpu->base.hw_apriv = true;
>>>>>>>>>>  
>>>>>>>>>> -	a6xx_llc_slices_init(pdev, a6xx_gpu);
>>>>>>>>>> +	/* No LLCC on non-RPMh (and by extension, non-GMU) SoCs */
>>>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>>>> +		a6xx_llc_slices_init(pdev, a6xx_gpu);
>>>>>>>>>>  
>>>>>>>>>>  	ret = a6xx_set_supported_hw(&pdev->dev, config->rev);
>>>>>>>>>>  	if (ret) {
>>>>>>>>>> @@ -2022,7 +2223,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>>>>>>>  		return ERR_PTR(ret);
>>>>>>>>>>  	}
>>>>>>>>>>  
>>>>>>>>>> -	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
>>>>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_gmuwrapper, 1);
>>>>>>>>>> +	else
>>>>>>>>>> +		ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
>>>>>>>>>>  	if (ret) {
>>>>>>>>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
>>>>>>>>>>  		return ERR_PTR(ret);
>>>>>>>>>> @@ -2035,13 +2239,10 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>>>>>>>>>>  	if (adreno_is_a618(adreno_gpu) || adreno_is_7c3(adreno_gpu))
>>>>>>>>>>  		priv->gpu_clamp_to_idle = true;
>>>>>>>>>>  
>>>>>>>>>> -	/* Check if there is a GMU phandle and set it up */
>>>>>>>>>> -	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);
>>>>>>>>>> -
>>>>>>>>>> -	/* FIXME: How do we gracefully handle this? */
>>>>>>>>>> -	BUG_ON(!node);
>>>>>>>>>> -
>>>>>>>>>> -	ret = a6xx_gmu_init(a6xx_gpu, node);
>>>>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu))
>>>>>>>>>> +		ret = a6xx_gmu_wrapper_init(a6xx_gpu, node);
>>>>>>>>>> +	else
>>>>>>>>>> +		ret = a6xx_gmu_init(a6xx_gpu, node);
>>>>>>>>>>  	of_node_put(node);
>>>>>>>>>>  	if (ret) {
>>>>>>>>>>  		a6xx_destroy(&(a6xx_gpu->base.base));
>>>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>>>>>>>> index eea2e60ce3b7..51a7656072fa 100644
>>>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
>>>>>>>>>> @@ -76,6 +76,7 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>>>>>>>>>>  void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>>>>>>>>>>  
>>>>>>>>>>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>>>>>>>>>> +int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>>>>>>>>>>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
>>>>>>>>>>  
>>>>>>>>>>  void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
>>>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>>>>>>>> index 30ecdff363e7..4e5d650578c6 100644
>>>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
>>>>>>>>>> @@ -1041,16 +1041,18 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
>>>>>>>>>>  	/* Get the generic state from the adreno core */
>>>>>>>>>>  	adreno_gpu_state_get(gpu, &a6xx_state->base);
>>>>>>>>>>  
>>>>>>>>>> -	a6xx_get_gmu_registers(gpu, a6xx_state);
>>>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu)) {
>>>>>>>>> nit: Kinda misleading function name to a layman. Should we invert the
>>>>>>>>> function to "adreno_has_gmu"?
>>>>>>>>>
>>>>>>>>> -Akhil
>>>>>>>>>> +		a6xx_get_gmu_registers(gpu, a6xx_state);
>>>>>>>>>>  
>>>>>>>>>> -	a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
>>>>>>>>>> -	a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
>>>>>>>>>> -	a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
>>>>>>>>>> +		a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log);
>>>>>>>>>> +		a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi);
>>>>>>>>>> +		a6xx_state->gmu_debug = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.debug);
>>>>>>>>>>  
>>>>>>>>>> -	a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
>>>>>>>>>> +		a6xx_snapshot_gmu_hfi_history(gpu, a6xx_state);
>>>>>>>>>> +	}
>>>>>>>>>>  
>>>>>>>>>>  	/* If GX isn't on the rest of the data isn't going to be accessible */
>>>>>>>>>> -	if (!a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>>>>>>>>>> +	if (!adreno_has_gmu_wrapper(adreno_gpu) && !a6xx_gmu_gx_is_on(&a6xx_gpu->gmu))
>>>>>>>>>>  		return &a6xx_state->base;
>>>>>>>>>>  
>>>>>>>>>>  	/* Get the banks of indexed registers */
>>>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>>>>>>>> index 6934cee07d42..5c5901d65950 100644
>>>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>>>>>>>> @@ -528,6 +528,10 @@ int adreno_load_fw(struct adreno_gpu *adreno_gpu)
>>>>>>>>>>  		if (!adreno_gpu->info->fw[i])
>>>>>>>>>>  			continue;
>>>>>>>>>>  
>>>>>>>>>> +		/* Skip loading GMU firwmare with GMU Wrapper */
>>>>>>>>>> +		if (adreno_has_gmu_wrapper(adreno_gpu) && i == ADRENO_FW_GMU)
>>>>>>>>>> +			continue;
>>>>>>>>>> +
>>>>>>>>>>  		/* Skip if the firmware has already been loaded */
>>>>>>>>>>  		if (adreno_gpu->fw[i])
>>>>>>>>>>  			continue;
>>>>>>>>>> @@ -1074,8 +1078,8 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>>>>>>>>>>  	u32 speedbin;
>>>>>>>>>>  	int ret;
>>>>>>>>>>  
>>>>>>>>>> -	/* Only handle the core clock when GMU is not in use */
>>>>>>>>>> -	if (config->rev.core < 6) {
>>>>>>>>>> +	/* Only handle the core clock when GMU is not in use (or is absent). */
>>>>>>>>>> +	if (adreno_has_gmu_wrapper(adreno_gpu) || config->rev.core < 6) {
>>>>>>>>>>  		/*
>>>>>>>>>>  		 * This can only be done before devm_pm_opp_of_add_table(), or
>>>>>>>>>>  		 * dev_pm_opp_set_config() will WARN_ON()
>>>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>>>>>>>> index f62612a5c70f..ee5352bc5329 100644
>>>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>>>>>>>>>> @@ -115,6 +115,7 @@ struct adreno_gpu {
>>>>>>>>>>  	 * code (a3xx_gpu.c) and stored in this common location.
>>>>>>>>>>  	 */
>>>>>>>>>>  	const unsigned int *reg_offsets;
>>>>>>>>>> +	bool gmu_is_wrapper;
>>>>>>>>>>  };
>>>>>>>>>>  #define to_adreno_gpu(x) container_of(x, struct adreno_gpu, base)
>>>>>>>>>>  
>>>>>>>>>> @@ -145,6 +146,11 @@ struct adreno_platform_config {
>>>>>>>>>>  
>>>>>>>>>>  bool adreno_cmp_rev(struct adreno_rev rev1, struct adreno_rev rev2);
>>>>>>>>>>  
>>>>>>>>>> +static inline bool adreno_has_gmu_wrapper(struct adreno_gpu *gpu)
>>>>>>>>>> +{
>>>>>>>>>> +	return gpu->gmu_is_wrapper;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>  static inline bool adreno_is_a2xx(struct adreno_gpu *gpu)
>>>>>>>>>>  {
>>>>>>>>>>  	return (gpu->revn < 300);
>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>> 2.40.0
>>>>>>>>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-05-09  8:46 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-01 11:54 [PATCH v6 00/15] GMU-less A6xx support (A610, A619_holi) Konrad Dybcio
2023-04-01 11:54 ` [PATCH v6 01/15] drm/msm/adreno: adreno_gpu: Don't set OPP scaling clock w/ GMU Konrad Dybcio
2023-04-02 15:43   ` Dmitry Baryshkov
2023-04-01 11:54 ` [PATCH v6 02/15] dt-bindings: display/msm: gpu: Document GMU wrapper-equipped A6xx Konrad Dybcio
2023-04-05  6:36   ` Krzysztof Kozlowski
2023-04-01 11:54 ` [PATCH v6 03/15] dt-bindings: display/msm/gmu: Add GMU wrapper Konrad Dybcio
2023-04-01 11:54 ` [PATCH v6 04/15] drm/msm/a6xx: Remove static keyword from sptprac en/disable functions Konrad Dybcio
2023-04-01 11:54 ` [PATCH v6 05/15] drm/msm/a6xx: Extend and explain UBWC config Konrad Dybcio
2023-04-01 11:54 ` [PATCH v6 06/15] drm/msm/a6xx: Introduce GMU wrapper support Konrad Dybcio
2023-05-02  7:49   ` Akhil P Oommen
2023-05-02  9:40     ` Konrad Dybcio
2023-05-03 20:32       ` Akhil P Oommen
2023-05-04  6:34         ` Konrad Dybcio
2023-05-05  8:46           ` Akhil P Oommen
2023-05-05 10:35             ` Konrad Dybcio
2023-05-06 14:46               ` Akhil P Oommen
2023-05-06 20:46                 ` [Freedreno] " Akhil P Oommen
2023-05-06 21:07                   ` Akhil P Oommen
2023-05-08  8:59                 ` Konrad Dybcio
2023-05-08 21:15                   ` Akhil P Oommen
2023-05-09  8:46                     ` Konrad Dybcio
2023-04-01 11:54 ` [PATCH v6 07/15] drm/msm/a6xx: Remove both GBIF and RBBM GBIF halt on hw init Konrad Dybcio
2023-04-01 11:54 ` [PATCH v6 08/15] drm/msm/adreno: Disable has_cached_coherent in GMU wrapper configurations Konrad Dybcio
2023-04-01 11:54 ` [PATCH v6 09/15] drm/msm/a6xx: Add support for A619_holi Konrad Dybcio
2023-04-01 11:54 ` [PATCH v6 10/15] drm/msm/a6xx: Add A610 support Konrad Dybcio
2023-04-01 11:54 ` [PATCH v6 11/15] drm/msm/a6xx: Fix some A619 tunables Konrad Dybcio
2023-04-01 11:54 ` [PATCH v6 12/15] drm/msm/a6xx: Use "else if" in GPU speedbin rev matching Konrad Dybcio
2023-04-01 11:54 ` [PATCH v6 13/15] drm/msm/a6xx: Use adreno_is_aXYZ macros in speedbin matching Konrad Dybcio
2023-04-01 11:54 ` [PATCH v6 14/15] drm/msm/a6xx: Add A619_holi speedbin support Konrad Dybcio
2023-04-01 11:54 ` [PATCH v6 15/15] drm/msm/a6xx: Add A610 " Konrad Dybcio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).