All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6
@ 2020-03-24 20:15 Chris Wilson
  2020-03-24 20:44 ` Chris Wilson
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Chris Wilson @ 2020-03-24 20:15 UTC (permalink / raw)
  To: intel-gfx

Measure and compare the energy consumed, as reported by the rapl MSR,
by the GPU while in RC0 and RC6 states. Throw an error if RC6 does not
at least halve the energy consumption of RC0, as this more than likely
means we failed to enter RC0 correctly.

If we can't measure the energy draw with the MSR, then it will report 0
for both measurements. This may be worth flagging as a warning? On the
other hand, it is reported and we can inspect the various machines in CI
to see if the values are reasonable.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_rc6.c | 29 ++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/selftest_rc6.c b/drivers/gpu/drm/i915/gt/selftest_rc6.c
index 95b165faeba7..7339758b0c77 100644
--- a/drivers/gpu/drm/i915/gt/selftest_rc6.c
+++ b/drivers/gpu/drm/i915/gt/selftest_rc6.c
@@ -12,6 +12,22 @@
 
 #include "selftests/i915_random.h"
 
+#define MCH_SECP_NRG_STTS              _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x592c)
+
+static u64 energy_uJ(struct intel_rc6 *rc6)
+{
+	unsigned long long power;
+	u32 units;
+
+	if (rdmsrl_safe(MSR_RAPL_POWER_UNIT, &power))
+		return 0;
+
+	units = (power & 0x1f00) >> 8;
+	power = intel_uncore_read_fw(rc6_to_uncore(rc6), MCH_SECP_NRG_STTS);
+
+	return (1000000 * power) >> units; /* convert to uJ */
+}
+
 static u64 rc6_residency(struct intel_rc6 *rc6)
 {
 	u64 result;
@@ -31,6 +47,7 @@ int live_rc6_manual(void *arg)
 {
 	struct intel_gt *gt = arg;
 	struct intel_rc6 *rc6 = &gt->rc6;
+	u64 rc0_energy, rc6_energy;
 	intel_wakeref_t wakeref;
 	u64 res[2];
 	int err = 0;
@@ -53,9 +70,11 @@ int live_rc6_manual(void *arg)
 	__intel_rc6_disable(rc6);
 	msleep(1); /* wakeup is not immediate, takes about 100us on icl */
 
+	rc0_energy = -energy_uJ(rc6);
 	res[0] = rc6_residency(rc6);
 	msleep(250);
 	res[1] = rc6_residency(rc6);
+	rc0_energy += energy_uJ(rc6);
 	if ((res[1] - res[0]) >> 10) {
 		pr_err("RC6 residency increased by %lldus while disabled for 250ms!\n",
 		       (res[1] - res[0]) >> 10);
@@ -66,9 +85,11 @@ int live_rc6_manual(void *arg)
 	/* Manually enter RC6 */
 	intel_rc6_park(rc6);
 
+	rc6_energy = -energy_uJ(rc6);
 	res[0] = rc6_residency(rc6);
 	msleep(100);
 	res[1] = rc6_residency(rc6);
+	rc6_energy += energy_uJ(rc6);
 
 	if (res[1] == res[0]) {
 		pr_err("Did not enter RC6! RC6_STATE=%08x, RC6_CONTROL=%08x, residency=%lld\n",
@@ -78,6 +99,14 @@ int live_rc6_manual(void *arg)
 		err = -EINVAL;
 	}
 
+	pr_info("GPU consumed %llduJ in RC0 and %llduJ in RC6\n",
+		rc0_energy, rc6_energy);
+	if ((rc6_energy >> 10) > (rc0_energy >> 10) / 2) { /* compare mJ */
+		pr_err("GPU leaked energy while in RC6!\n");
+		err = -EINVAL;
+		goto out_unlock;
+	}
+
 	/* Restore what should have been the original state! */
 	intel_rc6_unpark(rc6);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6
  2020-03-24 20:15 [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6 Chris Wilson
@ 2020-03-24 20:44 ` Chris Wilson
  2020-03-24 20:56   ` Chris Wilson
  2020-09-28 23:56   ` Lucas De Marchi
  2020-03-24 20:58 ` Chris Wilson
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 13+ messages in thread
From: Chris Wilson @ 2020-03-24 20:44 UTC (permalink / raw)
  To: intel-gfx

Measure and compare the energy consumed, as reported by the rapl MSR,
by the GPU while in RC0 and RC6 states. Throw an error if RC6 does not
at least halve the energy consumption of RC0, as this more than likely
means we failed to enter RC0 correctly.

If we can't measure the energy draw with the MSR, then it will report 0
for both measurements. Since the measurement works on all gen6+, this seems
worth flagging as an error.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_rc6.c | 39 ++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/selftest_rc6.c b/drivers/gpu/drm/i915/gt/selftest_rc6.c
index 95b165faeba7..3ac9a8925218 100644
--- a/drivers/gpu/drm/i915/gt/selftest_rc6.c
+++ b/drivers/gpu/drm/i915/gt/selftest_rc6.c
@@ -12,6 +12,22 @@
 
 #include "selftests/i915_random.h"
 
+#define MCH_SECP_NRG_STTS              _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x592c)
+
+static u64 energy_uJ(struct intel_rc6 *rc6)
+{
+	unsigned long long power;
+	u32 units;
+
+	if (rdmsrl_safe(MSR_RAPL_POWER_UNIT, &power))
+		return 0;
+
+	units = (power & 0x1f00) >> 8;
+	power = intel_uncore_read_fw(rc6_to_uncore(rc6), MCH_SECP_NRG_STTS);
+
+	return (1000000 * power) >> units; /* convert to uJ */
+}
+
 static u64 rc6_residency(struct intel_rc6 *rc6)
 {
 	u64 result;
@@ -31,7 +47,9 @@ int live_rc6_manual(void *arg)
 {
 	struct intel_gt *gt = arg;
 	struct intel_rc6 *rc6 = &gt->rc6;
+	u64 rc0_power, rc6_power;
 	intel_wakeref_t wakeref;
+	ktime_t dt;
 	u64 res[2];
 	int err = 0;
 
@@ -53,22 +71,35 @@ int live_rc6_manual(void *arg)
 	__intel_rc6_disable(rc6);
 	msleep(1); /* wakeup is not immediate, takes about 100us on icl */
 
+	dt = ktime_get();
+	rc0_power = energy_uJ(rc6);
 	res[0] = rc6_residency(rc6);
 	msleep(250);
 	res[1] = rc6_residency(rc6);
+	rc0_power = div64_u64(energy_uJ(rc6) - rc0_power,
+			      ktime_to_ns(ktime_sub(ktime_get(), dt)));
 	if ((res[1] - res[0]) >> 10) {
 		pr_err("RC6 residency increased by %lldus while disabled for 250ms!\n",
 		       (res[1] - res[0]) >> 10);
 		err = -EINVAL;
 		goto out_unlock;
 	}
+	if (!rc0_power) {
+		pr_err("No power measured while in RC0\n");
+		err = -EINVAL;
+		goto out_unlock;
+	}
 
 	/* Manually enter RC6 */
 	intel_rc6_park(rc6);
 
+	dt = ktime_get();
+	rc6_power = energy_uJ(rc6);
 	res[0] = rc6_residency(rc6);
 	msleep(100);
 	res[1] = rc6_residency(rc6);
+	rc6_power = div64_u64(energy_uJ(rc6) - rc6_power,
+			      ktime_to_ns(ktime_sub(ktime_get(), dt)));
 
 	if (res[1] == res[0]) {
 		pr_err("Did not enter RC6! RC6_STATE=%08x, RC6_CONTROL=%08x, residency=%lld\n",
@@ -78,6 +109,14 @@ int live_rc6_manual(void *arg)
 		err = -EINVAL;
 	}
 
+	pr_info("GPU consumed %llduW in RC0 and %llduW in RC6\n",
+		rc0_power, rc6_power);
+	if ((rc6_power >> 10) > (rc0_power >> 10) / 2) { /* compare mW */
+		pr_err("GPU leaked energy while in RC6!\n");
+		err = -EINVAL;
+		goto out_unlock;
+	}
+
 	/* Restore what should have been the original state! */
 	intel_rc6_unpark(rc6);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6
  2020-03-24 20:44 ` Chris Wilson
@ 2020-03-24 20:56   ` Chris Wilson
  2020-09-28 23:56   ` Lucas De Marchi
  1 sibling, 0 replies; 13+ messages in thread
From: Chris Wilson @ 2020-03-24 20:56 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2020-03-24 20:44:55)
> +       dt = ktime_get();
> +       rc0_power = energy_uJ(rc6);
>         res[0] = rc6_residency(rc6);
>         msleep(250);
>         res[1] = rc6_residency(rc6);
> +       rc0_power = div64_u64(energy_uJ(rc6) - rc0_power,
> +                             ktime_to_ns(ktime_sub(ktime_get(), dt)));

Did you forget this was in ns? You did!
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6
  2020-03-24 20:15 [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6 Chris Wilson
  2020-03-24 20:44 ` Chris Wilson
@ 2020-03-24 20:58 ` Chris Wilson
  2020-03-24 23:48 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/selftests: Measure the energy consumed while in RC6 (rev3) Patchwork
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Chris Wilson @ 2020-03-24 20:58 UTC (permalink / raw)
  To: intel-gfx

Measure and compare the energy consumed, as reported by the rapl MSR,
by the GPU while in RC0 and RC6 states. Throw an error if RC6 does not
at least halve the energy consumption of RC0, as this more than likely
means we failed to enter RC0 correctly.

If we can't measure the energy draw with the MSR, then it will report 0
for both measurements. Since the measurement works on all gen6+, this seems
worth flagging as an error.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_rc6.c | 39 ++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/selftest_rc6.c b/drivers/gpu/drm/i915/gt/selftest_rc6.c
index 95b165faeba7..0b3368cb6d89 100644
--- a/drivers/gpu/drm/i915/gt/selftest_rc6.c
+++ b/drivers/gpu/drm/i915/gt/selftest_rc6.c
@@ -12,6 +12,22 @@
 
 #include "selftests/i915_random.h"
 
+#define MCH_SECP_NRG_STTS              _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x592c)
+
+static u64 energy_uJ(struct intel_rc6 *rc6)
+{
+	unsigned long long power;
+	u32 units;
+
+	if (rdmsrl_safe(MSR_RAPL_POWER_UNIT, &power))
+		return 0;
+
+	units = (power & 0x1f00) >> 8;
+	power = intel_uncore_read_fw(rc6_to_uncore(rc6), MCH_SECP_NRG_STTS);
+
+	return (1000000 * power) >> units; /* convert to uJ */
+}
+
 static u64 rc6_residency(struct intel_rc6 *rc6)
 {
 	u64 result;
@@ -31,7 +47,9 @@ int live_rc6_manual(void *arg)
 {
 	struct intel_gt *gt = arg;
 	struct intel_rc6 *rc6 = &gt->rc6;
+	u64 rc0_power, rc6_power;
 	intel_wakeref_t wakeref;
+	ktime_t dt;
 	u64 res[2];
 	int err = 0;
 
@@ -53,22 +71,35 @@ int live_rc6_manual(void *arg)
 	__intel_rc6_disable(rc6);
 	msleep(1); /* wakeup is not immediate, takes about 100us on icl */
 
+	dt = ktime_get();
+	rc0_power = energy_uJ(rc6);
 	res[0] = rc6_residency(rc6);
 	msleep(250);
 	res[1] = rc6_residency(rc6);
+	rc0_power = div64_u64(NSEC_PER_SEC * (energy_uJ(rc6) - rc0_power),
+			      ktime_to_ns(ktime_sub(ktime_get(), dt)));
 	if ((res[1] - res[0]) >> 10) {
 		pr_err("RC6 residency increased by %lldus while disabled for 250ms!\n",
 		       (res[1] - res[0]) >> 10);
 		err = -EINVAL;
 		goto out_unlock;
 	}
+	if (!rc0_power) {
+		pr_err("No power measured while in RC0\n");
+		err = -EINVAL;
+		goto out_unlock;
+	}
 
 	/* Manually enter RC6 */
 	intel_rc6_park(rc6);
 
+	dt = ktime_get();
+	rc6_power = energy_uJ(rc6);
 	res[0] = rc6_residency(rc6);
 	msleep(100);
 	res[1] = rc6_residency(rc6);
+	rc6_power = div64_u64(NSEC_PER_SEC * (energy_uJ(rc6) - rc6_power),
+			      ktime_to_ns(ktime_sub(ktime_get(), dt)));
 
 	if (res[1] == res[0]) {
 		pr_err("Did not enter RC6! RC6_STATE=%08x, RC6_CONTROL=%08x, residency=%lld\n",
@@ -78,6 +109,14 @@ int live_rc6_manual(void *arg)
 		err = -EINVAL;
 	}
 
+	pr_info("GPU consumed %llduW in RC0 and %llduW in RC6\n",
+		rc0_power, rc6_power);
+	if ((rc6_power >> 10) > (rc0_power >> 10) / 2) { /* compare mW */
+		pr_err("GPU leaked energy while in RC6!\n");
+		err = -EINVAL;
+		goto out_unlock;
+	}
+
 	/* Restore what should have been the original state! */
 	intel_rc6_unpark(rc6);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/selftests: Measure the energy consumed while in RC6 (rev3)
  2020-03-24 20:15 [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6 Chris Wilson
  2020-03-24 20:44 ` Chris Wilson
  2020-03-24 20:58 ` Chris Wilson
@ 2020-03-24 23:48 ` Patchwork
  2020-03-25  0:08 ` [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6 Chris Wilson
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Patchwork @ 2020-03-24 23:48 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/selftests: Measure the energy consumed while in RC6 (rev3)
URL   : https://patchwork.freedesktop.org/series/75035/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_8184 -> Patchwork_17075
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_17075 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_17075, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17075/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_17075:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@gt_pm:
    - fi-glk-dsi:         [PASS][1] -> [DMESG-FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8184/fi-glk-dsi/igt@i915_selftest@live@gt_pm.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17075/fi-glk-dsi/igt@i915_selftest@live@gt_pm.html
    - fi-apl-guc:         [PASS][3] -> [DMESG-FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8184/fi-apl-guc/igt@i915_selftest@live@gt_pm.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17075/fi-apl-guc/igt@i915_selftest@live@gt_pm.html
    - fi-icl-guc:         [PASS][5] -> [DMESG-FAIL][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8184/fi-icl-guc/igt@i915_selftest@live@gt_pm.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17075/fi-icl-guc/igt@i915_selftest@live@gt_pm.html
    - fi-bxt-dsi:         [PASS][7] -> [DMESG-FAIL][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8184/fi-bxt-dsi/igt@i915_selftest@live@gt_pm.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17075/fi-bxt-dsi/igt@i915_selftest@live@gt_pm.html

  
Known issues
------------

  Here are the changes found in Patchwork_17075 that come from known issues:

### IGT changes ###

#### Possible fixes ####

  * igt@i915_selftest@live@gem_contexts:
    - fi-cml-s:           [DMESG-FAIL][9] ([i915#877]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8184/fi-cml-s/igt@i915_selftest@live@gem_contexts.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17075/fi-cml-s/igt@i915_selftest@live@gem_contexts.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [i915#656]: https://gitlab.freedesktop.org/drm/intel/issues/656
  [i915#877]: https://gitlab.freedesktop.org/drm/intel/issues/877


Participating hosts (40 -> 42)
------------------------------

  Additional (6): fi-kbl-soraka fi-hsw-4770r fi-bsw-n3050 fi-byt-j1900 fi-kbl-7500u fi-cfl-8109u 
  Missing    (4): fi-ctg-p8600 fi-byt-squawks fi-bsw-cyan fi-bdw-samus 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8184 -> Patchwork_17075

  CI-20190529: 20190529
  CI_DRM_8184: 1a72c9d9d3140e92190485d766b9d165932c5b86 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5535: d1dcf40cc6869ac858586c5ad9f09af6617ce2ee @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17075: 73d97df3bf7eca669a5e20e97110111df29f45d3 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

73d97df3bf7e drm/i915/selftests: Measure the energy consumed while in RC6

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17075/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6
  2020-03-24 20:15 [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6 Chris Wilson
                   ` (2 preceding siblings ...)
  2020-03-24 23:48 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/selftests: Measure the energy consumed while in RC6 (rev3) Patchwork
@ 2020-03-25  0:08 ` Chris Wilson
  2020-03-25  0:41 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/selftests: Measure the energy consumed while in RC6 (rev4) Patchwork
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Chris Wilson @ 2020-03-25  0:08 UTC (permalink / raw)
  To: intel-gfx

Measure and compare the energy consumed, as reported by the rapl MSR,
by the GPU while in RC0 and RC6 states. Throw an error if RC6 does not
at least halve the energy consumption of RC0, as this more than likely
means we failed to enter RC0 correctly.

If we can't measure the energy draw with the MSR, then it will report 0
for both measurements. Since the measurement works on all gen6+, this seems
worth flagging as an error.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_rc6.c | 39 ++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/selftest_rc6.c b/drivers/gpu/drm/i915/gt/selftest_rc6.c
index 95b165faeba7..04c3ba71749f 100644
--- a/drivers/gpu/drm/i915/gt/selftest_rc6.c
+++ b/drivers/gpu/drm/i915/gt/selftest_rc6.c
@@ -12,6 +12,22 @@
 
 #include "selftests/i915_random.h"
 
+static u64 energy_uJ(struct intel_rc6 *rc6)
+{
+	unsigned long long power;
+	u32 units;
+
+	if (rdmsrl_safe(MSR_RAPL_POWER_UNIT, &power))
+		return 0;
+
+	units = (power & 0x1f00) >> 8;
+
+	if (rdmsrl_safe(MSR_PP1_ENERGY_STATUS, &power))
+		return 0;
+
+	return (1000000 * power) >> units; /* convert to uJ */
+}
+
 static u64 rc6_residency(struct intel_rc6 *rc6)
 {
 	u64 result;
@@ -31,7 +47,9 @@ int live_rc6_manual(void *arg)
 {
 	struct intel_gt *gt = arg;
 	struct intel_rc6 *rc6 = &gt->rc6;
+	u64 rc0_power, rc6_power;
 	intel_wakeref_t wakeref;
+	ktime_t dt;
 	u64 res[2];
 	int err = 0;
 
@@ -53,22 +71,35 @@ int live_rc6_manual(void *arg)
 	__intel_rc6_disable(rc6);
 	msleep(1); /* wakeup is not immediate, takes about 100us on icl */
 
+	dt = ktime_get();
+	rc0_power = energy_uJ(rc6);
 	res[0] = rc6_residency(rc6);
 	msleep(250);
 	res[1] = rc6_residency(rc6);
+	rc0_power = div64_u64(NSEC_PER_SEC * (energy_uJ(rc6) - rc0_power),
+			      ktime_to_ns(ktime_sub(ktime_get(), dt)));
 	if ((res[1] - res[0]) >> 10) {
 		pr_err("RC6 residency increased by %lldus while disabled for 250ms!\n",
 		       (res[1] - res[0]) >> 10);
 		err = -EINVAL;
 		goto out_unlock;
 	}
+	if (!rc0_power) {
+		pr_err("No power measured while in RC0\n");
+		err = -EINVAL;
+		goto out_unlock;
+	}
 
 	/* Manually enter RC6 */
 	intel_rc6_park(rc6);
 
+	dt = ktime_get();
+	rc6_power = energy_uJ(rc6);
 	res[0] = rc6_residency(rc6);
 	msleep(100);
 	res[1] = rc6_residency(rc6);
+	rc6_power = div64_u64(NSEC_PER_SEC * (energy_uJ(rc6) - rc6_power),
+			      ktime_to_ns(ktime_sub(ktime_get(), dt)));
 
 	if (res[1] == res[0]) {
 		pr_err("Did not enter RC6! RC6_STATE=%08x, RC6_CONTROL=%08x, residency=%lld\n",
@@ -78,6 +109,14 @@ int live_rc6_manual(void *arg)
 		err = -EINVAL;
 	}
 
+	pr_info("GPU consumed %llduW in RC0 and %llduW in RC6\n",
+		rc0_power, rc6_power);
+	if ((rc6_power >> 10) > (rc0_power >> 10) / 2) { /* compare mW */
+		pr_err("GPU leaked energy while in RC6!\n");
+		err = -EINVAL;
+		goto out_unlock;
+	}
+
 	/* Restore what should have been the original state! */
 	intel_rc6_unpark(rc6);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/selftests: Measure the energy consumed while in RC6 (rev4)
  2020-03-24 20:15 [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6 Chris Wilson
                   ` (3 preceding siblings ...)
  2020-03-25  0:08 ` [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6 Chris Wilson
@ 2020-03-25  0:41 ` Patchwork
  2020-03-25  8:10 ` [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6 Chris Wilson
  2020-03-25  9:07 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/selftests: Measure the energy consumed while in RC6 (rev5) Patchwork
  6 siblings, 0 replies; 13+ messages in thread
From: Patchwork @ 2020-03-25  0:41 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/selftests: Measure the energy consumed while in RC6 (rev4)
URL   : https://patchwork.freedesktop.org/series/75035/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_8184 -> Patchwork_17076
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_17076 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_17076, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17076/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_17076:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@gt_pm:
    - fi-icl-guc:         [PASS][1] -> [DMESG-FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8184/fi-icl-guc/igt@i915_selftest@live@gt_pm.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17076/fi-icl-guc/igt@i915_selftest@live@gt_pm.html

  
Known issues
------------

  Here are the changes found in Patchwork_17076 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live@execlists:
    - fi-bxt-dsi:         [PASS][3] -> [INCOMPLETE][4] ([fdo#103927] / [i915#529])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8184/fi-bxt-dsi/igt@i915_selftest@live@execlists.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17076/fi-bxt-dsi/igt@i915_selftest@live@execlists.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@gem_contexts:
    - fi-cml-s:           [DMESG-FAIL][5] ([i915#877]) -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8184/fi-cml-s/igt@i915_selftest@live@gem_contexts.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17076/fi-cml-s/igt@i915_selftest@live@gem_contexts.html

  
#### Warnings ####

  * igt@i915_selftest@live@gem_contexts:
    - fi-cfl-guc:         [DMESG-FAIL][7] ([i915#481]) -> [DMESG-FAIL][8] ([i915#730] / [i915#933])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8184/fi-cfl-guc/igt@i915_selftest@live@gem_contexts.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17076/fi-cfl-guc/igt@i915_selftest@live@gem_contexts.html

  
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [i915#481]: https://gitlab.freedesktop.org/drm/intel/issues/481
  [i915#529]: https://gitlab.freedesktop.org/drm/intel/issues/529
  [i915#730]: https://gitlab.freedesktop.org/drm/intel/issues/730
  [i915#877]: https://gitlab.freedesktop.org/drm/intel/issues/877
  [i915#933]: https://gitlab.freedesktop.org/drm/intel/issues/933


Participating hosts (40 -> 35)
------------------------------

  Additional (5): fi-kbl-soraka fi-hsw-4770r fi-byt-j1900 fi-kbl-7500u fi-cfl-8109u 
  Missing    (10): fi-bdw-samus fi-hsw-peppy fi-byt-squawks fi-bsw-cyan fi-ilk-650 fi-ctg-p8600 fi-blb-e6850 fi-byt-n2820 fi-bsw-nick fi-skl-6600u 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8184 -> Patchwork_17076

  CI-20190529: 20190529
  CI_DRM_8184: 1a72c9d9d3140e92190485d766b9d165932c5b86 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5535: d1dcf40cc6869ac858586c5ad9f09af6617ce2ee @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17076: be4a0fbdaf4cb7df9f6278ba72cf83e802314730 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

be4a0fbdaf4c drm/i915/selftests: Measure the energy consumed while in RC6

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17076/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6
  2020-03-24 20:15 [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6 Chris Wilson
                   ` (4 preceding siblings ...)
  2020-03-25  0:41 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/selftests: Measure the energy consumed while in RC6 (rev4) Patchwork
@ 2020-03-25  8:10 ` Chris Wilson
  2020-03-25  8:58   ` Andi Shyti
  2020-03-25  9:07 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/selftests: Measure the energy consumed while in RC6 (rev5) Patchwork
  6 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2020-03-25  8:10 UTC (permalink / raw)
  To: intel-gfx

Measure and compare the energy consumed, as reported by the rapl MSR,
by the GPU while in RC0 and RC6 states. Throw an error if RC6 does not
at least halve the energy consumption of RC0, as this more than likely
means we failed to enter RC0 correctly.

If we can't measure the energy draw with the MSR, then it will report 0
for both measurements. Since the measurement works on all gen6+, this seems
worth flagging as an error.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_rc6.c | 43 +++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_rc6.c b/drivers/gpu/drm/i915/gt/selftest_rc6.c
index 95b165faeba7..48f8901d83e8 100644
--- a/drivers/gpu/drm/i915/gt/selftest_rc6.c
+++ b/drivers/gpu/drm/i915/gt/selftest_rc6.c
@@ -12,6 +12,22 @@
 
 #include "selftests/i915_random.h"
 
+static u64 energy_uJ(struct intel_rc6 *rc6)
+{
+	unsigned long long power;
+	u32 units;
+
+	if (rdmsrl_safe(MSR_RAPL_POWER_UNIT, &power))
+		return 0;
+
+	units = (power & 0x1f00) >> 8;
+
+	if (rdmsrl_safe(MSR_PP1_ENERGY_STATUS, &power))
+		return 0;
+
+	return (1000000 * power) >> units; /* convert to uJ */
+}
+
 static u64 rc6_residency(struct intel_rc6 *rc6)
 {
 	u64 result;
@@ -31,7 +47,9 @@ int live_rc6_manual(void *arg)
 {
 	struct intel_gt *gt = arg;
 	struct intel_rc6 *rc6 = &gt->rc6;
+	u64 rc0_power, rc6_power;
 	intel_wakeref_t wakeref;
+	ktime_t dt;
 	u64 res[2];
 	int err = 0;
 
@@ -54,7 +72,11 @@ int live_rc6_manual(void *arg)
 	msleep(1); /* wakeup is not immediate, takes about 100us on icl */
 
 	res[0] = rc6_residency(rc6);
+	dt = ktime_get();
+	rc0_power = energy_uJ(rc6);
 	msleep(250);
+	rc0_power = energy_uJ(rc6) - rc0_power;
+	dt = ktime_sub(ktime_get(), dt);
 	res[1] = rc6_residency(rc6);
 	if ((res[1] - res[0]) >> 10) {
 		pr_err("RC6 residency increased by %lldus while disabled for 250ms!\n",
@@ -63,13 +85,23 @@ int live_rc6_manual(void *arg)
 		goto out_unlock;
 	}
 
+	rc0_power = div64_u64(NSEC_PER_SEC * rc0_power, ktime_to_ns(dt));
+	if (!rc0_power) {
+		pr_err("No power measured while in RC0\n");
+		err = -EINVAL;
+		goto out_unlock;
+	}
+
 	/* Manually enter RC6 */
 	intel_rc6_park(rc6);
 
 	res[0] = rc6_residency(rc6);
+	dt = ktime_get();
+	rc6_power = energy_uJ(rc6);
 	msleep(100);
+	rc6_power = energy_uJ(rc6) - rc6_power;
+	dt = ktime_sub(ktime_get(), dt);
 	res[1] = rc6_residency(rc6);
-
 	if (res[1] == res[0]) {
 		pr_err("Did not enter RC6! RC6_STATE=%08x, RC6_CONTROL=%08x, residency=%lld\n",
 		       intel_uncore_read_fw(gt->uncore, GEN6_RC_STATE),
@@ -78,6 +110,15 @@ int live_rc6_manual(void *arg)
 		err = -EINVAL;
 	}
 
+	rc6_power = div64_u64(NSEC_PER_SEC * rc6_power, ktime_to_ns(dt));
+	pr_info("GPU consumed %llduW in RC0 and %llduW in RC6\n",
+		rc0_power, rc6_power);
+	if (2 * rc6_power > rc0_power) {
+		pr_err("GPU leaked energy while in RC6!\n");
+		err = -EINVAL;
+		goto out_unlock;
+	}
+
 	/* Restore what should have been the original state! */
 	intel_rc6_unpark(rc6);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6
  2020-03-25  8:10 ` [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6 Chris Wilson
@ 2020-03-25  8:58   ` Andi Shyti
  2020-03-25  9:10     ` Chris Wilson
  0 siblings, 1 reply; 13+ messages in thread
From: Andi Shyti @ 2020-03-25  8:58 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

Hi Chris,

On Wed, Mar 25, 2020 at 08:10:56AM +0000, Chris Wilson wrote:
> Measure and compare the energy consumed, as reported by the rapl MSR,
> by the GPU while in RC0 and RC6 states. Throw an error if RC6 does not
> at least halve the energy consumption of RC0, as this more than likely
> means we failed to enter RC0 correctly.
> 
> If we can't measure the energy draw with the MSR, then it will report 0
> for both measurements. Since the measurement works on all gen6+, this seems
> worth flagging as an error.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Andi Shyti <andi.shyti@intel.com>

would be nice to have a revision history, given that I got quite 
some versions of this patch.

> +static u64 energy_uJ(struct intel_rc6 *rc6)
> +{
> +	unsigned long long power;
> +	u32 units;
> +
> +	if (rdmsrl_safe(MSR_RAPL_POWER_UNIT, &power))
> +		return 0;
> +
> +	units = (power & 0x1f00) >> 8;
> +
> +	if (rdmsrl_safe(MSR_PP1_ENERGY_STATUS, &power))
> +		return 0;
> +
> +	return (1000000 * power) >> units; /* convert to uJ */
> +}

shall we put this in a library?

>  	res[0] = rc6_residency(rc6);
> +	dt = ktime_get();
> +	rc0_power = energy_uJ(rc6);
>  	msleep(250);
> +	rc0_power = energy_uJ(rc6) - rc0_power;
> +	dt = ktime_sub(ktime_get(), dt);
>  	res[1] = rc6_residency(rc6);
>  	if ((res[1] - res[0]) >> 10) {
>  		pr_err("RC6 residency increased by %lldus while disabled for 250ms!\n",
> @@ -63,13 +85,23 @@ int live_rc6_manual(void *arg)
>  		goto out_unlock;
>  	}
>  
> +	rc0_power = div64_u64(NSEC_PER_SEC * rc0_power, ktime_to_ns(dt));
> +	if (!rc0_power) {

is this likely to happen?

>  	res[0] = rc6_residency(rc6);
> +	dt = ktime_get();
> +	rc6_power = energy_uJ(rc6);
>  	msleep(100);
> +	rc6_power = energy_uJ(rc6) - rc6_power;
> +	dt = ktime_sub(ktime_get(), dt);
>  	res[1] = rc6_residency(rc6);
> -
>  	if (res[1] == res[0]) {
>  		pr_err("Did not enter RC6! RC6_STATE=%08x, RC6_CONTROL=%08x, residency=%lld\n",
>  		       intel_uncore_read_fw(gt->uncore, GEN6_RC_STATE),
> @@ -78,6 +110,15 @@ int live_rc6_manual(void *arg)
>  		err = -EINVAL;
>  	}
>  
> +	rc6_power = div64_u64(NSEC_PER_SEC * rc6_power, ktime_to_ns(dt));
> +	pr_info("GPU consumed %llduW in RC0 and %llduW in RC6\n",
> +		rc0_power, rc6_power);
> +	if (2 * rc6_power > rc0_power) {
> +		pr_err("GPU leaked energy while in RC6!\n");
> +		err = -EINVAL;
> +		goto out_unlock;
> +	}

nice,

Reviewed-by: Andi Shyti <andi.shyti@intel.com>

Thanks,
Andi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/selftests: Measure the energy consumed while in RC6 (rev5)
  2020-03-24 20:15 [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6 Chris Wilson
                   ` (5 preceding siblings ...)
  2020-03-25  8:10 ` [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6 Chris Wilson
@ 2020-03-25  9:07 ` Patchwork
  6 siblings, 0 replies; 13+ messages in thread
From: Patchwork @ 2020-03-25  9:07 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/selftests: Measure the energy consumed while in RC6 (rev5)
URL   : https://patchwork.freedesktop.org/series/75035/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_8185 -> Patchwork_17079
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_17079 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_17079, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17079/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_17079:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@gt_pm:
    - fi-icl-guc:         [PASS][1] -> [DMESG-FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8185/fi-icl-guc/igt@i915_selftest@live@gt_pm.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17079/fi-icl-guc/igt@i915_selftest@live@gt_pm.html

  
Known issues
------------

  Here are the changes found in Patchwork_17079 that come from known issues:

### IGT changes ###

#### Possible fixes ####

  * igt@i915_pm_backlight@basic-brightness:
    - fi-icl-dsi:         [INCOMPLETE][3] -> [PASS][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8185/fi-icl-dsi/igt@i915_pm_backlight@basic-brightness.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17079/fi-icl-dsi/igt@i915_pm_backlight@basic-brightness.html

  * igt@i915_selftest@live@gem_contexts:
    - fi-cml-s:           [DMESG-FAIL][5] ([i915#877]) -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8185/fi-cml-s/igt@i915_selftest@live@gem_contexts.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17079/fi-cml-s/igt@i915_selftest@live@gem_contexts.html

  
#### Warnings ####

  * igt@runner@aborted:
    - fi-kbl-8809g:       [FAIL][7] ([i915#1209]) -> [FAIL][8] ([i915#1485] / [i915#192] / [i915#193] / [i915#194])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8185/fi-kbl-8809g/igt@runner@aborted.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17079/fi-kbl-8809g/igt@runner@aborted.html

  
  [i915#1209]: https://gitlab.freedesktop.org/drm/intel/issues/1209
  [i915#1485]: https://gitlab.freedesktop.org/drm/intel/issues/1485
  [i915#192]: https://gitlab.freedesktop.org/drm/intel/issues/192
  [i915#193]: https://gitlab.freedesktop.org/drm/intel/issues/193
  [i915#194]: https://gitlab.freedesktop.org/drm/intel/issues/194
  [i915#877]: https://gitlab.freedesktop.org/drm/intel/issues/877


Participating hosts (45 -> 40)
------------------------------

  Additional (4): fi-byt-j1900 fi-kbl-7560u fi-skl-6700k2 fi-kbl-7500u 
  Missing    (9): fi-bdw-5557u fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-bwr-2160 fi-ctg-p8600 fi-byt-n2820 fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8185 -> Patchwork_17079

  CI-20190529: 20190529
  CI_DRM_8185: dbd2532fc5cf023b28bd631b51eea8452739b421 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5537: 190245120758e754813d76b2c6c613413a0dba29 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17079: 64d049bdd569e362465b0038b71aa8a90379714f @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

64d049bdd569 drm/i915/selftests: Measure the energy consumed while in RC6

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17079/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6
  2020-03-25  8:58   ` Andi Shyti
@ 2020-03-25  9:10     ` Chris Wilson
  0 siblings, 0 replies; 13+ messages in thread
From: Chris Wilson @ 2020-03-25  9:10 UTC (permalink / raw)
  To: Andi Shyti; +Cc: intel-gfx

Quoting Andi Shyti (2020-03-25 08:58:54)
> Hi Chris,
> 
> On Wed, Mar 25, 2020 at 08:10:56AM +0000, Chris Wilson wrote:
> > Measure and compare the energy consumed, as reported by the rapl MSR,
> > by the GPU while in RC0 and RC6 states. Throw an error if RC6 does not
> > at least halve the energy consumption of RC0, as this more than likely
> > means we failed to enter RC0 correctly.
> > 
> > If we can't measure the energy draw with the MSR, then it will report 0
> > for both measurements. Since the measurement works on all gen6+, this seems
> > worth flagging as an error.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Andi Shyti <andi.shyti@intel.com>
> 
> would be nice to have a revision history, given that I got quite 
> some versions of this patch.

Nothing that interesting happened, I told myself.

> > +static u64 energy_uJ(struct intel_rc6 *rc6)
> > +{
> > +     unsigned long long power;
> > +     u32 units;
> > +
> > +     if (rdmsrl_safe(MSR_RAPL_POWER_UNIT, &power))
> > +             return 0;
> > +
> > +     units = (power & 0x1f00) >> 8;
> > +
> > +     if (rdmsrl_safe(MSR_PP1_ENERGY_STATUS, &power))
> > +             return 0;
> > +
> > +     return (1000000 * power) >> units; /* convert to uJ */
> > +}
> 
> shall we put this in a library?

Call it rapl and make it available via perf? Done.

More seriously outside of measuring idle power usage, I haven't had an
idea where it makes sense. As an optimisation metric, you want work done
per joule, but we have no concept of the user's work in the kernel.
Other things like "operating point power" (the cost of running at a
particular frequency) are mostly constant and not tunable.

> >       res[0] = rc6_residency(rc6);
> > +     dt = ktime_get();
> > +     rc0_power = energy_uJ(rc6);
> >       msleep(250);
> > +     rc0_power = energy_uJ(rc6) - rc0_power;
> > +     dt = ktime_sub(ktime_get(), dt);
> >       res[1] = rc6_residency(rc6);
> >       if ((res[1] - res[0]) >> 10) {
> >               pr_err("RC6 residency increased by %lldus while disabled for 250ms!\n",
> > @@ -63,13 +85,23 @@ int live_rc6_manual(void *arg)
> >               goto out_unlock;
> >       }
> >  
> > +     rc0_power = div64_u64(NSEC_PER_SEC * rc0_power, ktime_to_ns(dt));
> > +     if (!rc0_power) {
> 
> is this likely to happen?

Likely? Only if rapl is unable to measure the GPU energy consumption. So
no, it's not likely, unless you load the guc firmware on icl!
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6
  2020-03-24 20:44 ` Chris Wilson
  2020-03-24 20:56   ` Chris Wilson
@ 2020-09-28 23:56   ` Lucas De Marchi
  2020-09-29  7:59     ` Chris Wilson
  1 sibling, 1 reply; 13+ messages in thread
From: Lucas De Marchi @ 2020-09-28 23:56 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics, Lucas De Marchi

On Tue, Mar 24, 2020 at 1:45 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Measure and compare the energy consumed, as reported by the rapl MSR,
> by the GPU while in RC0 and RC6 states. Throw an error if RC6 does not
> at least halve the energy consumption of RC0, as this more than likely
> means we failed to enter RC0 correctly.
>
> If we can't measure the energy draw with the MSR, then it will report 0
> for both measurements. Since the measurement works on all gen6+, this seems
> worth flagging as an error.

I'm confused by this statement here. MSR is a *CPU* register and you are using
it here, mixed with RC6. How is that supposed to work with, e.g., dgfx?

thanks
Lucas De Marchi

>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Andi Shyti <andi.shyti@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/selftest_rc6.c | 39 ++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/selftest_rc6.c b/drivers/gpu/drm/i915/gt/selftest_rc6.c
> index 95b165faeba7..3ac9a8925218 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_rc6.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_rc6.c
> @@ -12,6 +12,22 @@
>
>  #include "selftests/i915_random.h"
>
> +#define MCH_SECP_NRG_STTS              _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x592c)
> +
> +static u64 energy_uJ(struct intel_rc6 *rc6)
> +{
> +       unsigned long long power;
> +       u32 units;
> +
> +       if (rdmsrl_safe(MSR_RAPL_POWER_UNIT, &power))
> +               return 0;
> +
> +       units = (power & 0x1f00) >> 8;
> +       power = intel_uncore_read_fw(rc6_to_uncore(rc6), MCH_SECP_NRG_STTS);
> +
> +       return (1000000 * power) >> units; /* convert to uJ */
> +}
> +
>  static u64 rc6_residency(struct intel_rc6 *rc6)
>  {
>         u64 result;
> @@ -31,7 +47,9 @@ int live_rc6_manual(void *arg)
>  {
>         struct intel_gt *gt = arg;
>         struct intel_rc6 *rc6 = &gt->rc6;
> +       u64 rc0_power, rc6_power;
>         intel_wakeref_t wakeref;
> +       ktime_t dt;
>         u64 res[2];
>         int err = 0;
>
> @@ -53,22 +71,35 @@ int live_rc6_manual(void *arg)
>         __intel_rc6_disable(rc6);
>         msleep(1); /* wakeup is not immediate, takes about 100us on icl */
>
> +       dt = ktime_get();
> +       rc0_power = energy_uJ(rc6);
>         res[0] = rc6_residency(rc6);
>         msleep(250);
>         res[1] = rc6_residency(rc6);
> +       rc0_power = div64_u64(energy_uJ(rc6) - rc0_power,
> +                             ktime_to_ns(ktime_sub(ktime_get(), dt)));
>         if ((res[1] - res[0]) >> 10) {
>                 pr_err("RC6 residency increased by %lldus while disabled for 250ms!\n",
>                        (res[1] - res[0]) >> 10);
>                 err = -EINVAL;
>                 goto out_unlock;
>         }
> +       if (!rc0_power) {
> +               pr_err("No power measured while in RC0\n");
> +               err = -EINVAL;
> +               goto out_unlock;
> +       }
>
>         /* Manually enter RC6 */
>         intel_rc6_park(rc6);
>
> +       dt = ktime_get();
> +       rc6_power = energy_uJ(rc6);
>         res[0] = rc6_residency(rc6);
>         msleep(100);
>         res[1] = rc6_residency(rc6);
> +       rc6_power = div64_u64(energy_uJ(rc6) - rc6_power,
> +                             ktime_to_ns(ktime_sub(ktime_get(), dt)));
>
>         if (res[1] == res[0]) {
>                 pr_err("Did not enter RC6! RC6_STATE=%08x, RC6_CONTROL=%08x, residency=%lld\n",
> @@ -78,6 +109,14 @@ int live_rc6_manual(void *arg)
>                 err = -EINVAL;
>         }
>
> +       pr_info("GPU consumed %llduW in RC0 and %llduW in RC6\n",
> +               rc0_power, rc6_power);
> +       if ((rc6_power >> 10) > (rc0_power >> 10) / 2) { /* compare mW */
> +               pr_err("GPU leaked energy while in RC6!\n");
> +               err = -EINVAL;
> +               goto out_unlock;
> +       }
> +
>         /* Restore what should have been the original state! */
>         intel_rc6_unpark(rc6);
>
> --
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6
  2020-09-28 23:56   ` Lucas De Marchi
@ 2020-09-29  7:59     ` Chris Wilson
  0 siblings, 0 replies; 13+ messages in thread
From: Chris Wilson @ 2020-09-29  7:59 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: Intel Graphics, Lucas De Marchi

Quoting Lucas De Marchi (2020-09-29 00:56:54)
> On Tue, Mar 24, 2020 at 1:45 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >
> > Measure and compare the energy consumed, as reported by the rapl MSR,
> > by the GPU while in RC0 and RC6 states. Throw an error if RC6 does not
> > at least halve the energy consumption of RC0, as this more than likely
> > means we failed to enter RC0 correctly.
> >
> > If we can't measure the energy draw with the MSR, then it will report 0
> > for both measurements. Since the measurement works on all gen6+, this seems
> > worth flagging as an error.
> 
> I'm confused by this statement here. MSR is a *CPU* register and you are using
> it here, mixed with RC6. How is that supposed to work with, e.g., dgfx?

You abstract it with the right interface for hwmon. The card reports
energy draw, so the test remains the same, verify that a low power state
does consume substantially less energy (and if we can get fine enough
granularity that the GT powerwells draw 0).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-09-29  8:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-24 20:15 [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6 Chris Wilson
2020-03-24 20:44 ` Chris Wilson
2020-03-24 20:56   ` Chris Wilson
2020-09-28 23:56   ` Lucas De Marchi
2020-09-29  7:59     ` Chris Wilson
2020-03-24 20:58 ` Chris Wilson
2020-03-24 23:48 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/selftests: Measure the energy consumed while in RC6 (rev3) Patchwork
2020-03-25  0:08 ` [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6 Chris Wilson
2020-03-25  0:41 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/selftests: Measure the energy consumed while in RC6 (rev4) Patchwork
2020-03-25  8:10 ` [Intel-gfx] [PATCH] drm/i915/selftests: Measure the energy consumed while in RC6 Chris Wilson
2020-03-25  8:58   ` Andi Shyti
2020-03-25  9:10     ` Chris Wilson
2020-03-25  9:07 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/selftests: Measure the energy consumed while in RC6 (rev5) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.