* [PATCH] x86: Downgrade clock throttling thermal event critical error
@ 2018-10-09 11:37 Chris Wilson
2018-10-09 12:16 ` ✓ Fi.CI.BAT: success for " Patchwork
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Chris Wilson @ 2018-10-09 11:37 UTC (permalink / raw)
To: intel-gfx
Under CI testing, it is common for the cpus to overheat with the
continuous workloads and end up being throttled. As the cpus still
function, it is less of a critical error meriting urgent action, but an
expected yet significant condition (pr_note).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Petri Latvala <petri.latvala@intel.com>
---
arch/x86/kernel/cpu/mcheck/therm_throt.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 2da67b70ba98..bc57b5988589 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -184,10 +184,10 @@ static void therm_throt_process(bool new_event, int event, int level)
/* if we just entered the thermal event */
if (new_event) {
if (event == THERMAL_THROTTLING_EVENT)
- pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
- this_cpu,
- level == CORE_LEVEL ? "Core" : "Package",
- state->count);
+ pr_notice("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
+ this_cpu,
+ level == CORE_LEVEL ? "Core" : "Package",
+ state->count);
return;
}
if (old_event) {
--
2.19.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 5+ messages in thread
* ✓ Fi.CI.BAT: success for x86: Downgrade clock throttling thermal event critical error
2018-10-09 11:37 [PATCH] x86: Downgrade clock throttling thermal event critical error Chris Wilson
@ 2018-10-09 12:16 ` Patchwork
2018-10-09 14:51 ` ✓ Fi.CI.IGT: " Patchwork
2018-10-10 11:59 ` [PATCH] " Tvrtko Ursulin
2 siblings, 0 replies; 5+ messages in thread
From: Patchwork @ 2018-10-09 12:16 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: x86: Downgrade clock throttling thermal event critical error
URL : https://patchwork.freedesktop.org/series/50742/
State : success
== Summary ==
= CI Bug Log - changes from CI_DRM_4952 -> Patchwork_10399 =
== Summary - WARNING ==
Minor unknown changes coming with Patchwork_10399 need to be verified
manually.
If you think the reported changes have nothing to do with the changes
introduced in Patchwork_10399, please notify your bug team to allow them
to document this new failure mode, which will reduce false positives in CI.
External URL: https://patchwork.freedesktop.org/api/1.0/series/50742/revisions/1/mbox/
== Possible new issues ==
Here are the unknown changes that may have been introduced in Patchwork_10399:
=== IGT changes ===
==== Warnings ====
igt@pm_rpm@module-reload:
fi-hsw-4770r: PASS -> SKIP
== Known issues ==
Here are the changes found in Patchwork_10399 that come from known issues:
=== IGT changes ===
==== Issues hit ====
igt@drv_module_reload@basic-reload-inject:
fi-hsw-4770r: PASS -> DMESG-WARN (fdo#107425, fdo#107924)
==== Possible fixes ====
igt@gem_exec_suspend@basic-s3:
fi-blb-e6850: INCOMPLETE (fdo#107718) -> PASS
fdo#107425 https://bugs.freedesktop.org/show_bug.cgi?id=107425
fdo#107718 https://bugs.freedesktop.org/show_bug.cgi?id=107718
fdo#107924 https://bugs.freedesktop.org/show_bug.cgi?id=107924
== Participating hosts (47 -> 42) ==
Additional (1): fi-skl-6700hq
Missing (6): fi-kbl-soraka fi-ilk-m540 fi-byt-squawks fi-bsw-cyan fi-icl-u2 fi-ctg-p8600
== Build changes ==
* Linux: CI_DRM_4952 -> Patchwork_10399
CI_DRM_4952: a62e43ba13605a478b22307ea1790d48aea029a6 @ git://anongit.freedesktop.org/gfx-ci/linux
IGT_4671: b121f7d42c260ae3a050c3f440d1c11f7cff7d1a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
Patchwork_10399: 732d27e4b2c9f81103635c15f5aeed7095ce75f3 @ git://anongit.freedesktop.org/gfx-ci/linux
== Linux commits ==
732d27e4b2c9 x86: Downgrade clock throttling thermal event critical error
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10399/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 5+ messages in thread
* ✓ Fi.CI.IGT: success for x86: Downgrade clock throttling thermal event critical error
2018-10-09 11:37 [PATCH] x86: Downgrade clock throttling thermal event critical error Chris Wilson
2018-10-09 12:16 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2018-10-09 14:51 ` Patchwork
2018-10-10 11:59 ` [PATCH] " Tvrtko Ursulin
2 siblings, 0 replies; 5+ messages in thread
From: Patchwork @ 2018-10-09 14:51 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: x86: Downgrade clock throttling thermal event critical error
URL : https://patchwork.freedesktop.org/series/50742/
State : success
== Summary ==
= CI Bug Log - changes from CI_DRM_4952_full -> Patchwork_10399_full =
== Summary - WARNING ==
Minor unknown changes coming with Patchwork_10399_full need to be verified
manually.
If you think the reported changes have nothing to do with the changes
introduced in Patchwork_10399_full, please notify your bug team to allow them
to document this new failure mode, which will reduce false positives in CI.
== Possible new issues ==
Here are the unknown changes that may have been introduced in Patchwork_10399_full:
=== IGT changes ===
==== Warnings ====
igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions:
shard-snb: PASS -> SKIP
igt@pm_rc6_residency@rc6-accuracy:
shard-snb: SKIP -> PASS
== Known issues ==
Here are the changes found in Patchwork_10399_full that come from known issues:
=== IGT changes ===
==== Issues hit ====
igt@gem_exec_schedule@pi-ringfull-render:
shard-apl: NOTRUN -> FAIL (fdo#103158)
igt@kms_busy@extended-modeset-hang-newfb-render-b:
shard-skl: NOTRUN -> DMESG-WARN (fdo#107956) +1
igt@kms_busy@extended-pageflip-hang-newfb-render-b:
shard-apl: PASS -> DMESG-WARN (fdo#107956)
igt@kms_busy@extended-pageflip-hang-newfb-render-c:
shard-glk: PASS -> DMESG-WARN (fdo#107956)
igt@kms_chv_cursor_fail@pipe-a-64x64-left-edge:
shard-glk: PASS -> DMESG-WARN (fdo#106538, fdo#105763) +1
igt@kms_color@pipe-b-legacy-gamma:
shard-apl: PASS -> FAIL (fdo#104782)
igt@kms_color@pipe-c-legacy-gamma:
shard-skl: NOTRUN -> FAIL (fdo#104782)
igt@kms_cursor_crc@cursor-128x128-sliding:
shard-apl: PASS -> FAIL (fdo#103232) +1
igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-gtt:
shard-apl: PASS -> FAIL (fdo#103167) +1
igt@kms_frontbuffer_tracking@fbc-1p-rte:
shard-glk: PASS -> FAIL (fdo#105682, fdo#103167)
igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-move:
shard-glk: PASS -> FAIL (fdo#103167) +5
igt@kms_frontbuffer_tracking@fbcpsr-stridechange:
shard-skl: NOTRUN -> FAIL (fdo#105683)
igt@kms_panel_fitting@legacy:
shard-skl: NOTRUN -> FAIL (fdo#105456)
{igt@kms_plane_alpha_blend@pipe-a-alpha-opaque-fb}:
shard-skl: NOTRUN -> FAIL (fdo#108145) +2
{igt@kms_plane_alpha_blend@pipe-c-alpha-opaque-fb}:
shard-glk: PASS -> FAIL (fdo#108145)
igt@kms_plane_multiple@atomic-pipe-a-tiling-y:
shard-glk: PASS -> FAIL (fdo#103166) +2
igt@kms_setmode@basic:
shard-hsw: PASS -> FAIL (fdo#99912)
igt@perf@blocking:
shard-hsw: PASS -> FAIL (fdo#102252)
==== Possible fixes ====
igt@drv_suspend@sysfs-reader:
shard-apl: INCOMPLETE (fdo#103927) -> PASS
igt@gem_softpin@noreloc-s3:
shard-skl: INCOMPLETE (fdo#107773, fdo#104108) -> PASS
igt@gem_userptr_blits@readonly-unsync:
shard-skl: INCOMPLETE (fdo#108074) -> PASS
igt@kms_cursor_crc@cursor-256x256-random:
shard-apl: FAIL (fdo#103232) -> PASS
igt@kms_cursor_crc@cursor-64x64-suspend:
shard-skl: INCOMPLETE (fdo#104108) -> PASS
igt@kms_draw_crc@draw-method-rgb565-mmap-gtt-xtiled:
shard-skl: FAIL (fdo#103184) -> PASS
igt@kms_flip_tiling@flip-y-tiled:
shard-skl: FAIL (fdo#108303) -> PASS
igt@kms_frontbuffer_tracking@fbc-1p-primscrn-indfb-msflip-blt:
shard-skl: FAIL (fdo#103167) -> PASS +2
igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-render:
shard-apl: FAIL (fdo#103167) -> PASS +2
igt@kms_plane@plane-position-covered-pipe-a-planes:
shard-apl: FAIL (fdo#103166) -> PASS +2
igt@pm_rpm@basic-rte:
shard-skl: INCOMPLETE (fdo#107807) -> PASS
==== Warnings ====
igt@gem_ppgtt@blt-vs-render-ctxn:
shard-skl: INCOMPLETE -> TIMEOUT (fdo#108039)
{name}: This element is suppressed. This means it is ignored when computing
the status of the difference (SUCCESS, WARNING, or FAILURE).
fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252
fdo#103158 https://bugs.freedesktop.org/show_bug.cgi?id=103158
fdo#103166 https://bugs.freedesktop.org/show_bug.cgi?id=103166
fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
fdo#103184 https://bugs.freedesktop.org/show_bug.cgi?id=103184
fdo#103232 https://bugs.freedesktop.org/show_bug.cgi?id=103232
fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
fdo#104108 https://bugs.freedesktop.org/show_bug.cgi?id=104108
fdo#104782 https://bugs.freedesktop.org/show_bug.cgi?id=104782
fdo#105456 https://bugs.freedesktop.org/show_bug.cgi?id=105456
fdo#105682 https://bugs.freedesktop.org/show_bug.cgi?id=105682
fdo#105683 https://bugs.freedesktop.org/show_bug.cgi?id=105683
fdo#105763 https://bugs.freedesktop.org/show_bug.cgi?id=105763
fdo#106538 https://bugs.freedesktop.org/show_bug.cgi?id=106538
fdo#107773 https://bugs.freedesktop.org/show_bug.cgi?id=107773
fdo#107807 https://bugs.freedesktop.org/show_bug.cgi?id=107807
fdo#107956 https://bugs.freedesktop.org/show_bug.cgi?id=107956
fdo#108039 https://bugs.freedesktop.org/show_bug.cgi?id=108039
fdo#108074 https://bugs.freedesktop.org/show_bug.cgi?id=108074
fdo#108145 https://bugs.freedesktop.org/show_bug.cgi?id=108145
fdo#108303 https://bugs.freedesktop.org/show_bug.cgi?id=108303
fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
== Participating hosts (6 -> 6) ==
No changes in participating hosts
== Build changes ==
* Linux: CI_DRM_4952 -> Patchwork_10399
CI_DRM_4952: a62e43ba13605a478b22307ea1790d48aea029a6 @ git://anongit.freedesktop.org/gfx-ci/linux
IGT_4671: b121f7d42c260ae3a050c3f440d1c11f7cff7d1a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
Patchwork_10399: 732d27e4b2c9f81103635c15f5aeed7095ce75f3 @ git://anongit.freedesktop.org/gfx-ci/linux
piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10399/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] x86: Downgrade clock throttling thermal event critical error
2018-10-09 11:37 [PATCH] x86: Downgrade clock throttling thermal event critical error Chris Wilson
2018-10-09 12:16 ` ✓ Fi.CI.BAT: success for " Patchwork
2018-10-09 14:51 ` ✓ Fi.CI.IGT: " Patchwork
@ 2018-10-10 11:59 ` Tvrtko Ursulin
2018-10-10 12:10 ` Chris Wilson
2 siblings, 1 reply; 5+ messages in thread
From: Tvrtko Ursulin @ 2018-10-10 11:59 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
On 09/10/2018 12:37, Chris Wilson wrote:
> Under CI testing, it is common for the cpus to overheat with the
> continuous workloads and end up being throttled. As the cpus still
> function, it is less of a critical error meriting urgent action, but an
> expected yet significant condition (pr_note).
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Petri Latvala <petri.latvala@intel.com>
> ---
> arch/x86/kernel/cpu/mcheck/therm_throt.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> index 2da67b70ba98..bc57b5988589 100644
> --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> @@ -184,10 +184,10 @@ static void therm_throt_process(bool new_event, int event, int level)
> /* if we just entered the thermal event */
> if (new_event) {
> if (event == THERMAL_THROTTLING_EVENT)
> - pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> - this_cpu,
> - level == CORE_LEVEL ? "Core" : "Package",
> - state->count);
> + pr_notice("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> + this_cpu,
> + level == CORE_LEVEL ? "Core" : "Package",
> + state->count);
> return;
> }
> if (old_event) {
>
It even sounds it wouldn't be far fetched to argue these days notice is
the correct log level for thermal throttling. Unless there are more
sources of throttling messages. TBC when I get back to my Skull Canyon.
That one certainly logs something like this shortly after invoking make -j8.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] x86: Downgrade clock throttling thermal event critical error
2018-10-10 11:59 ` [PATCH] " Tvrtko Ursulin
@ 2018-10-10 12:10 ` Chris Wilson
0 siblings, 0 replies; 5+ messages in thread
From: Chris Wilson @ 2018-10-10 12:10 UTC (permalink / raw)
To: Tvrtko Ursulin, intel-gfx
Quoting Tvrtko Ursulin (2018-10-10 12:59:59)
>
> On 09/10/2018 12:37, Chris Wilson wrote:
> > Under CI testing, it is common for the cpus to overheat with the
> > continuous workloads and end up being throttled. As the cpus still
> > function, it is less of a critical error meriting urgent action, but an
> > expected yet significant condition (pr_note).
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Petri Latvala <petri.latvala@intel.com>
> > ---
> > arch/x86/kernel/cpu/mcheck/therm_throt.c | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > index 2da67b70ba98..bc57b5988589 100644
> > --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > @@ -184,10 +184,10 @@ static void therm_throt_process(bool new_event, int event, int level)
> > /* if we just entered the thermal event */
> > if (new_event) {
> > if (event == THERMAL_THROTTLING_EVENT)
> > - pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> > - this_cpu,
> > - level == CORE_LEVEL ? "Core" : "Package",
> > - state->count);
> > + pr_notice("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> > + this_cpu,
> > + level == CORE_LEVEL ? "Core" : "Package",
> > + state->count);
> > return;
> > }
> > if (old_event) {
> >
>
> It even sounds it wouldn't be far fetched to argue these days notice is
> the correct log level for thermal throttling. Unless there are more
> sources of throttling messages. TBC when I get back to my Skull Canyon.
> That one certainly logs something like this shortly after invoking make -j8.
I was thinking of tarting up the language to say most processors
nowadays can easily exceed their Thermal Design Point and are built with
that in mind. The caveat is making sure that the shutdown limit is still
reported as a critical event, iirc that comes as a MCE.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-10-10 12:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-09 11:37 [PATCH] x86: Downgrade clock throttling thermal event critical error Chris Wilson
2018-10-09 12:16 ` ✓ Fi.CI.BAT: success for " Patchwork
2018-10-09 14:51 ` ✓ Fi.CI.IGT: " Patchwork
2018-10-10 11:59 ` [PATCH] " Tvrtko Ursulin
2018-10-10 12:10 ` Chris Wilson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.