All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: Downgrade clock throttling thermal event critical error
@ 2018-10-09 11:37 Chris Wilson
  2018-10-09 12:16 ` ✓ Fi.CI.BAT: success for " Patchwork
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Chris Wilson @ 2018-10-09 11:37 UTC (permalink / raw)
  To: intel-gfx

Under CI testing, it is common for the cpus to overheat with the
continuous workloads and end up being throttled. As the cpus still
function, it is less of a critical error meriting urgent action, but an
expected yet significant condition (pr_note).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Petri Latvala <petri.latvala@intel.com>
---
 arch/x86/kernel/cpu/mcheck/therm_throt.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 2da67b70ba98..bc57b5988589 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -184,10 +184,10 @@ static void therm_throt_process(bool new_event, int event, int level)
 	/* if we just entered the thermal event */
 	if (new_event) {
 		if (event == THERMAL_THROTTLING_EVENT)
-			pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
-				this_cpu,
-				level == CORE_LEVEL ? "Core" : "Package",
-				state->count);
+			pr_notice("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
+				  this_cpu,
+				  level == CORE_LEVEL ? "Core" : "Package",
+				  state->count);
 		return;
 	}
 	if (old_event) {
-- 
2.19.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* ✓ Fi.CI.BAT: success for x86: Downgrade clock throttling thermal event critical error
  2018-10-09 11:37 [PATCH] x86: Downgrade clock throttling thermal event critical error Chris Wilson
@ 2018-10-09 12:16 ` Patchwork
  2018-10-09 14:51 ` ✓ Fi.CI.IGT: " Patchwork
  2018-10-10 11:59 ` [PATCH] " Tvrtko Ursulin
  2 siblings, 0 replies; 5+ messages in thread
From: Patchwork @ 2018-10-09 12:16 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: x86: Downgrade clock throttling thermal event critical error
URL   : https://patchwork.freedesktop.org/series/50742/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4952 -> Patchwork_10399 =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_10399 need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_10399, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/50742/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_10399:

  === IGT changes ===

    ==== Warnings ====

    igt@pm_rpm@module-reload:
      fi-hsw-4770r:       PASS -> SKIP

    
== Known issues ==

  Here are the changes found in Patchwork_10399 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_module_reload@basic-reload-inject:
      fi-hsw-4770r:       PASS -> DMESG-WARN (fdo#107425, fdo#107924)

    
    ==== Possible fixes ====

    igt@gem_exec_suspend@basic-s3:
      fi-blb-e6850:       INCOMPLETE (fdo#107718) -> PASS

    
  fdo#107425 https://bugs.freedesktop.org/show_bug.cgi?id=107425
  fdo#107718 https://bugs.freedesktop.org/show_bug.cgi?id=107718
  fdo#107924 https://bugs.freedesktop.org/show_bug.cgi?id=107924


== Participating hosts (47 -> 42) ==

  Additional (1): fi-skl-6700hq 
  Missing    (6): fi-kbl-soraka fi-ilk-m540 fi-byt-squawks fi-bsw-cyan fi-icl-u2 fi-ctg-p8600 


== Build changes ==

    * Linux: CI_DRM_4952 -> Patchwork_10399

  CI_DRM_4952: a62e43ba13605a478b22307ea1790d48aea029a6 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4671: b121f7d42c260ae3a050c3f440d1c11f7cff7d1a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10399: 732d27e4b2c9f81103635c15f5aeed7095ce75f3 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

732d27e4b2c9 x86: Downgrade clock throttling thermal event critical error

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10399/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* ✓ Fi.CI.IGT: success for x86: Downgrade clock throttling thermal event critical error
  2018-10-09 11:37 [PATCH] x86: Downgrade clock throttling thermal event critical error Chris Wilson
  2018-10-09 12:16 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2018-10-09 14:51 ` Patchwork
  2018-10-10 11:59 ` [PATCH] " Tvrtko Ursulin
  2 siblings, 0 replies; 5+ messages in thread
From: Patchwork @ 2018-10-09 14:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: x86: Downgrade clock throttling thermal event critical error
URL   : https://patchwork.freedesktop.org/series/50742/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4952_full -> Patchwork_10399_full =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_10399_full need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_10399_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_10399_full:

  === IGT changes ===

    ==== Warnings ====

    igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions:
      shard-snb:          PASS -> SKIP

    igt@pm_rc6_residency@rc6-accuracy:
      shard-snb:          SKIP -> PASS

    
== Known issues ==

  Here are the changes found in Patchwork_10399_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@gem_exec_schedule@pi-ringfull-render:
      shard-apl:          NOTRUN -> FAIL (fdo#103158)

    igt@kms_busy@extended-modeset-hang-newfb-render-b:
      shard-skl:          NOTRUN -> DMESG-WARN (fdo#107956) +1

    igt@kms_busy@extended-pageflip-hang-newfb-render-b:
      shard-apl:          PASS -> DMESG-WARN (fdo#107956)

    igt@kms_busy@extended-pageflip-hang-newfb-render-c:
      shard-glk:          PASS -> DMESG-WARN (fdo#107956)

    igt@kms_chv_cursor_fail@pipe-a-64x64-left-edge:
      shard-glk:          PASS -> DMESG-WARN (fdo#106538, fdo#105763) +1

    igt@kms_color@pipe-b-legacy-gamma:
      shard-apl:          PASS -> FAIL (fdo#104782)

    igt@kms_color@pipe-c-legacy-gamma:
      shard-skl:          NOTRUN -> FAIL (fdo#104782)

    igt@kms_cursor_crc@cursor-128x128-sliding:
      shard-apl:          PASS -> FAIL (fdo#103232) +1

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-gtt:
      shard-apl:          PASS -> FAIL (fdo#103167) +1

    igt@kms_frontbuffer_tracking@fbc-1p-rte:
      shard-glk:          PASS -> FAIL (fdo#105682, fdo#103167)

    igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-move:
      shard-glk:          PASS -> FAIL (fdo#103167) +5

    igt@kms_frontbuffer_tracking@fbcpsr-stridechange:
      shard-skl:          NOTRUN -> FAIL (fdo#105683)

    igt@kms_panel_fitting@legacy:
      shard-skl:          NOTRUN -> FAIL (fdo#105456)

    {igt@kms_plane_alpha_blend@pipe-a-alpha-opaque-fb}:
      shard-skl:          NOTRUN -> FAIL (fdo#108145) +2

    {igt@kms_plane_alpha_blend@pipe-c-alpha-opaque-fb}:
      shard-glk:          PASS -> FAIL (fdo#108145)

    igt@kms_plane_multiple@atomic-pipe-a-tiling-y:
      shard-glk:          PASS -> FAIL (fdo#103166) +2

    igt@kms_setmode@basic:
      shard-hsw:          PASS -> FAIL (fdo#99912)

    igt@perf@blocking:
      shard-hsw:          PASS -> FAIL (fdo#102252)

    
    ==== Possible fixes ====

    igt@drv_suspend@sysfs-reader:
      shard-apl:          INCOMPLETE (fdo#103927) -> PASS

    igt@gem_softpin@noreloc-s3:
      shard-skl:          INCOMPLETE (fdo#107773, fdo#104108) -> PASS

    igt@gem_userptr_blits@readonly-unsync:
      shard-skl:          INCOMPLETE (fdo#108074) -> PASS

    igt@kms_cursor_crc@cursor-256x256-random:
      shard-apl:          FAIL (fdo#103232) -> PASS

    igt@kms_cursor_crc@cursor-64x64-suspend:
      shard-skl:          INCOMPLETE (fdo#104108) -> PASS

    igt@kms_draw_crc@draw-method-rgb565-mmap-gtt-xtiled:
      shard-skl:          FAIL (fdo#103184) -> PASS

    igt@kms_flip_tiling@flip-y-tiled:
      shard-skl:          FAIL (fdo#108303) -> PASS

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-indfb-msflip-blt:
      shard-skl:          FAIL (fdo#103167) -> PASS +2

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-render:
      shard-apl:          FAIL (fdo#103167) -> PASS +2

    igt@kms_plane@plane-position-covered-pipe-a-planes:
      shard-apl:          FAIL (fdo#103166) -> PASS +2

    igt@pm_rpm@basic-rte:
      shard-skl:          INCOMPLETE (fdo#107807) -> PASS

    
    ==== Warnings ====

    igt@gem_ppgtt@blt-vs-render-ctxn:
      shard-skl:          INCOMPLETE -> TIMEOUT (fdo#108039)

    
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252
  fdo#103158 https://bugs.freedesktop.org/show_bug.cgi?id=103158
  fdo#103166 https://bugs.freedesktop.org/show_bug.cgi?id=103166
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103184 https://bugs.freedesktop.org/show_bug.cgi?id=103184
  fdo#103232 https://bugs.freedesktop.org/show_bug.cgi?id=103232
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#104108 https://bugs.freedesktop.org/show_bug.cgi?id=104108
  fdo#104782 https://bugs.freedesktop.org/show_bug.cgi?id=104782
  fdo#105456 https://bugs.freedesktop.org/show_bug.cgi?id=105456
  fdo#105682 https://bugs.freedesktop.org/show_bug.cgi?id=105682
  fdo#105683 https://bugs.freedesktop.org/show_bug.cgi?id=105683
  fdo#105763 https://bugs.freedesktop.org/show_bug.cgi?id=105763
  fdo#106538 https://bugs.freedesktop.org/show_bug.cgi?id=106538
  fdo#107773 https://bugs.freedesktop.org/show_bug.cgi?id=107773
  fdo#107807 https://bugs.freedesktop.org/show_bug.cgi?id=107807
  fdo#107956 https://bugs.freedesktop.org/show_bug.cgi?id=107956
  fdo#108039 https://bugs.freedesktop.org/show_bug.cgi?id=108039
  fdo#108074 https://bugs.freedesktop.org/show_bug.cgi?id=108074
  fdo#108145 https://bugs.freedesktop.org/show_bug.cgi?id=108145
  fdo#108303 https://bugs.freedesktop.org/show_bug.cgi?id=108303
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912


== Participating hosts (6 -> 6) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4952 -> Patchwork_10399

  CI_DRM_4952: a62e43ba13605a478b22307ea1790d48aea029a6 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4671: b121f7d42c260ae3a050c3f440d1c11f7cff7d1a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10399: 732d27e4b2c9f81103635c15f5aeed7095ce75f3 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10399/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86: Downgrade clock throttling thermal event critical error
  2018-10-09 11:37 [PATCH] x86: Downgrade clock throttling thermal event critical error Chris Wilson
  2018-10-09 12:16 ` ✓ Fi.CI.BAT: success for " Patchwork
  2018-10-09 14:51 ` ✓ Fi.CI.IGT: " Patchwork
@ 2018-10-10 11:59 ` Tvrtko Ursulin
  2018-10-10 12:10   ` Chris Wilson
  2 siblings, 1 reply; 5+ messages in thread
From: Tvrtko Ursulin @ 2018-10-10 11:59 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 09/10/2018 12:37, Chris Wilson wrote:
> Under CI testing, it is common for the cpus to overheat with the
> continuous workloads and end up being throttled. As the cpus still
> function, it is less of a critical error meriting urgent action, but an
> expected yet significant condition (pr_note).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Petri Latvala <petri.latvala@intel.com>
> ---
>   arch/x86/kernel/cpu/mcheck/therm_throt.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> index 2da67b70ba98..bc57b5988589 100644
> --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> @@ -184,10 +184,10 @@ static void therm_throt_process(bool new_event, int event, int level)
>   	/* if we just entered the thermal event */
>   	if (new_event) {
>   		if (event == THERMAL_THROTTLING_EVENT)
> -			pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> -				this_cpu,
> -				level == CORE_LEVEL ? "Core" : "Package",
> -				state->count);
> +			pr_notice("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> +				  this_cpu,
> +				  level == CORE_LEVEL ? "Core" : "Package",
> +				  state->count);
>   		return;
>   	}
>   	if (old_event) {
> 

It even sounds it wouldn't be far fetched to argue these days notice is 
the correct log level for thermal throttling. Unless there are more 
sources of throttling messages. TBC when I get back to my Skull Canyon. 
That one certainly logs something like this shortly after invoking make -j8.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86: Downgrade clock throttling thermal event critical error
  2018-10-10 11:59 ` [PATCH] " Tvrtko Ursulin
@ 2018-10-10 12:10   ` Chris Wilson
  0 siblings, 0 replies; 5+ messages in thread
From: Chris Wilson @ 2018-10-10 12:10 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-10-10 12:59:59)
> 
> On 09/10/2018 12:37, Chris Wilson wrote:
> > Under CI testing, it is common for the cpus to overheat with the
> > continuous workloads and end up being throttled. As the cpus still
> > function, it is less of a critical error meriting urgent action, but an
> > expected yet significant condition (pr_note).
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Petri Latvala <petri.latvala@intel.com>
> > ---
> >   arch/x86/kernel/cpu/mcheck/therm_throt.c | 8 ++++----
> >   1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > index 2da67b70ba98..bc57b5988589 100644
> > --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > @@ -184,10 +184,10 @@ static void therm_throt_process(bool new_event, int event, int level)
> >       /* if we just entered the thermal event */
> >       if (new_event) {
> >               if (event == THERMAL_THROTTLING_EVENT)
> > -                     pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> > -                             this_cpu,
> > -                             level == CORE_LEVEL ? "Core" : "Package",
> > -                             state->count);
> > +                     pr_notice("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> > +                               this_cpu,
> > +                               level == CORE_LEVEL ? "Core" : "Package",
> > +                               state->count);
> >               return;
> >       }
> >       if (old_event) {
> > 
> 
> It even sounds it wouldn't be far fetched to argue these days notice is 
> the correct log level for thermal throttling. Unless there are more 
> sources of throttling messages. TBC when I get back to my Skull Canyon. 
> That one certainly logs something like this shortly after invoking make -j8.

I was thinking of tarting up the language to say most processors
nowadays can easily exceed their Thermal Design Point and are built with
that in mind. The caveat is making sure that the shutdown limit is still
reported as a critical event, iirc that comes as a MCE.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-10-10 12:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-09 11:37 [PATCH] x86: Downgrade clock throttling thermal event critical error Chris Wilson
2018-10-09 12:16 ` ✓ Fi.CI.BAT: success for " Patchwork
2018-10-09 14:51 ` ✓ Fi.CI.IGT: " Patchwork
2018-10-10 11:59 ` [PATCH] " Tvrtko Ursulin
2018-10-10 12:10   ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.