All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@suse.de>
To: "Raj, Ashok" <ashok.raj@intel.com>
Cc: Alexander Alemayhu <alexander@alemayhu.com>,
	Daniel J Blueman <daniel@quora.org>,
	Paul Menzel <pmenzel@molgen.mpg.de>,
	tony.luck@intel.com, linux@leemhuis.info, len.brown@intel.com,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	"Pandruvada, Srinivas" <srinivas.pandruvada@intel.com>
Subject: Re: Dell XPS13: MCE (Hardware Error) reported
Date: Fri, 6 Jan 2017 17:54:23 +0100	[thread overview]
Message-ID: <20170106165423.7xwdlnvjlfccsrqd@pd.tnic> (raw)
In-Reply-To: <20170106155831.GA30814@otc-nc-03>

On Fri, Jan 06, 2017 at 07:58:31AM -0800, Raj, Ashok wrote:
> Looks like we don't need a return value from therm_throt_process(),
> we can fix that as void as well.

Right you are, here's v2:

---
>From a8151fa6f18c2605eb7972061234f05e79b372c4 Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp@suse.de>
Date: Fri, 6 Jan 2017 12:07:08 +0100
Subject: [PATCH] x86/MCE/therm_throt: Do not log a fake MCE for a thermal event

We log a fake bank 128 MCE to note that we're handling a CPU thermal
event. However, this confuses people into thinking that their hardware
generates MCEs. Hijacking MCA for logging thermal events is a gross
misuse anyway and it should've been done in the first place. And besides
we have other means for dealing with thermal events which are much more
suitable.

So let's kill the MCE logging part.

Signed-off-by: Borislav Petkov <bp@suse.de>
---

v2: Ashok: make therm_throt_process() void.

 arch/x86/include/asm/mce.h               |  6 ------
 arch/x86/kernel/cpu/mcheck/mce.c         | 25 -------------------------
 arch/x86/kernel/cpu/mcheck/therm_throt.c | 30 +++++++++++-------------------
 3 files changed, 11 insertions(+), 50 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 5132f2a6c0a2..a09ed05725c2 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -97,10 +97,6 @@
 
 #define MCE_OVERFLOW 0		/* bit 0 in flags means overflow */
 
-/* Software defined banks */
-#define MCE_EXTENDED_BANK	128
-#define MCE_THERMAL_BANK	(MCE_EXTENDED_BANK + 0)
-
 #define MCE_LOG_LEN 32
 #define MCE_LOG_SIGNATURE	"MACHINECHECK"
 
@@ -306,8 +302,6 @@ extern void (*deferred_error_int_vector)(void);
 
 void intel_init_thermal(struct cpuinfo_x86 *c);
 
-void mce_log_therm_throt_event(__u64 status);
-
 /* Interrupt Handler for core thermal thresholds */
 extern int (*platform_thermal_notify)(__u64 msr_val);
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 00ef43233e03..6eef6fde0f02 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1331,31 +1331,6 @@ static void mce_process_work(struct work_struct *dummy)
 	mce_gen_pool_process();
 }
 
-#ifdef CONFIG_X86_MCE_INTEL
-/***
- * mce_log_therm_throt_event - Logs the thermal throttling event to mcelog
- * @cpu: The CPU on which the event occurred.
- * @status: Event status information
- *
- * This function should be called by the thermal interrupt after the
- * event has been processed and the decision was made to log the event
- * further.
- *
- * The status parameter will be saved to the 'status' field of 'struct mce'
- * and historically has been the register value of the
- * MSR_IA32_THERMAL_STATUS (Intel) msr.
- */
-void mce_log_therm_throt_event(__u64 status)
-{
-	struct mce m;
-
-	mce_setup(&m);
-	m.bank = MCE_THERMAL_BANK;
-	m.status = status;
-	mce_log(&m);
-}
-#endif /* CONFIG_X86_MCE_INTEL */
-
 /*
  * Periodic polling timer for "silent" machine check errors.  If the
  * poller finds an MCE, poll 2x faster.  When the poller finds no more
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 465aca8be009..85469f84c921 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -6,7 +6,7 @@
  *
  * Maintains a counter in /sys that keeps track of the number of thermal
  * events, such that the user knows how bad the thermal problem might be
- * (since the logging to syslog and mcelog is rate limited).
+ * (since the logging to syslog is rate limited).
  *
  * Author: Dmitriy Zavin (dmitriyz@google.com)
  *
@@ -141,13 +141,8 @@ static struct attribute_group thermal_attr_group = {
  * IRQ has been acknowledged.
  *
  * It will take care of rate limiting and printing messages to the syslog.
- *
- * Returns: 0 : Event should NOT be further logged, i.e. still in
- *              "timeout" from previous log message.
- *          1 : Event should be logged further, and a message has been
- *              printed to the syslog.
  */
-static int therm_throt_process(bool new_event, int event, int level)
+static void therm_throt_process(bool new_event, int event, int level)
 {
 	struct _thermal_state *state;
 	unsigned int this_cpu = smp_processor_id();
@@ -162,16 +157,16 @@ static int therm_throt_process(bool new_event, int event, int level)
 		else if (event == POWER_LIMIT_EVENT)
 			state = &pstate->core_power_limit;
 		else
-			 return 0;
+			return;
 	} else if (level == PACKAGE_LEVEL) {
 		if (event == THERMAL_THROTTLING_EVENT)
 			state = &pstate->package_throttle;
 		else if (event == POWER_LIMIT_EVENT)
 			state = &pstate->package_power_limit;
 		else
-			return 0;
+			return;
 	} else
-		return 0;
+		return;
 
 	old_event = state->new_event;
 	state->new_event = new_event;
@@ -181,7 +176,7 @@ static int therm_throt_process(bool new_event, int event, int level)
 
 	if (time_before64(now, state->next_check) &&
 			state->count != state->last_count)
-		return 0;
+		return;
 
 	state->next_check = now + CHECK_INTERVAL;
 	state->last_count = state->count;
@@ -193,16 +188,14 @@ static int therm_throt_process(bool new_event, int event, int level)
 				this_cpu,
 				level == CORE_LEVEL ? "Core" : "Package",
 				state->count);
-		return 1;
+		return;
 	}
 	if (old_event) {
 		if (event == THERMAL_THROTTLING_EVENT)
 			pr_info("CPU%d: %s temperature/speed normal\n", this_cpu,
 				level == CORE_LEVEL ? "Core" : "Package");
-		return 1;
+		return;
 	}
-
-	return 0;
 }
 
 static int thresh_event_valid(int level, int event)
@@ -365,10 +358,9 @@ static void intel_thermal_interrupt(void)
 	/* Check for violation of core thermal thresholds*/
 	notify_thresholds(msr_val);
 
-	if (therm_throt_process(msr_val & THERM_STATUS_PROCHOT,
-				THERMAL_THROTTLING_EVENT,
-				CORE_LEVEL) != 0)
-		mce_log_therm_throt_event(msr_val);
+	therm_throt_process(msr_val & THERM_STATUS_PROCHOT,
+			    THERMAL_THROTTLING_EVENT,
+			    CORE_LEVEL);
 
 	if (this_cpu_has(X86_FEATURE_PLN) && int_pln_enable)
 		therm_throt_process(msr_val & THERM_STATUS_POWER_LIMIT,
-- 
2.11.0

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

  reply	other threads:[~2017-01-06 16:54 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-05  5:00 Dell XPS13: MCE (Hardware Error) reported Daniel J Blueman
2017-01-05 14:05 ` Daniel J Blueman
2017-01-05 20:10   ` Alexander Alemayhu
2017-01-05 20:31     ` Borislav Petkov
2017-01-05 20:43       ` Raj, Ashok
2017-01-05 21:03         ` Pandruvada, Srinivas
2017-01-05 23:23           ` Alexander Alemayhu
2017-01-05 21:38       ` Alexander Alemayhu
2017-01-05 23:28       ` Raj, Ashok
2017-01-05 23:56         ` Borislav Petkov
2017-01-06  1:26           ` Raj, Ashok
2017-01-06 11:16             ` Borislav Petkov
2017-01-06 15:58               ` Raj, Ashok
2017-01-06 16:54                 ` Borislav Petkov [this message]
2017-01-06 17:04                   ` Raj, Ashok
2017-01-09 10:55                   ` Paul Menzel
2017-01-09 11:05                     ` Borislav Petkov
2017-01-09 11:11                       ` Paul Menzel
  -- strict thread matches above, loose matches on Subject: below --
2017-01-23 18:35 [PATCH 0/9] x86/RAS: Queue for 4.11 Borislav Petkov
2017-01-23 18:35 ` [PATCH 1/9] x86/mce-inject: Make it depend on X86_LOCAL_APIC Borislav Petkov
2017-01-24  8:46   ` [tip:ras/core] x86/ras/inject: Make it depend on X86_LOCAL_APIC=y tip-bot for Borislav Petkov
2017-01-23 18:35 ` [PATCH 2/9] x86/MCE/therm_throt: Do not log a fake MCE for a thermal event Borislav Petkov
2017-01-24  8:47   ` [tip:ras/core] x86/ras/therm_throt: Do not log a fake MCE for thermal events tip-bot for Borislav Petkov
2017-01-23 18:35 ` [PATCH 3/9] x86/MCE/AMD: Make sysfs names of banks more user-friendly Borislav Petkov
2017-01-24  8:47   ` [tip:ras/core] x86/ras/amd: " tip-bot for Yazen Ghannam
2017-01-23 18:35 ` [PATCH 4/9] x86/MCE: Flip the TSC-adding logic Borislav Petkov
2017-01-24  8:48   ` [tip:ras/core] x86/ras: " tip-bot for Borislav Petkov
2017-01-23 18:35 ` [PATCH 5/9] x86/ras/mce_amd_inj: Change dependency Borislav Petkov
2017-01-24  8:48   ` [tip:ras/core] x86/ras/amd/inj: " tip-bot for Borislav Petkov
2017-01-23 18:35 ` [PATCH 6/9] EDAC/mce_amd: Unexport amd_decode_mce() Borislav Petkov
2017-01-24  8:49   ` [tip:ras/core] EDAC/mce/amd: " tip-bot for Borislav Petkov
2017-01-23 18:35 ` [PATCH 7/9] EDAC/mce_amd: Dump TSC value Borislav Petkov
2017-01-24  8:50   ` [tip:ras/core] EDAC/mce/amd: " tip-bot for Borislav Petkov
2017-01-23 18:35 ` [PATCH 8/9] x86/MCE: Get rid of mce_process_work() Borislav Petkov
2017-01-24  8:50   ` [tip:ras/core] x86/ras: " tip-bot for Borislav Petkov
2017-01-23 18:35 ` [PATCH 9/9] x86/MCE, EDAC, acpi: Assign MCE notifier handlers a priority Borislav Petkov
2017-01-24  8:51   ` [tip:ras/core] x86/ras, " tip-bot for Borislav Petkov
2017-01-04 15:42 Dell XPS13: MCE (Hardware Error) reported Paul Menzel
2017-01-04 22:55 ` Borislav Petkov
2017-01-05  1:12   ` Raj, Ashok
2017-01-09 11:53     ` Paul Menzel
2017-01-09 19:23       ` Raj, Ashok
2017-01-27 13:35         ` Paul Menzel
2017-01-27 17:10           ` Borislav Petkov
2017-01-27 17:16             ` Mario.Limonciello
2017-01-31 15:29               ` Paul Menzel
2017-01-31 17:20                 ` Borislav Petkov
2017-01-31 18:50                 ` Austin S. Hemmelgarn
2017-02-01 20:52                 ` Mario.Limonciello

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170106165423.7xwdlnvjlfccsrqd@pd.tnic \
    --to=bp@suse.de \
    --cc=alexander@alemayhu.com \
    --cc=ashok.raj@intel.com \
    --cc=daniel@quora.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@leemhuis.info \
    --cc=pmenzel@molgen.mpg.de \
    --cc=srinivas.pandruvada@intel.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.