From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755297AbdHYKYW (ORCPT ); Fri, 25 Aug 2017 06:24:22 -0400 Received: from mail.skyhub.de ([5.9.137.197]:55598 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754527AbdHYKYV (ORCPT ); Fri, 25 Aug 2017 06:24:21 -0400 From: Borislav Petkov To: linux-edac Cc: Steven Rostedt , Tony Luck , Yazen Ghannam , X86 ML , LKML Subject: [PATCH 0/7] EDAC, mce_amd: Issue decoded MCE through the tracepoint Date: Fri, 25 Aug 2017 12:24:04 +0200 Message-Id: <20170825102411.8682-1-bp@alien8.de> X-Mailer: git-send-email 2.13.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Borislav Petkov Hi all, here's v2 incorporating all the feedback from last time. The main difference is that instead of adding yet another tracepoint, I extended mce_record with the decoded string. This way is much more natural and we should've done it like this since the get-go. The TP record looks like this: # tracer: nop # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | kworker/1:1-91 [001] .... 97.021806: mce_record: CPU: 0, MCGc/s: 0/0, MC5: 9600410000000e0f, IPID: 0000000000000000, ADDR/MISC/SYND: 0000000097370d7b/0000000000000000/0000000000000000, RIP: 00:<0000000000000000>, TSC: 5c226747ec, PROCESSOR: 2:0, TIME: 0, SOCKET: 0, APIC: 0 MC5 Error: CPU Watchdog timer expire. and userspace can pick apart the fields, as before. Next step is adding that to rasdaemon. Thanks. Changelog: ========== v1: here's a first stab at adding a tracepoint which dumps the decoded MCE string to userspace. The main idea is to have the decoding functionality in the kernel and depending on whether you have userspace consumers listening or not, to dump the error to the tracepoint or to dmesg. In either case, we do the decoding in the kernel and don't need special userspace. Furthermore, adding new CPU support will have to be done only in one place. First 6 patches are cleanups which are good to have regardless, IMO. Any constructive comments and suggestions are appreciated. Thanks. P.S., Thanks to Rostedt for the input! Borislav Petkov (7): x86/mce: Handle an in-kernel MCE decoder x86/mce: Extend the MCE tracepoint with a decoded string seq_buf: Add seq_buf_clear_buf() seq_buf: Export seq_buf_printf() to modules EDAC, mce_amd: Convert to seq_buf EDAC, mce_amd: Issue the decoded info through the TP or printk() x86/mce: Issue the mcelog --ascii message on !AMD arch/x86/include/asm/mce.h | 4 +- arch/x86/kernel/cpu/mcheck/mce.c | 14 +- drivers/edac/mce_amd.c | 279 ++++++++++++++++++++++++--------------- include/linux/seq_buf.h | 7 + include/trace/events/mce.h | 11 +- lib/seq_buf.c | 1 + 6 files changed, 204 insertions(+), 112 deletions(-) -- 2.13.0