From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752000AbdBMK3J (ORCPT ); Mon, 13 Feb 2017 05:29:09 -0500 Received: from mail.skyhub.de ([78.46.96.112]:41720 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751050AbdBMK3H (ORCPT ); Mon, 13 Feb 2017 05:29:07 -0500 Date: Mon, 13 Feb 2017 11:28:51 +0100 From: Borislav Petkov To: Gabriel C Cc: Thomas Gleixner , Linus Torvalds , Greg KH , Linux Kernel Mailing List , Andrew Morton , stable , lwn@lwn.net, Jiri Slaby , Ruslan Ruslichenko Subject: Re: Linux 4.9.6 ( Restore IO-APIC irq_chip retrigger callback , breaks my box ) Message-ID: <20170213102851.npt6x5bs6l2wvolq@pd.tnic> References: <73c6bd86-3ebb-c6db-b522-47a48b847227@gmail.com> <20170211142059.447eo6bgxycmp6kb@pd.tnic> <20170211213221.p6xs6c7qccz2w42r@pd.tnic> <0da9ef61-f3ee-13a8-2877-c235d710c50f@gmail.com> <20170212211228.iycyl76o4auxk2jy@pd.tnic> <20170213003804.7gt2edclorjlx52p@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20161014 (1.7.1) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 13, 2017 at 02:26:20AM +0100, Gabriel C wrote: > I didn't tested your patch yet but did a boot with mce=off and nomce > which seems to not really works since is still want to mc_device_add() > even when off. mc_device_add() is microcode loader's ->add_dev() subsys pointer and that's not from mce. From mce you should be seeing only (with the debug patch applied): [ 1.717508] mce: mcheck_init_device: entry [ 1.718769] mce: Unable to init device /dev/mcelog (rc: -5) > See : > > http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/crash_mce_off.jpg That looks like core 13 got the NMI from the watchdog at if (wait) csd_lock_wait(csd); IINM and from what I could correlate to the asm it generates here, RIP points to that READ_ONCE there in smp_cond_load_acquire() in smp_call_function_single() which is called by collect_cpu_info() of the microcode loader to get the microcode-relevant info from the CPU. So this is simply a bystander CPU which got interrupted. > I'll build an .10-rc8 with your patch tomorrow .. is somewhat late now here :) Ok. > Another thing is .. there seems to be a real bug in tsc code . > > I've build an -rc8 with a lot more debug options on an now I see the following : Right before I went to bed I thought of telling you to enable lockdep :-) Good. :-) -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.