From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756850AbcK3Isi (ORCPT ); Wed, 30 Nov 2016 03:48:38 -0500 Received: from mail.skyhub.de ([78.46.96.112]:54314 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751382AbcK3Isb (ORCPT ); Wed, 30 Nov 2016 03:48:31 -0500 Date: Wed, 30 Nov 2016 09:48:28 +0100 From: Borislav Petkov To: Thomas Gleixner Cc: Peter Zijlstra , Steven Rostedt , Jiri Olsa , "Paul E. McKenney" , linux-kernel@vger.kernel.org, Ingo Molnar , Josh Triplett , Andi Kleen , Jan Stancek Subject: Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage! Message-ID: <20161130084828.7jsi6r6pxztj5dmz@pd.tnic> References: <20161121092850.GF3102@twins.programming.kicks-ass.net> <20161121093424.GA9814@krava> <20161121125830.GE3092@twins.programming.kicks-ass.net> <20161121091543.45f49945@gandalf.local.home> <20161121143716.GG3092@twins.programming.kicks-ass.net> <20161121153538.27wegzmdv3om52xq@pd.tnic> <20161121154104.GA3124@twins.programming.kicks-ass.net> <20161121160653.s4i3nua46rtpvj5l@pd.tnic> <20161129131649.hajagzcjfhn5cenp@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20161014 (1.7.1) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 29, 2016 at 02:59:01PM +0100, Thomas Gleixner wrote: > The issue is that you obvioulsy start with the assumption, that the machine > has this bug. As a consequence the machine is brute forced into tick > broadcast mode, which cannot be reverted when you clear that misfeature > after ACPI init. So in case of !NOHZ and !HIGHRES the periodic tick is > forced into broadcast mode, which is not what you want. > > As far as I understood the whole magic, this C1E misfeature takes only > effect _after_ ACPI has been initialized. So instead of setting the bug in > early boot and therefor forcing the broadcast nonsense, we should only set > it when ACPI has actually detected it. Problem is, select_idle_routine() runs a lot earlier than acpi_init() so there's a window where we don't definitively know yet whether the box is actually going to enter C1E or not. [ I presume the reason why we have to do the proper detection after ACPI has been initialized is because the frickelware decides whether to do C1E entry or not and then sets those bits in the MSR (or not). ] If in that window we enter idle and we're on an affected machine and we *don't* switch to broadcast mode, we risk not waking up from C1E, i.e., the main reason this fix was even done. So, if we "prematurely" switch to broadcast mode on the affected CPUs, we're ok, it will be detected properly later and we're in broadcast mode already. Now, on those machines which are not affected and we clear X86_BUG_AMD_APIC_C1E because they don't enter C1E at all, I was thinking of maybe doing amd_e400_remove_cpu() and clearing that e400 mask and even freeing it so that they can do default_idle(). But you're saying tick_broadcast_enter() is irreversible? Thanks. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.