From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF61DC04AB5 for ; Mon, 3 Jun 2019 14:23:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9B64D24367 for ; Mon, 3 Jun 2019 14:23:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729043AbfFCOXc (ORCPT ); Mon, 3 Jun 2019 10:23:32 -0400 Received: from mga17.intel.com ([192.55.52.151]:31791 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727429AbfFCOXc (ORCPT ); Mon, 3 Jun 2019 10:23:32 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Jun 2019 07:23:32 -0700 X-ExtLoop1: 1 Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.36]) by fmsmga008.fm.intel.com with ESMTP; 03 Jun 2019 07:23:31 -0700 Date: Mon, 3 Jun 2019 07:23:31 -0700 From: Sean Christopherson To: Andy Lutomirski Cc: Jiri Kosina , Andy Lutomirski , "Rafael J. Wysocki" , Josh Poimboeuf , "Rafael J. Wysocki" , Thomas Gleixner , the arch/x86 maintainers , Pavel Machek , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Peter Zijlstra , Linux PM , Linux Kernel Mailing List Subject: Re: [PATCH v4] x86/power: Fix 'nosmt' vs. hibernation triple fault during resume Message-ID: <20190603142330.GA13384@linux.intel.com> References: <20190531051456.fzkvn62qlkf6wqra@treble> <5564116.e9OFvgDRbB@kreacher> <98E57C7E-24E2-4EB8-A14E-FCA80316F812@amacapital.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <98E57C7E-24E2-4EB8-A14E-FCA80316F812@amacapital.net> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 31, 2019 at 02:22:27PM -0700, Andy Lutomirski wrote: > > > On May 31, 2019, at 2:05 PM, Jiri Kosina wrote: > > > >> On Fri, 31 May 2019, Andy Lutomirski wrote: > >> > >> The Intel SDM Vol 3 34.10 says: > >> > >> If the HLT instruction is restarted, the processor will generate a > >> memory access to fetch the HLT instruction (if it is > >> not in the internal cache), and execute a HLT bus transaction. This > >> behavior results in multiple HLT bus transactions > >> for the same HLT instruction. > > > > Which basically means that both hibernation and kexec have been broken in > > this respect for gazillions of years, and seems like noone noticed. Makes > > one wonder what the reason for that might be. > > > > Either SDM is not precise and the refetch actually never happens for real > > (or is always in these cases satisfied from I$ perhaps?), or ... ? > > > > So my patch basically puts things back where they have been for ages > > (while mwait is obviously much worse, as that gets woken up by the write > > to the monitored address, which inevitably does happen during resume), but > > seems like SDM is suggesting that we've been in a grey zone wrt RSM at > > least for all those ages. > > > > So perhaps we really should ditch resume_play_dead() altogether > > eventually, and replace it with sending INIT IPI around instead (and then > > waking the CPUs properly via INIT INIT START). I'd still like to do that > > for 5.3 though, as that'd be slightly bigger surgery, and conservatively > > put things basically back to state they have been up to now for 5.2. > > > > > Seems reasonable to me. I would guess that it mostly works because SMI isn’t > all that common and the window where it matters is short. Or maybe the SDM > is misleading. For P6 and later, i.e. all modern CPUs, Intel processors go straight to halted state and don't fetch/decode the HLT instruction. P5 actually did a fetch, but from what I can tell that behavior wasn't carried forward to KNC, unlike other legacy interrupt crud from P5: [1] https://lkml.kernel.org/r/20190430004504.GH31379@linux.intel.com