From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCA59C433DB for ; Tue, 19 Jan 2021 23:58:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5A0C923108 for ; Tue, 19 Jan 2021 23:58:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5A0C923108 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 676A66B0005; Tue, 19 Jan 2021 18:58:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 629D46B0006; Tue, 19 Jan 2021 18:58:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53D576B0007; Tue, 19 Jan 2021 18:58:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0030.hostedemail.com [216.40.44.30]) by kanga.kvack.org (Postfix) with ESMTP id 3FCD16B0005 for ; Tue, 19 Jan 2021 18:58:04 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 07D0F181AF5C4 for ; Tue, 19 Jan 2021 23:58:04 +0000 (UTC) X-FDA: 77724190488.12.sheep00_2e0ea6a27556 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id DD19A1801E0CD for ; Tue, 19 Jan 2021 23:58:03 +0000 (UTC) X-HE-Tag: sheep00_2e0ea6a27556 X-Filterd-Recvd-Size: 3874 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Tue, 19 Jan 2021 23:58:02 +0000 (UTC) IronPort-SDR: f6bz5HL9SMef/dg24930qX4ZCgQimeSkJpiQgDc8MGZkI6EUEXGupdwZ2j9JfrLq4o/CDAv8rC ySaaJBxQBsfg== X-IronPort-AV: E=McAfee;i="6000,8403,9869"; a="158790295" X-IronPort-AV: E=Sophos;i="5.79,359,1602572400"; d="scan'208";a="158790295" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2021 15:58:00 -0800 IronPort-SDR: Cr3m4y5okOrq1yhbLaWhhnHnH+hvpjKVUwfhFjIKslEtcuL+l6rDtnvmJFbtoRCs02T/kPK94B lS6RZYBCqJJQ== X-IronPort-AV: E=Sophos;i="5.79,359,1602572400"; d="scan'208";a="402548004" Received: from agluck-desk2.sc.intel.com (HELO agluck-desk2.amr.corp.intel.com) ([10.3.52.68]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2021 15:58:00 -0800 Date: Tue, 19 Jan 2021 15:57:59 -0800 From: "Luck, Tony" To: Borislav Petkov Cc: x86@kernel.org, Andrew Morton , Peter Zijlstra , Darren Hart , Andy Lutomirski , linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v4] x86/mce: Avoid infinite loop for copy from user recovery Message-ID: <20210119235759.GA9970@agluck-desk2.amr.corp.intel.com> References: <20210111214452.1826-1-tony.luck@intel.com> <20210115003817.23657-1-tony.luck@intel.com> <20210115152754.GC9138@zn.tnic> <20210115193435.GA4663@agluck-desk2.amr.corp.intel.com> <20210115205103.GA5920@agluck-desk2.amr.corp.intel.com> <20210115232346.GA7967@agluck-desk2.amr.corp.intel.com> <20210119105632.GF27433@zn.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210119105632.GF27433@zn.tnic> Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 19, 2021 at 11:56:32AM +0100, Borislav Petkov wrote: > On Fri, Jan 15, 2021 at 03:23:46PM -0800, Luck, Tony wrote: > > On Fri, Jan 15, 2021 at 12:51:03PM -0800, Luck, Tony wrote: > > > static void kill_me_now(struct callback_head *ch) > > > { > > > + p->mce_count =3D 0; > > > force_sig(SIGBUS); > > > } > >=20 > > Brown paper bag time ... I just pasted that line from kill_me_maybe() > > and I thought I did a re-compile ... but obviously not since it gives > >=20 > > error: =E2=80=98p=E2=80=99 undeclared (first use in this function) > >=20 > > Option a) (just like kill_me_maybe) > >=20 > > struct task_struct *p =3D container_of(cb, struct task_struct, mce_ki= ll_me); > >=20 > > Option b) (simpler ... not sure why PeterZ did the container_of thing > >=20 > > current->mce_count =3D 0; >=20 > Right, he says it is the canonical way to get it out of callback_head. > I don't think current will change while the #MC handler runs but we can > adhere to the design pattern here and do container_of() ... Ok ... I'll use the canonical way. But now I've run into a weird issue. I'd run some basic tests with a dozen machine checks in each of: 1) user access 2) kernel copyin 3) futex (multiple accesses from kernel before task_work()) and it passed my tests before I posted. But the real validation folks took my patch and found that it has destabilized cases 1 & 2 (and case 3 also chokes if you repeat a few more times). System either hangs or panics. Generally before 100 injection/conumption cycles. Their tests are still just doing one at a time (i.e. complete recovery of one machine cehck before injecting the next error). So there aren't any complicated race conditions. So if you see anything obviously broken, let me know. Otherwise I'll be poking around at the patch to figure out what is wrong. -Tony