From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C864C4320A for ; Wed, 18 Aug 2021 00:29:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A782F61038 for ; Wed, 18 Aug 2021 00:29:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A782F61038 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 21D586B006C; Tue, 17 Aug 2021 20:29:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 159F2900002; Tue, 17 Aug 2021 20:29:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 020E16B0073; Tue, 17 Aug 2021 20:29:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0034.hostedemail.com [216.40.44.34]) by kanga.kvack.org (Postfix) with ESMTP id DAB416B006C for ; Tue, 17 Aug 2021 20:29:55 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 881AD8249980 for ; Wed, 18 Aug 2021 00:29:55 +0000 (UTC) X-FDA: 78486318750.31.5EB366D Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf12.hostedemail.com (Postfix) with ESMTP id DF2BF1001511 for ; Wed, 18 Aug 2021 00:29:54 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10079"; a="216260723" X-IronPort-AV: E=Sophos;i="5.84,330,1620716400"; d="scan'208";a="216260723" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Aug 2021 17:29:52 -0700 X-IronPort-AV: E=Sophos;i="5.84,330,1620716400"; d="scan'208";a="520687343" Received: from agluck-desk2.sc.intel.com ([10.3.52.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Aug 2021 17:29:52 -0700 From: Tony Luck To: Borislav Petkov Cc: Jue Wang , Ding Hui , naoya.horiguchi@nec.com, osalvador@suse.de, Youquan Song , huangcun@sangfor.com.cn, x86@kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Tony Luck Subject: [PATCH v2 1/3] x86/mce: Avoid infinite loop for copy from user recovery Date: Tue, 17 Aug 2021 17:29:40 -0700 Message-Id: <20210818002942.1607544-2-tony.luck@intel.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210818002942.1607544-1-tony.luck@intel.com> References: <20210706190620.1290391-1-tony.luck@intel.com> <20210818002942.1607544-1-tony.luck@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: DF2BF1001511 Authentication-Results: imf12.hostedemail.com; dkim=none; spf=none (imf12.hostedemail.com: domain of tony.luck@intel.com has no SPF policy when checking 134.134.136.65) smtp.mailfrom=tony.luck@intel.com; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none) X-Rspamd-Server: rspam01 X-Stat-Signature: eofc7uyt75faya6dwea4yozam3fz3hr5 X-HE-Tag: 1629246594-816385 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Recovery action when get_user() triggers a machine check uses the fixup path to make get_user() return -EFAULT. Also queue_task_work() sets up so that kill_me_maybe() will be called on return to user mode to send a SIGBUS to the current process. But there are places in the kernel where the code assumes that this EFAULT return was simply because of a page fault. The code takes some action to fix that, and then retries the access. This results in a second machine check. While processing this second machine check queue_task_work() is called again. But since this uses the same callback_head structure that was used in the first call, the net result is an entry on the current->task_works list that points to itself. When task_work_run() is called it loops forever in this code: do { next =3D work->next; work->func(work); work =3D next; cond_resched(); } while (work); Add a counter (current->mce_count) to keep track of repeated machine checks before task_work() is called. First machine check saves the addres= s information and calls task_work_add(). Subsequent machine checks before that task_work call back is executed check that the address is in the same page as the first machine check (since the callback will offline exactly one page). Expected worst case is two machine checks before moving on (e.g. one user access with page faults disabled, then a repeat to the same addrsss with page faults enabled). Just in case there is some code that loops forever enforce a limit of 10. Cc: Signed-off-by: Tony Luck --- arch/x86/kernel/cpu/mce/core.c | 43 +++++++++++++++++++++++++--------- include/linux/sched.h | 1 + 2 files changed, 33 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/cor= e.c index 22791aadc085..94830ee9581c 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1250,6 +1250,9 @@ static void __mc_scan_banks(struct mce *m, struct p= t_regs *regs, struct mce *fin =20 static void kill_me_now(struct callback_head *ch) { + struct task_struct *p =3D container_of(ch, struct task_struct, mce_kill= _me); + + p->mce_count =3D 0; force_sig(SIGBUS); } =20 @@ -1259,6 +1262,7 @@ static void kill_me_maybe(struct callback_head *cb) int flags =3D MF_ACTION_REQUIRED; int ret; =20 + p->mce_count =3D 0; pr_err("Uncorrected hardware memory error in user-access at %llx", p->m= ce_addr); =20 if (!p->mce_ripv) @@ -1287,17 +1291,34 @@ static void kill_me_maybe(struct callback_head *c= b) } } =20 -static void queue_task_work(struct mce *m, int kill_current_task) +static void queue_task_work(struct mce *m, char *msg, int kill_current_t= ask) { - current->mce_addr =3D m->addr; - current->mce_kflags =3D m->kflags; - current->mce_ripv =3D !!(m->mcgstatus & MCG_STATUS_RIPV); - current->mce_whole_page =3D whole_page(m); + int count =3D ++current->mce_count; =20 - if (kill_current_task) - current->mce_kill_me.func =3D kill_me_now; - else - current->mce_kill_me.func =3D kill_me_maybe; + /* First call, save all the details */ + if (count =3D=3D 1) { + current->mce_addr =3D m->addr; + current->mce_kflags =3D m->kflags; + current->mce_ripv =3D !!(m->mcgstatus & MCG_STATUS_RIPV); + current->mce_whole_page =3D whole_page(m); + + if (kill_current_task) + current->mce_kill_me.func =3D kill_me_now; + else + current->mce_kill_me.func =3D kill_me_maybe; + } + + /* Ten is likley overkill. Don't expect more than two faults before tas= k_work() */ + if (count > 10) + mce_panic("Too many machine checks while accessing user data", m, msg)= ; + + /* Second or later call, make sure page address matches the one from fi= rst call */ + if (count > 1 && (current->mce_addr >> PAGE_SHIFT) !=3D (m->addr >> PAG= E_SHIFT)) + mce_panic("Machine checks to different user pages", m, msg); + + /* Do not call task_work_add() more than once */ + if (count > 1) + return; =20 task_work_add(current, ¤t->mce_kill_me, TWA_RESUME); } @@ -1435,7 +1456,7 @@ noinstr void do_machine_check(struct pt_regs *regs) /* If this triggers there is no way to recover. Die hard. */ BUG_ON(!on_thread_stack() || !user_mode(regs)); =20 - queue_task_work(&m, kill_current_task); + queue_task_work(&m, msg, kill_current_task); =20 } else { /* @@ -1453,7 +1474,7 @@ noinstr void do_machine_check(struct pt_regs *regs) } =20 if (m.kflags & MCE_IN_KERNEL_COPYIN) - queue_task_work(&m, kill_current_task); + queue_task_work(&m, msg, kill_current_task); } out: mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); diff --git a/include/linux/sched.h b/include/linux/sched.h index ec8d07d88641..f6935787e7e8 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1394,6 +1394,7 @@ struct task_struct { mce_whole_page : 1, __mce_reserved : 62; struct callback_head mce_kill_me; + int mce_count; #endif =20 #ifdef CONFIG_KRETPROBES --=20 2.29.2