From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13623C433E0 for ; Mon, 11 Jan 2021 22:12:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D150222CBE for ; Mon, 11 Jan 2021 22:12:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389154AbhAKWMk (ORCPT ); Mon, 11 Jan 2021 17:12:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726945AbhAKWMj (ORCPT ); Mon, 11 Jan 2021 17:12:39 -0500 Received: from mail-pf1-x42a.google.com (mail-pf1-x42a.google.com [IPv6:2607:f8b0:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 97531C061794 for ; Mon, 11 Jan 2021 14:11:59 -0800 (PST) Received: by mail-pf1-x42a.google.com with SMTP id h10so113963pfo.9 for ; Mon, 11 Jan 2021 14:11:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=content-transfer-encoding:from:mime-version:subject:date:message-id :references:cc:in-reply-to:to; bh=HS7fxdK8JZoFgzbZviQDL+FJot9ytoncxDHn7G7o7hU=; b=N7A7iQCrzatkQ5cCobcCikH4tHwyq62MMxZUpCjV0P2VTkTTTjtghzY3FYBg44GCVg l0anrYZpCE+ugJIxWSO62h22DQ4/WmYRZgr8P7oIm3edhizY3pjzCWsYjjlzLgbB5ZY6 +sRv2zNCr/xUPttFKn9FEVK/92e+dS7VeiuQTArgIbfC240td3LsfyrItnd0EuVfNz64 PHLGtWX9MxVFMGt43Gp0P5oHN3rMNp/IhCSt0KKqBLwAYEUYgGesBNPAFCH8R6ae/lpo XElKavbwO8xbtGial7DbFuGaanxBo+99apBaNcptQuY/Hkpi8IK8UzBxcSAOiWvdcTv7 4C8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=HS7fxdK8JZoFgzbZviQDL+FJot9ytoncxDHn7G7o7hU=; b=le7rzMZyFrKl0UixKWMa9Cb4OuonRbS7XNiAtsxkFT+W728odud9VY+qqlkkyNMY+/ Yu/j2bbLowF0+N14GMWDWjv8i7rYT7eX2X+OzTMyjAGoGI7qVM9nGNIN225+OpOzBY32 lfjSpTpl43X+/beWJ3LmP479+1r+QvrOxG3+/qTpip4kx69LPMI4M96diQtA9MGvoYDP f1EBdykI29AeI3TBjBH4JIuzekjSWGv/tomaEKMtq0bgQZepn2fGOkMUcIpr9Qf0DWBU irp7ofmF3g4iowFiAADqiWFNmV8tE2O3zw62UOYBeEzCnbOfZvYmOta43DtFgP9ZlMZy 7t3w== X-Gm-Message-State: AOAM532Ns5yknXnRTAYP41WxNekycmaDdvYmy0SGwewL4GIM3yfxgY3J XuOde0lcF2JmfB5AU56WsTxy5rZphhq0hw== X-Google-Smtp-Source: ABdhPJwtazOAjIUNjsp7P6XBjKEky8Y/ydC2ogSCffAC48MQyc7KChGD2v03jQV/bFeIMs7ikFsouA== X-Received: by 2002:a63:3e8f:: with SMTP id l137mr1519125pga.117.1610403118800; Mon, 11 Jan 2021 14:11:58 -0800 (PST) Received: from ?IPv6:2601:646:c200:1ef2:4d0d:5741:f5e3:172? ([2601:646:c200:1ef2:4d0d:5741:f5e3:172]) by smtp.gmail.com with ESMTPSA id z23sm658756pfj.143.2021.01.11.14.11.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 11 Jan 2021 14:11:58 -0800 (PST) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Andy Lutomirski Mime-Version: 1.0 (1.0) Subject: Re: [PATCH v2 1/3] x86/mce: Avoid infinite loop for copy from user recovery Date: Mon, 11 Jan 2021 14:11:56 -0800 Message-Id: References: <20210111214452.1826-2-tony.luck@intel.com> Cc: Borislav Petkov , x86@kernel.org, Andrew Morton , Peter Zijlstra , Darren Hart , Andy Lutomirski , linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org In-Reply-To: <20210111214452.1826-2-tony.luck@intel.com> To: Tony Luck X-Mailer: iPhone Mail (18B121) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Jan 11, 2021, at 1:45 PM, Tony Luck wrote: >=20 > =EF=BB=BFRecovery action when get_user() triggers a machine check uses the= fixup > path to make get_user() return -EFAULT. Also queue_task_work() sets up > so that kill_me_maybe() will be called on return to user mode to send a > SIGBUS to the current process. >=20 > But there are places in the kernel where the code assumes that this > EFAULT return was simply because of a page fault. The code takes some > action to fix that, and then retries the access. This results in a second > machine check. >=20 > While processing this second machine check queue_task_work() is called > again. But since this uses the same callback_head structure that > was used in the first call, the net result is an entry on the > current->task_works list that points to itself. Is this happening in pagefault_disable context or normal sleepable fault con= text? If the latter, maybe we should reconsider finding a way for the machi= ne check code to do its work inline instead of deferring it. Yes, I realize this is messy, but maybe it=E2=80=99s not that messy. Concept= ually, we just (famous last words) need to arrange for an MCE with IF=3D1 to= switch off the IST stack and run like a normal exception.=