From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E8C8C43460 for ; Mon, 19 Apr 2021 21:28:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D0500613AF for ; Mon, 19 Apr 2021 21:28:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232441AbhDSV3L (ORCPT ); Mon, 19 Apr 2021 17:29:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54224 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229714AbhDSV3K (ORCPT ); Mon, 19 Apr 2021 17:29:10 -0400 Received: from mail-vs1-xe34.google.com (mail-vs1-xe34.google.com [IPv6:2607:f8b0:4864:20::e34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A142C06174A for ; Mon, 19 Apr 2021 14:28:40 -0700 (PDT) Received: by mail-vs1-xe34.google.com with SMTP id k19so5283211vsg.0 for ; Mon, 19 Apr 2021 14:28:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:from:date:message-id:subject:to:cc; bh=cleXv1yhEvw46D0B1rWUk84bkqfi6fu+rLFkY44zfmA=; b=nFxMjeD7AmL56ZhtH0sR2t2h7TfuthJSJf+NdlIHx+BF+rvuPuW+9ovFUuKyRUugmU 5ixLO8gPKjq6UVc5V5pfdCx6dNJrgjM3CFYkFzydSknQDAIC1J6js1PAojit5MKN22VA ZxkLNb1VytJc7oPeiLqe2FjWVjqzn7xN1Xns1m3EqZLCYGnu0H3JJfR1rgsONcrK3z37 jqOfT+SxLbwPp+xXEFnlV0Im1dE5ZybrdHuMBIWBnzFd5bvZzsGUMncwJSt7veOL1cQ4 tY7+dDnz/af2xD6z9juc1lPZPwulI6IyB4ofLszl5N0QSyNOw6pDf2IT+/iLbjiUVhwm fa8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=cleXv1yhEvw46D0B1rWUk84bkqfi6fu+rLFkY44zfmA=; b=FzfWR4kGQADN4Ywv+z7Wi7KpZDMho1ViGiRMV5EeUOe84HIq9IcqEQjTJLQB/uxFUp no3J7zXoUzblM7FgVvttehhvLDzKNwE3eKp5KWJkRiGv/52Dep/MOg/Umi4u3zy/LJVj i9BbkP8rK+zxfXU2WlsXp06t9eJqC7qioYBQvZbxWhZMUz9ujVR9diyw3g0thpz53bVu 546TFSyOq9Soh/6INVeIXYO70weAYiEp6LY9QZNF4bl14Zn8sn3BdhFU00DuXvt4LJZZ 9wjAZDcGUn/+INsSM0yem0q4BclS/MCtxeipfCoPJAc37crMtNzCPJGtRl8Vdl0bmT/Y 0f1A== X-Gm-Message-State: AOAM533fz19NLmHEdzp74mRFvuKBDTnXMymAu3Rhq6tRZ4VNnYWSVfcW V3mnwOd+EiR27Hs1GFE7oqkXtvPhRSMSEzJsBz+pN5dJG6CQRA== X-Google-Smtp-Source: ABdhPJxNiGs2dJnnST4zN1cMW6O+xsLEhgR0c+AVVrpW9tSV5gP3SQbLZdVI0yl9kXQ7k/wrb/xo4kT2tgRD4dGCV8A= X-Received: by 2002:a67:7d42:: with SMTP id y63mr17685526vsc.5.1618867719441; Mon, 19 Apr 2021 14:28:39 -0700 (PDT) MIME-Version: 1.0 From: Jue Wang Date: Mon, 19 Apr 2021 14:28:28 -0700 Message-ID: Subject: Re: [PATCH 4/4] x86/mce: Avoid infinite loop for copy from user recovery To: tony.luck@intel.com Cc: bp@alien8.de, linux-kernel@vger.kernel.org, linux-mm@kvack.org, luto@kernel.org, naoya.horiguchi@nec.com, x86@kernel.org, yaoaili@kingsoft.com Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 25 Mar 2021 17:02:35 -0700, Tony Luck wrote: ... > But there are places in the kernel where the code assumes that this > EFAULT return was simply because of a page fault. The code takes some > action to fix that, and then retries the access. This results in a second > machine check. What about return EHWPOISON instead of EFAULT and update the callers to handle EHWPOISON explicitly: i.e., not retry but give up on the page? My main concern is that the strong assumptions that the kernel can't hit more than a fixed number of poisoned cache lines before turning to user space may simply not be true. When DIMM goes bad, it can easily affect an entire bank or entire ram device chip. Even with memory interleaving, it's possible that a kernel control path touches lots of poisoned cache lines in the buffer it is working through. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68B52C433B4 for ; Mon, 19 Apr 2021 21:28:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B6B146113C for ; Mon, 19 Apr 2021 21:28:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B6B146113C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3267B6B0036; Mon, 19 Apr 2021 17:28:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D6F36B006E; Mon, 19 Apr 2021 17:28:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 19EB46B0070; Mon, 19 Apr 2021 17:28:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0156.hostedemail.com [216.40.44.156]) by kanga.kvack.org (Postfix) with ESMTP id F23906B0036 for ; Mon, 19 Apr 2021 17:28:40 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A7AB0363D for ; Mon, 19 Apr 2021 21:28:40 +0000 (UTC) X-FDA: 78050406000.01.79941FF Received: from mail-vs1-f45.google.com (mail-vs1-f45.google.com [209.85.217.45]) by imf20.hostedemail.com (Postfix) with ESMTP id AB19D13A for ; Mon, 19 Apr 2021 21:28:33 +0000 (UTC) Received: by mail-vs1-f45.google.com with SMTP id i9so3464851vsb.13 for ; Mon, 19 Apr 2021 14:28:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:from:date:message-id:subject:to:cc; bh=cleXv1yhEvw46D0B1rWUk84bkqfi6fu+rLFkY44zfmA=; b=nFxMjeD7AmL56ZhtH0sR2t2h7TfuthJSJf+NdlIHx+BF+rvuPuW+9ovFUuKyRUugmU 5ixLO8gPKjq6UVc5V5pfdCx6dNJrgjM3CFYkFzydSknQDAIC1J6js1PAojit5MKN22VA ZxkLNb1VytJc7oPeiLqe2FjWVjqzn7xN1Xns1m3EqZLCYGnu0H3JJfR1rgsONcrK3z37 jqOfT+SxLbwPp+xXEFnlV0Im1dE5ZybrdHuMBIWBnzFd5bvZzsGUMncwJSt7veOL1cQ4 tY7+dDnz/af2xD6z9juc1lPZPwulI6IyB4ofLszl5N0QSyNOw6pDf2IT+/iLbjiUVhwm fa8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=cleXv1yhEvw46D0B1rWUk84bkqfi6fu+rLFkY44zfmA=; b=QzmczTE8wpY6p0X89/7aZZ1D8QglpTqJ9/scETYsZZhcIO1jbFJsM7eFJxXvsXz8iN lKN6kVN8icc+tCB4jF/5LUWO1lwMEoib6zBlDUMp40cOqxo/xId7IOWOQTYmfhle2Szd 15KL6phf3peooFJf82q6L1am/RFGQNy1EE1yL5c0CjjLJ4K1lefP3MuaFj/Id5UJR146 axlsXOB1DxZVmtO4ZNvAmmooWeilqJRAcOtC6arLTCEoGq/pQVgcIT2V6+Mhj+5mWrq9 HtmGnwDTKR+c7Ml5vZWSuTycUJjs7Cpy8MNVj5wVxTcWQ0cCyHYHuvTRYK7MH1W/gs94 rPwA== X-Gm-Message-State: AOAM5334QmKN3NnCiH0zbV7ps94QsnjTjhZ1UqMHu9zDf2GH3GRiZeOe NWmkS05L7euV66/Jcqtqyzxc3CeUXPy3mQ7dTHstWQ== X-Google-Smtp-Source: ABdhPJxNiGs2dJnnST4zN1cMW6O+xsLEhgR0c+AVVrpW9tSV5gP3SQbLZdVI0yl9kXQ7k/wrb/xo4kT2tgRD4dGCV8A= X-Received: by 2002:a67:7d42:: with SMTP id y63mr17685526vsc.5.1618867719441; Mon, 19 Apr 2021 14:28:39 -0700 (PDT) MIME-Version: 1.0 From: Jue Wang Date: Mon, 19 Apr 2021 14:28:28 -0700 Message-ID: Subject: Re: [PATCH 4/4] x86/mce: Avoid infinite loop for copy from user recovery To: tony.luck@intel.com Cc: bp@alien8.de, linux-kernel@vger.kernel.org, linux-mm@kvack.org, luto@kernel.org, naoya.horiguchi@nec.com, x86@kernel.org, yaoaili@kingsoft.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: AB19D13A X-Stat-Signature: p1zs9nkc3j3yy3mhp8sk4s4ap9949afc Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf20; identity=mailfrom; envelope-from=""; helo=mail-vs1-f45.google.com; client-ip=209.85.217.45 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1618867713-459658 X-Bogosity: Ham, tests=bogofilter, spamicity=0.003503, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 25 Mar 2021 17:02:35 -0700, Tony Luck wrote: ... > But there are places in the kernel where the code assumes that this > EFAULT return was simply because of a page fault. The code takes some > action to fix that, and then retries the access. This results in a second > machine check. What about return EHWPOISON instead of EFAULT and update the callers to handle EHWPOISON explicitly: i.e., not retry but give up on the page? My main concern is that the strong assumptions that the kernel can't hit more than a fixed number of poisoned cache lines before turning to user space may simply not be true. When DIMM goes bad, it can easily affect an entire bank or entire ram device chip. Even with memory interleaving, it's possible that a kernel control path touches lots of poisoned cache lines in the buffer it is working through.