From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09362C432BE for ; Fri, 27 Aug 2021 05:02:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6E94D60F92 for ; Fri, 27 Aug 2021 05:02:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6E94D60F92 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id BDB8E8D0002; Fri, 27 Aug 2021 01:02:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B8C0D8D0001; Fri, 27 Aug 2021 01:02:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A7A2D8D0002; Fri, 27 Aug 2021 01:02:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0125.hostedemail.com [216.40.44.125]) by kanga.kvack.org (Postfix) with ESMTP id 8B2AF8D0001 for ; Fri, 27 Aug 2021 01:02:37 -0400 (EDT) Received: from smtpin38.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 389C41801D01D for ; Fri, 27 Aug 2021 05:02:37 +0000 (UTC) X-FDA: 78519665154.38.169283A Received: from mail-ej1-f50.google.com (mail-ej1-f50.google.com [209.85.218.50]) by imf15.hostedemail.com (Postfix) with ESMTP id E39CAD0000A8 for ; Fri, 27 Aug 2021 05:02:36 +0000 (UTC) Received: by mail-ej1-f50.google.com with SMTP id h9so11213132ejs.4 for ; Thu, 26 Aug 2021 22:02:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=WJZU1W9C8cBpY9d4/U11BLxrp06rNyBlDKPtwsLRXMM=; b=J+piEI8gb/AVLJ8olrHyXWjnSZLaFznFh6YuzV3DuD/uCNxWzjzm7GHodyHE+8OKuc oV6M40dTtAajKAxfVZgmjSig8NBk/abXjg0speMjnhlJlwHpC/fMtnSJzltkSx548qEs zzH3DDE9FpiuGP9zqUoJSKYAjWsCHjlYVWFD6rB04QKxyjN7FZQAUDB8b9ziyaIDCXL3 qxio4McKSMaTb4ZwW3/zsHqWypiIIGOv8VjKzcQSjuE/4ncJu+J0GBAo0NoobE0Qby0k 3IbJVh0t5Qlt08TJ66g5ec4sQ30QI32aSBEXADO7QC+SlRAQvSDObkF/ZuOVEL3X1cAN Xyww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=WJZU1W9C8cBpY9d4/U11BLxrp06rNyBlDKPtwsLRXMM=; b=nJzaoEidRsMHe4Z5tdB9Ijr26/wNWK//Ivpc2PxvvKzvC6dZq7PfK9uugcmW4+Y3ah oMcVWB4u3BkO8FazvCGC1kPWvLOVH1qCKtKfLmLrYX50ODHAnltX63T2iXqIps3I3P2g faNle5ThipKdwh2+TXpdbzKzgKUF5aO4+wvC6cR03bh9aTzEocdY1JJkl/1u6BueJe+3 G3+ov3JDzLdHAJeIoyQgiZn8Ee8Pz/mBQVfhUK0xB5v/9p8Q1cEXZW4V9Mokv3Mv/moy FvUNP9YMaMvHo03YFGx2WSDyO58cZR9aZiyBtg8BaO3vYl2Rt4Q4aG49hhAy9AH4o3aS dcVA== X-Gm-Message-State: AOAM533H5XATnYeqmwnigPW9Wn6cb7//xuA9aShAVkuCngdDLbKW6PwK s8GGL2MzvEpb6wJAAy9DxxJmcwxWAYtk+G79rCQ= X-Google-Smtp-Source: ABdhPJw9GkhV0bx78Z6tqUkc88QEzSkJ4ELK5LYUUxvq0taDhYTHOFQL4zgi1okJx84MwE/8whSlq7Cf73B7V2vbpwE= X-Received: by 2002:a17:906:c182:: with SMTP id g2mr7915118ejz.507.1630040555540; Thu, 26 Aug 2021 22:02:35 -0700 (PDT) MIME-Version: 1.0 References: <20210824221322.7663-1-shy828301@gmail.com> <20210826061724.GA2864786@hori.linux.bs1.fc.nec.co.jp> <20210827035739.GA3247360@hori.linux.bs1.fc.nec.co.jp> In-Reply-To: <20210827035739.GA3247360@hori.linux.bs1.fc.nec.co.jp> From: Yang Shi Date: Thu, 26 Aug 2021 22:02:23 -0700 Message-ID: Subject: Re: [PATCH] mm: hwpoison: deal with page cache THP To: =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= Cc: "osalvador@suse.de" , "hughd@google.com" , "kirill.shutemov@linux.intel.com" , "akpm@linux-foundation.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=J+piEI8g; spf=pass (imf15.hostedemail.com: domain of shy828301@gmail.com designates 209.85.218.50 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: E39CAD0000A8 X-Stat-Signature: 5dk3qyrzbbs5g8xjhhxkig34rcf66h9o X-HE-Tag: 1630040556-942422 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Aug 26, 2021 at 8:57 PM HORIGUCHI NAOYA(=E5=A0=80=E5=8F=A3=E3=80=80= =E7=9B=B4=E4=B9=9F) wrote: > > On Thu, Aug 26, 2021 at 03:03:57PM -0700, Yang Shi wrote: > > On Thu, Aug 26, 2021 at 1:03 PM Yang Shi wrote: > > > > > > On Wed, Aug 25, 2021 at 11:17 PM HORIGUCHI NAOYA(=E5=A0=80=E5=8F=A3= =E3=80=80=E7=9B=B4=E4=B9=9F) > > > wrote: > > > > > > > > On Tue, Aug 24, 2021 at 03:13:22PM -0700, Yang Shi wrote: > ... > > > > > > > > There was a discussion about another approach of keeping error page= s in page > > > > cache for filesystem without backend storage. > > > > https://lore.kernel.org/lkml/alpine.LSU.2.11.2103111312310.7859@egg= ly.anvils/ > > > > This approach seems to me less complicated, but one concern is that= this > > > > change affects user-visible behavior of memory errors. Keeping err= or pages > > > > in page cache means that the errors are persistent until next syste= m reboot, > > > > so we might need to define the way to clear the errors to continue = to use > > > > the error file. Current implementation is just to send SIGBUS to t= he > > > > mapping processes (at least once), then forget about the error, so = there is > > > > no such issue. > > > > > > > > Another thought of possible solution might be to send SIGBUS immedi= ately when > > > > a memory error happens on a shmem thp. We can find all the mapping = processes > > > > before splitting shmem thp, so send SIGBUS first, then split it and= contain > > > > the error page. This is not elegant (giving up any optional action= s) but > > > > anyway we can avoid the silent data lost. > > > > > > Thanks a lot. I apologize I didn't notice you already posted a simila= r > > > patch before. > > > > > > Yes, I think I focused on the soft offline part too much and missed > > > the uncorrected error part and I admit I did underestimate the > > > problem. > > > > > > I think Hugh's suggestion makes sense if we treat tmpfs as a regular > > > filesystem (just memory backed). AFAIK, some filesystem, e.g. btrfs, > > > may do checksum after reading from storage block then return an error > > > if checksum is not right since it may indicate hardware failure on > > > disk. Then the syscalls or page fault return error or SIGBUS. > > > > > > So in shmem/tmpfs case, if hwpoisoned page is met, just return error > > > (-EIO or whatever) for syscall or SIGBUS for page fault. It does alig= n > > > with the behavior of other filesystems. It is definitely applications= ' > > > responsibility to check the return value of read/write syscalls. > > > > BTW, IIUC the dirty regular page cache (storage backed) would be left > > in the page cache too, the clean page cache would be truncated since > > they can be just reread from storage, right? > > A dirty page cache is also removed on error (me_pagecache_dirty() falls > through me_pagecache_clean(), then truncate_error_page() is called). > The main purpose of this is to separate off the error page from exising > data structures to minimize the risk of later accesses (maybe by race or = bug). > But we can change this behavior for specific file systems by updating > error_remove_page() callbacks in address_space_operation. Yeah, if fs's error_remove_page() is defined. It seems the filesystems which have error_remove_page() defined just use generic_remove_page() except hugetlbfs. And the generic implementation just clears the dirty flag and removes the page from page cache. If error_remove_page() is not defined, the page would stay in page cache since invalidate_inode_page() can't remove dirty page. > > Honestly, it seems to me that how dirty data is lost does not depend on > file system, and I'm still not sure that this is really a right approach > for the current issue. IMHO the biggest problem is that applications may see obsolete/inconsistent data silently, right? Actually keeping the corrupted page in page cache should be able to notify applications that they are accessing inconsistent data. > > Thanks, > Naoya Horiguchi