From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2215DC433EF for ; Tue, 14 Sep 2021 18:37:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A0D7861130 for ; Tue, 14 Sep 2021 18:37:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A0D7861130 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id C9EE76B006C; Tue, 14 Sep 2021 14:37:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C4D3A6B0072; Tue, 14 Sep 2021 14:37:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC6CA6B0073; Tue, 14 Sep 2021 14:37:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0200.hostedemail.com [216.40.44.200]) by kanga.kvack.org (Postfix) with ESMTP id 9A8426B006C for ; Tue, 14 Sep 2021 14:37:32 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 4CA8C32601 for ; Tue, 14 Sep 2021 18:37:32 +0000 (UTC) X-FDA: 78587037144.21.25F7539 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf08.hostedemail.com (Postfix) with ESMTP id 1185430000AD for ; Tue, 14 Sep 2021 18:37:31 +0000 (UTC) Received: by mail-pf1-f176.google.com with SMTP id j16so197157pfc.2 for ; Tue, 14 Sep 2021 11:37:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=pRa6HqUhQ+fdTXKvWXd36fRi3TX1WdardxQCBjScj6s=; b=d9dsOcWabdNkz5M6PlZuPYWhfmeeWroMMJAR5Caicpoe4HKehNsXf+y9LHNyjCTi73 XRiNz4IBpGGyJhAlJRBKppO6OW0uUTf3nuiEU6JmZFb4fnCIhOQn366V8TKl/rqgRFsd wPaFWgY/cDnSh9VS5Vw4A9ikPyT21abYl2qgQCyUc5cNU6sxxFu5ZQXyj9TQlzNKkpHp U3h14HHBUHrZ8dP10GQ16pnIOU3g51930ImJEa3XohbY/iTHarAFAqTIyZu1gYey56TA cFHCwONBGUxs80q4ivm2u4SpEbYjSP4wdE6a4+DOgXEbz0xg8PyE1nbATJN4L50xPfyC AMuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=pRa6HqUhQ+fdTXKvWXd36fRi3TX1WdardxQCBjScj6s=; b=MyvdH6p0Uz2KOV9txe96EebAwkbwdRnY9gREIntltCVlxLLTh9+qfJLalBY5NO97WV okIQ+UrJfkjyuXlU2VXEUDpCrKacL+Oo+0TFs+vAiAlgkDgrYAz5MnVEDypjVFPpj+EN 37i4FcmgpfNRpIWjKv3xyosZ4Z7iE1rPcnI6cOMrMrgokfsjdiEdv9cEM2Gm2+Abc/nO FImUHiSy+STEoML5kQJpYq7PE8SM1BEDnlpbt7Wk8Z0UxS1p3BKve3xb9/+M3w4Sq5xs GA8YmO+MdM5OnLHsBud587G9dJV4Innm7Rc+Y4BOG1V6AB5XRhBaSkMNvZR5Csb3vIyR GrxQ== X-Gm-Message-State: AOAM531uhTNn1FTtXdKOftYoBQx61MBYCij4u2cL+S3oiG/LJu9YTW+f lBdvSsOMXA6Cb519Z05Xfkg= X-Google-Smtp-Source: ABdhPJwp7U2LDZyKE5jo/rvBVtBlZqpa3nv4i6qbaczOFuJ1PuBeyWWrqMw7ryfd72rrBgg7Nacehw== X-Received: by 2002:a65:508a:: with SMTP id r10mr16717565pgp.96.1631644651020; Tue, 14 Sep 2021 11:37:31 -0700 (PDT) Received: from localhost.localdomain (c-73-93-239-127.hsd1.ca.comcast.net. [73.93.239.127]) by smtp.gmail.com with ESMTPSA id y3sm12003965pge.44.2021.09.14.11.37.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Sep 2021 11:37:29 -0700 (PDT) From: Yang Shi To: naoya.horiguchi@nec.com, hughd@google.com, kirill.shutemov@linux.intel.com, willy@infradead.org, osalvador@suse.de, akpm@linux-foundation.org Cc: shy828301@gmail.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 0/4] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Date: Tue, 14 Sep 2021 11:37:14 -0700 Message-Id: <20210914183718.4236-1-shy828301@gmail.com> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 X-Stat-Signature: cancfnnnt771wfnnesazh5s3fczntbhx Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=d9dsOcWa; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 1185430000AD X-HE-Tag: 1631644651-877477 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When discussing the patch that splits page cache THP in order to offline = the poisoned page, Noaya mentioned there is a bigger problem [1] that prevent= s this from working since the page cache page will be truncated if uncorrectable errors happen. By looking this deeper it turns out this approach (trunca= ting poisoned page) may incur silent data loss for all non-readonly filesystem= s if the page is dirty. It may be worse for in-memory filesystem, e.g. shmem/= tmpfs since the data blocks are actually gone. To solve this problem we could keep the poisoned dirty page in page cache= then notify the users on any later access, e.g. page fault, read/write, etc. = The clean page could be truncated as is since they can be reread from disk la= ter on. The consequence is the filesystems may find poisoned page and manipulate = it as healthy page since all the filesystems actually don't check if the page i= s poisoned or not in all the relevant paths except page fault. In general,= we need make the filesystems be aware of poisoned page before we could keep = the poisoned page in page cache in order to solve the data loss problem. To make filesystems be aware of poisoned page we should consider: - The page should be not written back: clearing dirty flag could prevent = from writeback. - The page should not be dropped (it shows as a clean page) by drop cache= s or other callers: the refcount pin from hwpoison could prevent from invali= dating (called by cache drop, inode cache shrinking, etc), but it doesn't avoi= d invalidation in DIO path. - The page should be able to get truncated/hole punched/unlinked: it work= s as it is. - Notify users when the page is accessed, e.g. read/write, page fault and= other paths (compression, encryption, etc). The scope of the last one is huge since almost all filesystems need do it= once a page is returned from page cache lookup. There are a couple of options= to do it: 1. Check hwpoison flag for every path, the most straightforward way. 2. Return NULL for poisoned page from page cache lookup, the most callsit= es check if NULL is returned, this should have least work I think. But t= he error handling in filesystems just return -ENOMEM, the error code will= incur confusion to the users obviously. 3. To improve #2, we could return error pointer, e.g. ERR_PTR(-EIO), but = this will involve significant amount of code change as well since all the p= aths need check if the pointer is ERR or not just like option #1. I did prototype for both #1 and #3, but it seems #3 may require more chan= ges than #1. For #3 ERR_PTR will be returned so all the callers need to chec= k the return value otherwise invalid pointer may be dereferenced, but not all c= allers really care about the content of the page, for example, partial truncate = which just sets the truncated range in one page to 0. So for such paths it nee= ds additional modification if ERR_PTR is returned. And if the callers have = their own way to handle the problematic pages we need to add a new FGP flag to = tell FGP functions to return the pointer to the page. It may happen very rarely, but once it happens the consequence (data corr= uption) could be very bad and it is very hard to debug. It seems this problem ha= d been slightly discussed before, but seems no action was taken at that time. [2= ] As the aforementioned investigation, it needs huge amount of work to solv= e the potential data loss for all filesystems. But it is much easier for in-memory filesystems and such filesystems actually suffer more than othe= rs since even the data blocks are gone due to truncating. So this patchset = starts from shmem/tmpfs by taking option #1. Patch #1 and #2: fix bugs in page fault and khugepaged. And patch #2 als= o did some preparation for the later patches. Patch #3: keep the poisoned page in page cache and handle such case for a= ll the paths. Patch #4: the previous patches unblock page cache THP split, so this patc= h add page cache THP split support. [1] https://lore.kernel.org/linux-mm/CAHbLzkqNPBh_sK09qfr4yu4WTFOzRy+MKj+= PA7iG-adzi9zGsg@mail.gmail.com/T/#m0e959283380156f1d064456af01ae51fdff912= 65 [2] https://lore.kernel.org/lkml/20210318183350.GT3420@casper.infradead.o= rg/