From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.4 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 097BAC433E6 for ; Tue, 2 Feb 2021 12:48:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7D46364F49 for ; Tue, 2 Feb 2021 12:48:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7D46364F49 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BA2E66B0005; Tue, 2 Feb 2021 07:48:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B52AA6B006E; Tue, 2 Feb 2021 07:48:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A407F6B0070; Tue, 2 Feb 2021 07:48:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0022.hostedemail.com [216.40.44.22]) by kanga.kvack.org (Postfix) with ESMTP id 8EB516B0005 for ; Tue, 2 Feb 2021 07:48:13 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 4FF9E180AD811 for ; Tue, 2 Feb 2021 12:48:13 +0000 (UTC) X-FDA: 77773305666.20.night86_3d0b879275cb Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 2B858180C07AF for ; Tue, 2 Feb 2021 12:48:13 +0000 (UTC) X-HE-Tag: night86_3d0b879275cb X-Filterd-Recvd-Size: 10512 Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Tue, 2 Feb 2021 12:48:11 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.79,394,1602518400"; d="scan'208";a="104103944" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 02 Feb 2021 20:48:06 +0800 Received: from G08CNEXMBPEKD05.g08.fujitsu.local (unknown [10.167.33.204]) by cn.fujitsu.com (Postfix) with ESMTP id C3DA44CE6D68; Tue, 2 Feb 2021 20:48:02 +0800 (CST) Received: from irides.mr (10.167.225.141) by G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 2 Feb 2021 20:48:03 +0800 Subject: Re: [PATCH RESEND v2 09/10] xfs: Implement ->corrupted_range() for XFS To: "Darrick J. Wong" CC: , , , , , , , , , , , , , , References: <20210129062757.1594130-1-ruansy.fnst@cn.fujitsu.com> <20210129062757.1594130-10-ruansy.fnst@cn.fujitsu.com> <20210202024147.GI7193@magnolia> From: Ruan Shiyang Message-ID: Date: Tue, 2 Feb 2021 20:48:01 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: <20210202024147.GI7193@magnolia> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Language: en-US X-Originating-IP: [10.167.225.141] X-ClientProxiedBy: G08CNEXCHPEKD04.g08.fujitsu.local (10.167.33.200) To G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) X-yoursite-MailScanner-ID: C3DA44CE6D68.AE3BA X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2021/2/2 =E4=B8=8A=E5=8D=8810:41, Darrick J. Wong wrote: > On Fri, Jan 29, 2021 at 02:27:56PM +0800, Shiyang Ruan wrote: >> This function is used to handle errors which may cause data lost in >> filesystem. Such as memory failure in fsdax mode. >> >> In XFS, it requires "rmapbt" feature in order to query for files or >> metadata which associated to the corrupted data. Then we could call f= s >> recover functions to try to repair the corrupted data.(did not >> implemented in this patchset) >=20 > I would suggest: > "If the rmap feature of XFS enabled, we can query it to find files and > metadata which are associated with the corrupt data. For now all we do > is kill processes with that file mapped into their address spaces, but > future patches could actually do something about corrupt metadata." >=20 Yes, this is better. >> After that, the memory failure also needs to notify the processes who >> are using those files. >> >> Only support data device. Realtime device is not supported for now. >> >> Signed-off-by: Shiyang Ruan >> --- >> fs/xfs/xfs_fsops.c | 5 +++ >> fs/xfs/xfs_mount.h | 1 + >> fs/xfs/xfs_super.c | 109 +++++++++++++++++++++++++++++++++++++++++++= ++ >> 3 files changed, 115 insertions(+) >> >> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c >> index 959ce91a3755..f03901a5c673 100644 >> --- a/fs/xfs/xfs_fsops.c >> +++ b/fs/xfs/xfs_fsops.c >> @@ -498,6 +498,11 @@ xfs_do_force_shutdown( >> "Corruption of in-memory data detected. Shutting down filesystem"); >> if (XFS_ERRLEVEL_HIGH <=3D xfs_error_level) >> xfs_stack_trace(); >> + } else if (flags & SHUTDOWN_CORRUPT_META) { >> + xfs_alert_tag(mp, XFS_PTAG_SHUTDOWN_CORRUPT, >> +"Corruption of on-disk metadata detected. Shutting down filesystem")= ; >> + if (XFS_ERRLEVEL_HIGH <=3D xfs_error_level) >> + xfs_stack_trace(); >> } else if (logerror) { >> xfs_alert_tag(mp, XFS_PTAG_SHUTDOWN_LOGERROR, >> "Log I/O Error Detected. Shutting down filesystem"); >> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h >> index dfa429b77ee2..8f0df67ffcc1 100644 >> --- a/fs/xfs/xfs_mount.h >> +++ b/fs/xfs/xfs_mount.h >> @@ -274,6 +274,7 @@ void xfs_do_force_shutdown(struct xfs_mount *mp, i= nt flags, char *fname, >> #define SHUTDOWN_LOG_IO_ERROR 0x0002 /* write attempt to the log fai= led */ >> #define SHUTDOWN_FORCE_UMOUNT 0x0004 /* shutdown from a forced unmou= nt */ >> #define SHUTDOWN_CORRUPT_INCORE 0x0008 /* corrupt in-memory data str= uctures */ >> +#define SHUTDOWN_CORRUPT_META 0x0010 /* corrupt metadata on device *= / >> =20 >> /* >> * Flags for xfs_mountfs >> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c >> index 813be879a5e5..93093fe0ee8a 100644 >> --- a/fs/xfs/xfs_super.c >> +++ b/fs/xfs/xfs_super.c >> @@ -35,6 +35,11 @@ >> #include "xfs_refcount_item.h" >> #include "xfs_bmap_item.h" >> #include "xfs_reflink.h" >> +#include "xfs_alloc.h" >> +#include "xfs_rmap.h" >> +#include "xfs_rmap_btree.h" >> +#include "xfs_rtalloc.h" >> +#include "xfs_bit.h" >> =20 >> #include >> #include >> @@ -1105,6 +1110,109 @@ xfs_fs_free_cached_objects( >> return xfs_reclaim_inodes_nr(XFS_M(sb), sc->nr_to_scan); >> } >> =20 >> +static int >> +xfs_corrupt_helper( >> + struct xfs_btree_cur *cur, >> + struct xfs_rmap_irec *rec, >> + void *data) >> +{ >> + struct xfs_inode *ip; >> + struct address_space *mapping; >> + int rc =3D 0; >> + int *flags =3D data; >> + >> + if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) || >> + (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) { >> + // TODO check and try to fix metadata >> + rc =3D -EFSCORRUPTED; >=20 > The xfs_force_shutdown() call should go here, since SHUTDOWN_CORRUPT_ME= TA > is specific to this case. OK. >=20 > I guess one could also dig through the buffer cache and delwri_submit > those buffers or something. >=20 >> + } else { >> + /* >> + * Get files that incore, filter out others that are not in use. >> + */ >> + rc =3D xfs_iget(cur->bc_mp, cur->bc_tp, rec->rm_owner, >> + XFS_IGET_INCORE, 0, &ip); >> + if (rc || !ip) >> + return rc; >> + if (!VFS_I(ip)->i_mapping) >> + goto out; >> + >> + mapping =3D VFS_I(ip)->i_mapping; >> + if (IS_DAX(VFS_I(ip))) >> + rc =3D mf_dax_mapping_kill_procs(mapping, rec->rm_offset, >> + *flags); >> + else >> + mapping_set_error(mapping, -EIO); >> + >> + // TODO try to fix data >=20 > What could we do to fix the data? If we're not in S_DAX mode and > there's actually pagecache mapped in, does that imply that we could > mark it dirty and kick off dirty pagecache writeback? But in this case, the dax page is already broken, it seems that page=20 cache should not be written back to the origin dax page. I think=20 another dax page need to be allocate for the writeback. >=20 >> +out: >> + xfs_irele(ip); >> + } >> + >> + return rc; >> +} >> + >> +static int >> +xfs_fs_corrupted_range( >> + struct super_block *sb, >> + struct block_device *bdev, >> + loff_t offset, >> + size_t len, >> + void *data) >> +{ >> + struct xfs_mount *mp =3D XFS_M(sb); >> + struct xfs_trans *tp =3D NULL; >> + struct xfs_btree_cur *cur =3D NULL; >> + struct xfs_rmap_irec rmap_low, rmap_high; >> + struct xfs_buf *agf_bp =3D NULL; >> + xfs_fsblock_t fsbno =3D XFS_B_TO_FSB(mp, offset); >> + xfs_filblks_t bcnt =3D XFS_B_TO_FSB(mp, len); >> + xfs_agnumber_t agno =3D XFS_FSB_TO_AGNO(mp, fsbno); >> + xfs_agblock_t agbno =3D XFS_FSB_TO_AGBNO(mp, fsbno); >> + int error =3D 0; >> + >> + if (mp->m_rtdev_targp && mp->m_rtdev_targp->bt_bdev =3D=3D bdev) { >> + xfs_warn(mp, "corrupted_range support not available for realtime de= vice!"); >> + return 0; >> + } >> + if (mp->m_logdev_targp && mp->m_logdev_targp->bt_bdev =3D=3D bdev && >> + mp->m_logdev_targp !=3D mp->m_ddev_targp) { >> + xfs_err(mp, "ondisk log corrupt, shutting down fs!"); >> + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_META); >=20 > Longer term question for the rest of the xfs community: Can we do bette= r > than this? If the ail has checkpointed past this part of the log then > we could just write zeroes into dead area, right? >=20 > Also, if one of the log buffers points to a dead log area and isn't the > one that's currently being written into, can we just submit_bio it to > rewrite the lost part of the log?? Yes, We should also fix the log rather than shutdown it directly. I=20 will take that into consideration in future patches. >=20 >> + return 0; >> + } >> + >> + if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) { >> + xfs_warn(mp, "corrupted_range needs rmapbt enabled!"); >> + return 0; >> + } >> + >> + error =3D xfs_trans_alloc_empty(mp, &tp); >> + if (error) >> + return error; >> + >> + error =3D xfs_alloc_read_agf(mp, tp, agno, 0, &agf_bp); >> + if (error) >> + goto out_cancel_tp; >> + >> + cur =3D xfs_rmapbt_init_cursor(mp, tp, agf_bp, agno); >> + >> + /* Construct a range for rmap query */ >> + memset(&rmap_low, 0, sizeof(rmap_low)); >> + memset(&rmap_high, 0xFF, sizeof(rmap_high)); >> + rmap_low.rm_startblock =3D rmap_high.rm_startblock =3D agbno; >> + rmap_low.rm_blockcount =3D rmap_high.rm_blockcount =3D bcnt; >> + >> + error =3D xfs_rmap_query_range(cur, &rmap_low, &rmap_high, xfs_corru= pt_helper, data); >=20 > Long line here... >=20 >> + if (error =3D=3D -EFSCORRUPTED) >> + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_META); >=20 > This should go in xfs_corrupt_helper as I mentioned above. OK. -- Thanks, Ruan Shiyang. >=20 > --D >=20 >> + >> + xfs_btree_del_cursor(cur, error); >> + xfs_trans_brelse(tp, agf_bp); >> +out_cancel_tp: >> + xfs_trans_cancel(tp); >> + return error; >> +} >> + >> static const struct super_operations xfs_super_operations =3D { >> .alloc_inode =3D xfs_fs_alloc_inode, >> .destroy_inode =3D xfs_fs_destroy_inode, >> @@ -1118,6 +1226,7 @@ static const struct super_operations xfs_super_o= perations =3D { >> .show_options =3D xfs_fs_show_options, >> .nr_cached_objects =3D xfs_fs_nr_cached_objects, >> .free_cached_objects =3D xfs_fs_free_cached_objects, >> + .corrupted_range =3D xfs_fs_corrupted_range, >> }; >> =20 >> static int >> --=20 >> 2.30.0 >> >> >> >=20 >=20