From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=KBoR=HE=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-15.4 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,
	USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 097BAC433E6
	for <linux-mm@archiver.kernel.org>; Tue,  2 Feb 2021 12:48:15 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 7D46364F49
	for <linux-mm@archiver.kernel.org>; Tue,  2 Feb 2021 12:48:14 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7D46364F49
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id BA2E66B0005; Tue,  2 Feb 2021 07:48:13 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id B52AA6B006E; Tue,  2 Feb 2021 07:48:13 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id A407F6B0070; Tue,  2 Feb 2021 07:48:13 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0022.hostedemail.com [216.40.44.22])
	by kanga.kvack.org (Postfix) with ESMTP id 8EB516B0005
	for <linux-mm@kvack.org>; Tue,  2 Feb 2021 07:48:13 -0500 (EST)
Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id 4FF9E180AD811
	for <linux-mm@kvack.org>; Tue,  2 Feb 2021 12:48:13 +0000 (UTC)
X-FDA: 77773305666.20.night86_3d0b879275cb
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin20.hostedemail.com (Postfix) with ESMTP id 2B858180C07AF
	for <linux-mm@kvack.org>; Tue,  2 Feb 2021 12:48:13 +0000 (UTC)
X-HE-Tag: night86_3d0b879275cb
X-Filterd-Recvd-Size: 10512
Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132])
	by imf19.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue,  2 Feb 2021 12:48:11 +0000 (UTC)
X-IronPort-AV: E=Sophos;i="5.79,394,1602518400"; 
   d="scan'208";a="104103944"
Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5])
  by heian.cn.fujitsu.com with ESMTP; 02 Feb 2021 20:48:06 +0800
Received: from G08CNEXMBPEKD05.g08.fujitsu.local (unknown [10.167.33.204])
	by cn.fujitsu.com (Postfix) with ESMTP id C3DA44CE6D68;
	Tue,  2 Feb 2021 20:48:02 +0800 (CST)
Received: from irides.mr (10.167.225.141) by G08CNEXMBPEKD05.g08.fujitsu.local
 (10.167.33.204) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 2 Feb
 2021 20:48:03 +0800
Subject: Re: [PATCH RESEND v2 09/10] xfs: Implement ->corrupted_range() for
 XFS
To: "Darrick J. Wong" <djwong@kernel.org>
CC: <linux-kernel@vger.kernel.org>, <linux-xfs@vger.kernel.org>,
	<linux-nvdimm@lists.01.org>, <linux-mm@kvack.org>,
	<linux-fsdevel@vger.kernel.org>, <dm-devel@redhat.com>,
	<darrick.wong@oracle.com>, <dan.j.williams@intel.com>, <david@fromorbit.com>,
	<hch@lst.de>, <agk@redhat.com>, <snitzer@redhat.com>, <rgoldwyn@suse.de>,
	<qi.fuli@fujitsu.com>, <y-goto@fujitsu.com>
References: <20210129062757.1594130-1-ruansy.fnst@cn.fujitsu.com>
 <20210129062757.1594130-10-ruansy.fnst@cn.fujitsu.com>
 <20210202024147.GI7193@magnolia>
From: Ruan Shiyang <ruansy.fnst@cn.fujitsu.com>
Message-ID: <d9c989bc-be7d-4771-4cd8-2e4653648b4d@cn.fujitsu.com>
Date: Tue, 2 Feb 2021 20:48:01 +0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.7.0
MIME-Version: 1.0
In-Reply-To: <20210202024147.GI7193@magnolia>
Content-Type: text/plain; charset="UTF-8"; format=flowed
Content-Language: en-US
X-Originating-IP: [10.167.225.141]
X-ClientProxiedBy: G08CNEXCHPEKD04.g08.fujitsu.local (10.167.33.200) To
 G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204)
X-yoursite-MailScanner-ID: C3DA44CE6D68.AE3BA
X-yoursite-MailScanner: Found to be clean
X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>


On 2021/2/2 =E4=B8=8A=E5=8D=8810:41, Darrick J. Wong wrote:
> On Fri, Jan 29, 2021 at 02:27:56PM +0800, Shiyang Ruan wrote:
>> This function is used to handle errors which may cause data lost in
>> filesystem.  Such as memory failure in fsdax mode.
>>
>> In XFS, it requires "rmapbt" feature in order to query for files or
>> metadata which associated to the corrupted data.  Then we could call f=
s
>> recover functions to try to repair the corrupted data.(did not
>> implemented in this patchset)
>=20
> I would suggest:
> "If the rmap feature of XFS enabled, we can query it to find files and
> metadata which are associated with the corrupt data.  For now all we do
> is kill processes with that file mapped into their address spaces, but
> future patches could actually do something about corrupt metadata."
>=20

Yes, this is better.

>> After that, the memory failure also needs to notify the processes who
>> are using those files.
>>
>> Only support data device.  Realtime device is not supported for now.
>>
>> Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
>> ---
>>   fs/xfs/xfs_fsops.c |   5 +++
>>   fs/xfs/xfs_mount.h |   1 +
>>   fs/xfs/xfs_super.c | 109 +++++++++++++++++++++++++++++++++++++++++++=
++
>>   3 files changed, 115 insertions(+)
>>
>> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
>> index 959ce91a3755..f03901a5c673 100644
>> --- a/fs/xfs/xfs_fsops.c
>> +++ b/fs/xfs/xfs_fsops.c
>> @@ -498,6 +498,11 @@ xfs_do_force_shutdown(
>>   "Corruption of in-memory data detected.  Shutting down filesystem");
>>   		if (XFS_ERRLEVEL_HIGH <=3D xfs_error_level)
>>   			xfs_stack_trace();
>> +	} else if (flags & SHUTDOWN_CORRUPT_META) {
>> +		xfs_alert_tag(mp, XFS_PTAG_SHUTDOWN_CORRUPT,
>> +"Corruption of on-disk metadata detected.  Shutting down filesystem")=
;
>> +		if (XFS_ERRLEVEL_HIGH <=3D xfs_error_level)
>> +			xfs_stack_trace();
>>   	} else if (logerror) {
>>   		xfs_alert_tag(mp, XFS_PTAG_SHUTDOWN_LOGERROR,
>>   			"Log I/O Error Detected. Shutting down filesystem");
>> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
>> index dfa429b77ee2..8f0df67ffcc1 100644
>> --- a/fs/xfs/xfs_mount.h
>> +++ b/fs/xfs/xfs_mount.h
>> @@ -274,6 +274,7 @@ void xfs_do_force_shutdown(struct xfs_mount *mp, i=
nt flags, char *fname,
>>   #define SHUTDOWN_LOG_IO_ERROR	0x0002	/* write attempt to the log fai=
led */
>>   #define SHUTDOWN_FORCE_UMOUNT	0x0004	/* shutdown from a forced unmou=
nt */
>>   #define SHUTDOWN_CORRUPT_INCORE	0x0008	/* corrupt in-memory data str=
uctures */
>> +#define SHUTDOWN_CORRUPT_META	0x0010  /* corrupt metadata on device *=
/
>>  =20
>>   /*
>>    * Flags for xfs_mountfs
>> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
>> index 813be879a5e5..93093fe0ee8a 100644
>> --- a/fs/xfs/xfs_super.c
>> +++ b/fs/xfs/xfs_super.c
>> @@ -35,6 +35,11 @@
>>   #include "xfs_refcount_item.h"
>>   #include "xfs_bmap_item.h"
>>   #include "xfs_reflink.h"
>> +#include "xfs_alloc.h"
>> +#include "xfs_rmap.h"
>> +#include "xfs_rmap_btree.h"
>> +#include "xfs_rtalloc.h"
>> +#include "xfs_bit.h"
>>  =20
>>   #include <linux/magic.h>
>>   #include <linux/fs_context.h>
>> @@ -1105,6 +1110,109 @@ xfs_fs_free_cached_objects(
>>   	return xfs_reclaim_inodes_nr(XFS_M(sb), sc->nr_to_scan);
>>   }
>>  =20
>> +static int
>> +xfs_corrupt_helper(
>> +	struct xfs_btree_cur		*cur,
>> +	struct xfs_rmap_irec		*rec,
>> +	void				*data)
>> +{
>> +	struct xfs_inode		*ip;
>> +	struct address_space		*mapping;
>> +	int				rc =3D 0;
>> +	int				*flags =3D data;
>> +
>> +	if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
>> +	    (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) {
>> +		// TODO check and try to fix metadata
>> +		rc =3D -EFSCORRUPTED;
>=20
> The xfs_force_shutdown() call should go here, since SHUTDOWN_CORRUPT_ME=
TA
> is specific to this case.

OK.
>=20
> I guess one could also dig through the buffer cache and delwri_submit
> those buffers or something.
>=20
>> +	} else {
>> +		/*
>> +		 * Get files that incore, filter out others that are not in use.
>> +		 */
>> +		rc =3D xfs_iget(cur->bc_mp, cur->bc_tp, rec->rm_owner,
>> +			      XFS_IGET_INCORE, 0, &ip);
>> +		if (rc || !ip)
>> +			return rc;
>> +		if (!VFS_I(ip)->i_mapping)
>> +			goto out;
>> +
>> +		mapping =3D VFS_I(ip)->i_mapping;
>> +		if (IS_DAX(VFS_I(ip)))
>> +			rc =3D mf_dax_mapping_kill_procs(mapping, rec->rm_offset,
>> +						       *flags);
>> +		else
>> +			mapping_set_error(mapping, -EIO);
>> +
>> +		// TODO try to fix data
>=20
> What could we do to fix the data?  If we're not in S_DAX mode and
> there's actually pagecache mapped in, does that imply that we could
> mark it dirty and kick off dirty pagecache writeback?

But in this case, the dax page is already broken, it seems that page=20
cache should not be written back to the origin dax page.  I think=20
another dax page need to be allocate for the writeback.

>=20
>> +out:
>> +		xfs_irele(ip);
>> +	}
>> +
>> +	return rc;
>> +}
>> +
>> +static int
>> +xfs_fs_corrupted_range(
>> +	struct super_block	*sb,
>> +	struct block_device	*bdev,
>> +	loff_t			offset,
>> +	size_t			len,
>> +	void			*data)
>> +{
>> +	struct xfs_mount	*mp =3D XFS_M(sb);
>> +	struct xfs_trans	*tp =3D NULL;
>> +	struct xfs_btree_cur	*cur =3D NULL;
>> +	struct xfs_rmap_irec	rmap_low, rmap_high;
>> +	struct xfs_buf		*agf_bp =3D NULL;
>> +	xfs_fsblock_t		fsbno =3D XFS_B_TO_FSB(mp, offset);
>> +	xfs_filblks_t		bcnt =3D XFS_B_TO_FSB(mp, len);
>> +	xfs_agnumber_t		agno =3D XFS_FSB_TO_AGNO(mp, fsbno);
>> +	xfs_agblock_t		agbno =3D XFS_FSB_TO_AGBNO(mp, fsbno);
>> +	int			error =3D 0;
>> +
>> +	if (mp->m_rtdev_targp && mp->m_rtdev_targp->bt_bdev =3D=3D bdev) {
>> +		xfs_warn(mp, "corrupted_range support not available for realtime de=
vice!");
>> +		return 0;
>> +	}
>> +	if (mp->m_logdev_targp && mp->m_logdev_targp->bt_bdev =3D=3D bdev &&
>> +	    mp->m_logdev_targp !=3D mp->m_ddev_targp) {
>> +		xfs_err(mp, "ondisk log corrupt, shutting down fs!");
>> +		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_META);
>=20
> Longer term question for the rest of the xfs community: Can we do bette=
r
> than this?  If the ail has checkpointed past this part of the log then
> we could just write zeroes into dead area, right?
>=20
> Also, if one of the log buffers points to a dead log area and isn't the
> one that's currently being written into, can we just submit_bio it to
> rewrite the lost part of the log??

Yes, We should also fix the log rather than shutdown it directly.  I=20
will take that into consideration in future patches.

>=20
>> +		return 0;
>> +	}
>> +
>> +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) {
>> +		xfs_warn(mp, "corrupted_range needs rmapbt enabled!");
>> +		return 0;
>> +	}
>> +
>> +	error =3D xfs_trans_alloc_empty(mp, &tp);
>> +	if (error)
>> +		return error;
>> +
>> +	error =3D xfs_alloc_read_agf(mp, tp, agno, 0, &agf_bp);
>> +	if (error)
>> +		goto out_cancel_tp;
>> +
>> +	cur =3D xfs_rmapbt_init_cursor(mp, tp, agf_bp, agno);
>> +
>> +	/* Construct a range for rmap query */
>> +	memset(&rmap_low, 0, sizeof(rmap_low));
>> +	memset(&rmap_high, 0xFF, sizeof(rmap_high));
>> +	rmap_low.rm_startblock =3D rmap_high.rm_startblock =3D agbno;
>> +	rmap_low.rm_blockcount =3D rmap_high.rm_blockcount =3D bcnt;
>> +
>> +	error =3D xfs_rmap_query_range(cur, &rmap_low, &rmap_high, xfs_corru=
pt_helper, data);
>=20
> Long line here...
>=20
>> +	if (error =3D=3D -EFSCORRUPTED)
>> +		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_META);
>=20
> This should go in xfs_corrupt_helper as I mentioned above.

OK.


--
Thanks,
Ruan Shiyang.
>=20
> --D
>=20
>> +
>> +	xfs_btree_del_cursor(cur, error);
>> +	xfs_trans_brelse(tp, agf_bp);
>> +out_cancel_tp:
>> +	xfs_trans_cancel(tp);
>> +	return error;
>> +}
>> +
>>   static const struct super_operations xfs_super_operations =3D {
>>   	.alloc_inode		=3D xfs_fs_alloc_inode,
>>   	.destroy_inode		=3D xfs_fs_destroy_inode,
>> @@ -1118,6 +1226,7 @@ static const struct super_operations xfs_super_o=
perations =3D {
>>   	.show_options		=3D xfs_fs_show_options,
>>   	.nr_cached_objects	=3D xfs_fs_nr_cached_objects,
>>   	.free_cached_objects	=3D xfs_fs_free_cached_objects,
>> +	.corrupted_range	=3D xfs_fs_corrupted_range,
>>   };
>>  =20
>>   static int
>> --=20
>> 2.30.0
>>
>>
>>
>=20
>=20