From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D99E2C64E8A for ; Wed, 2 Dec 2020 07:14:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0FBD422203 for ; Wed, 2 Dec 2020 07:14:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0FBD422203 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1D1AA8D0002; Wed, 2 Dec 2020 02:14:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 180FC8D0001; Wed, 2 Dec 2020 02:14:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 071B98D0002; Wed, 2 Dec 2020 02:14:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E55DE8D0001 for ; Wed, 2 Dec 2020 02:14:19 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id A75801EFF for ; Wed, 2 Dec 2020 07:14:19 +0000 (UTC) X-FDA: 77547478638.08.goose14_1312b23273b1 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 8A2161819E623 for ; Wed, 2 Dec 2020 07:14:19 +0000 (UTC) X-HE-Tag: goose14_1312b23273b1 X-Filterd-Recvd-Size: 8468 Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Dec 2020 07:14:17 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.78,385,1599494400"; d="scan'208";a="101976775" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 02 Dec 2020 15:14:10 +0800 Received: from G08CNEXMBPEKD05.g08.fujitsu.local (unknown [10.167.33.204]) by cn.fujitsu.com (Postfix) with ESMTP id 11B374CE5CF5; Wed, 2 Dec 2020 15:14:06 +0800 (CST) Received: from irides.mr (10.167.225.141) by G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 2 Dec 2020 15:14:05 +0800 Subject: Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink To: Dave Chinner CC: , , , , , , , , , , , , References: <20201123004116.2453-1-ruansy.fnst@cn.fujitsu.com> <20201129224723.GG2842436@dread.disaster.area> From: Ruan Shiyang Message-ID: Date: Wed, 2 Dec 2020 15:12:20 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: <20201129224723.GG2842436@dread.disaster.area> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Language: en-US X-Originating-IP: [10.167.225.141] X-ClientProxiedBy: G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.205) To G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) X-yoursite-MailScanner-ID: 11B374CE5CF5.A2628 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Dave, On 2020/11/30 =E4=B8=8A=E5=8D=886:47, Dave Chinner wrote: > On Mon, Nov 23, 2020 at 08:41:10AM +0800, Shiyang Ruan wrote: >>=20 >> The call trace is like this: >> memory_failure() >> pgmap->ops->memory_failure() =3D> pmem_pgmap_memory_failure() >> gendisk->fops->block_lost() =3D> pmem_block_lost() or >> md_blk_block_lost() >> sb->s_ops->storage_lost() =3D> xfs_fs_storage_lost() >> xfs_rmap_query_range() >> xfs_storage_lost_helper() >> mf_recover_controller->recover_fn =3D> \ >> memory_failure_dev_pagemap_kill_procs() >> >> The collect_procs() and kill_procs() are moved into a callback which >> is passed from memory_failure() to xfs_storage_lost_helper(). So we >> can call it when a file assocaited is found, instead of creating a >> file list and iterate it. >> >> The fsdax & reflink support for XFS is not contained in this patchset. >=20 > This looks promising - the overall architecture is a lot more > generic and less dependent on knowing about memory, dax or memory > failures. A few comments that I think would further improve > understanding the patchset and the implementation: Thanks for your kindly comment. It gives me confidence. >=20 > - the order of the patches is inverted. It should start with a > single patch introducing the mf_recover_controller structure for > callbacks, then introduce pgmap->ops->memory_failure, then > ->block_lost, then the pmem and md implementations of ->block > list, then ->storage_lost and the XFS implementations of > ->storage_lost. Yes, it will be easier to understand the patchset in this order. But I have something unsure: for example, I introduce ->memory_failure()=20 firstly, but the implementation of ->memory_failure() needs to call=20 ->block_lost() which is supposed to be introduced in the next patch. So,=20 I am not sure the code is supposed to be what in the implementation of=20 ->memory_failure() in pmem? To avoid this situation, I committed the=20 patches in the inverted order: lowest level first, then its caller, and=20 then caller's caller. I am trying to sort out the order. How about this: Patch i. Introduce ->memory_failure() - just introduce interface, without implementation Patch i++. Introduce ->block_lost() - introduce interface and implement ->memory_failure() in pmem, so that it can call ->block_lost() Patch i++. (similar with above, skip...) >=20 > - I think the names "block_lost" and "storage_lost" are misleading. > It's more like a "media failure" or a general "data corruption" > event at a specific physical location. The data may not be "lost" > but only damaged, so we might be able to recover from it without > "losing" anything. Hence I think they could be better named, > perhaps just "->corrupt_range" 'corrupt' sounds better. (I'm not good at naming functions...) >=20 > - need to pass a {offset,len} pair through the chain, not just a > single offset. This will allow other types of devices to report > different ranges of failures, from a single sector to an entire > device. Yes, it's better to add the length. I restrictively thought that=20 memory-failure on pmem should affect one single page at one time. >=20 > - I'm not sure that passing the mf_recover_controller structure > through the corruption event chain is the right thing to do here. > A block device could generate this storage failure callback if it > detects an unrecoverable error (e.g. during a MD media scrub or > rebuild/resilver failure) and in that case we don't have PFNs or > memory device failure functions to perform. >=20 > IOWs, I think the action that is taken needs to be independent of > the source that generated the error. Even for a pmem device, we > can be using the page cache, so it may be possible to recover the > pmem error by writing the cached page (if it exists) back over the > pmem. >=20 > Hence I think that the recover function probably needs to be moved > to the address space ops, because what we do to recover from the > error is going to be dependent on type of mapping the filesystem > is using. If it's a DAX mapping, we call back into a generic DAX > function that does the vma walk and process kill functions. If it > is a page cache mapping, then if the page is cached then we can > try to re-write it to disk to fix the bad data, otherwise we treat > it like a writeback error and report it on the next > write/fsync/close operation done on that file. >=20 > This gets rid of the mf_recover_controller altogether and allows > the interface to be used by any sort of block device for any sort > of bottom-up reporting of media/device failures. Moving the recover function to the address_space ops looks a better=20 idea. But I think that the error handler for page cache mapping is=20 finished well in memory-failure. The memory-failure is also reused to=20 handles anonymous page. If we move the recover function to=20 address_space ops, I think we also need to refactor the existing handler=20 for page cache mapping, which may affect anonymous page handling. This=20 makes me confused... I rewrote the call trace: memory_failure() * dax mapping case pgmap->ops->memory_failure() =3D> pmem_pgmap_memory_failure() gendisk->fops->block_corrupt_range() =3D> - pmem_block_corrupt_range() - md_blk_block_corrupt_range() sb->s_ops->storage_currupt_range() =3D> xfs_fs_storage_corrupt_range() xfs_rmap_query_range() xfs_storage_lost_helper() mapping->a_ops->corrupt_range() =3D> xfs_dax_aops.xfs_dax_corrupt_range memory_failure_dev_pagemap_kill_procs() * page cache mapping case mapping->a_ops->corrupt_range() =3D> xfs_address_space_operations.xfs_xxx memory_failure_generic_kill_procs() It's rough and not completed yet. Hope for your comment. --=20 Thanks, Ruan Shiyang. >=20 > Cheers, >=20 > Dave. >=20