From: Dave Chinner <david@fromorbit.com> To: Dan Williams <dan.j.williams@intel.com> Cc: "y-goto@fujitsu.com" <y-goto@fujitsu.com>, "jack@suse.cz" <jack@suse.cz>, "fnstml-iaas@cn.fujitsu.com" <fnstml-iaas@cn.fujitsu.com>, "linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>, "darrick.wong@oracle.com" <darrick.wong@oracle.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "ruansy.fnst@fujitsu.com" <ruansy.fnst@fujitsu.com>, "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>, "ocfs2-devel@oss.oracle.com" <ocfs2-devel@oss.oracle.com>, "viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>, "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>, "qi.fuli@fujitsu.com" <qi.fuli@fujitsu.com>, "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org> Subject: Re: [Ocfs2-devel] Question about the "EXPERIMENTAL" tag for dax in XFS Date: Mon, 1 Mar 2021 09:38:46 +1100 [thread overview] Message-ID: <20210228223846.GA4662@dread.disaster.area> (raw) In-Reply-To: <CAPcyv4h7XA3Jorcy_J+t9scw0A4KdT2WEwAhE-Nbjc=C2qmkMw@mail.gmail.com> On Sat, Feb 27, 2021 at 03:40:24PM -0800, Dan Williams wrote: > On Sat, Feb 27, 2021 at 2:36 PM Dave Chinner <david@fromorbit.com> wrote: > > On Fri, Feb 26, 2021 at 02:41:34PM -0800, Dan Williams wrote: > > > On Fri, Feb 26, 2021 at 1:28 PM Dave Chinner <david@fromorbit.com> wrote: > > > > On Fri, Feb 26, 2021 at 12:59:53PM -0800, Dan Williams wrote: > > it points to, check if it points to the PMEM that is being removed, > > grab the page it points to, map that to the relevant struct page, > > run collect_procs() on that page, then kill the user processes that > > map that page. > > > > So why can't we walk the ptescheck the physical pages that they > > map to and if they map to a pmem page we go poison that > > page and that kills any user process that maps it. > > > > i.e. I can't see how unexpected pmem device unplug is any different > > to an MCE delivering a hwpoison event to a DAX mapped page. > > I guess the tradeoff is walking a long list of inodes vs walking a > large array of pages. Not really. You're assuming all a filesystem has to do is invalidate everything if a device goes away, and that's not true. Finding if an inode has a mapping that spans a specific device in a multi-device filesystem can be a lot more complex than that. Just walking inodes is easy - determining whihc inodes need invalidation is the hard part. That's where ->corrupt_range() comes in - the filesystem is already set up to do reverse mapping from physical range to inode(s) offsets... > There's likely always more pages than inodes, but perhaps it's more > efficient to walk the 'struct page' array than sb->s_inodes? I really don't see you seem to be telling us that invalidation is an either/or choice. There's more ways to convert physical block address -> inode file offset and mapping index than brute force inode cache walks.... ..... > > IOWs, what needs to happen at this point is very filesystem > > specific. Assuming that "device unplug == filesystem dead" is not > > correct, nor is specifying a generic action that assumes the > > filesystem is dead because a device it is using went away. > > Ok, I think I set this discussion in the wrong direction implying any > mapping of this action to a "filesystem dead" event. It's just a "zap > all ptes" event and upper layers recover from there. Yes, that's exactly what ->corrupt_range() is intended for. It allows the filesystem to lock out access to the bad range and then recover the data. Or metadata, if that's where the bad range lands. If that recovery fails, it can then report a data loss/filesystem shutdown event to userspace and kill user procs that span the bad range... FWIW, is this notification going to occur before or after the device has been physically unplugged? i.e. what do we do about the time-of-unplug-to-time-of-invalidation window where userspace can still attempt to access the missing pmem though the not-yet-invalidated ptes? It may not be likely that people just yank pmem nvdimms out of machines, but with NVMe persistent memory spaces, there's every chance that someone pulls the wrong device... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
next prev parent reply other threads:[~2021-02-28 22:39 UTC|newest] Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-02-26 0:20 [Ocfs2-devel] [PATCH v2 00/10] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan 2021-02-26 0:20 ` [Ocfs2-devel] [PATCH v2 01/10] fsdax: Factor helpers to simplify dax fault code Shiyang Ruan 2021-03-03 9:13 ` Christoph Hellwig 2021-02-26 0:20 ` [Ocfs2-devel] [PATCH v2 02/10] fsdax: Factor helper: dax_fault_actor() Shiyang Ruan 2021-03-03 9:28 ` Christoph Hellwig 2021-03-12 9:01 ` ruansy.fnst 2021-02-26 0:20 ` [Ocfs2-devel] [PATCH v2 03/10] fsdax: Output address in dax_iomap_pfn() and rename it Shiyang Ruan 2021-02-26 0:20 ` [Ocfs2-devel] [PATCH v2 05/10] fsdax: Replace mmap entry in case of CoW Shiyang Ruan 2021-03-03 9:30 ` Christoph Hellwig 2021-03-03 9:41 ` ruansy.fnst 2021-03-03 9:44 ` Christoph Hellwig 2021-03-03 9:48 ` Christoph Hellwig 2021-02-26 0:20 ` [Ocfs2-devel] [PATCH v2 08/10] fsdax: Dedup file range to use a compare function Shiyang Ruan 2021-02-26 8:28 ` Shiyang Ruan 2021-03-03 8:20 ` Joe Perches 2021-03-03 8:45 ` ruansy.fnst 2021-03-03 9:04 ` Joe Perches 2021-03-03 9:39 ` hch 2021-03-03 9:46 ` ruansy.fnst 2021-03-04 5:42 ` [Ocfs2-devel] [RESEND PATCH v2.1 " Shiyang Ruan 2021-02-26 0:20 ` [Ocfs2-devel] [PATCH v2 09/10] fs/xfs: Handle CoW for fsdax write() path Shiyang Ruan 2021-03-03 9:43 ` Christoph Hellwig 2021-03-03 9:57 ` ruansy.fnst 2021-03-03 10:43 ` Christoph Hellwig 2021-03-04 1:35 ` ruansy.fnst 2021-02-26 0:20 ` [Ocfs2-devel] [PATCH v2 10/10] fs/xfs: Add dedupe support for fsdax Shiyang Ruan 2021-02-26 9:45 ` [Ocfs2-devel] Question about the "EXPERIMENTAL" tag for dax in XFS ruansy.fnst 2021-02-26 19:04 ` Darrick J. Wong 2021-02-26 19:24 ` Dan Williams 2021-02-26 20:51 ` Dave Chinner 2021-02-26 20:59 ` Dan Williams 2021-02-26 21:27 ` Dave Chinner 2021-02-26 22:41 ` Dan Williams 2021-02-27 22:36 ` Dave Chinner 2021-02-27 23:40 ` Dan Williams 2021-02-28 22:38 ` Dave Chinner [this message] 2021-03-01 20:55 ` Dan Williams 2021-03-01 22:46 ` Dave Chinner 2021-03-02 0:32 ` Dan Williams 2021-03-02 2:42 ` Dave Chinner 2021-03-02 3:33 ` Dan Williams 2021-03-02 5:38 ` Dave Chinner 2021-03-02 5:50 ` Dan Williams 2021-03-02 3:28 ` Darrick J. Wong 2021-03-02 5:41 ` Dan Williams 2021-03-02 7:57 ` Dave Chinner 2021-03-02 17:49 ` Dan Williams 2021-03-04 23:40 ` Darrick J. Wong 2021-03-01 7:26 ` Yasunori Goto 2021-03-01 21:34 ` Dan Williams [not found] ` <20210226002030.653855-5-ruansy.fnst@fujitsu.com> 2021-03-03 9:29 ` [Ocfs2-devel] [PATCH v2 04/10] fsdax: Introduce dax_iomap_cow_copy() Christoph Hellwig [not found] ` <20210226002030.653855-7-ruansy.fnst@fujitsu.com> 2021-03-03 9:31 ` [Ocfs2-devel] [PATCH v2 06/10] fsdax: Add dax_iomap_cow_copy() for dax_iomap_zero Christoph Hellwig [not found] ` <20210226002030.653855-8-ruansy.fnst@fujitsu.com> 2021-02-26 4:14 ` [Ocfs2-devel] [PATCH v2 07/10] iomap: Introduce iomap_apply2() for operations on two files Darrick J. Wong 2021-02-26 8:11 ` ruansy.fnst 2021-02-26 8:25 ` Shiyang Ruan 2021-03-04 5:41 ` [Ocfs2-devel] [RESEND PATCH v2.1 " Shiyang Ruan 2021-03-11 12:30 ` Christoph Hellwig 2021-03-09 6:36 ` [Ocfs2-devel] [PATCH v2 00/10] fsdax, xfs: Add reflink&dedupe support for fsdax Xiaoguang Wang 2021-03-10 1:32 ` ruansy.fnst 2021-03-09 16:19 ` Goldwyn Rodrigues 2021-03-10 1:26 ` ruansy.fnst 2021-03-10 12:30 ` Neal Gompa 2021-03-10 13:02 ` Matthew Wilcox 2021-03-10 13:36 ` Neal Gompa 2021-03-10 13:55 ` Matthew Wilcox 2021-03-10 14:21 ` Goldwyn Rodrigues 2021-03-10 14:26 ` Matthew Wilcox 2021-03-10 17:04 ` Goldwyn Rodrigues 2021-03-11 0:53 ` Dan Williams 2021-03-11 8:26 ` Neal Gompa 2021-03-13 13:07 ` Adam Borowski 2021-03-13 16:24 ` Neal Gompa 2021-03-13 22:00 ` Adam Borowski
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210228223846.GA4662@dread.disaster.area \ --to=david@fromorbit.com \ --cc=dan.j.williams@intel.com \ --cc=darrick.wong@oracle.com \ --cc=fnstml-iaas@cn.fujitsu.com \ --cc=jack@suse.cz \ --cc=linux-btrfs@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nvdimm@lists.01.org \ --cc=linux-xfs@vger.kernel.org \ --cc=ocfs2-devel@oss.oracle.com \ --cc=qi.fuli@fujitsu.com \ --cc=ruansy.fnst@fujitsu.com \ --cc=viro@zeniv.linux.org.uk \ --cc=y-goto@fujitsu.com \ --subject='Re: [Ocfs2-devel] Question about the "EXPERIMENTAL" tag for dax in XFS' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).