From: Dave Chinner <firstname.lastname@example.org> To: Dan Williams <email@example.com> Cc: "firstname.lastname@example.org" <email@example.com>, "firstname.lastname@example.org" <email@example.com>, "firstname.lastname@example.org" <email@example.com>, "firstname.lastname@example.org" <email@example.com>, "firstname.lastname@example.org" <email@example.com>, "firstname.lastname@example.org" <email@example.com>, "firstname.lastname@example.org" <email@example.com>, "firstname.lastname@example.org" <email@example.com>, "firstname.lastname@example.org" <email@example.com>, "firstname.lastname@example.org" <email@example.com>, "firstname.lastname@example.org" <email@example.com>, "firstname.lastname@example.org" <email@example.com>, "firstname.lastname@example.org" <email@example.com> Subject: Re: [Ocfs2-devel] Question about the "EXPERIMENTAL" tag for dax in XFS Date: Tue, 2 Mar 2021 16:38:28 +1100 [thread overview] Message-ID: <20210302053828.GI4662@dread.disaster.area> (raw) In-Reply-To: <CAPcyv4ja8gnTR1E-Ge5etm+y69cHwdWN6Bg79wPPF4M=C-w79A@mail.gmail.com> On Mon, Mar 01, 2021 at 07:33:28PM -0800, Dan Williams wrote: > On Mon, Mar 1, 2021 at 6:42 PM Dave Chinner <firstname.lastname@example.org> wrote: > [..] > > We do not need a DAX specific mechanism to tell us "DAX device > > gone", we need a generic block device interface that tells us "range > > of block device is gone". > > This is the crux of the disagreement. The block_device is going away > *and* the dax_device is going away. No, that is not the disagreement I have with what you are saying. You still haven't understand that it's even more basic and generic than devices going away. At the simplest form, all the filesystem wants is to be notified of is when *unrecoverable media errors* occur in the persistent storage that underlies the filesystem. The filesystem does not care what that media is build from - PMEM, flash, corroded spinning disks, MRAM, or any other persistent media you can think off. It just doesn't matter. What we care about is that the contents of a *specific LBA range* no longer contain *valid data*. IOWs, the data in that range of the block device has been lost, cannot be retreived and/or cannot be written to any more. PMEM taking a MCE because ECC tripped is a media error because data is lost and inaccessible until recovery actions are taken. MD RAID failing a scrub is a media error and data is lost and unrecoverable at that layer. A device disappearing is a media error because the storage media is now permanently inaccessible to the higher layers. This "media error" categorisation is a fundamental property of persistent storage and, as such, is a property of the block devices used to access said persistent storage. That's the disagreement here - that you and Christoph are saying ->corrupted_range is not a block device property because only a pmem/DAX device currently generates it. You both seem to be NACKing a generic interface because it's only implemented for the first subsystem that needs it. AFAICT, you either don't understand or are completely ignoring the architectural need for it to be provided across the rest of the storage stack that *block device based filesystems depend on*. Sure, there might be dax device based fielsystems around the corner. They just require a different pmem device ->corrupted_range callout to implement the notification - one that directs to the dax device rather than the block device. That's simple and trivial to implement, but such functionaity for DAX devices does not replace the need for the same generic functionality to be provided across a *range of different block devices* as required by *block device based filesystems*. And that's fundamentally the problem. XFS is block device based, not DAX device based. We require errors to be reported through block device mechanisms. fs-dax does not change this - it is based on pmem being presented as a primarily as a block device to the block device based filesystems and only secondarily as a dax device. Hence if it can be trivially implemented as a block device interface, that's where it should go, because then all the other block devices that the filesytem runs on can provide the same functionality for similar media error events.... > The dax_device removal implies one > set of actions (direct accessed pfns invalid) the block device removal > implies another (block layer sector access offline). There you go again, saying DAX requires an action, while the block device notification is a -state change- (i.e. goes offline). This is exactly what I said was wrong in my last email. > corrupted_range > is blurring the notification for 2 different failure domains. Look at > the nascent idea to mount a filesystem on dax sans a block device. > Look at the existing plumbing for DM to map dax_operations through a > device stack. Ummm, it just maps the direct_access call to the underlying device and calls it's ->direct_access method. All it's doing is LBA mapping. That's all it needs to do for ->corrupted_range, too. I have no clue why you think this is a problem for error notification... > Look at the pushback Ruan got for adding a new > block_device operation for corrupted_range(). one person said "no". That's hardly pushback. Especially as I think Christoph's objection about this being dax specific functionality is simply wrong, as per above. > > This is why we need to communicate what error occurred, not what > > action a device driver thinks needs to be taken. > > The driver is only an event producer in this model, whatever the > consumer does at the other end is not its concern. There may be a > generic consumer and a filesystem specific consumer. <sigh> That's why these are all ops functions that can provide multiple implementations to different device types. So that when we get a new use case, the ops function structure can be replaced with one that directs the notification to the new user instead of to the existing one. It's a design pattern we use all over the kernel code. Cheers, Dave. -- Dave Chinner email@example.com _______________________________________________ Ocfs2-devel mailing list Ocfs2firstname.lastname@example.org https://oss.oracle.com/mailman/listinfo/ocfs2-devel
next prev parent reply other threads:[~2021-03-02 5:45 UTC|newest] Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-02-26 0:20 [Ocfs2-devel] [PATCH v2 00/10] fsdax, xfs: Add reflink&dedupe support for fsdax Shiyang Ruan 2021-02-26 0:20 ` [Ocfs2-devel] [PATCH v2 01/10] fsdax: Factor helpers to simplify dax fault code Shiyang Ruan 2021-03-03 9:13 ` Christoph Hellwig 2021-02-26 0:20 ` [Ocfs2-devel] [PATCH v2 02/10] fsdax: Factor helper: dax_fault_actor() Shiyang Ruan 2021-03-03 9:28 ` Christoph Hellwig 2021-03-12 9:01 ` ruansy.fnst 2021-02-26 0:20 ` [Ocfs2-devel] [PATCH v2 03/10] fsdax: Output address in dax_iomap_pfn() and rename it Shiyang Ruan 2021-02-26 0:20 ` [Ocfs2-devel] [PATCH v2 05/10] fsdax: Replace mmap entry in case of CoW Shiyang Ruan 2021-03-03 9:30 ` Christoph Hellwig 2021-03-03 9:41 ` ruansy.fnst 2021-03-03 9:44 ` Christoph Hellwig 2021-03-03 9:48 ` Christoph Hellwig 2021-02-26 0:20 ` [Ocfs2-devel] [PATCH v2 08/10] fsdax: Dedup file range to use a compare function Shiyang Ruan 2021-02-26 8:28 ` Shiyang Ruan 2021-03-03 8:20 ` Joe Perches 2021-03-03 8:45 ` ruansy.fnst 2021-03-03 9:04 ` Joe Perches 2021-03-03 9:39 ` hch 2021-03-03 9:46 ` ruansy.fnst 2021-03-04 5:42 ` [Ocfs2-devel] [RESEND PATCH v2.1 " Shiyang Ruan 2021-02-26 0:20 ` [Ocfs2-devel] [PATCH v2 09/10] fs/xfs: Handle CoW for fsdax write() path Shiyang Ruan 2021-03-03 9:43 ` Christoph Hellwig 2021-03-03 9:57 ` ruansy.fnst 2021-03-03 10:43 ` Christoph Hellwig 2021-03-04 1:35 ` ruansy.fnst 2021-02-26 0:20 ` [Ocfs2-devel] [PATCH v2 10/10] fs/xfs: Add dedupe support for fsdax Shiyang Ruan 2021-02-26 9:45 ` [Ocfs2-devel] Question about the "EXPERIMENTAL" tag for dax in XFS ruansy.fnst 2021-02-26 19:04 ` Darrick J. Wong 2021-02-26 19:24 ` Dan Williams 2021-02-26 20:51 ` Dave Chinner 2021-02-26 20:59 ` Dan Williams 2021-02-26 21:27 ` Dave Chinner 2021-02-26 22:41 ` Dan Williams 2021-02-27 22:36 ` Dave Chinner 2021-02-27 23:40 ` Dan Williams 2021-02-28 22:38 ` Dave Chinner 2021-03-01 20:55 ` Dan Williams 2021-03-01 22:46 ` Dave Chinner 2021-03-02 0:32 ` Dan Williams 2021-03-02 2:42 ` Dave Chinner 2021-03-02 3:33 ` Dan Williams 2021-03-02 5:38 ` Dave Chinner [this message] 2021-03-02 5:50 ` Dan Williams 2021-03-02 3:28 ` Darrick J. Wong 2021-03-02 5:41 ` Dan Williams 2021-03-02 7:57 ` Dave Chinner 2021-03-02 17:49 ` Dan Williams 2021-03-04 23:40 ` Darrick J. Wong 2021-03-01 7:26 ` Yasunori Goto 2021-03-01 21:34 ` Dan Williams [not found] ` <email@example.com> 2021-03-03 9:29 ` [Ocfs2-devel] [PATCH v2 04/10] fsdax: Introduce dax_iomap_cow_copy() Christoph Hellwig [not found] ` <firstname.lastname@example.org> 2021-03-03 9:31 ` [Ocfs2-devel] [PATCH v2 06/10] fsdax: Add dax_iomap_cow_copy() for dax_iomap_zero Christoph Hellwig [not found] ` <email@example.com> 2021-02-26 4:14 ` [Ocfs2-devel] [PATCH v2 07/10] iomap: Introduce iomap_apply2() for operations on two files Darrick J. Wong 2021-02-26 8:11 ` ruansy.fnst 2021-02-26 8:25 ` Shiyang Ruan 2021-03-04 5:41 ` [Ocfs2-devel] [RESEND PATCH v2.1 " Shiyang Ruan 2021-03-11 12:30 ` Christoph Hellwig 2021-03-09 6:36 ` [Ocfs2-devel] [PATCH v2 00/10] fsdax, xfs: Add reflink&dedupe support for fsdax Xiaoguang Wang 2021-03-10 1:32 ` ruansy.fnst 2021-03-09 16:19 ` Goldwyn Rodrigues 2021-03-10 1:26 ` ruansy.fnst 2021-03-10 12:30 ` Neal Gompa 2021-03-10 13:02 ` Matthew Wilcox 2021-03-10 13:36 ` Neal Gompa 2021-03-10 13:55 ` Matthew Wilcox 2021-03-10 14:21 ` Goldwyn Rodrigues 2021-03-10 14:26 ` Matthew Wilcox 2021-03-10 17:04 ` Goldwyn Rodrigues 2021-03-11 0:53 ` Dan Williams 2021-03-11 8:26 ` Neal Gompa 2021-03-13 13:07 ` Adam Borowski 2021-03-13 16:24 ` Neal Gompa 2021-03-13 22:00 ` Adam Borowski
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210302053828.GI4662@dread.disaster.area \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: [Ocfs2-devel] Question about the "EXPERIMENTAL" tag for dax in XFS' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).