From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90CB6C433E0 for ; Tue, 2 Mar 2021 05:38:37 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2D4896146B for ; Tue, 2 Mar 2021 05:38:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2D4896146B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=fromorbit.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id C8EC8100EB83A; Mon, 1 Mar 2021 21:38:36 -0800 (PST) Received-SPF: Pass (helo) identity=helo; client-ip=211.29.132.53; helo=mail107.syd.optusnet.com.au; envelope-from=david@fromorbit.com; receiver= Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by ml01.01.org (Postfix) with ESMTP id 99458100EB838 for ; Mon, 1 Mar 2021 21:38:34 -0800 (PST) Received: from dread.disaster.area (pa49-179-130-210.pa.nsw.optusnet.com.au [49.179.130.210]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 6C9E4FA8629; Tue, 2 Mar 2021 16:38:30 +1100 (AEDT) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1lGxjo-00AzkM-R6; Tue, 02 Mar 2021 16:38:28 +1100 Date: Tue, 2 Mar 2021 16:38:28 +1100 From: Dave Chinner To: Dan Williams Subject: Re: Question about the "EXPERIMENTAL" tag for dax in XFS Message-ID: <20210302053828.GI4662@dread.disaster.area> References: <20210226212748.GY4662@dread.disaster.area> <20210227223611.GZ4662@dread.disaster.area> <20210228223846.GA4662@dread.disaster.area> <20210301224640.GG4662@dread.disaster.area> <20210302024227.GH4662@dread.disaster.area> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=YKPhNiOx c=1 sm=1 tr=0 cx=a_idp_d a=JD06eNgDs9tuHP7JIKoLzw==:117 a=JD06eNgDs9tuHP7JIKoLzw==:17 a=kj9zAlcOel0A:10 a=dESyimp9J3IA:10 a=7-415B0cAAAA:8 a=OJFMFRqHYWYmYpl_lvQA:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Message-ID-Hash: OGSI56236YZ5O6DCLTPAAEPPO2AR572Z X-Message-ID-Hash: OGSI56236YZ5O6DCLTPAAEPPO2AR572Z X-MailFrom: david@fromorbit.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation CC: "Darrick J. Wong" , "ruansy.fnst@fujitsu.com" , "linux-kernel@vger.kernel.org" , "linux-xfs@vger.kernel.org" , "linux-nvdimm@lists.01.org" , "linux-fsdevel@vger.kernel.org" , "darrick.wong@oracle.com" , "willy@infradead.org" , "jack@suse.cz" , "viro@zeniv.linux.org.uk" , "linux-btrfs@vger.kernel.org" , "ocfs2-devel@oss.oracle.com" , "hch@lst.de" , "rgoldwyn@suse.de" , "y-goto@fujitsu.com" , "qi.fuli@fujitsu.com" , "fnstml-iaas@cn.fujitsu.com" X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Mon, Mar 01, 2021 at 07:33:28PM -0800, Dan Williams wrote: > On Mon, Mar 1, 2021 at 6:42 PM Dave Chinner wrote: > [..] > > We do not need a DAX specific mechanism to tell us "DAX device > > gone", we need a generic block device interface that tells us "range > > of block device is gone". > > This is the crux of the disagreement. The block_device is going away > *and* the dax_device is going away. No, that is not the disagreement I have with what you are saying. You still haven't understand that it's even more basic and generic than devices going away. At the simplest form, all the filesystem wants is to be notified of is when *unrecoverable media errors* occur in the persistent storage that underlies the filesystem. The filesystem does not care what that media is build from - PMEM, flash, corroded spinning disks, MRAM, or any other persistent media you can think off. It just doesn't matter. What we care about is that the contents of a *specific LBA range* no longer contain *valid data*. IOWs, the data in that range of the block device has been lost, cannot be retreived and/or cannot be written to any more. PMEM taking a MCE because ECC tripped is a media error because data is lost and inaccessible until recovery actions are taken. MD RAID failing a scrub is a media error and data is lost and unrecoverable at that layer. A device disappearing is a media error because the storage media is now permanently inaccessible to the higher layers. This "media error" categorisation is a fundamental property of persistent storage and, as such, is a property of the block devices used to access said persistent storage. That's the disagreement here - that you and Christoph are saying ->corrupted_range is not a block device property because only a pmem/DAX device currently generates it. You both seem to be NACKing a generic interface because it's only implemented for the first subsystem that needs it. AFAICT, you either don't understand or are completely ignoring the architectural need for it to be provided across the rest of the storage stack that *block device based filesystems depend on*. Sure, there might be dax device based fielsystems around the corner. They just require a different pmem device ->corrupted_range callout to implement the notification - one that directs to the dax device rather than the block device. That's simple and trivial to implement, but such functionaity for DAX devices does not replace the need for the same generic functionality to be provided across a *range of different block devices* as required by *block device based filesystems*. And that's fundamentally the problem. XFS is block device based, not DAX device based. We require errors to be reported through block device mechanisms. fs-dax does not change this - it is based on pmem being presented as a primarily as a block device to the block device based filesystems and only secondarily as a dax device. Hence if it can be trivially implemented as a block device interface, that's where it should go, because then all the other block devices that the filesytem runs on can provide the same functionality for similar media error events.... > The dax_device removal implies one > set of actions (direct accessed pfns invalid) the block device removal > implies another (block layer sector access offline). There you go again, saying DAX requires an action, while the block device notification is a -state change- (i.e. goes offline). This is exactly what I said was wrong in my last email. > corrupted_range > is blurring the notification for 2 different failure domains. Look at > the nascent idea to mount a filesystem on dax sans a block device. > Look at the existing plumbing for DM to map dax_operations through a > device stack. Ummm, it just maps the direct_access call to the underlying device and calls it's ->direct_access method. All it's doing is LBA mapping. That's all it needs to do for ->corrupted_range, too. I have no clue why you think this is a problem for error notification... > Look at the pushback Ruan got for adding a new > block_device operation for corrupted_range(). one person said "no". That's hardly pushback. Especially as I think Christoph's objection about this being dax specific functionality is simply wrong, as per above. > > This is why we need to communicate what error occurred, not what > > action a device driver thinks needs to be taken. > > The driver is only an event producer in this model, whatever the > consumer does at the other end is not its concern. There may be a > generic consumer and a filesystem specific consumer. That's why these are all ops functions that can provide multiple implementations to different device types. So that when we get a new use case, the ops function structure can be replaced with one that directs the notification to the new user instead of to the existing one. It's a design pattern we use all over the kernel code. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org