Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Ira Weiny <ira.weiny@intel.com>, Christoph Hellwig <hch@lst.de>,
	linux-kernel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Dan Williams <dan.j.williams@intel.com>,
	"Theodore Y. Ts'o" <tytso@mit.edu>, Jan Kara <jack@suse.cz>,
	linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, akpm@linux-foundation.org,
	torvalds@linux-foundation.org
Subject: Re: [PATCH V5 00/12] Enable per-file/per-directory DAX operations V5
Date: Wed, 11 Mar 2020 17:39:42 +1100
Message-ID: <20200311063942.GE10776@dread.disaster.area> (raw)
In-Reply-To: <20200311033614.GQ1752567@magnolia>

On Tue, Mar 10, 2020 at 08:36:14PM -0700, Darrick J. Wong wrote:
> There are still other things that need to be ironed out WRT pmem:
> 
> a) reflink and page/pfn/whatever sharing -- fix the mm or (ab)use the
> xfs buffer cache, or something worse?

I don't think we need either. We just need to remove the DAX page
association for hwpoison that requires the struct page to store the
mapping and index. Get rid of that and we should be able to  safely
map the same page into different inode address spaces at the same
time. When we write a shared page, we COW it immediately and replace
the page in the inode's mapping tree, so we can't actually write to
a shared page...

IOWs, the dax_associate_page() related functionality probably needs
to be a filesystem callout - part of the aops vector, I think, so
that device dax can still use it. That way XFS can go it's own way,
while ext4 and device dax can continue to use the existing mechanism
mechanisn that is currently implemented....

XFS can then make use of rmapbt to find the owners on a bad page
notification, and run the "kill userspace dead dead dead" lookup on
each mapping/index tuple rather than pass it around on a struct
page. i.e. we'll do a kill scan for each mapping/index owner tuple
we find, not just one.

That requires converting all the current vma killer code to pass
mapping/index tuples around rather than the struct page. That kill
code doesn't actually need the struct page, it just needs the
mapping/index tuple to match to the vmas that have it mapped into
userspace.

> b) getting our stories straight on how to clear poison, and whether or
> not we can come up with a better story for ZERO_FILE_RANGE on pmem.  In
> the ideal world I'd love to see Z_F_R actually memset(0) the pmem and
> clear poison, at least if the file->pmem mappings were contiguous.

Are you talking about ZFR from userspace through the filesystem (how
do you clear poison in free space?) or ZFR on the dax device fro
either userspace or the kernel filesystem code?

> c) wiring up xfs to hwpoison, or wiring up hwpoison to xfs, or otherwise
> figuring out how to get storage to tell xfs that it lost something so
> that maybe xfs can fix it quickly

Yup, I think that's a dax device callback into the filesystem. i.e
the hwpoison gets delivered to the dax device, which then calls the
fs function rather than do it's current "dax_lock_page(), kill
userspace dead dead dead, dax_unlock_page()" dance. The filesystem
can do a much more intricate dance and wreak far more havoc on
userspace than what the dax device can do.....

Copious amounts of unused time are things I don't have,
unfortunately. Only having 7 fingers to type with right now doesn't
help, either.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply index

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-27  5:24 ira.weiny
2020-02-27  5:24 ` [PATCH V5 01/12] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
2020-02-27 17:25   ` Ira Weiny
2020-02-27  5:24 ` [PATCH V5 02/12] fs: Remove unneeded IS_DAX() check ira.weiny
2020-02-27  5:24 ` [PATCH V5 03/12] fs/stat: Define DAX statx attribute ira.weiny
2020-02-27  5:24 ` [PATCH V5 04/12] fs/xfs: Isolate the physical DAX flag from enabled ira.weiny
2020-02-27  5:24 ` [PATCH V5 05/12] fs/xfs: Create function xfs_inode_enable_dax() ira.weiny
2020-03-01 22:37   ` Dave Chinner
2020-02-27  5:24 ` [PATCH V5 06/12] fs: Add locking for a dynamic address space operations state ira.weiny
2020-03-02  1:26   ` Dave Chinner
2020-03-02  1:36     ` Dave Chinner
2020-02-27  5:24 ` [PATCH V5 07/12] fs: Prevent DAX state change if file is mmap'ed ira.weiny
2020-02-27  5:24 ` [PATCH V5 08/12] fs/xfs: Hold off aops users while changing DAX state ira.weiny
2020-02-27  5:24 ` [PATCH V5 09/12] fs/xfs: Clean up locking in dax invalidate ira.weiny
2020-02-27  5:24 ` [PATCH V5 10/12] fs/xfs: Allow toggle of effective DAX flag ira.weiny
2020-02-27  5:24 ` [PATCH V5 11/12] fs/xfs: Remove xfs_diflags_to_linux() ira.weiny
2020-02-27  5:24 ` [PATCH V5 12/12] Documentation/dax: Update Usage section ira.weiny
2020-03-05 15:51 ` [PATCH V5 00/12] Enable per-file/per-directory DAX operations V5 Christoph Hellwig
2020-03-09 17:04   ` Ira Weiny
2020-03-11  3:36     ` Darrick J. Wong
2020-03-11  6:29       ` Christoph Hellwig
2020-03-11 17:07         ` Dan Williams
2020-03-16  9:52           ` Jan Kara
2020-03-16  9:55             ` Christoph Hellwig
2020-04-01  4:00               ` Darrick J. Wong
2020-04-01 10:25                 ` Jan Kara
2020-04-02  8:53                   ` Christoph Hellwig
2020-04-02 20:55                     ` Ira Weiny
2020-04-03  7:27                       ` Christoph Hellwig
2020-04-03 15:48                         ` Ira Weiny
2020-04-03 17:03                           ` Jan Kara
2020-04-03 18:18                             ` Ira Weiny
2020-04-03 18:21                               ` Ira Weiny
2020-04-03 18:37                               ` Darrick J. Wong
2020-04-05  6:19                                 ` Ira Weiny
2020-04-06 10:00                               ` Jan Kara
2020-04-03 18:29                             ` Darrick J. Wong
2020-04-03 16:05                         ` Darrick J. Wong
2020-04-03  4:39                 ` Ira Weiny
2020-03-11  6:39       ` Dave Chinner [this message]
2020-03-11  6:44         ` Christoph Hellwig
2020-03-11 17:07           ` Dan Williams
2020-03-12  0:49           ` Dave Chinner
2020-03-12  3:00             ` Darrick J. Wong
2020-03-12  7:27             ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200311063942.GE10776@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@lst.de \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git