From: "Darrick J. Wong" <djwong@kernel.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>,
Jane Chu <jane.chu@oracle.com>,
Vishal L Verma <vishal.l.verma@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
"Weiny, Ira" <ira.weiny@intel.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
Linux NVDIMM <nvdimm@lists.linux.dev>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 0/3] dax: clear poison on the fly along pwrite
Date: Fri, 17 Sep 2021 17:07:00 -0700 [thread overview]
Message-ID: <20210918000700.GA10182@magnolia> (raw)
In-Reply-To: <CAPcyv4iAr_Vwwgqw+4wz0RQUXhUUJGGz7_T+p+W6tC4T+k+zNw@mail.gmail.com>
On Fri, Sep 17, 2021 at 01:21:25PM -0700, Dan Williams wrote:
> On Fri, Sep 17, 2021 at 8:27 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Fri, Sep 17, 2021 at 01:53:33PM +0100, Christoph Hellwig wrote:
> > > On Thu, Sep 16, 2021 at 11:40:28AM -0700, Dan Williams wrote:
> > > > > That was my gut feeling. If everyone feels 100% comfortable with
> > > > > zeroingas the mechanism to clear poisoning I'll cave in. The most
> > > > > important bit is that we do that through a dedicated DAX path instead
> > > > > of abusing the block layer even more.
> > > >
> > > > ...or just rename dax_zero_page_range() to dax_reset_page_range()?
> > > > Where reset == "zero + clear-poison"?
> > >
> > > I'd say that naming is more confusing than overloading zero.
> >
> > How about dax_zeroinit_range() ?
>
> Works for me.
>
> >
> > To go with its fallocate flag (yeah I've been too busy sorting out -rc1
> > regressions to repost this) FALLOC_FL_ZEROINIT_RANGE that will reset the
> > hardware (whatever that means) and set the contents to the known value
> > zero.
> >
> > Userspace usage model:
> >
> > void handle_media_error(int fd, loff_t pos, size_t len)
> > {
> > /* yell about this for posterior's sake */
> >
> > ret = fallocate(fd, FALLOC_FL_ZEROINIT_RANGE, pos, len);
> >
> > /* yay our disk drive / pmem / stone table engraver is online */
>
> The fallocate mode can still be error-aware though, right? When the FS
> has knowledge of the error locations the fallocate mode could be
> fallocate(fd, FALLOC_FL_OVERWRITE_ERRORS, pos, len) with the semantics
> of attempting to zero out any known poison extents in the given file
> range? At the risk of going overboard on new fallocate modes there
> could also (or instead of) be FALLOC_FL_PUNCH_ERRORS to skip trying to
> clear them and just ask the FS to throw error extents away.
It /could/ be, but for now I've stuck to what you see is what you get --
if you tell it to 'zero initialize' 1MB of pmem, it'll write zeroes and
clear the poison on all 1MB, regardless of the old contents.
IOWs, you can use it from a poison handler on just the range that it
told you about, or you could use it to bulk-clear a lot of space all at
once.
A dorky thing here is that the dax_zero_page_range function returns EIO
if you tell it to do more than one page...
>
> > }
> >
> > > > > I'm really worried about both patartitions on DAX and DM passing through
> > > > > DAX because they deeply bind DAX to the block layer, which is just a bad
> > > > > idea. I think we also need to sort that whole story out before removing
> > > > > the EXPERIMENTAL tags.
> > > >
> > > > I do think it was a mistake to allow for DAX on partitions of a pmemX
> > > > block-device.
> > > >
> > > > DAX-reflink support may be the opportunity to start deprecating that
> > > > support. Only enable DAX-reflink for direct mounting on /dev/pmemX
> > > > without partitions (later add dax-device direct mounting),
> > >
> > > I think we need to fully or almost fully sort this out.
> > >
> > > Here is my bold suggestions:
> > >
> > > 1) drop no drop the EXPERMINTAL on the current block layer overload
> > > at all
> >
> > I don't understand this.
> >
> > > 2) add direct mounting of the nvdimm namespaces ASAP. Because all
> > > the filesystem currently also need the /dev/pmem0 device add a way
> > > to open the block device by the dax_device instead of our current
> > > way of doing the reverse
> > > 3) deprecate DAX support through block layer mounts with a say 2 year
> > > deprecation period
> > > 4) add DAX remapping devices as needed
> >
> > What devices are needed? linear for lvm, and maybe error so we can
> > actually test all this stuff?
>
> The proposal would be zero lvm support. The nvdimm namespace
> definition would need to grow support for concatenation + striping.
Ah, ok.
> Soft error injection could be achieved by writing to the badblocks
> interface.
<nod>
I'll send out an RFC of what I have currently.
--D
next prev parent reply other threads:[~2021-09-18 0:07 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-14 23:31 [PATCH 0/3] dax: clear poison on the fly along pwrite Jane Chu
2021-09-14 23:31 ` [PATCH 1/3] dax: introduce dax_operation dax_clear_poison Jane Chu
2021-11-04 17:53 ` Christoph Hellwig
2021-09-14 23:31 ` [PATCH 2/3] dax: introduce dax_clear_poison to dax pwrite operation Jane Chu
2021-11-04 17:53 ` Christoph Hellwig
2021-09-14 23:31 ` [PATCH 2/3] dax: introduce dax clear poison to page aligned " Jane Chu
2021-09-14 23:31 ` [PATCH 3/3] libnvdimm/pmem: Provide pmem_dax_clear_poison for dax operation Jane Chu
2021-11-04 17:55 ` Christoph Hellwig
2021-11-04 20:27 ` Jane Chu
2021-09-15 4:44 ` [PATCH 0/3] dax: clear poison on the fly along pwrite Dan Williams
2021-09-15 7:22 ` Jane Chu
2021-09-15 16:15 ` Darrick J. Wong
2021-09-15 20:27 ` Dan Williams
2021-09-16 0:05 ` Darrick J. Wong
2021-09-16 7:11 ` Christoph Hellwig
2021-09-16 18:40 ` Dan Williams
2021-09-17 12:53 ` Christoph Hellwig
2021-09-17 15:27 ` Darrick J. Wong
2021-09-17 20:21 ` Dan Williams
2021-09-18 0:07 ` Darrick J. Wong [this message]
2021-09-17 19:37 ` Dan Williams
2021-09-23 20:48 ` Jane Chu
2021-09-23 20:55 ` Jane Chu
2021-09-23 21:42 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210918000700.GA10182@magnolia \
--to=djwong@kernel.org \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=hch@infradead.org \
--cc=ira.weiny@intel.com \
--cc=jack@suse.cz \
--cc=jane.chu@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nvdimm@lists.linux.dev \
--cc=viro@zeniv.linux.org.uk \
--cc=vishal.l.verma@intel.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).