From: Jane Chu <jane.chu@oracle.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>,
Vishal L Verma <vishal.l.verma@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
"Weiny, Ira" <ira.weiny@intel.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
Linux NVDIMM <nvdimm@lists.linux.dev>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 0/3] dax: clear poison on the fly along pwrite
Date: Thu, 23 Sep 2021 13:55:40 -0700 [thread overview]
Message-ID: <324444b0-6121-d14c-a59f-7689bb206f58@oracle.com> (raw)
In-Reply-To: <20210915161510.GA34830@magnolia>
On 9/15/2021 9:15 AM, Darrick J. Wong wrote:
> On Wed, Sep 15, 2021 at 12:22:05AM -0700, Jane Chu wrote:
>> Hi, Dan,
>>
>> On 9/14/2021 9:44 PM, Dan Williams wrote:
>>> On Tue, Sep 14, 2021 at 4:32 PM Jane Chu <jane.chu@oracle.com> wrote:
>>>>
>>>> If pwrite(2) encounters poison in a pmem range, it fails with EIO.
>>>> This is unecessary if hardware is capable of clearing the poison.
>>>>
>>>> Though not all dax backend hardware has the capability of clearing
>>>> poison on the fly, but dax backed by Intel DCPMEM has such capability,
>>>> and it's desirable to, first, speed up repairing by means of it;
>>>> second, maintain backend continuity instead of fragmenting it in
>>>> search for clean blocks.
>>>>
>>>> Jane Chu (3):
>>>> dax: introduce dax_operation dax_clear_poison
>>>
>>> The problem with new dax operations is that they need to be plumbed
>>> not only through fsdax and pmem, but also through device-mapper.
>>>
>>> In this case I think we're already covered by dax_zero_page_range().
>>> That will ultimately trigger pmem_clear_poison() and it is routed
>>> through device-mapper properly.
>>>
>>> Can you clarify why the existing dax_zero_page_range() is not sufficient?
>>
>> fallocate ZERO_RANGE is in itself a functionality that applied to dax
>> should lead to zero out the media range. So one may argue it is part
>> of a block operations, and not something explicitly aimed at clearing
>> poison.
>
> Yeah, Christoph suggested that we make the clearing operation explicit
> in a related thread a few weeks ago:
> https://lore.kernel.org/linux-fsdevel/YRtnlPERHfMZ23Tr@infradead.org/
>
> I like Jane's patchset far better than the one that I sent, because it
> doesn't require a block device wrapper for the pmem, and it enables us
> to tell application writers that they can handle media errors by
> pwrite()ing the bad region, just like they do for nvme and spinners.
>
>> I'm also thinking about the MOVEDIR64B instruction and how it
>> might be used to clear poison on the fly with a single 'store'.
>> Of course, that means we need to figure out how to narrow down the
>> error blast radius first.
>
> That was one of the advantages of Shiyang Ruan's NAKed patchset to
> enable byte-granularity media errors to pass upwards through the stack
> back to the filesystem, which could then tell applications exactly what
> they lost.
>
> I want to get back to that, though if Dan won't withdraw the NAK then I
> don't know how to move forward...
>
>> With respect to plumbing through device-mapper, I thought about that,
>> and wasn't sure. I mean the clear-poison work will eventually fall on
>> the pmem driver, and thru the DM layers, how does that play out thru
>> DM?
>
> Each of the dm drivers has to add their own ->clear_poison operation
> that remaps the incoming (sector, len) parameters as appropriate for
> that device and then calls the lower device's ->clear_poison with the
> translated parameters.
>
> This (AFAICT) has already been done for dax_zero_page_range, so I sense
> that Dan is trying to save you a bunch of code plumbing work by nudging
> you towards doing s/dax_clear_poison/dax_zero_page_range/ to this series
> and then you only need patches 2-3.
Thanks Darrick for the explanation!
I don't mind to add DM layer support, it sounds straight forward.
I also like your latest patch and am wondering if the clear_poison API
is still of value.
thanks,
-jane
>
>> BTW, our customer doesn't care about creating dax volume thru DM, so.
>
> They might not care, but anything going upstream should work in the
> general case.
>
> --D
>
>> thanks!
>> -jane
>>
>>
>>>
>>>> dax: introduce dax_clear_poison to dax pwrite operation
>>>> libnvdimm/pmem: Provide pmem_dax_clear_poison for dax operation
>>>>
>>>> drivers/dax/super.c | 13 +++++++++++++
>>>> drivers/nvdimm/pmem.c | 17 +++++++++++++++++
>>>> fs/dax.c | 9 +++++++++
>>>> include/linux/dax.h | 6 ++++++
>>>> 4 files changed, 45 insertions(+)
>>>>
>>>> --
>>>> 2.18.4
>>>>
next prev parent reply other threads:[~2021-09-23 20:55 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-14 23:31 [PATCH 0/3] dax: clear poison on the fly along pwrite Jane Chu
2021-09-14 23:31 ` [PATCH 1/3] dax: introduce dax_operation dax_clear_poison Jane Chu
2021-11-04 17:53 ` Christoph Hellwig
2021-09-14 23:31 ` [PATCH 2/3] dax: introduce dax_clear_poison to dax pwrite operation Jane Chu
2021-11-04 17:53 ` Christoph Hellwig
2021-09-14 23:31 ` [PATCH 2/3] dax: introduce dax clear poison to page aligned " Jane Chu
2021-09-14 23:31 ` [PATCH 3/3] libnvdimm/pmem: Provide pmem_dax_clear_poison for dax operation Jane Chu
2021-11-04 17:55 ` Christoph Hellwig
2021-11-04 20:27 ` Jane Chu
2021-09-15 4:44 ` [PATCH 0/3] dax: clear poison on the fly along pwrite Dan Williams
2021-09-15 7:22 ` Jane Chu
2021-09-15 16:15 ` Darrick J. Wong
2021-09-15 20:27 ` Dan Williams
2021-09-16 0:05 ` Darrick J. Wong
2021-09-16 7:11 ` Christoph Hellwig
2021-09-16 18:40 ` Dan Williams
2021-09-17 12:53 ` Christoph Hellwig
2021-09-17 15:27 ` Darrick J. Wong
2021-09-17 20:21 ` Dan Williams
2021-09-18 0:07 ` Darrick J. Wong
2021-09-17 19:37 ` Dan Williams
2021-09-23 20:48 ` Jane Chu
2021-09-23 20:55 ` Jane Chu [this message]
2021-09-23 21:42 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=324444b0-6121-d14c-a59f-7689bb206f58@oracle.com \
--to=jane.chu@oracle.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=djwong@kernel.org \
--cc=ira.weiny@intel.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nvdimm@lists.linux.dev \
--cc=viro@zeniv.linux.org.uk \
--cc=vishal.l.verma@intel.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).