All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jane Chu <jane.chu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	 "david@fromorbit.com" <david@fromorbit.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	 "dave.jiang@intel.com" <dave.jiang@intel.com>,
	"agk@redhat.com" <agk@redhat.com>,
	 "snitzer@redhat.com" <snitzer@redhat.com>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	 "ira.weiny@intel.com" <ira.weiny@intel.com>,
	"willy@infradead.org" <willy@infradead.org>,
	 "vgoyal@redhat.com" <vgoyal@redhat.com>,
	 "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	 "nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	 "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>
Subject: Re: [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag
Date: Thu, 4 Nov 2021 12:00:12 -0700	[thread overview]
Message-ID: <CAPcyv4hJjcy2TnOv-Y5=MUMHeDdN-BCH4d0xC-pFGcHXEU_ZEw@mail.gmail.com> (raw)
In-Reply-To: <6d21ece1-0201-54f2-ec5a-ae2f873d46a3@oracle.com>

On Thu, Nov 4, 2021 at 11:34 AM Jane Chu <jane.chu@oracle.com> wrote:
>
> Thanks for the enlightening discussion here, it's so helpful!
>
> Please allow me to recap what I've caught up so far -
>
> 1. recovery write at page boundary due to NP setting in poisoned
>     page to prevent undesirable prefetching
> 2. single interface to perform 3 tasks:
>       { clear-poison, update error-list, write }
>     such as an API in pmem driver.
>     For CPUs that support MOVEDIR64B, the 'clear-poison' and 'write'
>     task can be combined (would need something different from the
>     existing _copy_mcsafe though) and 'update error-list' follows
>     closely behind;
>     For CPUs that rely on firmware call to clear posion, the existing
>     pmem_clear_poison() can be used, followed by the 'write' task.
> 3. if user isn't given RWF_RECOVERY_FLAG flag, then dax recovery
>     would be automatic for a write if range is page aligned;
>     otherwise, the write fails with EIO as usual.
>     Also, user mustn't have punched out the poisoned page in which
>     case poison repairing will be a lot more complicated.
> 4. desirable to fetch as much data as possible from a poisoned range.
>
> If this understanding is in the right direction, then I'd like to
> propose below changes to
>    dax_direct_access(), dax_copy_to/from_iter(), pmem_copy_to/from_iter()
>    and the dm layer copy_to/from_iter, dax_iomap_iter().
>
> 1. dax_iomap_iter() rely on dax_direct_access() to decide whether there
>     is likely media error: if the API without DAX_F_RECOVERY returns
>     -EIO, then switch to recovery-read/write code.  In recovery code,
>     supply DAX_F_RECOVERY to dax_direct_access() in order to obtain
>     'kaddr', and then call dax_copy_to/from_iter() with DAX_F_RECOVERY.

I like it. It allows for an atomic write+clear implementation on
capable platforms and coordinates with potentially unmapped pages. The
best of both worlds from the dax_clear_poison() proposal and my "take
a fault and do a slow-path copy".

> 2. the _copy_to/from_iter implementation would be largely the same
>     as in my recent patch, but some changes in Christoph's
>     'dax-devirtualize' maybe kept, such as DAX_F_VIRTUAL, obviously
>     virtual devices don't have the ability to clear poison, so no need
>     to complicate them.  And this also means that not every endpoint
>     dax device has to provide dax_op.copy_to/from_iter, they may use the
>     default.

Did I miss this series or are you talking about this one?
https://lore.kernel.org/all/20211018044054.1779424-1-hch@lst.de/

> I'm not sure about nova and others, if they use different 'write' other
> than via iomap, does that mean there will be need for a new set of
> dax_op for their read/write?

No, they're out-of-tree they'll adjust to the same interface that xfs
and ext4 are using when/if they go upstream.

> the 3-in-1 binding would always be
> required though. Maybe that'll be an ongoing discussion?

Yeah, let's cross that bridge when we come to it.

> Comments? Suggestions?

It sounds great to me!

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Jane Chu <jane.chu@oracle.com>
Cc: "nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	"snitzer@redhat.com" <snitzer@redhat.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	"david@fromorbit.com" <david@fromorbit.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"willy@infradead.org" <willy@infradead.org>,
	Christoph Hellwig <hch@infradead.org>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"vgoyal@redhat.com" <vgoyal@redhat.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"agk@redhat.com" <agk@redhat.com>
Subject: Re: [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag
Date: Thu, 4 Nov 2021 12:00:12 -0700	[thread overview]
Message-ID: <CAPcyv4hJjcy2TnOv-Y5=MUMHeDdN-BCH4d0xC-pFGcHXEU_ZEw@mail.gmail.com> (raw)
In-Reply-To: <6d21ece1-0201-54f2-ec5a-ae2f873d46a3@oracle.com>

On Thu, Nov 4, 2021 at 11:34 AM Jane Chu <jane.chu@oracle.com> wrote:
>
> Thanks for the enlightening discussion here, it's so helpful!
>
> Please allow me to recap what I've caught up so far -
>
> 1. recovery write at page boundary due to NP setting in poisoned
>     page to prevent undesirable prefetching
> 2. single interface to perform 3 tasks:
>       { clear-poison, update error-list, write }
>     such as an API in pmem driver.
>     For CPUs that support MOVEDIR64B, the 'clear-poison' and 'write'
>     task can be combined (would need something different from the
>     existing _copy_mcsafe though) and 'update error-list' follows
>     closely behind;
>     For CPUs that rely on firmware call to clear posion, the existing
>     pmem_clear_poison() can be used, followed by the 'write' task.
> 3. if user isn't given RWF_RECOVERY_FLAG flag, then dax recovery
>     would be automatic for a write if range is page aligned;
>     otherwise, the write fails with EIO as usual.
>     Also, user mustn't have punched out the poisoned page in which
>     case poison repairing will be a lot more complicated.
> 4. desirable to fetch as much data as possible from a poisoned range.
>
> If this understanding is in the right direction, then I'd like to
> propose below changes to
>    dax_direct_access(), dax_copy_to/from_iter(), pmem_copy_to/from_iter()
>    and the dm layer copy_to/from_iter, dax_iomap_iter().
>
> 1. dax_iomap_iter() rely on dax_direct_access() to decide whether there
>     is likely media error: if the API without DAX_F_RECOVERY returns
>     -EIO, then switch to recovery-read/write code.  In recovery code,
>     supply DAX_F_RECOVERY to dax_direct_access() in order to obtain
>     'kaddr', and then call dax_copy_to/from_iter() with DAX_F_RECOVERY.

I like it. It allows for an atomic write+clear implementation on
capable platforms and coordinates with potentially unmapped pages. The
best of both worlds from the dax_clear_poison() proposal and my "take
a fault and do a slow-path copy".

> 2. the _copy_to/from_iter implementation would be largely the same
>     as in my recent patch, but some changes in Christoph's
>     'dax-devirtualize' maybe kept, such as DAX_F_VIRTUAL, obviously
>     virtual devices don't have the ability to clear poison, so no need
>     to complicate them.  And this also means that not every endpoint
>     dax device has to provide dax_op.copy_to/from_iter, they may use the
>     default.

Did I miss this series or are you talking about this one?
https://lore.kernel.org/all/20211018044054.1779424-1-hch@lst.de/

> I'm not sure about nova and others, if they use different 'write' other
> than via iomap, does that mean there will be need for a new set of
> dax_op for their read/write?

No, they're out-of-tree they'll adjust to the same interface that xfs
and ext4 are using when/if they go upstream.

> the 3-in-1 binding would always be
> required though. Maybe that'll be an ongoing discussion?

Yeah, let's cross that bridge when we come to it.

> Comments? Suggestions?

It sounds great to me!

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


  reply	other threads:[~2021-11-04 19:00 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-21  0:10 [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag Jane Chu
2021-10-21  0:10 ` [dm-devel] " Jane Chu
2021-10-21  0:10 ` [PATCH 1/6] dax: introduce RWF_RECOVERY_DATA flag to preadv2() and pwritev2() Jane Chu
2021-10-21  0:10   ` [dm-devel] " Jane Chu
2021-10-21  0:10 ` [PATCH 2/6] dax: prepare dax_direct_access() API with DAXDEV_F_RECOVERY flag Jane Chu
2021-10-21  0:10   ` [dm-devel] " Jane Chu
2021-10-21 11:20   ` Christoph Hellwig
2021-10-21 11:20     ` [dm-devel] " Christoph Hellwig
2021-10-21 18:19     ` Jane Chu
2021-10-21 18:19       ` [dm-devel] " Jane Chu
2021-10-21  0:10 ` [PATCH 3/6] pmem: pmem_dax_direct_access() to honor the " Jane Chu
2021-10-21  0:10   ` [dm-devel] " Jane Chu
2021-10-21 11:23   ` Christoph Hellwig
2021-10-21 11:23     ` [dm-devel] " Christoph Hellwig
2021-10-21 18:24     ` Jane Chu
2021-10-21 18:24       ` [dm-devel] " Jane Chu
2021-10-21  0:10 ` [PATCH 4/6] dm,dax,pmem: prepare dax_copy_to/from_iter() APIs with DAXDEV_F_RECOVERY Jane Chu
2021-10-21  0:10   ` [dm-devel] [PATCH 4/6] dm, dax, pmem: " Jane Chu
2021-10-21 11:27   ` [PATCH 4/6] dm,dax,pmem: " Christoph Hellwig
2021-10-21 11:27     ` [dm-devel] [PATCH 4/6] dm, dax, pmem: " Christoph Hellwig
2021-10-22  0:49     ` [PATCH 4/6] dm,dax,pmem: " Jane Chu
2021-10-22  0:49       ` [dm-devel] [PATCH 4/6] dm, dax, pmem: " Jane Chu
2021-10-22  1:41       ` correction: Re: [PATCH 4/6] dm,dax,pmem: " Jane Chu
2021-10-22  1:41         ` [dm-devel] correction: Re: [PATCH 4/6] dm, dax, pmem: " Jane Chu
2021-10-22  5:33       ` [PATCH 4/6] dm,dax,pmem: " Christoph Hellwig
2021-10-22  5:33         ` [dm-devel] [PATCH 4/6] dm, dax, pmem: " Christoph Hellwig
2021-10-22 20:30         ` [PATCH 4/6] dm,dax,pmem: " Jane Chu
2021-10-22 20:30           ` [dm-devel] [PATCH 4/6] dm, dax, pmem: " Jane Chu
2021-10-21  0:10 ` [PATCH 5/6] dax,pmem: Add data recovery feature to pmem_copy_to/from_iter() Jane Chu
2021-10-21  0:10   ` [dm-devel] [PATCH 5/6] dax, pmem: " Jane Chu
2021-10-21 11:28   ` [PATCH 5/6] dax,pmem: " Christoph Hellwig
2021-10-21 11:28     ` [dm-devel] [PATCH 5/6] dax, pmem: " Christoph Hellwig
2021-10-22  0:58     ` [PATCH 5/6] dax,pmem: " Jane Chu
2021-10-22  0:58       ` [dm-devel] [PATCH 5/6] dax, pmem: " Jane Chu
2021-10-22  8:03   ` kernel test robot
2021-10-22  8:03     ` kernel test robot
2021-10-26 10:21   ` [PATCH 5/6] dax,pmem: " kernel test robot
2021-10-26 10:21     ` [PATCH 5/6] dax, pmem: " kernel test robot
2021-10-26 10:21     ` [dm-devel] " kernel test robot
2021-10-21  0:10 ` [PATCH 6/6] dm: Ensure dm honors DAXDEV_F_RECOVERY flag on dax only Jane Chu
2021-10-21  0:10   ` [dm-devel] " Jane Chu
2021-10-21 11:31 ` [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag Christoph Hellwig
2021-10-21 11:31   ` Christoph Hellwig
2021-10-22  1:37   ` Jane Chu
2021-10-22  1:37     ` Jane Chu
2021-10-22  1:58     ` Darrick J. Wong
2021-10-22  1:58       ` Darrick J. Wong
2021-10-22  5:38       ` Christoph Hellwig
2021-10-22  5:38         ` Christoph Hellwig
2021-10-22  5:36     ` Christoph Hellwig
2021-10-22  5:36       ` Christoph Hellwig
2021-10-22 20:52       ` Jane Chu
2021-10-22 20:52         ` Jane Chu
2021-10-27  6:49         ` Christoph Hellwig
2021-10-27  6:49           ` Christoph Hellwig
2021-10-28  0:24           ` Darrick J. Wong
2021-10-28  0:24             ` Darrick J. Wong
2021-10-28 22:59             ` Dave Chinner
2021-10-28 22:59               ` Dave Chinner
2021-10-29 11:46               ` Pavel Begunkov
2021-10-29 11:46                 ` Pavel Begunkov
2021-10-29 16:57                 ` Darrick J. Wong
2021-10-29 16:57                   ` Darrick J. Wong
2021-10-29 19:23                   ` Pavel Begunkov
2021-10-29 19:23                     ` Pavel Begunkov
2021-10-29 20:08                     ` Darrick J. Wong
2021-10-29 20:08                       ` Darrick J. Wong
2021-10-31 13:27                       ` Pavel Begunkov
2021-10-31 13:27                         ` Pavel Begunkov
2021-10-29 18:53                 ` Jane Chu
2021-10-29 18:53                   ` Jane Chu
2021-10-29 22:32                 ` Dave Chinner
2021-10-29 22:32                   ` Dave Chinner
2021-10-31 13:19                   ` Pavel Begunkov
2021-10-31 13:19                     ` Pavel Begunkov
2021-11-01  2:31                     ` Matthew Wilcox
2021-11-01  2:31                       ` Matthew Wilcox
2021-11-02  6:18             ` Christoph Hellwig
2021-11-02  6:18               ` Christoph Hellwig
2021-11-02 19:57               ` Dan Williams
2021-11-02 19:57                 ` Dan Williams
2021-11-03 16:58                 ` Christoph Hellwig
2021-11-03 16:58                   ` Christoph Hellwig
2021-11-03 20:33                   ` Dan Williams
2021-11-03 20:33                     ` Dan Williams
2021-11-04  8:30                     ` Christoph Hellwig
2021-11-04  8:30                       ` Christoph Hellwig
2021-11-04 12:29                       ` Matthew Wilcox
2021-11-04 12:29                         ` Matthew Wilcox
2021-11-04 16:24                       ` Dan Williams
2021-11-04 16:24                         ` Dan Williams
2021-11-04 17:43                         ` Christoph Hellwig
2021-11-04 17:43                           ` Christoph Hellwig
2021-11-04 17:50                           ` Dan Williams
2021-11-04 17:50                             ` Dan Williams
2021-11-04 18:05                           ` Matthew Wilcox
2021-11-04 18:05                             ` Matthew Wilcox
2021-11-04 18:33                         ` Jane Chu
2021-11-04 18:33                           ` Jane Chu
2021-11-04 19:00                           ` Dan Williams [this message]
2021-11-04 19:00                             ` Dan Williams
2021-11-04 20:27                             ` Jane Chu
2021-11-04 20:27                               ` Jane Chu
2021-11-05  0:46                               ` Dan Williams
2021-11-05  0:46                                 ` Dan Williams
2021-11-05  1:35                                 ` Dan Williams
2021-11-05  1:35                                   ` Dan Williams
2021-11-05  5:56                             ` Christoph Hellwig
2021-11-05  5:56                               ` Christoph Hellwig
2021-11-03 18:09               ` Jane Chu
2021-11-03 18:09                 ` Jane Chu
2021-11-04  6:21                 ` Dan Williams
2021-11-04  6:21                   ` Dan Williams
2021-11-04  8:36                   ` Christoph Hellwig
2021-11-04  8:36                     ` Christoph Hellwig
2021-11-04 16:08                     ` Dan Williams
2021-11-04 16:08                       ` Dan Williams
2021-11-04 17:46                       ` Christoph Hellwig
2021-11-04 17:46                         ` Christoph Hellwig
2021-11-04  8:21                 ` Christoph Hellwig
2021-11-04  8:21                   ` Christoph Hellwig
2021-11-02 16:12             ` Dan Williams
2021-11-02 16:12               ` Dan Williams
2021-11-02 16:03           ` Dan Williams
2021-11-02 16:03             ` Dan Williams
2021-11-03 16:53             ` Christoph Hellwig
2021-11-03 16:53               ` Christoph Hellwig
2021-11-06  7:41             ` Lukas Straub
2021-11-06  7:41               ` Lukas Straub

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4hJjcy2TnOv-Y5=MUMHeDdN-BCH4d0xC-pFGcHXEU_ZEw@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=agk@redhat.com \
    --cc=dave.jiang@intel.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=ira.weiny@intel.com \
    --cc=jane.chu@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=snitzer@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.