All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Jane Chu <jane.chu@oracle.com>,
	"david@fromorbit.com" <david@fromorbit.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	"agk@redhat.com" <agk@redhat.com>,
	"snitzer@redhat.com" <snitzer@redhat.com>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>,
	"willy@infradead.org" <willy@infradead.org>,
	"vgoyal@redhat.com" <vgoyal@redhat.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>
Subject: Re: [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag
Date: Thu, 4 Nov 2021 10:43:23 -0700	[thread overview]
Message-ID: <YYQbu6dOCVB7yS02@infradead.org> (raw)
In-Reply-To: <CAPcyv4jKHH7H+PmcsGDxsWA5CS_U3USHM8cT1MhoLk72fa9z8Q@mail.gmail.com>

On Thu, Nov 04, 2021 at 09:24:15AM -0700, Dan Williams wrote:
> No, the big difference with every other modern storage device is
> access to byte-addressable storage. Storage devices get to "cheat"
> with guaranteed minimum 512-byte accesses. So you can arrange for
> writes to always be large enough to scrub the ECC bits along with the
> data. For PMEM and byte-granularity DAX accesses the "sector size" is
> a cacheline and it needed a new CPU instruction before software could
> atomically update data + ECC. Otherwise, with sub-cacheline accesses,
> a RMW cycle can't always be avoided. Such a cycle pulls poison from
> the device on the read and pushes it back out to the media on the
> cacheline writeback.

Indeed.  The fake byte addressability is indeed the problem, and the
fix is to not do that, at least on the second attempt.

> I don't understand what overprovisioning has to do with better error
> management? No other storage device has seen fit to be as transparent
> with communicating the error list and offering ways to proactively
> scrub it. Dave and Darrick rightly saw this and said "hey, the FS
> could do a much better job for the user if it knew about this error
> list". So I don't get what this argument about spare blocks has to do
> with what XFS wants? I.e. an rmap facility to communicate files that
> have been clobbered by cosmic rays and other calamities.

Well, the answer for other interfaces (at least at the gold plated
cost option) is so strong internal CRCs that user visible bits clobbered
by cosmic rays don't realisticly happen.  But it is a problem with the
cheaper ones, and at least SCSI and NVMe offer the error list through
the Get LBA status command (and I bet ATA too, but I haven't looked into
that).  Oddly enough there has never been much interested from the
fs community for those.

> > So far out of the low instrusiveness options Janes' previous series
> > to automatically retry after calling a clear_poison operation seems
> > like the best idea so far.  We just need to also think about what
> > we want to do for direct users of ->direct_access that do not use
> > the mcsafe iov_iter helpers.
> 
> Those exist? Even dm-writecache uses copy_mc_to_kernel().

I'm sorry, I have completely missed that it has been added.  And it's
been in for a whole year..

WARNING: multiple messages have this Message-ID (diff)
From: Christoph Hellwig <hch@infradead.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Jane Chu <jane.chu@oracle.com>,
	"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	"snitzer@redhat.com" <snitzer@redhat.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	"david@fromorbit.com" <david@fromorbit.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"willy@infradead.org" <willy@infradead.org>,
	Christoph Hellwig <hch@infradead.org>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"vgoyal@redhat.com" <vgoyal@redhat.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"agk@redhat.com" <agk@redhat.com>
Subject: Re: [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag
Date: Thu, 4 Nov 2021 10:43:23 -0700	[thread overview]
Message-ID: <YYQbu6dOCVB7yS02@infradead.org> (raw)
In-Reply-To: <CAPcyv4jKHH7H+PmcsGDxsWA5CS_U3USHM8cT1MhoLk72fa9z8Q@mail.gmail.com>

On Thu, Nov 04, 2021 at 09:24:15AM -0700, Dan Williams wrote:
> No, the big difference with every other modern storage device is
> access to byte-addressable storage. Storage devices get to "cheat"
> with guaranteed minimum 512-byte accesses. So you can arrange for
> writes to always be large enough to scrub the ECC bits along with the
> data. For PMEM and byte-granularity DAX accesses the "sector size" is
> a cacheline and it needed a new CPU instruction before software could
> atomically update data + ECC. Otherwise, with sub-cacheline accesses,
> a RMW cycle can't always be avoided. Such a cycle pulls poison from
> the device on the read and pushes it back out to the media on the
> cacheline writeback.

Indeed.  The fake byte addressability is indeed the problem, and the
fix is to not do that, at least on the second attempt.

> I don't understand what overprovisioning has to do with better error
> management? No other storage device has seen fit to be as transparent
> with communicating the error list and offering ways to proactively
> scrub it. Dave and Darrick rightly saw this and said "hey, the FS
> could do a much better job for the user if it knew about this error
> list". So I don't get what this argument about spare blocks has to do
> with what XFS wants? I.e. an rmap facility to communicate files that
> have been clobbered by cosmic rays and other calamities.

Well, the answer for other interfaces (at least at the gold plated
cost option) is so strong internal CRCs that user visible bits clobbered
by cosmic rays don't realisticly happen.  But it is a problem with the
cheaper ones, and at least SCSI and NVMe offer the error list through
the Get LBA status command (and I bet ATA too, but I haven't looked into
that).  Oddly enough there has never been much interested from the
fs community for those.

> > So far out of the low instrusiveness options Janes' previous series
> > to automatically retry after calling a clear_poison operation seems
> > like the best idea so far.  We just need to also think about what
> > we want to do for direct users of ->direct_access that do not use
> > the mcsafe iov_iter helpers.
> 
> Those exist? Even dm-writecache uses copy_mc_to_kernel().

I'm sorry, I have completely missed that it has been added.  And it's
been in for a whole year..

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


  reply	other threads:[~2021-11-04 17:43 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-21  0:10 [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag Jane Chu
2021-10-21  0:10 ` [dm-devel] " Jane Chu
2021-10-21  0:10 ` [PATCH 1/6] dax: introduce RWF_RECOVERY_DATA flag to preadv2() and pwritev2() Jane Chu
2021-10-21  0:10   ` [dm-devel] " Jane Chu
2021-10-21  0:10 ` [PATCH 2/6] dax: prepare dax_direct_access() API with DAXDEV_F_RECOVERY flag Jane Chu
2021-10-21  0:10   ` [dm-devel] " Jane Chu
2021-10-21 11:20   ` Christoph Hellwig
2021-10-21 11:20     ` [dm-devel] " Christoph Hellwig
2021-10-21 18:19     ` Jane Chu
2021-10-21 18:19       ` [dm-devel] " Jane Chu
2021-10-21  0:10 ` [PATCH 3/6] pmem: pmem_dax_direct_access() to honor the " Jane Chu
2021-10-21  0:10   ` [dm-devel] " Jane Chu
2021-10-21 11:23   ` Christoph Hellwig
2021-10-21 11:23     ` [dm-devel] " Christoph Hellwig
2021-10-21 18:24     ` Jane Chu
2021-10-21 18:24       ` [dm-devel] " Jane Chu
2021-10-21  0:10 ` [PATCH 4/6] dm,dax,pmem: prepare dax_copy_to/from_iter() APIs with DAXDEV_F_RECOVERY Jane Chu
2021-10-21  0:10   ` [dm-devel] [PATCH 4/6] dm, dax, pmem: " Jane Chu
2021-10-21 11:27   ` [PATCH 4/6] dm,dax,pmem: " Christoph Hellwig
2021-10-21 11:27     ` [dm-devel] [PATCH 4/6] dm, dax, pmem: " Christoph Hellwig
2021-10-22  0:49     ` [PATCH 4/6] dm,dax,pmem: " Jane Chu
2021-10-22  0:49       ` [dm-devel] [PATCH 4/6] dm, dax, pmem: " Jane Chu
2021-10-22  1:41       ` correction: Re: [PATCH 4/6] dm,dax,pmem: " Jane Chu
2021-10-22  1:41         ` [dm-devel] correction: Re: [PATCH 4/6] dm, dax, pmem: " Jane Chu
2021-10-22  5:33       ` [PATCH 4/6] dm,dax,pmem: " Christoph Hellwig
2021-10-22  5:33         ` [dm-devel] [PATCH 4/6] dm, dax, pmem: " Christoph Hellwig
2021-10-22 20:30         ` [PATCH 4/6] dm,dax,pmem: " Jane Chu
2021-10-22 20:30           ` [dm-devel] [PATCH 4/6] dm, dax, pmem: " Jane Chu
2021-10-21  0:10 ` [PATCH 5/6] dax,pmem: Add data recovery feature to pmem_copy_to/from_iter() Jane Chu
2021-10-21  0:10   ` [dm-devel] [PATCH 5/6] dax, pmem: " Jane Chu
2021-10-21 11:28   ` [PATCH 5/6] dax,pmem: " Christoph Hellwig
2021-10-21 11:28     ` [dm-devel] [PATCH 5/6] dax, pmem: " Christoph Hellwig
2021-10-22  0:58     ` [PATCH 5/6] dax,pmem: " Jane Chu
2021-10-22  0:58       ` [dm-devel] [PATCH 5/6] dax, pmem: " Jane Chu
2021-10-22  8:03   ` kernel test robot
2021-10-22  8:03     ` kernel test robot
2021-10-26 10:21   ` [PATCH 5/6] dax,pmem: " kernel test robot
2021-10-26 10:21     ` [PATCH 5/6] dax, pmem: " kernel test robot
2021-10-26 10:21     ` [dm-devel] " kernel test robot
2021-10-21  0:10 ` [PATCH 6/6] dm: Ensure dm honors DAXDEV_F_RECOVERY flag on dax only Jane Chu
2021-10-21  0:10   ` [dm-devel] " Jane Chu
2021-10-21 11:31 ` [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag Christoph Hellwig
2021-10-21 11:31   ` Christoph Hellwig
2021-10-22  1:37   ` Jane Chu
2021-10-22  1:37     ` Jane Chu
2021-10-22  1:58     ` Darrick J. Wong
2021-10-22  1:58       ` Darrick J. Wong
2021-10-22  5:38       ` Christoph Hellwig
2021-10-22  5:38         ` Christoph Hellwig
2021-10-22  5:36     ` Christoph Hellwig
2021-10-22  5:36       ` Christoph Hellwig
2021-10-22 20:52       ` Jane Chu
2021-10-22 20:52         ` Jane Chu
2021-10-27  6:49         ` Christoph Hellwig
2021-10-27  6:49           ` Christoph Hellwig
2021-10-28  0:24           ` Darrick J. Wong
2021-10-28  0:24             ` Darrick J. Wong
2021-10-28 22:59             ` Dave Chinner
2021-10-28 22:59               ` Dave Chinner
2021-10-29 11:46               ` Pavel Begunkov
2021-10-29 11:46                 ` Pavel Begunkov
2021-10-29 16:57                 ` Darrick J. Wong
2021-10-29 16:57                   ` Darrick J. Wong
2021-10-29 19:23                   ` Pavel Begunkov
2021-10-29 19:23                     ` Pavel Begunkov
2021-10-29 20:08                     ` Darrick J. Wong
2021-10-29 20:08                       ` Darrick J. Wong
2021-10-31 13:27                       ` Pavel Begunkov
2021-10-31 13:27                         ` Pavel Begunkov
2021-10-29 18:53                 ` Jane Chu
2021-10-29 18:53                   ` Jane Chu
2021-10-29 22:32                 ` Dave Chinner
2021-10-29 22:32                   ` Dave Chinner
2021-10-31 13:19                   ` Pavel Begunkov
2021-10-31 13:19                     ` Pavel Begunkov
2021-11-01  2:31                     ` Matthew Wilcox
2021-11-01  2:31                       ` Matthew Wilcox
2021-11-02  6:18             ` Christoph Hellwig
2021-11-02  6:18               ` Christoph Hellwig
2021-11-02 19:57               ` Dan Williams
2021-11-02 19:57                 ` Dan Williams
2021-11-03 16:58                 ` Christoph Hellwig
2021-11-03 16:58                   ` Christoph Hellwig
2021-11-03 20:33                   ` Dan Williams
2021-11-03 20:33                     ` Dan Williams
2021-11-04  8:30                     ` Christoph Hellwig
2021-11-04  8:30                       ` Christoph Hellwig
2021-11-04 12:29                       ` Matthew Wilcox
2021-11-04 12:29                         ` Matthew Wilcox
2021-11-04 16:24                       ` Dan Williams
2021-11-04 16:24                         ` Dan Williams
2021-11-04 17:43                         ` Christoph Hellwig [this message]
2021-11-04 17:43                           ` Christoph Hellwig
2021-11-04 17:50                           ` Dan Williams
2021-11-04 17:50                             ` Dan Williams
2021-11-04 18:05                           ` Matthew Wilcox
2021-11-04 18:05                             ` Matthew Wilcox
2021-11-04 18:33                         ` Jane Chu
2021-11-04 18:33                           ` Jane Chu
2021-11-04 19:00                           ` Dan Williams
2021-11-04 19:00                             ` Dan Williams
2021-11-04 20:27                             ` Jane Chu
2021-11-04 20:27                               ` Jane Chu
2021-11-05  0:46                               ` Dan Williams
2021-11-05  0:46                                 ` Dan Williams
2021-11-05  1:35                                 ` Dan Williams
2021-11-05  1:35                                   ` Dan Williams
2021-11-05  5:56                             ` Christoph Hellwig
2021-11-05  5:56                               ` Christoph Hellwig
2021-11-03 18:09               ` Jane Chu
2021-11-03 18:09                 ` Jane Chu
2021-11-04  6:21                 ` Dan Williams
2021-11-04  6:21                   ` Dan Williams
2021-11-04  8:36                   ` Christoph Hellwig
2021-11-04  8:36                     ` Christoph Hellwig
2021-11-04 16:08                     ` Dan Williams
2021-11-04 16:08                       ` Dan Williams
2021-11-04 17:46                       ` Christoph Hellwig
2021-11-04 17:46                         ` Christoph Hellwig
2021-11-04  8:21                 ` Christoph Hellwig
2021-11-04  8:21                   ` Christoph Hellwig
2021-11-02 16:12             ` Dan Williams
2021-11-02 16:12               ` Dan Williams
2021-11-02 16:03           ` Dan Williams
2021-11-02 16:03             ` Dan Williams
2021-11-03 16:53             ` Christoph Hellwig
2021-11-03 16:53               ` Christoph Hellwig
2021-11-06  7:41             ` Lukas Straub
2021-11-06  7:41               ` Lukas Straub

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YYQbu6dOCVB7yS02@infradead.org \
    --to=hch@infradead.org \
    --cc=agk@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=dm-devel@redhat.com \
    --cc=ira.weiny@intel.com \
    --cc=jane.chu@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=snitzer@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.