All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Jane Chu <jane.chu@oracle.com>,
	"david@fromorbit.com" <david@fromorbit.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	"agk@redhat.com" <agk@redhat.com>,
	"snitzer@redhat.com" <snitzer@redhat.com>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>,
	"willy@infradead.org" <willy@infradead.org>,
	"vgoyal@redhat.com" <vgoyal@redhat.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>
Subject: Re: [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag
Date: Wed, 3 Nov 2021 09:58:28 -0700	[thread overview]
Message-ID: <YYK/tGfpG0CnVIO4@infradead.org> (raw)
In-Reply-To: <CAPcyv4j8snuGpy=z6BAXogQkP5HmTbqzd6e22qyERoNBvFKROw@mail.gmail.com>

On Tue, Nov 02, 2021 at 12:57:10PM -0700, Dan Williams wrote:
> This goes back to one of the original DAX concerns of wanting a kernel
> library for coordinating PMEM mmap I/O vs leaving userspace to wrap
> PMEM semantics on top of a DAX mapping. The problem is that mmap-I/O
> has this error-handling-API issue whether it is a DAX mapping or not.

Semantics of writes through shared mmaps are a nightmare.  Agreed,
including agreeing that this is neither new nor pmem specific.  But
it also has absolutely nothing to do with the new RWF_ flag.

> CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE implies that processes will
> receive SIGBUS + BUS_MCEERR_A{R,O} when memory failure is signalled
> and then rely on readv(2)/writev(2) to recover. Do you see a readily
> available way to improve upon that model without CPU instruction
> changes? Even with CPU instructions changes, do you think it could
> improve much upon the model of interrupting the process when a load
> instruction aborts?

The "only" think we need is something like the exception table we
use in the kernel for the uaccess helpers (and the new _nofault
kernel access helper).  But I suspect refitting that into userspace
environments is probably non-trivial.

> I do agree with you that DAX needs to separate itself from block, but
> I don't think it follows that DAX also needs to separate itself from
> readv/writev for when a kernel slow-path needs to get involved because
> mmap I/O (just CPU instructions) does not have the proper semantics.
> Even if you got one of the ARCH_SUPPORTS_MEMORY_FAILURE to implement
> those semantics in new / augmented CPU instructions you will likely
> not get all of them to move and certainly not in any near term
> timeframe, so the kernel path will be around indefinitely.

I think you misunderstood me.  I don't think pmem needs to be
decoupled from the read/write path.  But I'm very skeptical of adding
a new flag to the common read/write path for the special workaround
that a plain old write will not actually clear errors unlike every
other store interfac.

> Meanwhile, I think RWF_RECOVER_DATA is generically useful for other
> storage besides PMEM and helps storage-drivers do better than large
> blast radius "I/O error" completions with no other recourse.

How?

WARNING: multiple messages have this Message-ID (diff)
From: Christoph Hellwig <hch@infradead.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Jane Chu <jane.chu@oracle.com>,
	"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	"snitzer@redhat.com" <snitzer@redhat.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	"david@fromorbit.com" <david@fromorbit.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"willy@infradead.org" <willy@infradead.org>,
	Christoph Hellwig <hch@infradead.org>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"vgoyal@redhat.com" <vgoyal@redhat.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"agk@redhat.com" <agk@redhat.com>
Subject: Re: [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag
Date: Wed, 3 Nov 2021 09:58:28 -0700	[thread overview]
Message-ID: <YYK/tGfpG0CnVIO4@infradead.org> (raw)
In-Reply-To: <CAPcyv4j8snuGpy=z6BAXogQkP5HmTbqzd6e22qyERoNBvFKROw@mail.gmail.com>

On Tue, Nov 02, 2021 at 12:57:10PM -0700, Dan Williams wrote:
> This goes back to one of the original DAX concerns of wanting a kernel
> library for coordinating PMEM mmap I/O vs leaving userspace to wrap
> PMEM semantics on top of a DAX mapping. The problem is that mmap-I/O
> has this error-handling-API issue whether it is a DAX mapping or not.

Semantics of writes through shared mmaps are a nightmare.  Agreed,
including agreeing that this is neither new nor pmem specific.  But
it also has absolutely nothing to do with the new RWF_ flag.

> CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE implies that processes will
> receive SIGBUS + BUS_MCEERR_A{R,O} when memory failure is signalled
> and then rely on readv(2)/writev(2) to recover. Do you see a readily
> available way to improve upon that model without CPU instruction
> changes? Even with CPU instructions changes, do you think it could
> improve much upon the model of interrupting the process when a load
> instruction aborts?

The "only" think we need is something like the exception table we
use in the kernel for the uaccess helpers (and the new _nofault
kernel access helper).  But I suspect refitting that into userspace
environments is probably non-trivial.

> I do agree with you that DAX needs to separate itself from block, but
> I don't think it follows that DAX also needs to separate itself from
> readv/writev for when a kernel slow-path needs to get involved because
> mmap I/O (just CPU instructions) does not have the proper semantics.
> Even if you got one of the ARCH_SUPPORTS_MEMORY_FAILURE to implement
> those semantics in new / augmented CPU instructions you will likely
> not get all of them to move and certainly not in any near term
> timeframe, so the kernel path will be around indefinitely.

I think you misunderstood me.  I don't think pmem needs to be
decoupled from the read/write path.  But I'm very skeptical of adding
a new flag to the common read/write path for the special workaround
that a plain old write will not actually clear errors unlike every
other store interfac.

> Meanwhile, I think RWF_RECOVER_DATA is generically useful for other
> storage besides PMEM and helps storage-drivers do better than large
> blast radius "I/O error" completions with no other recourse.

How?

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


  reply	other threads:[~2021-11-03 16:58 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-21  0:10 [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag Jane Chu
2021-10-21  0:10 ` [dm-devel] " Jane Chu
2021-10-21  0:10 ` [PATCH 1/6] dax: introduce RWF_RECOVERY_DATA flag to preadv2() and pwritev2() Jane Chu
2021-10-21  0:10   ` [dm-devel] " Jane Chu
2021-10-21  0:10 ` [PATCH 2/6] dax: prepare dax_direct_access() API with DAXDEV_F_RECOVERY flag Jane Chu
2021-10-21  0:10   ` [dm-devel] " Jane Chu
2021-10-21 11:20   ` Christoph Hellwig
2021-10-21 11:20     ` [dm-devel] " Christoph Hellwig
2021-10-21 18:19     ` Jane Chu
2021-10-21 18:19       ` [dm-devel] " Jane Chu
2021-10-21  0:10 ` [PATCH 3/6] pmem: pmem_dax_direct_access() to honor the " Jane Chu
2021-10-21  0:10   ` [dm-devel] " Jane Chu
2021-10-21 11:23   ` Christoph Hellwig
2021-10-21 11:23     ` [dm-devel] " Christoph Hellwig
2021-10-21 18:24     ` Jane Chu
2021-10-21 18:24       ` [dm-devel] " Jane Chu
2021-10-21  0:10 ` [PATCH 4/6] dm,dax,pmem: prepare dax_copy_to/from_iter() APIs with DAXDEV_F_RECOVERY Jane Chu
2021-10-21  0:10   ` [dm-devel] [PATCH 4/6] dm, dax, pmem: " Jane Chu
2021-10-21 11:27   ` [PATCH 4/6] dm,dax,pmem: " Christoph Hellwig
2021-10-21 11:27     ` [dm-devel] [PATCH 4/6] dm, dax, pmem: " Christoph Hellwig
2021-10-22  0:49     ` [PATCH 4/6] dm,dax,pmem: " Jane Chu
2021-10-22  0:49       ` [dm-devel] [PATCH 4/6] dm, dax, pmem: " Jane Chu
2021-10-22  1:41       ` correction: Re: [PATCH 4/6] dm,dax,pmem: " Jane Chu
2021-10-22  1:41         ` [dm-devel] correction: Re: [PATCH 4/6] dm, dax, pmem: " Jane Chu
2021-10-22  5:33       ` [PATCH 4/6] dm,dax,pmem: " Christoph Hellwig
2021-10-22  5:33         ` [dm-devel] [PATCH 4/6] dm, dax, pmem: " Christoph Hellwig
2021-10-22 20:30         ` [PATCH 4/6] dm,dax,pmem: " Jane Chu
2021-10-22 20:30           ` [dm-devel] [PATCH 4/6] dm, dax, pmem: " Jane Chu
2021-10-21  0:10 ` [PATCH 5/6] dax,pmem: Add data recovery feature to pmem_copy_to/from_iter() Jane Chu
2021-10-21  0:10   ` [dm-devel] [PATCH 5/6] dax, pmem: " Jane Chu
2021-10-21 11:28   ` [PATCH 5/6] dax,pmem: " Christoph Hellwig
2021-10-21 11:28     ` [dm-devel] [PATCH 5/6] dax, pmem: " Christoph Hellwig
2021-10-22  0:58     ` [PATCH 5/6] dax,pmem: " Jane Chu
2021-10-22  0:58       ` [dm-devel] [PATCH 5/6] dax, pmem: " Jane Chu
2021-10-22  8:03   ` kernel test robot
2021-10-22  8:03     ` kernel test robot
2021-10-26 10:21   ` [PATCH 5/6] dax,pmem: " kernel test robot
2021-10-26 10:21     ` [PATCH 5/6] dax, pmem: " kernel test robot
2021-10-26 10:21     ` [dm-devel] " kernel test robot
2021-10-21  0:10 ` [PATCH 6/6] dm: Ensure dm honors DAXDEV_F_RECOVERY flag on dax only Jane Chu
2021-10-21  0:10   ` [dm-devel] " Jane Chu
2021-10-21 11:31 ` [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag Christoph Hellwig
2021-10-21 11:31   ` Christoph Hellwig
2021-10-22  1:37   ` Jane Chu
2021-10-22  1:37     ` Jane Chu
2021-10-22  1:58     ` Darrick J. Wong
2021-10-22  1:58       ` Darrick J. Wong
2021-10-22  5:38       ` Christoph Hellwig
2021-10-22  5:38         ` Christoph Hellwig
2021-10-22  5:36     ` Christoph Hellwig
2021-10-22  5:36       ` Christoph Hellwig
2021-10-22 20:52       ` Jane Chu
2021-10-22 20:52         ` Jane Chu
2021-10-27  6:49         ` Christoph Hellwig
2021-10-27  6:49           ` Christoph Hellwig
2021-10-28  0:24           ` Darrick J. Wong
2021-10-28  0:24             ` Darrick J. Wong
2021-10-28 22:59             ` Dave Chinner
2021-10-28 22:59               ` Dave Chinner
2021-10-29 11:46               ` Pavel Begunkov
2021-10-29 11:46                 ` Pavel Begunkov
2021-10-29 16:57                 ` Darrick J. Wong
2021-10-29 16:57                   ` Darrick J. Wong
2021-10-29 19:23                   ` Pavel Begunkov
2021-10-29 19:23                     ` Pavel Begunkov
2021-10-29 20:08                     ` Darrick J. Wong
2021-10-29 20:08                       ` Darrick J. Wong
2021-10-31 13:27                       ` Pavel Begunkov
2021-10-31 13:27                         ` Pavel Begunkov
2021-10-29 18:53                 ` Jane Chu
2021-10-29 18:53                   ` Jane Chu
2021-10-29 22:32                 ` Dave Chinner
2021-10-29 22:32                   ` Dave Chinner
2021-10-31 13:19                   ` Pavel Begunkov
2021-10-31 13:19                     ` Pavel Begunkov
2021-11-01  2:31                     ` Matthew Wilcox
2021-11-01  2:31                       ` Matthew Wilcox
2021-11-02  6:18             ` Christoph Hellwig
2021-11-02  6:18               ` Christoph Hellwig
2021-11-02 19:57               ` Dan Williams
2021-11-02 19:57                 ` Dan Williams
2021-11-03 16:58                 ` Christoph Hellwig [this message]
2021-11-03 16:58                   ` Christoph Hellwig
2021-11-03 20:33                   ` Dan Williams
2021-11-03 20:33                     ` Dan Williams
2021-11-04  8:30                     ` Christoph Hellwig
2021-11-04  8:30                       ` Christoph Hellwig
2021-11-04 12:29                       ` Matthew Wilcox
2021-11-04 12:29                         ` Matthew Wilcox
2021-11-04 16:24                       ` Dan Williams
2021-11-04 16:24                         ` Dan Williams
2021-11-04 17:43                         ` Christoph Hellwig
2021-11-04 17:43                           ` Christoph Hellwig
2021-11-04 17:50                           ` Dan Williams
2021-11-04 17:50                             ` Dan Williams
2021-11-04 18:05                           ` Matthew Wilcox
2021-11-04 18:05                             ` Matthew Wilcox
2021-11-04 18:33                         ` Jane Chu
2021-11-04 18:33                           ` Jane Chu
2021-11-04 19:00                           ` Dan Williams
2021-11-04 19:00                             ` Dan Williams
2021-11-04 20:27                             ` Jane Chu
2021-11-04 20:27                               ` Jane Chu
2021-11-05  0:46                               ` Dan Williams
2021-11-05  0:46                                 ` Dan Williams
2021-11-05  1:35                                 ` Dan Williams
2021-11-05  1:35                                   ` Dan Williams
2021-11-05  5:56                             ` Christoph Hellwig
2021-11-05  5:56                               ` Christoph Hellwig
2021-11-03 18:09               ` Jane Chu
2021-11-03 18:09                 ` Jane Chu
2021-11-04  6:21                 ` Dan Williams
2021-11-04  6:21                   ` Dan Williams
2021-11-04  8:36                   ` Christoph Hellwig
2021-11-04  8:36                     ` Christoph Hellwig
2021-11-04 16:08                     ` Dan Williams
2021-11-04 16:08                       ` Dan Williams
2021-11-04 17:46                       ` Christoph Hellwig
2021-11-04 17:46                         ` Christoph Hellwig
2021-11-04  8:21                 ` Christoph Hellwig
2021-11-04  8:21                   ` Christoph Hellwig
2021-11-02 16:12             ` Dan Williams
2021-11-02 16:12               ` Dan Williams
2021-11-02 16:03           ` Dan Williams
2021-11-02 16:03             ` Dan Williams
2021-11-03 16:53             ` Christoph Hellwig
2021-11-03 16:53               ` Christoph Hellwig
2021-11-06  7:41             ` Lukas Straub
2021-11-06  7:41               ` Lukas Straub

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YYK/tGfpG0CnVIO4@infradead.org \
    --to=hch@infradead.org \
    --cc=agk@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=dm-devel@redhat.com \
    --cc=ira.weiny@intel.com \
    --cc=jane.chu@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=snitzer@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.