From: Dan Williams <dan.j.williams@intel.com> To: Dave Chinner <david@fromorbit.com> Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>, linux-nvdimm <linux-nvdimm@lists.01.org>, Christoph Hellwig <hch@infradead.org>, device-mapper development <dm-devel@redhat.com> Subject: Re: [PATCH v5 2/8] drivers/pmem: Allow pmem_clear_poison() to accept arbitrary offset and len Date: Thu, 27 Feb 2020 19:28:41 -0800 [thread overview] Message-ID: <CAPcyv4i2vjUGrwaRsjp1-L0wFf0a00e46F-SUbocQBkiQ3M1kg@mail.gmail.com> (raw) In-Reply-To: <20200228013054.GO10737@dread.disaster.area> On Thu, Feb 27, 2020 at 5:31 PM Dave Chinner <david@fromorbit.com> wrote: > On Wed, Feb 26, 2020 at 08:19:37PM -0800, Dan Williams wrote: [..] > > So you want the FS to have error handling for just pmem errors or all > > memory errors? > > Just pmem errors in the specific range the filesystem manages - we > really only care storage errors because those are the only errors > the filesystem is responsible for handling correctly. > > Somebody else can worry about errors that hit page cache pages - > page cache pages require mapping/index pointers on each page anyway, > so a generic mechanism for handling those errors can be built into > the page cache. And if the error is in general kernel memory, then > it's game over for the entire kernel at that point, not just the > filesystem. > > > And you want this to be done without the mm core using > > page->index to identify what to unmap when the error happens? > > Isn't that exactly what I just said? We get the page address that > failed, the daxdev can turn that into a sector address and call into > the filesystem with a {sector, len, errno} tuple. We then do a > reverse mapping lookup on {sector, len} to find all the owners of > that range in the filesystem. If it's file data, that owner record > gives us the inode and the offset into the file, which then gives us > a {mapping, index} tuple. > > Further, the filesytem reverse map is aware that it's blocks can be > shared by multiple owners, so it will have a record for every inode > and file offset that maps to that page. Hence we can simply iterate > the reverse map and do that whacky collect_procs/kill_procs dance > for every {mapping, index} pair that references the the bad range. > > Nothing ever needs to be stored on the struct page... Ok, so fs/dax.c needs to coordinate with mm/memory-failure.c to say "don't perform generic memory-error-handling, there's an fs that owns this daxdev and wants to disposition errors". The fs/dax.c infrastructure that sets up ->index and ->mapping would still need to be there for ext4 until its ready to take on the same responsibility. Last I checked the ext4 reverse mapping implementation was not as capable as XFS. This goes back to why the initial design avoided relying on not universally available / stable reverse-mapping facilities and opted for extending the generic mm/memory-failure.c implementation. If XFS optionally supplants mm/memory-failure.c I would expect we'd want to do better than the "whacky collect_procs/kill_procs" implementation and let applications register for a notification format better than BUS_MCEERR_AO signals. > > Memory > > error scanning is not universally available on all pmem > > implementations, so FS will need to subscribe for memory-error > > handling events. > > No. Filesystems interact with the underlying device abstraction, not > the physical storage that lies behind that device abstraction. The > filesystem should not interface with pmem directly in any way (all > direct accesses are hidden inside fs/dax.c!), nor should it care > about the quirks of the pmem implementation it is sitting on. That's > what the daxdev abstraction is supposed to hide from the users of > the pmem. I wasn't proposing that XFS have a machine-check handler, I was trying to get to the next level detail of the async notification interface to the fs. > > IOWs, the daxdev infrastructure subscribes to memory-error event > subsystem, calls out to the filesystem when an error in a page in > the daxdev is reported. The filesystem only needs to know the > {sector, len, errno} tuple related to the error; it is the device's > responsibility to handle the physical mechanics of listening, > filtering and translating MCEs to a format the filesystem > understands.... > > Another reason it should be provided by the daxdev as a {sector, > len, errno} tuple is that it also allows non-dax block devices to > implement the similar error notifications and provide filesystems > with exactly the same information so the filesystem can start > auto-recovery processes.... The driver layer does already have 'struct badblocks' that both pmem and md use, just no current way for the FS to get a reference to it. However, my hope was to get away from the interface being sector based because the error granularity is already smaller than a sector in the daxdev case as compared to a bdev. A daxdev native error record should be a daxdev relative byte offset, not a sector. If the fs wants to align the blast radius of the error to sectors or fs-blocks that's its choice, but I don't think the driver interface should preclude sub-sector error handling. Essentially I don't want to add more bdev semantics to fs/dax.c while Vivek is busy removing them. _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com> To: Dave Chinner <david@fromorbit.com> Cc: Vivek Goyal <vgoyal@redhat.com>, Jeff Moyer <jmoyer@redhat.com>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, linux-nvdimm <linux-nvdimm@lists.01.org>, Christoph Hellwig <hch@infradead.org>, device-mapper development <dm-devel@redhat.com> Subject: Re: [PATCH v5 2/8] drivers/pmem: Allow pmem_clear_poison() to accept arbitrary offset and len Date: Thu, 27 Feb 2020 19:28:41 -0800 [thread overview] Message-ID: <CAPcyv4i2vjUGrwaRsjp1-L0wFf0a00e46F-SUbocQBkiQ3M1kg@mail.gmail.com> (raw) In-Reply-To: <20200228013054.GO10737@dread.disaster.area> On Thu, Feb 27, 2020 at 5:31 PM Dave Chinner <david@fromorbit.com> wrote: > On Wed, Feb 26, 2020 at 08:19:37PM -0800, Dan Williams wrote: [..] > > So you want the FS to have error handling for just pmem errors or all > > memory errors? > > Just pmem errors in the specific range the filesystem manages - we > really only care storage errors because those are the only errors > the filesystem is responsible for handling correctly. > > Somebody else can worry about errors that hit page cache pages - > page cache pages require mapping/index pointers on each page anyway, > so a generic mechanism for handling those errors can be built into > the page cache. And if the error is in general kernel memory, then > it's game over for the entire kernel at that point, not just the > filesystem. > > > And you want this to be done without the mm core using > > page->index to identify what to unmap when the error happens? > > Isn't that exactly what I just said? We get the page address that > failed, the daxdev can turn that into a sector address and call into > the filesystem with a {sector, len, errno} tuple. We then do a > reverse mapping lookup on {sector, len} to find all the owners of > that range in the filesystem. If it's file data, that owner record > gives us the inode and the offset into the file, which then gives us > a {mapping, index} tuple. > > Further, the filesytem reverse map is aware that it's blocks can be > shared by multiple owners, so it will have a record for every inode > and file offset that maps to that page. Hence we can simply iterate > the reverse map and do that whacky collect_procs/kill_procs dance > for every {mapping, index} pair that references the the bad range. > > Nothing ever needs to be stored on the struct page... Ok, so fs/dax.c needs to coordinate with mm/memory-failure.c to say "don't perform generic memory-error-handling, there's an fs that owns this daxdev and wants to disposition errors". The fs/dax.c infrastructure that sets up ->index and ->mapping would still need to be there for ext4 until its ready to take on the same responsibility. Last I checked the ext4 reverse mapping implementation was not as capable as XFS. This goes back to why the initial design avoided relying on not universally available / stable reverse-mapping facilities and opted for extending the generic mm/memory-failure.c implementation. If XFS optionally supplants mm/memory-failure.c I would expect we'd want to do better than the "whacky collect_procs/kill_procs" implementation and let applications register for a notification format better than BUS_MCEERR_AO signals. > > Memory > > error scanning is not universally available on all pmem > > implementations, so FS will need to subscribe for memory-error > > handling events. > > No. Filesystems interact with the underlying device abstraction, not > the physical storage that lies behind that device abstraction. The > filesystem should not interface with pmem directly in any way (all > direct accesses are hidden inside fs/dax.c!), nor should it care > about the quirks of the pmem implementation it is sitting on. That's > what the daxdev abstraction is supposed to hide from the users of > the pmem. I wasn't proposing that XFS have a machine-check handler, I was trying to get to the next level detail of the async notification interface to the fs. > > IOWs, the daxdev infrastructure subscribes to memory-error event > subsystem, calls out to the filesystem when an error in a page in > the daxdev is reported. The filesystem only needs to know the > {sector, len, errno} tuple related to the error; it is the device's > responsibility to handle the physical mechanics of listening, > filtering and translating MCEs to a format the filesystem > understands.... > > Another reason it should be provided by the daxdev as a {sector, > len, errno} tuple is that it also allows non-dax block devices to > implement the similar error notifications and provide filesystems > with exactly the same information so the filesystem can start > auto-recovery processes.... The driver layer does already have 'struct badblocks' that both pmem and md use, just no current way for the FS to get a reference to it. However, my hope was to get away from the interface being sector based because the error granularity is already smaller than a sector in the daxdev case as compared to a bdev. A daxdev native error record should be a daxdev relative byte offset, not a sector. If the fs wants to align the blast radius of the error to sectors or fs-blocks that's its choice, but I don't think the driver interface should preclude sub-sector error handling. Essentially I don't want to add more bdev semantics to fs/dax.c while Vivek is busy removing them.
next prev parent reply other threads:[~2020-02-28 3:28 UTC|newest] Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-02-18 21:48 [PATCH v5 0/8] dax/pmem: Provide a dax operation to zero range of memory Vivek Goyal 2020-02-18 21:48 ` Vivek Goyal 2020-02-18 21:48 ` [PATCH v5 1/8] pmem: Add functions for reading/writing page to/from pmem Vivek Goyal 2020-02-18 21:48 ` Vivek Goyal 2020-02-18 21:48 ` [PATCH v5 2/8] drivers/pmem: Allow pmem_clear_poison() to accept arbitrary offset and len Vivek Goyal 2020-02-18 21:48 ` Vivek Goyal 2020-02-20 16:17 ` Christoph Hellwig 2020-02-20 16:17 ` Christoph Hellwig 2020-02-20 21:35 ` Jeff Moyer 2020-02-20 21:35 ` Jeff Moyer 2020-02-20 21:57 ` Vivek Goyal 2020-02-20 21:57 ` Vivek Goyal 2020-02-21 18:32 ` Jeff Moyer 2020-02-21 18:32 ` Jeff Moyer 2020-02-21 20:17 ` Vivek Goyal 2020-02-21 20:17 ` Vivek Goyal 2020-02-21 21:00 ` Dan Williams 2020-02-21 21:00 ` Dan Williams 2020-02-21 21:24 ` Vivek Goyal 2020-02-21 21:24 ` Vivek Goyal 2020-02-21 21:30 ` Dan Williams 2020-02-21 21:30 ` Dan Williams 2020-02-21 21:33 ` Jeff Moyer 2020-02-21 21:33 ` Jeff Moyer 2020-02-23 23:03 ` Dave Chinner 2020-02-23 23:03 ` Dave Chinner 2020-02-24 0:40 ` Dan Williams 2020-02-24 0:40 ` Dan Williams 2020-02-24 13:50 ` Jeff Moyer 2020-02-24 13:50 ` Jeff Moyer 2020-02-24 20:48 ` Dan Williams 2020-02-24 20:48 ` Dan Williams 2020-02-24 21:53 ` Jeff Moyer 2020-02-24 21:53 ` Jeff Moyer 2020-02-25 0:26 ` Dan Williams 2020-02-25 0:26 ` Dan Williams 2020-02-25 20:32 ` Jeff Moyer 2020-02-25 20:32 ` Jeff Moyer 2020-02-25 21:52 ` Dan Williams 2020-02-25 21:52 ` Dan Williams 2020-02-25 23:26 ` Jane Chu 2020-02-25 23:26 ` Jane Chu 2020-02-24 15:38 ` Vivek Goyal 2020-02-24 15:38 ` Vivek Goyal 2020-02-27 3:02 ` Dave Chinner 2020-02-27 3:02 ` Dave Chinner 2020-02-27 4:19 ` Dan Williams 2020-02-27 4:19 ` Dan Williams 2020-02-28 1:30 ` Dave Chinner 2020-02-28 1:30 ` Dave Chinner 2020-02-28 3:28 ` Dan Williams [this message] 2020-02-28 3:28 ` Dan Williams 2020-02-28 14:05 ` Christoph Hellwig 2020-02-28 14:05 ` Christoph Hellwig 2020-02-28 16:26 ` Dan Williams 2020-02-28 16:26 ` Dan Williams 2020-02-24 20:13 ` Vivek Goyal 2020-02-24 20:13 ` Vivek Goyal 2020-02-24 20:52 ` Dan Williams 2020-02-24 20:52 ` Dan Williams 2020-02-24 21:15 ` Vivek Goyal 2020-02-24 21:15 ` Vivek Goyal 2020-02-24 21:32 ` Dan Williams 2020-02-24 21:32 ` Dan Williams 2020-02-25 13:36 ` Vivek Goyal 2020-02-25 13:36 ` Vivek Goyal 2020-02-25 16:25 ` Dan Williams 2020-02-25 16:25 ` Dan Williams 2020-02-25 20:08 ` Vivek Goyal 2020-02-25 20:08 ` Vivek Goyal 2020-02-25 22:49 ` Dan Williams 2020-02-25 22:49 ` Dan Williams 2020-02-26 13:51 ` Vivek Goyal 2020-02-26 13:51 ` Vivek Goyal 2020-02-26 16:57 ` Vivek Goyal 2020-02-26 16:57 ` Vivek Goyal 2020-02-27 3:11 ` Dave Chinner 2020-02-27 3:11 ` Dave Chinner 2020-02-27 15:25 ` Vivek Goyal 2020-02-27 15:25 ` Vivek Goyal 2020-02-28 1:50 ` Dave Chinner 2020-02-28 1:50 ` Dave Chinner 2020-02-18 21:48 ` [PATCH v5 3/8] pmem: Enable pmem_do_write() to deal with arbitrary ranges Vivek Goyal 2020-02-18 21:48 ` Vivek Goyal 2020-02-20 16:17 ` Christoph Hellwig 2020-02-20 16:17 ` Christoph Hellwig 2020-02-18 21:48 ` [PATCH v5 4/8] dax, pmem: Add a dax operation zero_page_range Vivek Goyal 2020-02-18 21:48 ` Vivek Goyal 2020-03-31 19:38 ` Dan Williams 2020-03-31 19:38 ` Dan Williams 2020-04-01 13:15 ` Vivek Goyal 2020-04-01 13:15 ` Vivek Goyal 2020-04-01 16:14 ` Vivek Goyal 2020-04-01 16:14 ` Vivek Goyal 2020-02-18 21:48 ` [PATCH v5 5/8] s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver Vivek Goyal 2020-02-18 21:48 ` Vivek Goyal 2020-02-18 21:48 ` [PATCH v5 6/8] dm,dax: Add dax zero_page_range operation Vivek Goyal 2020-02-18 21:48 ` Vivek Goyal 2020-02-18 21:48 ` [PATCH v5 7/8] dax,iomap: Start using dax native zero_page_range() Vivek Goyal 2020-02-18 21:48 ` Vivek Goyal 2020-02-18 21:48 ` [PATCH v5 8/8] dax,iomap: Add helper dax_iomap_zero() to zero a range Vivek Goyal 2020-02-18 21:48 ` Vivek Goyal 2020-04-25 11:31 ` [PATCH v5 8/8] dax, iomap: " neolift9
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAPcyv4i2vjUGrwaRsjp1-L0wFf0a00e46F-SUbocQBkiQ3M1kg@mail.gmail.com \ --to=dan.j.williams@intel.com \ --cc=david@fromorbit.com \ --cc=dm-devel@redhat.com \ --cc=hch@infradead.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-nvdimm@lists.01.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.