From: Dan Williams <dan.j.williams@intel.com> To: akpm@linux-foundation.org Cc: jack@suse.cz, linux-nvdimm@lists.01.org, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Heiko Carstens <heiko.carstens@de.ibm.com>, linux-xfs@vger.kernel.org, Martin Schwidefsky <schwidefsky@de.ibm.com>, Paul Mackerras <paulus@samba.org>, Michael Ellerman <mpe@ellerman.id.au>, linux-fsdevel@vger.kernel.org, hch@lst.de, Gerald Schaefer <gerald.schaefer@de.ibm.com> Subject: [PATCH v4 04/18] dax: require 'struct page' by default for filesystem dax Date: Sat, 23 Dec 2017 16:56:22 -0800 [thread overview] Message-ID: <151407698249.38751.17338746909239708376.stgit@dwillia2-desk3.amr.corp.intel.com> (raw) In-Reply-To: <151407695916.38751.2866053440557472361.stgit@dwillia2-desk3.amr.corp.intel.com> If a dax buffer from a device that does not map pages is passed to read(2) or write(2) as a target for direct-I/O it triggers SIGBUS. If gdb attempts to examine the contents of a dax buffer from a device that does not map pages it triggers SIGBUS. If fork(2) is called on a process with a dax mapping from a device that does not map pages it triggers SIGBUS. 'struct page' is required otherwise several kernel code paths break in surprising ways. Disable filesystem-dax on devices that do not map pages. In addition to needing pfn_to_page() to be valid we also require devmap pages. We need this to detect dax pages in the get_user_pages_fast() path and so that we can stop managing the VM_MIXEDMAP flag. For DAX drivers that have not supported get_user_pages() to date we allow them to opt-in to supporting DAX with the CONFIG_FS_DAX_LIMITED configuration option which requires ->direct_access() to return pfn_t_special() pfns. This leaves DAX support in brd disabled and scheduled for removal. Note that when the initial dax support was being merged a few years back there was concern that struct page was unsuitable for use with next generation persistent memory devices. The theoretical concern was that struct page access, being such a hotly used data structure in the kernel, would lead to media wear out. While that was a reasonable conservative starting position it has not held true in practice. We have long since committed to using devm_memremap_pages() to support higher order kernel functionality that needs get_user_pages() and pfn_to_page(). Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- arch/powerpc/platforms/Kconfig | 1 + drivers/dax/super.c | 10 ++++++++++ drivers/s390/block/Kconfig | 1 + fs/Kconfig | 7 +++++++ 4 files changed, 19 insertions(+) diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig index 5a96a2763e4a..2ce89b42a9f4 100644 --- a/arch/powerpc/platforms/Kconfig +++ b/arch/powerpc/platforms/Kconfig @@ -297,6 +297,7 @@ config AXON_RAM tristate "Axon DDR2 memory device driver" depends on PPC_IBM_CELL_BLADE && BLOCK select DAX + select FS_DAX_LIMITED default m help It registers one block device per Axon's DDR2 memory bank found diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 3ec804672601..473af694ad1c 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -15,6 +15,7 @@ #include <linux/mount.h> #include <linux/magic.h> #include <linux/genhd.h> +#include <linux/pfn_t.h> #include <linux/cdev.h> #include <linux/hash.h> #include <linux/slab.h> @@ -123,6 +124,15 @@ int __bdev_dax_supported(struct super_block *sb, int blocksize) return len < 0 ? len : -EIO; } + if ((IS_ENABLED(CONFIG_FS_DAX_LIMITED) && pfn_t_special(pfn)) + || pfn_t_devmap(pfn)) + /* pass */; + else { + pr_debug("VFS (%s): error: dax support not enabled\n", + sb->s_id); + return -EOPNOTSUPP; + } + return 0; } EXPORT_SYMBOL_GPL(__bdev_dax_supported); diff --git a/drivers/s390/block/Kconfig b/drivers/s390/block/Kconfig index 31f014b57bfc..594ae5fc8e9d 100644 --- a/drivers/s390/block/Kconfig +++ b/drivers/s390/block/Kconfig @@ -15,6 +15,7 @@ config BLK_DEV_XPRAM config DCSSBLK def_tristate m select DAX + select FS_DAX_LIMITED prompt "DCSSBLK support" depends on S390 && BLOCK help diff --git a/fs/Kconfig b/fs/Kconfig index 7aee6d699fd6..b40128bf6d1a 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -58,6 +58,13 @@ config FS_DAX_PMD depends on ZONE_DEVICE depends on TRANSPARENT_HUGEPAGE +# Selected by DAX drivers that do not expect filesystem DAX to support +# get_user_pages() of DAX mappings. I.e. "limited" indicates no support +# for fork() of processes with MAP_SHARED mappings or support for +# direct-I/O to a DAX mapping. +config FS_DAX_LIMITED + bool + endif # BLOCK # Posix ACL utility routines _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com> To: akpm@linux-foundation.org Cc: jack@suse.cz, linux-nvdimm@lists.01.org, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Heiko Carstens <heiko.carstens@de.ibm.com>, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, Jeff Moyer <jmoyer@redhat.com>, Paul Mackerras <paulus@samba.org>, Michael Ellerman <mpe@ellerman.id.au>, Martin Schwidefsky <schwidefsky@de.ibm.com>, ross.zwisler@linux.intel.com, hch@lst.de, Gerald Schaefer <gerald.schaefer@de.ibm.com> Subject: [PATCH v4 04/18] dax: require 'struct page' by default for filesystem dax Date: Sat, 23 Dec 2017 16:56:22 -0800 [thread overview] Message-ID: <151407698249.38751.17338746909239708376.stgit@dwillia2-desk3.amr.corp.intel.com> (raw) In-Reply-To: <151407695916.38751.2866053440557472361.stgit@dwillia2-desk3.amr.corp.intel.com> If a dax buffer from a device that does not map pages is passed to read(2) or write(2) as a target for direct-I/O it triggers SIGBUS. If gdb attempts to examine the contents of a dax buffer from a device that does not map pages it triggers SIGBUS. If fork(2) is called on a process with a dax mapping from a device that does not map pages it triggers SIGBUS. 'struct page' is required otherwise several kernel code paths break in surprising ways. Disable filesystem-dax on devices that do not map pages. In addition to needing pfn_to_page() to be valid we also require devmap pages. We need this to detect dax pages in the get_user_pages_fast() path and so that we can stop managing the VM_MIXEDMAP flag. For DAX drivers that have not supported get_user_pages() to date we allow them to opt-in to supporting DAX with the CONFIG_FS_DAX_LIMITED configuration option which requires ->direct_access() to return pfn_t_special() pfns. This leaves DAX support in brd disabled and scheduled for removal. Note that when the initial dax support was being merged a few years back there was concern that struct page was unsuitable for use with next generation persistent memory devices. The theoretical concern was that struct page access, being such a hotly used data structure in the kernel, would lead to media wear out. While that was a reasonable conservative starting position it has not held true in practice. We have long since committed to using devm_memremap_pages() to support higher order kernel functionality that needs get_user_pages() and pfn_to_page(). Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- arch/powerpc/platforms/Kconfig | 1 + drivers/dax/super.c | 10 ++++++++++ drivers/s390/block/Kconfig | 1 + fs/Kconfig | 7 +++++++ 4 files changed, 19 insertions(+) diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig index 5a96a2763e4a..2ce89b42a9f4 100644 --- a/arch/powerpc/platforms/Kconfig +++ b/arch/powerpc/platforms/Kconfig @@ -297,6 +297,7 @@ config AXON_RAM tristate "Axon DDR2 memory device driver" depends on PPC_IBM_CELL_BLADE && BLOCK select DAX + select FS_DAX_LIMITED default m help It registers one block device per Axon's DDR2 memory bank found diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 3ec804672601..473af694ad1c 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -15,6 +15,7 @@ #include <linux/mount.h> #include <linux/magic.h> #include <linux/genhd.h> +#include <linux/pfn_t.h> #include <linux/cdev.h> #include <linux/hash.h> #include <linux/slab.h> @@ -123,6 +124,15 @@ int __bdev_dax_supported(struct super_block *sb, int blocksize) return len < 0 ? len : -EIO; } + if ((IS_ENABLED(CONFIG_FS_DAX_LIMITED) && pfn_t_special(pfn)) + || pfn_t_devmap(pfn)) + /* pass */; + else { + pr_debug("VFS (%s): error: dax support not enabled\n", + sb->s_id); + return -EOPNOTSUPP; + } + return 0; } EXPORT_SYMBOL_GPL(__bdev_dax_supported); diff --git a/drivers/s390/block/Kconfig b/drivers/s390/block/Kconfig index 31f014b57bfc..594ae5fc8e9d 100644 --- a/drivers/s390/block/Kconfig +++ b/drivers/s390/block/Kconfig @@ -15,6 +15,7 @@ config BLK_DEV_XPRAM config DCSSBLK def_tristate m select DAX + select FS_DAX_LIMITED prompt "DCSSBLK support" depends on S390 && BLOCK help diff --git a/fs/Kconfig b/fs/Kconfig index 7aee6d699fd6..b40128bf6d1a 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -58,6 +58,13 @@ config FS_DAX_PMD depends on ZONE_DEVICE depends on TRANSPARENT_HUGEPAGE +# Selected by DAX drivers that do not expect filesystem DAX to support +# get_user_pages() of DAX mappings. I.e. "limited" indicates no support +# for fork() of processes with MAP_SHARED mappings or support for +# direct-I/O to a DAX mapping. +config FS_DAX_LIMITED + bool + endif # BLOCK # Posix ACL utility routines
next prev parent reply other threads:[~2017-12-24 0:59 UTC|newest] Thread overview: 136+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-12-24 0:56 [PATCH v4 00/18] dax: fix dma vs truncate/hole-punch Dan Williams 2017-12-24 0:56 ` Dan Williams 2017-12-24 0:56 ` Dan Williams 2017-12-24 0:56 ` [PATCH v4 01/18] mm, dax: introduce pfn_t_special() Dan Williams 2017-12-24 0:56 ` Dan Williams 2018-01-04 8:16 ` Christoph Hellwig 2018-01-04 8:16 ` Christoph Hellwig 2017-12-24 0:56 ` [PATCH v4 02/18] ext4: auto disable dax instead of failing mount Dan Williams 2017-12-24 0:56 ` Dan Williams 2018-01-03 14:20 ` Jan Kara 2018-01-03 14:20 ` Jan Kara 2017-12-24 0:56 ` [PATCH v4 03/18] ext2: " Dan Williams 2017-12-24 0:56 ` Dan Williams 2018-01-03 14:21 ` Jan Kara 2018-01-03 14:21 ` Jan Kara 2017-12-24 0:56 ` Dan Williams [this message] 2017-12-24 0:56 ` [PATCH v4 04/18] dax: require 'struct page' by default for filesystem dax Dan Williams 2018-01-03 15:29 ` Jan Kara 2018-01-03 15:29 ` Jan Kara 2018-01-04 8:16 ` Christoph Hellwig 2018-01-04 8:16 ` Christoph Hellwig 2018-01-08 11:58 ` Gerald Schaefer 2018-01-08 11:58 ` Gerald Schaefer 2017-12-24 0:56 ` [PATCH v4 05/18] dax: stop using VM_MIXEDMAP for dax Dan Williams 2017-12-24 0:56 ` Dan Williams 2018-01-03 15:27 ` Jan Kara 2018-01-03 15:27 ` Jan Kara 2017-12-24 0:56 ` [PATCH v4 06/18] dax: stop using VM_HUGEPAGE " Dan Williams 2017-12-24 0:56 ` Dan Williams 2017-12-24 0:56 ` [PATCH v4 07/18] dax: store pfns in the radix Dan Williams 2017-12-24 0:56 ` Dan Williams 2017-12-27 0:17 ` Ross Zwisler 2017-12-27 0:17 ` Ross Zwisler 2018-01-02 20:15 ` Dan Williams 2018-01-02 20:15 ` Dan Williams 2018-01-03 15:39 ` Jan Kara 2018-01-03 15:39 ` Jan Kara 2017-12-24 0:56 ` [PATCH v4 08/18] tools/testing/nvdimm: add 'bio_delay' mechanism Dan Williams 2017-12-24 0:56 ` Dan Williams 2017-12-27 18:08 ` Ross Zwisler 2017-12-27 18:08 ` Ross Zwisler 2018-01-02 20:35 ` Dan Williams 2018-01-02 20:35 ` Dan Williams 2018-01-02 21:44 ` Dave Chinner 2018-01-02 21:44 ` Dave Chinner 2018-01-02 21:51 ` Dan Williams 2018-01-02 21:51 ` Dan Williams 2018-01-03 15:46 ` Jan Kara 2018-01-03 15:46 ` Jan Kara 2018-01-03 20:37 ` Jeff Moyer 2018-01-03 20:37 ` Jeff Moyer 2017-12-24 0:56 ` [PATCH v4 09/18] mm, dax: enable filesystems to trigger dev_pagemap ->page_free callbacks Dan Williams 2017-12-24 0:56 ` Dan Williams 2018-01-04 8:20 ` Christoph Hellwig 2018-01-04 8:20 ` Christoph Hellwig 2017-12-24 0:56 ` [PATCH v4 10/18] mm, dev_pagemap: introduce CONFIG_DEV_PAGEMAP_OPS Dan Williams 2017-12-24 0:56 ` Dan Williams 2018-01-04 8:25 ` Christoph Hellwig 2018-01-04 8:25 ` Christoph Hellwig 2017-12-24 0:56 ` [PATCH v4 11/18] fs, dax: introduce DEFINE_FSDAX_AOPS Dan Williams 2017-12-24 0:56 ` Dan Williams 2017-12-27 5:29 ` Matthew Wilcox 2017-12-27 5:29 ` Matthew Wilcox 2018-01-02 20:21 ` Dan Williams 2018-01-02 20:21 ` Dan Williams 2018-01-03 16:05 ` Jan Kara 2018-01-03 16:05 ` Jan Kara 2018-01-04 8:27 ` Christoph Hellwig 2018-01-04 8:27 ` Christoph Hellwig 2018-01-02 21:41 ` Dave Chinner 2018-01-02 21:41 ` Dave Chinner 2017-12-24 0:57 ` [PATCH v4 12/18] xfs: use DEFINE_FSDAX_AOPS Dan Williams 2017-12-24 0:57 ` Dan Williams 2018-01-02 21:15 ` Darrick J. Wong 2018-01-02 21:15 ` Darrick J. Wong 2018-01-02 21:40 ` Dan Williams 2018-01-02 21:40 ` Dan Williams 2018-01-03 16:09 ` Jan Kara 2018-01-03 16:09 ` Jan Kara 2018-01-04 8:28 ` Christoph Hellwig 2018-01-04 8:28 ` Christoph Hellwig 2017-12-24 0:57 ` [PATCH v4 13/18] ext4: " Dan Williams 2017-12-24 0:57 ` Dan Williams 2017-12-24 0:57 ` Dan Williams 2018-01-04 8:29 ` Christoph Hellwig 2018-01-04 8:29 ` Christoph Hellwig 2018-01-04 8:29 ` Christoph Hellwig 2017-12-24 0:57 ` [PATCH v4 14/18] ext2: " Dan Williams 2017-12-24 0:57 ` Dan Williams 2018-01-04 8:29 ` Christoph Hellwig 2018-01-04 8:29 ` Christoph Hellwig 2017-12-24 0:57 ` [PATCH v4 15/18] mm, fs, dax: use page->mapping to warn if dma collides with truncate Dan Williams 2017-12-24 0:57 ` Dan Williams 2018-01-04 8:30 ` Christoph Hellwig 2018-01-04 8:30 ` Christoph Hellwig 2018-01-04 9:39 ` Jan Kara 2018-01-04 9:39 ` Jan Kara 2017-12-24 0:57 ` [PATCH v4 16/18] wait_bit: introduce {wait_on,wake_up}_atomic_one Dan Williams 2017-12-24 0:57 ` Dan Williams 2018-01-04 8:30 ` Christoph Hellwig 2018-01-04 8:30 ` Christoph Hellwig 2017-12-24 0:57 ` [PATCH v4 17/18] mm, fs, dax: dax_flush_dma, handle dma vs block-map-change collisions Dan Williams 2017-12-24 0:57 ` Dan Williams 2018-01-04 8:31 ` Christoph Hellwig 2018-01-04 8:31 ` Christoph Hellwig 2018-01-04 11:12 ` Jan Kara 2018-01-04 11:12 ` Jan Kara 2018-01-07 21:58 ` Dan Williams 2018-01-07 21:58 ` Dan Williams 2018-01-08 13:50 ` Jan Kara 2018-01-08 13:50 ` Jan Kara 2018-03-08 17:02 ` Dan Williams 2018-03-08 17:02 ` Dan Williams 2018-03-09 12:56 ` Jan Kara 2018-03-09 12:56 ` Jan Kara 2018-03-09 16:15 ` Dan Williams 2018-03-09 16:15 ` Dan Williams 2018-03-09 17:26 ` Dan Williams 2018-03-09 17:26 ` Dan Williams 2017-12-24 0:57 ` [PATCH v4 18/18] xfs, dax: wire up dax_flush_dma support via a new xfs_sync_dma helper Dan Williams 2017-12-24 0:57 ` Dan Williams 2018-01-02 21:07 ` Darrick J. Wong 2018-01-02 21:07 ` Darrick J. Wong 2018-01-02 23:00 ` Dave Chinner 2018-01-02 23:00 ` Dave Chinner 2018-01-03 2:21 ` Dan Williams 2018-01-03 2:21 ` Dan Williams 2018-01-03 7:51 ` Dave Chinner 2018-01-03 7:51 ` Dave Chinner 2018-01-04 8:34 ` Christoph Hellwig 2018-01-04 8:34 ` Christoph Hellwig 2018-01-04 8:33 ` Christoph Hellwig 2018-01-04 8:33 ` Christoph Hellwig 2018-01-04 8:17 ` [PATCH v4 00/18] dax: fix dma vs truncate/hole-punch Christoph Hellwig 2018-01-04 8:17 ` Christoph Hellwig 2018-01-04 8:17 ` Christoph Hellwig
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=151407698249.38751.17338746909239708376.stgit@dwillia2-desk3.amr.corp.intel.com \ --to=dan.j.williams@intel.com \ --cc=akpm@linux-foundation.org \ --cc=benh@kernel.crashing.org \ --cc=gerald.schaefer@de.ibm.com \ --cc=hch@lst.de \ --cc=heiko.carstens@de.ibm.com \ --cc=jack@suse.cz \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-nvdimm@lists.01.org \ --cc=linux-xfs@vger.kernel.org \ --cc=mpe@ellerman.id.au \ --cc=paulus@samba.org \ --cc=schwidefsky@de.ibm.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.