From: Dan Williams <dan.j.williams@intel.com> To: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com>, Matthew Wilcox <mawilcox@microsoft.com>, "linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>, linux-block@vger.kernel.org, linux-fsdevel <linux-fsdevel@vger.kernel.org> Subject: Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access() Date: Mon, 30 Jan 2017 10:16:29 -0800 [thread overview] Message-ID: <CAPcyv4iARmZQSBybJ1iJhwkndVxSe62rk4hjP1T9prBuOqVQ-A@mail.gmail.com> (raw) In-Reply-To: <20170130123226.GD9043@lst.de> On Mon, Jan 30, 2017 at 4:32 AM, Christoph Hellwig <hch@lst.de> wrote: > On Sat, Jan 28, 2017 at 12:36:58AM -0800, Dan Williams wrote: >> Provide a replacement for bdev_direct_access() that uses >> dax_operations.direct_access() instead of >> block_device_operations.direct_access(). Once all consumers of the old >> api have been converted bdev_direct_access() will be deleted. >> >> Given that block device partitioning decisions can cause dax page >> alignment constraints to be violated we still need to validate the >> block_device before calling the dax ->direct_access method. >> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com> >> --- >> block/Kconfig | 1 + >> drivers/dax/super.c | 33 +++++++++++++++++++++++++++++++++ >> fs/block_dev.c | 28 ++++++++++++++++++++++++++++ >> include/linux/blkdev.h | 3 +++ >> include/linux/dax.h | 2 ++ >> 5 files changed, 67 insertions(+) >> >> diff --git a/block/Kconfig b/block/Kconfig >> index 8bf114a3858a..9be785173280 100644 >> --- a/block/Kconfig >> +++ b/block/Kconfig >> @@ -6,6 +6,7 @@ menuconfig BLOCK >> default y >> select SBITMAP >> select SRCU >> + select DAX >> help >> Provide block layer support for the kernel. >> >> diff --git a/drivers/dax/super.c b/drivers/dax/super.c >> index eb844ffea3cf..ab5b082df5dd 100644 >> --- a/drivers/dax/super.c >> +++ b/drivers/dax/super.c >> @@ -65,6 +65,39 @@ struct dax_inode { >> const struct dax_operations *ops; >> }; >> >> +long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr, >> + void **kaddr, pfn_t *pfn, long size) >> +{ >> + long avail; >> + >> + /* >> + * The device driver is allowed to sleep, in order to make the >> + * memory directly accessible. >> + */ >> + might_sleep(); >> + >> + if (!dax_inode) >> + return -EOPNOTSUPP; >> + >> + if (!dax_inode_alive(dax_inode)) >> + return -ENXIO; >> + >> + if (size < 0) >> + return size; >> + >> + if (dev_addr % PAGE_SIZE) >> + return -EINVAL; >> + >> + avail = dax_inode->ops->direct_access(dax_inode, dev_addr, kaddr, pfn, >> + size); >> + if (!avail) >> + return -ERANGE; >> + if (avail > 0 && avail & ~PAGE_MASK) >> + return -ENXIO; >> + return min(avail, size); >> +} >> +EXPORT_SYMBOL_GPL(dax_direct_access); >> + >> bool dax_inode_alive(struct dax_inode *dax_inode) >> { >> lockdep_assert_held(&dax_srcu); >> diff --git a/fs/block_dev.c b/fs/block_dev.c >> index edb1d2b16b8f..bf4b51a3a412 100644 >> --- a/fs/block_dev.c >> +++ b/fs/block_dev.c >> @@ -18,6 +18,7 @@ >> #include <linux/module.h> >> #include <linux/blkpg.h> >> #include <linux/magic.h> >> +#include <linux/dax.h> >> #include <linux/buffer_head.h> >> #include <linux/swap.h> >> #include <linux/pagevec.h> >> @@ -763,6 +764,33 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax) >> EXPORT_SYMBOL_GPL(bdev_direct_access); >> >> /** >> + * bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address >> + * @bdev: host block device for @dax_inode >> + * @dax_inode: interface data and operations for a memory device >> + * @dax: control and output parameters for ->direct_access >> + * >> + * Return: negative errno if an error occurs, otherwise the number of bytes >> + * accessible at this address. >> + * >> + * Locking: must be called with dax_read_lock() held >> + */ >> +long bdev_dax_direct_access(struct block_device *bdev, >> + struct dax_inode *dax_inode, struct blk_dax_ctl *dax) >> +{ >> + sector_t sector = dax->sector; >> + >> + if (!blk_queue_dax(bdev->bd_queue)) >> + return -EOPNOTSUPP; > > I don't think this should take a bdev - the caller should know if > it has a dax_inode. Also if you touch this anyway can we kill > the annoying struct blk_dax_ctl calling convention? Passing the > four arguments explicitly is just a lot more readable and understandable. Ok, now that dax_map_atomic() is gone, it's much easier to remove struct blk_dax_ctl. We can also move the partition alignment checks to be a one-time check at bdev_dax_capable() time and kill bdev_dax_direct_access() in favor of calling dax_direct_access() directly. >> + if ((sector + DIV_ROUND_UP(dax->size, 512)) >> + > part_nr_sects_read(bdev->bd_part)) >> + return -ERANGE; >> + sector += get_start_sect(bdev); >> + return dax_direct_access(dax_inode, sector * 512, &dax->addr, >> + &dax->pfn, dax->size); > > And please switch to using bytes as the granularity given that we're > deadling with byte addressable memory. dax_direct_access() does take a byte aligned physical address, but it needs to be at least page aligned since we are returning a pfn_t... Hmm, perhaps the input should be raw page frame number. We could reduce one of the arguments by making the current 'pfn_t *' parameter an in/out-parameter. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com> To: Christoph Hellwig <hch@lst.de> Cc: "linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>, Mike Snitzer <snitzer@redhat.com>, Toshi Kani <toshi.kani@hpe.com>, Matthew Wilcox <mawilcox@microsoft.com>, linux-block@vger.kernel.org, jmoyer <jmoyer@redhat.com>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, Ross Zwisler <ross.zwisler@linux.intel.com> Subject: Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access() Date: Mon, 30 Jan 2017 10:16:29 -0800 [thread overview] Message-ID: <CAPcyv4iARmZQSBybJ1iJhwkndVxSe62rk4hjP1T9prBuOqVQ-A@mail.gmail.com> (raw) In-Reply-To: <20170130123226.GD9043@lst.de> On Mon, Jan 30, 2017 at 4:32 AM, Christoph Hellwig <hch@lst.de> wrote: > On Sat, Jan 28, 2017 at 12:36:58AM -0800, Dan Williams wrote: >> Provide a replacement for bdev_direct_access() that uses >> dax_operations.direct_access() instead of >> block_device_operations.direct_access(). Once all consumers of the old >> api have been converted bdev_direct_access() will be deleted. >> >> Given that block device partitioning decisions can cause dax page >> alignment constraints to be violated we still need to validate the >> block_device before calling the dax ->direct_access method. >> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com> >> --- >> block/Kconfig | 1 + >> drivers/dax/super.c | 33 +++++++++++++++++++++++++++++++++ >> fs/block_dev.c | 28 ++++++++++++++++++++++++++++ >> include/linux/blkdev.h | 3 +++ >> include/linux/dax.h | 2 ++ >> 5 files changed, 67 insertions(+) >> >> diff --git a/block/Kconfig b/block/Kconfig >> index 8bf114a3858a..9be785173280 100644 >> --- a/block/Kconfig >> +++ b/block/Kconfig >> @@ -6,6 +6,7 @@ menuconfig BLOCK >> default y >> select SBITMAP >> select SRCU >> + select DAX >> help >> Provide block layer support for the kernel. >> >> diff --git a/drivers/dax/super.c b/drivers/dax/super.c >> index eb844ffea3cf..ab5b082df5dd 100644 >> --- a/drivers/dax/super.c >> +++ b/drivers/dax/super.c >> @@ -65,6 +65,39 @@ struct dax_inode { >> const struct dax_operations *ops; >> }; >> >> +long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr, >> + void **kaddr, pfn_t *pfn, long size) >> +{ >> + long avail; >> + >> + /* >> + * The device driver is allowed to sleep, in order to make the >> + * memory directly accessible. >> + */ >> + might_sleep(); >> + >> + if (!dax_inode) >> + return -EOPNOTSUPP; >> + >> + if (!dax_inode_alive(dax_inode)) >> + return -ENXIO; >> + >> + if (size < 0) >> + return size; >> + >> + if (dev_addr % PAGE_SIZE) >> + return -EINVAL; >> + >> + avail = dax_inode->ops->direct_access(dax_inode, dev_addr, kaddr, pfn, >> + size); >> + if (!avail) >> + return -ERANGE; >> + if (avail > 0 && avail & ~PAGE_MASK) >> + return -ENXIO; >> + return min(avail, size); >> +} >> +EXPORT_SYMBOL_GPL(dax_direct_access); >> + >> bool dax_inode_alive(struct dax_inode *dax_inode) >> { >> lockdep_assert_held(&dax_srcu); >> diff --git a/fs/block_dev.c b/fs/block_dev.c >> index edb1d2b16b8f..bf4b51a3a412 100644 >> --- a/fs/block_dev.c >> +++ b/fs/block_dev.c >> @@ -18,6 +18,7 @@ >> #include <linux/module.h> >> #include <linux/blkpg.h> >> #include <linux/magic.h> >> +#include <linux/dax.h> >> #include <linux/buffer_head.h> >> #include <linux/swap.h> >> #include <linux/pagevec.h> >> @@ -763,6 +764,33 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax) >> EXPORT_SYMBOL_GPL(bdev_direct_access); >> >> /** >> + * bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address >> + * @bdev: host block device for @dax_inode >> + * @dax_inode: interface data and operations for a memory device >> + * @dax: control and output parameters for ->direct_access >> + * >> + * Return: negative errno if an error occurs, otherwise the number of bytes >> + * accessible at this address. >> + * >> + * Locking: must be called with dax_read_lock() held >> + */ >> +long bdev_dax_direct_access(struct block_device *bdev, >> + struct dax_inode *dax_inode, struct blk_dax_ctl *dax) >> +{ >> + sector_t sector = dax->sector; >> + >> + if (!blk_queue_dax(bdev->bd_queue)) >> + return -EOPNOTSUPP; > > I don't think this should take a bdev - the caller should know if > it has a dax_inode. Also if you touch this anyway can we kill > the annoying struct blk_dax_ctl calling convention? Passing the > four arguments explicitly is just a lot more readable and understandable. Ok, now that dax_map_atomic() is gone, it's much easier to remove struct blk_dax_ctl. We can also move the partition alignment checks to be a one-time check at bdev_dax_capable() time and kill bdev_dax_direct_access() in favor of calling dax_direct_access() directly. >> + if ((sector + DIV_ROUND_UP(dax->size, 512)) >> + > part_nr_sects_read(bdev->bd_part)) >> + return -ERANGE; >> + sector += get_start_sect(bdev); >> + return dax_direct_access(dax_inode, sector * 512, &dax->addr, >> + &dax->pfn, dax->size); > > And please switch to using bytes as the granularity given that we're > deadling with byte addressable memory. dax_direct_access() does take a byte aligned physical address, but it needs to be at least page aligned since we are returning a pfn_t... Hmm, perhaps the input should be raw page frame number. We could reduce one of the arguments by making the current 'pfn_t *' parameter an in/out-parameter.
next prev parent reply other threads:[~2017-01-30 18:16 UTC|newest] Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-01-28 8:36 [RFC PATCH 00/17] introduce a dax_inode for dax_operations Dan Williams 2017-01-28 8:36 ` Dan Williams 2017-01-28 8:36 ` [RFC PATCH 01/17] dax: refactor dax-fs into a generic provider of dax inodes Dan Williams 2017-01-28 8:36 ` Dan Williams 2017-01-30 12:28 ` Christoph Hellwig 2017-01-30 17:12 ` Dan Williams 2017-01-30 17:12 ` Dan Williams 2017-01-28 8:36 ` [RFC PATCH 02/17] dax: convert dax_inode locking to srcu Dan Williams 2017-01-28 8:36 ` Dan Williams 2017-01-28 8:36 ` [RFC PATCH 03/17] dax: add a facility to lookup a dax inode by 'host' device name Dan Williams 2017-01-28 8:36 ` Dan Williams 2017-01-28 8:36 ` [RFC PATCH 04/17] dax: introduce dax_operations Dan Williams 2017-01-28 8:36 ` Dan Williams 2017-01-28 8:36 ` [RFC PATCH 05/17] pmem: add dax_operations support Dan Williams 2017-01-28 8:36 ` Dan Williams 2017-01-28 8:36 ` [RFC PATCH 06/17] axon_ram: " Dan Williams 2017-01-28 8:36 ` Dan Williams 2017-01-28 8:36 ` [RFC PATCH 07/17] brd: " Dan Williams 2017-01-28 8:36 ` Dan Williams 2017-01-28 8:36 ` [RFC PATCH 08/17] dcssblk: " Dan Williams 2017-01-28 8:36 ` Dan Williams 2017-01-28 8:36 ` [RFC PATCH 09/17] block: kill bdev_dax_capable() Dan Williams 2017-01-28 8:36 ` Dan Williams 2017-01-28 8:36 ` [RFC PATCH 10/17] block: introduce bdev_dax_direct_access() Dan Williams 2017-01-28 8:36 ` Dan Williams 2017-01-30 12:32 ` Christoph Hellwig 2017-01-30 18:16 ` Dan Williams [this message] 2017-01-30 18:16 ` Dan Williams 2017-02-01 8:10 ` Christoph Hellwig 2017-02-01 8:10 ` Christoph Hellwig 2017-02-01 9:21 ` Dan Williams 2017-02-01 9:21 ` Dan Williams 2017-02-01 9:28 ` Christoph Hellwig 2017-02-01 9:28 ` Christoph Hellwig 2017-01-28 8:37 ` [RFC PATCH 11/17] dm: add dax_operations support (producer) Dan Williams 2017-01-28 8:37 ` Dan Williams 2017-01-28 8:37 ` [RFC PATCH 12/17] dm: add dax_operations support (consumer) Dan Williams 2017-01-28 8:37 ` Dan Williams 2017-01-28 8:37 ` [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure Dan Williams 2017-01-28 8:37 ` Dan Williams 2017-01-30 12:26 ` Christoph Hellwig 2017-01-30 18:29 ` Dan Williams 2017-01-30 18:29 ` Dan Williams 2017-02-01 8:08 ` Christoph Hellwig 2017-02-01 8:08 ` Christoph Hellwig 2017-02-01 9:16 ` Dan Williams 2017-02-01 9:16 ` Dan Williams 2017-01-28 8:37 ` [RFC PATCH 14/17] ext2, ext4, xfs: retrieve dax_inode through iomap operations Dan Williams 2017-01-28 8:37 ` Dan Williams 2017-01-28 8:37 ` [RFC PATCH 15/17] Revert "block: use DAX for partition table reads" Dan Williams 2017-01-28 8:37 ` Dan Williams 2017-01-28 8:37 ` [RFC PATCH 16/17] fs, dax: convert filesystem-dax to bdev_dax_direct_access Dan Williams 2017-01-28 8:37 ` Dan Williams 2017-01-28 8:37 ` [RFC PATCH 17/17] block: remove block_device_operations.direct_access and related infrastructure Dan Williams 2017-01-28 8:37 ` Dan Williams
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAPcyv4iARmZQSBybJ1iJhwkndVxSe62rk4hjP1T9prBuOqVQ-A@mail.gmail.com \ --to=dan.j.williams@intel.com \ --cc=hch@lst.de \ --cc=linux-block@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-nvdimm@lists.01.org \ --cc=mawilcox@microsoft.com \ --cc=snitzer@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.