From: Dan Williams <dan.j.williams@intel.com> To: axboe@fb.com Cc: jack@suse.cz, linux-nvdimm@lists.01.org, david@fromorbit.com, linux-kernel@vger.kernel.org, hch@lst.de, Jeff Moyer <jmoyer@redhat.com>, willy@linux.intel.com, ross.zwisler@linux.intel.com, akpm@linux-foundation.org Subject: [PATCH 5/5] block: enable dax for raw block devices Date: Thu, 22 Oct 2015 02:42:11 -0400 [thread overview] Message-ID: <20151022064211.12700.77105.stgit@dwillia2-desk3.amr.corp.intel.com> (raw) In-Reply-To: <20151022064142.12700.11849.stgit@dwillia2-desk3.amr.corp.intel.com> If an application wants exclusive access to all of the persistent memory provided by an NVDIMM namespace it can use this raw-block-dax facility to forgo establishing a filesystem. This capability is targeted primarily to hypervisors wanting to provision persistent memory for guests. Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- fs/block_dev.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 53 insertions(+), 1 deletion(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 3255dcec96b4..c27cd1a21a13 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1687,13 +1687,65 @@ static const struct address_space_operations def_blk_aops = { .is_dirty_writeback = buffer_check_dirty_writeback, }; +#ifdef CONFIG_FS_DAX +/* + * In the raw block case we do not need to contend with truncation nor + * unwritten file extents. Without those concerns there is no need for + * additional locking beyond the mmap_sem context that these routines + * are already executing under. + * + * Note, there is no protection if the block device is dynamically + * resized (partition grow/shrink) during a fault. A stable block device + * size is already not enforced in the blkdev_direct_IO path. + * + * For DAX, it is the responsibility of the block device driver to + * ensure the whole-disk device size is stable while requests are in + * flight. + * + * Finally, these paths do not synchronize against freezing + * (sb_start_pagefault(), etc...) since bdev_sops does not support + * freezing. + */ +static int blkdev_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf) +{ + return __dax_fault(vma, vmf, blkdev_get_block, NULL); +} + +static int blkdev_dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr, + pmd_t *pmd, unsigned int flags) +{ + return __dax_pmd_fault(vma, addr, pmd, flags, blkdev_get_block, NULL); +} + +static const struct vm_operations_struct blkdev_dax_vm_ops = { + .page_mkwrite = blkdev_dax_fault, + .fault = blkdev_dax_fault, + .pmd_fault = blkdev_dax_pmd_fault, +}; + +static int blkdev_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct inode *bd_inode = file_bd_inode(file); + + if (!IS_DAX(bd_inode)) + return generic_file_mmap(file, vma); + + file_accessed(file); + vma->vm_ops = &blkdev_dax_vm_ops; + vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE; + return 0; +} +#else +#define blkdev_mmap generic_file_mmap +#endif + const struct file_operations def_blk_fops = { .open = blkdev_open, .release = blkdev_close, .llseek = block_llseek, .read_iter = blkdev_read_iter, .write_iter = blkdev_write_iter, - .mmap = generic_file_mmap, + .mmap = blkdev_mmap, .fsync = blkdev_fsync, .unlocked_ioctl = block_ioctl, #ifdef CONFIG_COMPAT
WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com> To: axboe@fb.com Cc: jack@suse.cz, linux-nvdimm@ml01.01.org, david@fromorbit.com, linux-kernel@vger.kernel.org, hch@lst.de, Jeff Moyer <jmoyer@redhat.com>, willy@linux.intel.com, ross.zwisler@linux.intel.com, akpm@linux-foundation.org Subject: [PATCH 5/5] block: enable dax for raw block devices Date: Thu, 22 Oct 2015 02:42:11 -0400 [thread overview] Message-ID: <20151022064211.12700.77105.stgit@dwillia2-desk3.amr.corp.intel.com> (raw) In-Reply-To: <20151022064142.12700.11849.stgit@dwillia2-desk3.amr.corp.intel.com> If an application wants exclusive access to all of the persistent memory provided by an NVDIMM namespace it can use this raw-block-dax facility to forgo establishing a filesystem. This capability is targeted primarily to hypervisors wanting to provision persistent memory for guests. Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- fs/block_dev.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 53 insertions(+), 1 deletion(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 3255dcec96b4..c27cd1a21a13 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1687,13 +1687,65 @@ static const struct address_space_operations def_blk_aops = { .is_dirty_writeback = buffer_check_dirty_writeback, }; +#ifdef CONFIG_FS_DAX +/* + * In the raw block case we do not need to contend with truncation nor + * unwritten file extents. Without those concerns there is no need for + * additional locking beyond the mmap_sem context that these routines + * are already executing under. + * + * Note, there is no protection if the block device is dynamically + * resized (partition grow/shrink) during a fault. A stable block device + * size is already not enforced in the blkdev_direct_IO path. + * + * For DAX, it is the responsibility of the block device driver to + * ensure the whole-disk device size is stable while requests are in + * flight. + * + * Finally, these paths do not synchronize against freezing + * (sb_start_pagefault(), etc...) since bdev_sops does not support + * freezing. + */ +static int blkdev_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf) +{ + return __dax_fault(vma, vmf, blkdev_get_block, NULL); +} + +static int blkdev_dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr, + pmd_t *pmd, unsigned int flags) +{ + return __dax_pmd_fault(vma, addr, pmd, flags, blkdev_get_block, NULL); +} + +static const struct vm_operations_struct blkdev_dax_vm_ops = { + .page_mkwrite = blkdev_dax_fault, + .fault = blkdev_dax_fault, + .pmd_fault = blkdev_dax_pmd_fault, +}; + +static int blkdev_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct inode *bd_inode = file_bd_inode(file); + + if (!IS_DAX(bd_inode)) + return generic_file_mmap(file, vma); + + file_accessed(file); + vma->vm_ops = &blkdev_dax_vm_ops; + vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE; + return 0; +} +#else +#define blkdev_mmap generic_file_mmap +#endif + const struct file_operations def_blk_fops = { .open = blkdev_open, .release = blkdev_close, .llseek = block_llseek, .read_iter = blkdev_read_iter, .write_iter = blkdev_write_iter, - .mmap = generic_file_mmap, + .mmap = blkdev_mmap, .fsync = blkdev_fsync, .unlocked_ioctl = block_ioctl, #ifdef CONFIG_COMPAT
next prev parent reply other threads:[~2015-10-22 6:42 UTC|newest] Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-10-22 6:41 [PATCH 0/5] block, dax: updates for 4.4 Dan Williams 2015-10-22 6:41 ` Dan Williams 2015-10-22 6:41 ` [PATCH 1/5] pmem, dax: clean up clear_pmem() Dan Williams 2015-10-22 6:41 ` Dan Williams 2015-10-22 6:41 ` [PATCH 2/5] dax: increase granularity of dax_clear_blocks() operations Dan Williams 2015-10-22 6:41 ` Dan Williams 2015-10-22 9:26 ` Jan Kara 2015-10-22 9:26 ` Jan Kara 2015-10-22 6:41 ` [PATCH 3/5] block, dax: fix lifetime of in-kernel dax mappings with dax_map_atomic() Dan Williams 2015-10-22 6:41 ` Dan Williams 2015-10-22 6:42 ` [PATCH 4/5] block: introduce file_bd_inode() Dan Williams 2015-10-22 6:42 ` Dan Williams 2015-10-22 9:45 ` Jan Kara 2015-10-22 9:45 ` Jan Kara 2015-10-22 15:41 ` Dan Williams 2015-10-22 15:41 ` Dan Williams 2015-10-22 6:42 ` Dan Williams [this message] 2015-10-22 6:42 ` [PATCH 5/5] block: enable dax for raw block devices Dan Williams 2015-10-22 9:35 ` Jan Kara 2015-10-22 9:35 ` Jan Kara 2015-10-22 16:05 ` Williams, Dan J 2015-10-22 16:05 ` Williams, Dan J 2015-10-22 21:08 ` Jan Kara 2015-10-22 21:08 ` Jan Kara 2015-10-22 23:41 ` Williams, Dan J 2015-10-22 23:41 ` Williams, Dan J 2015-10-24 12:21 ` Jan Kara 2015-10-24 12:21 ` Jan Kara 2015-10-23 23:32 ` Dan Williams 2015-10-23 23:32 ` Dan Williams 2015-10-24 14:49 ` Jan Kara 2015-10-24 14:49 ` Jan Kara 2015-10-25 21:22 ` Dave Chinner 2015-10-25 21:22 ` Dave Chinner 2015-10-26 2:48 ` Dan Williams 2015-10-26 2:48 ` Dan Williams 2015-10-26 6:23 ` Dave Chinner 2015-10-26 6:23 ` Dave Chinner 2015-10-26 7:20 ` Jan Kara 2015-10-26 7:20 ` Jan Kara 2015-10-26 8:56 ` Dan Williams 2015-10-26 8:56 ` Dan Williams 2015-10-26 22:19 ` Dave Chinner 2015-10-26 22:19 ` Dave Chinner 2015-10-27 22:55 ` Ross Zwisler 2015-10-27 22:55 ` Ross Zwisler
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20151022064211.12700.77105.stgit@dwillia2-desk3.amr.corp.intel.com \ --to=dan.j.williams@intel.com \ --cc=akpm@linux-foundation.org \ --cc=axboe@fb.com \ --cc=david@fromorbit.com \ --cc=hch@lst.de \ --cc=jack@suse.cz \ --cc=jmoyer@redhat.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nvdimm@lists.01.org \ --cc=ross.zwisler@linux.intel.com \ --cc=willy@linux.intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.