* [RFC PATCH 00/17] introduce a dax_inode for dax_operations
@ 2017-01-28 8:36 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
Recently there was an effort to introduce dax_operations to unwind the
abuse of the user-copy api in the pmem api [1]. Christoph noted that we
should not add new block-dax operations as it is further abuse of struct
block_device [2].
The ->direct_access() method in block_device_operations was an expedient
way to get the filesystem-dax capability bootstrapped. However, looking
forward to native persistent memory filesystems, they can forgo the
block layer and mount directly on a provider of dax services, a dax
inode.
For the time being, since current dax capable filesystems are block
based, we need a facility to look up this dax object via the
block-device name. If this approach looks reasonable I'll follow up with
reworking the proposed ->copy_from_iter(), ->flush(), and ->clear() dax
operations into this new scheme.
These patches survive a run of the libnvdimm unit tests, but I have not
tested the non-libnvdimm dax drivers.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008586.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008638.html
---
Dan Williams (17):
dax: refactor dax-fs into a generic provider of dax inodes
dax: convert dax_inode locking to srcu
dax: add a facility to lookup a dax inode by 'host' device name
dax: introduce dax_operations
pmem: add dax_operations support
axon_ram: add dax_operations support
brd: add dax_operations support
dcssblk: add dax_operations support
block: kill bdev_dax_capable()
block: introduce bdev_dax_direct_access()
dm: add dax_operations support (producer)
dm: add dax_operations support (consumer)
fs: update mount_bdev() to lookup dax infrastructure
ext2, ext4, xfs: retrieve dax_inode through iomap operations
Revert "block: use DAX for partition table reads"
fs, dax: convert filesystem-dax to bdev_dax_direct_access
block: remove block_device_operations.direct_access and related infrastructure
arch/powerpc/platforms/Kconfig | 1
arch/powerpc/sysdev/axonram.c | 37 +++
block/Kconfig | 1
block/partition-generic.c | 17 --
drivers/Makefile | 2
drivers/block/Kconfig | 1
drivers/block/brd.c | 48 +++-
drivers/dax/Kconfig | 9 +
drivers/dax/Makefile | 5
drivers/dax/dax.h | 19 +-
drivers/dax/device-dax.h | 25 ++
drivers/dax/device.c | 257 ++++-------------------
drivers/dax/pmem.c | 2
drivers/dax/super.c | 434 +++++++++++++++++++++++++++++++++++++++
drivers/md/Kconfig | 1
drivers/md/dm-core.h | 3
drivers/md/dm-linear.c | 15 +
drivers/md/dm-snap.c | 8 +
drivers/md/dm-stripe.c | 16 +
drivers/md/dm-table.c | 2
drivers/md/dm-target.c | 10 +
drivers/md/dm.c | 43 +++-
drivers/nvdimm/Kconfig | 1
drivers/nvdimm/pmem.c | 46 +++-
drivers/nvdimm/pmem.h | 7 -
drivers/s390/block/Kconfig | 1
drivers/s390/block/dcssblk.c | 41 +++-
fs/block_dev.c | 75 ++-----
fs/dax.c | 149 ++++++-------
fs/ext2/inode.c | 1
fs/ext4/inode.c | 1
fs/iomap.c | 3
fs/super.c | 32 +++
fs/xfs/xfs_aops.c | 13 +
fs/xfs/xfs_aops.h | 1
fs/xfs/xfs_buf.h | 1
fs/xfs/xfs_iomap.c | 1
fs/xfs/xfs_super.c | 3
include/linux/blkdev.h | 7 -
include/linux/dax.h | 29 ++-
include/linux/device-mapper.h | 16 +
include/linux/fs.h | 1
include/linux/iomap.h | 1
tools/testing/nvdimm/Kbuild | 6 -
tools/testing/nvdimm/pmem-dax.c | 12 -
45 files changed, 927 insertions(+), 477 deletions(-)
create mode 100644 drivers/dax/device-dax.h
rename drivers/dax/{dax.c => device.c} (74%)
create mode 100644 drivers/dax/super.c
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC PATCH 00/17] introduce a dax_inode for dax_operations
@ 2017-01-28 8:36 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
Recently there was an effort to introduce dax_operations to unwind the
abuse of the user-copy api in the pmem api [1]. Christoph noted that we
should not add new block-dax operations as it is further abuse of struct
block_device [2].
The ->direct_access() method in block_device_operations was an expedient
way to get the filesystem-dax capability bootstrapped. However, looking
forward to native persistent memory filesystems, they can forgo the
block layer and mount directly on a provider of dax services, a dax
inode.
For the time being, since current dax capable filesystems are block
based, we need a facility to look up this dax object via the
block-device name. If this approach looks reasonable I'll follow up with
reworking the proposed ->copy_from_iter(), ->flush(), and ->clear() dax
operations into this new scheme.
These patches survive a run of the libnvdimm unit tests, but I have not
tested the non-libnvdimm dax drivers.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008586.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008638.html
---
Dan Williams (17):
dax: refactor dax-fs into a generic provider of dax inodes
dax: convert dax_inode locking to srcu
dax: add a facility to lookup a dax inode by 'host' device name
dax: introduce dax_operations
pmem: add dax_operations support
axon_ram: add dax_operations support
brd: add dax_operations support
dcssblk: add dax_operations support
block: kill bdev_dax_capable()
block: introduce bdev_dax_direct_access()
dm: add dax_operations support (producer)
dm: add dax_operations support (consumer)
fs: update mount_bdev() to lookup dax infrastructure
ext2, ext4, xfs: retrieve dax_inode through iomap operations
Revert "block: use DAX for partition table reads"
fs, dax: convert filesystem-dax to bdev_dax_direct_access
block: remove block_device_operations.direct_access and related infrastructure
arch/powerpc/platforms/Kconfig | 1
arch/powerpc/sysdev/axonram.c | 37 +++
block/Kconfig | 1
block/partition-generic.c | 17 --
drivers/Makefile | 2
drivers/block/Kconfig | 1
drivers/block/brd.c | 48 +++-
drivers/dax/Kconfig | 9 +
drivers/dax/Makefile | 5
drivers/dax/dax.h | 19 +-
drivers/dax/device-dax.h | 25 ++
drivers/dax/device.c | 257 ++++-------------------
drivers/dax/pmem.c | 2
drivers/dax/super.c | 434 +++++++++++++++++++++++++++++++++++++++
drivers/md/Kconfig | 1
drivers/md/dm-core.h | 3
drivers/md/dm-linear.c | 15 +
drivers/md/dm-snap.c | 8 +
drivers/md/dm-stripe.c | 16 +
drivers/md/dm-table.c | 2
drivers/md/dm-target.c | 10 +
drivers/md/dm.c | 43 +++-
drivers/nvdimm/Kconfig | 1
drivers/nvdimm/pmem.c | 46 +++-
drivers/nvdimm/pmem.h | 7 -
drivers/s390/block/Kconfig | 1
drivers/s390/block/dcssblk.c | 41 +++-
fs/block_dev.c | 75 ++-----
fs/dax.c | 149 ++++++-------
fs/ext2/inode.c | 1
fs/ext4/inode.c | 1
fs/iomap.c | 3
fs/super.c | 32 +++
fs/xfs/xfs_aops.c | 13 +
fs/xfs/xfs_aops.h | 1
fs/xfs/xfs_buf.h | 1
fs/xfs/xfs_iomap.c | 1
fs/xfs/xfs_super.c | 3
include/linux/blkdev.h | 7 -
include/linux/dax.h | 29 ++-
include/linux/device-mapper.h | 16 +
include/linux/fs.h | 1
include/linux/iomap.h | 1
tools/testing/nvdimm/Kbuild | 6 -
tools/testing/nvdimm/pmem-dax.c | 12 -
45 files changed, 927 insertions(+), 477 deletions(-)
create mode 100644 drivers/dax/device-dax.h
rename drivers/dax/{dax.c => device.c} (74%)
create mode 100644 drivers/dax/super.c
^ permalink raw reply [flat|nested] 55+ messages in thread
* [RFC PATCH 01/17] dax: refactor dax-fs into a generic provider of dax inodes
2017-01-28 8:36 ` Dan Williams
@ 2017-01-28 8:36 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
We want dax capable drivers to be able to publish a set of dax
operations [1]. However, we do not want to further abuse block_devices
to advertise these operations. Instead we will attach these operations
to a dax inode and add a lookup mechanism to go from block device path
to a dax inode. A dax capable driver like pmem or brd is responsible for
registering a dax inode, alongside a block device, and then a dax
capable filesystem is responsible for retrieving the dax inode by path
name if it wants to call dax_operations.
For now, we refactor the dax pseudo-fs to be a generic facility, rather
than an implementation detail, of the device-dax use case. Where a "dax
inode" is just an inode + dax infrastructure, and "Device DAX" is a
mapping service layered on top of that base inode. "Filesystem DAX" is
then a mapping service that layers a filesystem on top of the base dax
inode. Filesystem DAX goes through a block_device for now, but perhaps
directly to a dax inode in the future, or for new pmem-only filesystems.
[1]: https://lkml.org/lkml/2017/1/19/880
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/Makefile | 2
drivers/dax/Kconfig | 8 +
drivers/dax/Makefile | 5 +
drivers/dax/dax.h | 24 ++-
drivers/dax/device-dax.h | 25 +++
drivers/dax/device.c | 241 +++++----------------------------
drivers/dax/pmem.c | 2
drivers/dax/super.c | 310 +++++++++++++++++++++++++++++++++++++++++++
tools/testing/nvdimm/Kbuild | 6 -
9 files changed, 402 insertions(+), 221 deletions(-)
create mode 100644 drivers/dax/device-dax.h
rename drivers/dax/{dax.c => device.c} (75%)
create mode 100644 drivers/dax/super.c
diff --git a/drivers/Makefile b/drivers/Makefile
index 060026a02f59..17f42e4a6717 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -68,7 +68,7 @@ obj-$(CONFIG_PARPORT) += parport/
obj-$(CONFIG_NVM) += lightnvm/
obj-y += base/ block/ misc/ mfd/ nfc/
obj-$(CONFIG_LIBNVDIMM) += nvdimm/
-obj-$(CONFIG_DEV_DAX) += dax/
+obj-$(CONFIG_DAX) += dax/
obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf/
obj-$(CONFIG_NUBUS) += nubus/
obj-y += macintosh/
diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index 3e2ab3b14eea..39bcbf4c5e40 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -1,6 +1,11 @@
-menuconfig DEV_DAX
+menuconfig DAX
tristate "DAX: direct access to differentiated memory"
default m if NVDIMM_DAX
+
+if DAX
+
+config DEV_DAX
+ tristate "Device DAX: direct access mapping device"
depends on TRANSPARENT_HUGEPAGE
help
Support raw access to differentiated (persistence, bandwidth,
@@ -10,7 +15,6 @@ menuconfig DEV_DAX
baseline memory pool. Mappings of a /dev/daxX.Y device impose
restrictions that make the mapping behavior deterministic.
-if DEV_DAX
config DEV_DAX_PMEM
tristate "PMEM DAX: direct access to persistent memory"
diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
index 27c54e38478a..dc7422530462 100644
--- a/drivers/dax/Makefile
+++ b/drivers/dax/Makefile
@@ -1,4 +1,7 @@
-obj-$(CONFIG_DEV_DAX) += dax.o
+obj-$(CONFIG_DAX) += dax.o
+obj-$(CONFIG_DEV_DAX) += device_dax.o
obj-$(CONFIG_DEV_DAX_PMEM) += dax_pmem.o
+dax-y := super.o
dax_pmem-y := pmem.o
+device_dax-y := device.o
diff --git a/drivers/dax/dax.h b/drivers/dax/dax.h
index ddd829ab58c0..def061aa75f4 100644
--- a/drivers/dax/dax.h
+++ b/drivers/dax/dax.h
@@ -1,5 +1,5 @@
/*
- * Copyright(c) 2016 Intel Corporation. All rights reserved.
+ * Copyright(c) 2016 - 2017 Intel Corporation. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -12,14 +12,16 @@
*/
#ifndef __DAX_H__
#define __DAX_H__
-struct device;
-struct dax_dev;
-struct resource;
-struct dax_region;
-void dax_region_put(struct dax_region *dax_region);
-struct dax_region *alloc_dax_region(struct device *parent,
- int region_id, struct resource *res, unsigned int align,
- void *addr, unsigned long flags);
-struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
- struct resource *res, int count);
+struct dax_inode;
+struct dax_inode *alloc_dax_inode(void *private);
+void put_dax_inode(struct dax_inode *dax_inode);
+bool dax_inode_alive(struct dax_inode *dax_inode);
+void kill_dax_inode(struct dax_inode *dax_inode);
+struct dax_inode *inode_to_dax_inode(struct inode *inode);
+struct inode *dax_inode_to_inode(struct dax_inode *dax_inode);
+void *dax_inode_get_private(struct dax_inode *dax_inode);
+int dax_inode_register(struct dax_inode *dax_inode,
+ const struct file_operations *fops, struct module *owner,
+ struct kobject *parent);
+void dax_inode_unregister(struct dax_inode *dax_inode);
#endif /* __DAX_H__ */
diff --git a/drivers/dax/device-dax.h b/drivers/dax/device-dax.h
new file mode 100644
index 000000000000..c9b7e9cc227e
--- /dev/null
+++ b/drivers/dax/device-dax.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#ifndef __DEVICE_DAX_H__
+#define __DEVICE_DAX_H__
+struct device;
+struct dax_dev;
+struct resource;
+struct dax_region;
+void dax_region_put(struct dax_region *dax_region);
+struct dax_region *alloc_dax_region(struct device *parent,
+ int region_id, struct resource *res, unsigned int align,
+ void *addr, unsigned long flags);
+struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
+ struct resource *res, int count);
+#endif /* __DEVICE_DAX_H__ */
diff --git a/drivers/dax/dax.c b/drivers/dax/device.c
similarity index 75%
rename from drivers/dax/dax.c
rename to drivers/dax/device.c
index ed758b74ddf0..5b5572314929 100644
--- a/drivers/dax/dax.c
+++ b/drivers/dax/device.c
@@ -1,5 +1,5 @@
/*
- * Copyright(c) 2016 Intel Corporation. All rights reserved.
+ * Copyright(c) 2016 - 2017 Intel Corporation. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -13,25 +13,14 @@
#include <linux/pagemap.h>
#include <linux/module.h>
#include <linux/device.h>
-#include <linux/mount.h>
#include <linux/pfn_t.h>
-#include <linux/hash.h>
-#include <linux/cdev.h>
#include <linux/slab.h>
#include <linux/dax.h>
#include <linux/fs.h>
#include <linux/mm.h>
#include "dax.h"
-static dev_t dax_devt;
static struct class *dax_class;
-static DEFINE_IDA(dax_minor_ida);
-static int nr_dax = CONFIG_NR_DEV_DAX;
-module_param(nr_dax, int, S_IRUGO);
-static struct vfsmount *dax_mnt;
-static struct kmem_cache *dax_cache __read_mostly;
-static struct super_block *dax_superblock __read_mostly;
-MODULE_PARM_DESC(nr_dax, "max number of device-dax instances");
/**
* struct dax_region - mapping infrastructure for dax devices
@@ -57,19 +46,16 @@ struct dax_region {
/**
* struct dax_dev - subdivision of a dax region
* @region - parent region
- * @dev - device backing the character device
- * @cdev - core chardev data
- * @alive - !alive + rcu grace period == no new mappings can be established
+ * @dax_inode - core dax functionality
+ * @dev - device core
* @id - child id in the region
* @num_resources - number of physical address extents in this device
* @res - array of physical address ranges
*/
struct dax_dev {
struct dax_region *region;
- struct inode *inode;
+ struct dax_inode *dax_inode;
struct device dev;
- struct cdev cdev;
- bool alive;
int id;
int num_resources;
struct resource res[0];
@@ -142,117 +128,6 @@ static const struct attribute_group *dax_region_attribute_groups[] = {
NULL,
};
-static struct inode *dax_alloc_inode(struct super_block *sb)
-{
- return kmem_cache_alloc(dax_cache, GFP_KERNEL);
-}
-
-static void dax_i_callback(struct rcu_head *head)
-{
- struct inode *inode = container_of(head, struct inode, i_rcu);
-
- kmem_cache_free(dax_cache, inode);
-}
-
-static void dax_destroy_inode(struct inode *inode)
-{
- call_rcu(&inode->i_rcu, dax_i_callback);
-}
-
-static const struct super_operations dax_sops = {
- .statfs = simple_statfs,
- .alloc_inode = dax_alloc_inode,
- .destroy_inode = dax_destroy_inode,
- .drop_inode = generic_delete_inode,
-};
-
-static struct dentry *dax_mount(struct file_system_type *fs_type,
- int flags, const char *dev_name, void *data)
-{
- return mount_pseudo(fs_type, "dax:", &dax_sops, NULL, DAXFS_MAGIC);
-}
-
-static struct file_system_type dax_type = {
- .name = "dax",
- .mount = dax_mount,
- .kill_sb = kill_anon_super,
-};
-
-static int dax_test(struct inode *inode, void *data)
-{
- return inode->i_cdev == data;
-}
-
-static int dax_set(struct inode *inode, void *data)
-{
- inode->i_cdev = data;
- return 0;
-}
-
-static struct inode *dax_inode_get(struct cdev *cdev, dev_t devt)
-{
- struct inode *inode;
-
- inode = iget5_locked(dax_superblock, hash_32(devt + DAXFS_MAGIC, 31),
- dax_test, dax_set, cdev);
-
- if (!inode)
- return NULL;
-
- if (inode->i_state & I_NEW) {
- inode->i_mode = S_IFCHR;
- inode->i_flags = S_DAX;
- inode->i_rdev = devt;
- mapping_set_gfp_mask(&inode->i_data, GFP_USER);
- unlock_new_inode(inode);
- }
- return inode;
-}
-
-static void init_once(void *inode)
-{
- inode_init_once(inode);
-}
-
-static int dax_inode_init(void)
-{
- int rc;
-
- dax_cache = kmem_cache_create("dax_cache", sizeof(struct inode), 0,
- (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
- init_once);
- if (!dax_cache)
- return -ENOMEM;
-
- rc = register_filesystem(&dax_type);
- if (rc)
- goto err_register_fs;
-
- dax_mnt = kern_mount(&dax_type);
- if (IS_ERR(dax_mnt)) {
- rc = PTR_ERR(dax_mnt);
- goto err_mount;
- }
- dax_superblock = dax_mnt->mnt_sb;
-
- return 0;
-
- err_mount:
- unregister_filesystem(&dax_type);
- err_register_fs:
- kmem_cache_destroy(dax_cache);
-
- return rc;
-}
-
-static void dax_inode_exit(void)
-{
- kern_unmount(dax_mnt);
- unregister_filesystem(&dax_type);
- kmem_cache_destroy(dax_cache);
-}
-
static void dax_region_free(struct kref *kref)
{
struct dax_region *dax_region;
@@ -361,7 +236,7 @@ static int check_vma(struct dax_dev *dax_dev, struct vm_area_struct *vma,
struct device *dev = &dax_dev->dev;
unsigned long mask;
- if (!dax_dev->alive)
+ if (!dax_inode_alive(dax_dev->dax_inode))
return -ENXIO;
/* prevent private mappings from being established */
@@ -542,7 +417,13 @@ static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
dev_dbg(&dax_dev->dev, "%s\n", __func__);
+ /*
+ * We lock to check dax_inode liveness and will re-check at
+ * fault time.
+ */
+ rcu_read_lock();
rc = check_vma(dax_dev, vma, __func__);
+ rcu_read_unlock();
if (rc)
return rc;
@@ -588,12 +469,13 @@ static unsigned long dax_get_unmapped_area(struct file *filp,
static int dax_open(struct inode *inode, struct file *filp)
{
- struct dax_dev *dax_dev;
+ struct dax_inode *dax_inode = inode_to_dax_inode(inode);
+ struct inode *__dax_inode = dax_inode_to_inode(dax_inode);
+ struct dax_dev *dax_dev = dax_inode_get_private(dax_inode);
- dax_dev = container_of(inode->i_cdev, struct dax_dev, cdev);
dev_dbg(&dax_dev->dev, "%s\n", __func__);
- inode->i_mapping = dax_dev->inode->i_mapping;
- inode->i_mapping->host = dax_dev->inode;
+ inode->i_mapping = __dax_inode->i_mapping;
+ inode->i_mapping->host = __dax_inode;
filp->f_mapping = inode->i_mapping;
filp->private_data = dax_dev;
inode->i_flags = S_DAX;
@@ -622,32 +504,25 @@ static void dax_dev_release(struct device *dev)
{
struct dax_dev *dax_dev = to_dax_dev(dev);
struct dax_region *dax_region = dax_dev->region;
+ struct dax_inode *dax_inode = dax_dev->dax_inode;
ida_simple_remove(&dax_region->ida, dax_dev->id);
- ida_simple_remove(&dax_minor_ida, MINOR(dev->devt));
dax_region_put(dax_region);
- iput(dax_dev->inode);
+ put_dax_inode(dax_inode);
kfree(dax_dev);
}
static void unregister_dax_dev(void *dev)
{
struct dax_dev *dax_dev = to_dax_dev(dev);
- struct cdev *cdev = &dax_dev->cdev;
+ struct dax_inode *dax_inode = dax_dev->dax_inode;
+ struct inode *inode = dax_inode_to_inode(dax_inode);
dev_dbg(dev, "%s\n", __func__);
- /*
- * Note, rcu is not protecting the liveness of dax_dev, rcu is
- * ensuring that any fault handlers that might have seen
- * dax_dev->alive == true, have completed. Any fault handlers
- * that start after synchronize_rcu() has started will abort
- * upon seeing dax_dev->alive == false.
- */
- dax_dev->alive = false;
- synchronize_rcu();
- unmap_mapping_range(dax_dev->inode->i_mapping, 0, 0, 1);
- cdev_del(cdev);
+ kill_dax_inode(dax_inode);
+ unmap_mapping_range(inode->i_mapping, 0, 0, 1);
+ dax_inode_unregister(dax_inode);
device_unregister(dev);
}
@@ -655,11 +530,11 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
struct resource *res, int count)
{
struct device *parent = dax_region->dev;
+ struct dax_inode *dax_inode;
struct dax_dev *dax_dev;
- int rc = 0, minor, i;
+ struct inode *inode;
struct device *dev;
- struct cdev *cdev;
- dev_t dev_t;
+ int rc = 0, i;
dax_dev = kzalloc(sizeof(*dax_dev) + sizeof(*res) * count, GFP_KERNEL);
if (!dax_dev)
@@ -685,38 +560,27 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
goto err_id;
}
- minor = ida_simple_get(&dax_minor_ida, 0, 0, GFP_KERNEL);
- if (minor < 0) {
- rc = minor;
- goto err_minor;
- }
-
- dev_t = MKDEV(MAJOR(dax_devt), minor);
- dev = &dax_dev->dev;
- dax_dev->inode = dax_inode_get(&dax_dev->cdev, dev_t);
- if (!dax_dev->inode) {
- rc = -ENOMEM;
+ dax_inode = alloc_dax_inode(dax_dev);
+ if (!dax_inode)
goto err_inode;
- }
- /* device_initialize() so cdev can reference kobj parent */
+ /* initialize now so dax_inode_register() can reference dev->kobj */
+ dax_dev->dax_inode = dax_inode;
+ dev = &dax_dev->dev;
device_initialize(dev);
- cdev = &dax_dev->cdev;
- cdev_init(cdev, &dax_fops);
- cdev->owner = parent->driver->owner;
- cdev->kobj.parent = &dev->kobj;
- rc = cdev_add(&dax_dev->cdev, dev_t, 1);
+ rc = dax_inode_register(dax_inode, &dax_fops,
+ parent->driver->owner, &dev->kobj);
if (rc)
- goto err_cdev;
+ goto err_register;
/* from here on we're committed to teardown via dax_dev_release() */
dax_dev->num_resources = count;
- dax_dev->alive = true;
dax_dev->region = dax_region;
kref_get(&dax_region->kref);
- dev->devt = dev_t;
+ inode = dax_inode_to_inode(dax_inode);
+ dev->devt = inode->i_rdev;
dev->class = dax_class;
dev->parent = parent;
dev->groups = dax_attribute_groups;
@@ -734,11 +598,9 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
return dax_dev;
- err_cdev:
- iput(dax_dev->inode);
+ err_register:
+ put_dax_inode(dax_inode);
err_inode:
- ida_simple_remove(&dax_minor_ida, minor);
- err_minor:
ida_simple_remove(&dax_region->ida, dax_dev->id);
err_id:
kfree(dax_dev);
@@ -749,38 +611,13 @@ EXPORT_SYMBOL_GPL(devm_create_dax_dev);
static int __init dax_init(void)
{
- int rc;
-
- rc = dax_inode_init();
- if (rc)
- return rc;
-
- nr_dax = max(nr_dax, 256);
- rc = alloc_chrdev_region(&dax_devt, 0, nr_dax, "dax");
- if (rc)
- goto err_chrdev;
-
dax_class = class_create(THIS_MODULE, "dax");
- if (IS_ERR(dax_class)) {
- rc = PTR_ERR(dax_class);
- goto err_class;
- }
-
- return 0;
-
- err_class:
- unregister_chrdev_region(dax_devt, nr_dax);
- err_chrdev:
- dax_inode_exit();
- return rc;
+ return PTR_ERR_OR_ZERO(dax_class);
}
static void __exit dax_exit(void)
{
class_destroy(dax_class);
- unregister_chrdev_region(dax_devt, nr_dax);
- ida_destroy(&dax_minor_ida);
- dax_inode_exit();
}
MODULE_AUTHOR("Intel Corporation");
diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index 033f49b31fdc..9c98b1dd24c1 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -16,7 +16,7 @@
#include <linux/pfn_t.h>
#include "../nvdimm/pfn.h"
#include "../nvdimm/nd.h"
-#include "dax.h"
+#include "device-dax.h"
struct dax_pmem {
struct device *dev;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
new file mode 100644
index 000000000000..e6369b851619
--- /dev/null
+++ b/drivers/dax/super.c
@@ -0,0 +1,310 @@
+/*
+ * Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <linux/pagemap.h>
+#include <linux/module.h>
+#include <linux/mount.h>
+#include <linux/magic.h>
+#include <linux/cdev.h>
+#include <linux/hash.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+
+static int nr_dax = CONFIG_NR_DEV_DAX;
+module_param(nr_dax, int, S_IRUGO);
+MODULE_PARM_DESC(nr_dax, "max number of dax device instances");
+
+static dev_t dax_devt;
+static struct vfsmount *dax_mnt;
+static DEFINE_IDA(dax_minor_ida);
+static struct kmem_cache *dax_cache __read_mostly;
+static struct super_block *dax_superblock __read_mostly;
+
+/**
+ * struct dax_inode - anchor object for dax services
+ * @inode: core vfs
+ * @cdev: optional character interface for "device dax"
+ * @private: dax driver private data
+ * @alive: !alive + rcu grace period == no new operations / mappings
+ */
+struct dax_inode {
+ struct inode inode;
+ struct cdev cdev;
+ void *private;
+ bool alive;
+};
+
+bool dax_inode_alive(struct dax_inode *dax_inode)
+{
+ RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
+ "dax operations require rcu_read_lock()\n");
+ return dax_inode->alive;
+}
+EXPORT_SYMBOL_GPL(dax_inode_alive);
+
+/*
+ * Note, rcu is not protecting the liveness of dax_inode, rcu is
+ * ensuring that any fault handlers or operations that might have seen
+ * dax_inode_alive(), have completed. Any operations that start after
+ * synchronize_rcu() has run will abort upon seeing !dax_inode_alive().
+ */
+void kill_dax_inode(struct dax_inode *dax_inode)
+{
+ if (!dax_inode)
+ return;
+
+ dax_inode->alive = false;
+ synchronize_rcu();
+ dax_inode->private = NULL;
+}
+EXPORT_SYMBOL_GPL(kill_dax_inode);
+
+static struct inode *dax_alloc_inode(struct super_block *sb)
+{
+ struct dax_inode *dax_inode;
+
+ dax_inode = kmem_cache_alloc(dax_cache, GFP_KERNEL);
+ return &dax_inode->inode;
+}
+
+static struct dax_inode *to_dax_inode(struct inode *inode)
+{
+ return container_of(inode, struct dax_inode, inode);
+}
+
+static void dax_i_callback(struct rcu_head *head)
+{
+ struct inode *inode = container_of(head, struct inode, i_rcu);
+ struct dax_inode *dax_inode = to_dax_inode(inode);
+
+ ida_simple_remove(&dax_minor_ida, MINOR(inode->i_rdev));
+ kmem_cache_free(dax_cache, dax_inode);
+}
+
+static void dax_destroy_inode(struct inode *inode)
+{
+ struct dax_inode *dax_inode = to_dax_inode(inode);
+
+ WARN_ONCE(dax_inode->alive,
+ "kill_dax_inode() must be called before final iput()\n");
+ call_rcu(&inode->i_rcu, dax_i_callback);
+}
+
+static const struct super_operations dax_sops = {
+ .statfs = simple_statfs,
+ .alloc_inode = dax_alloc_inode,
+ .destroy_inode = dax_destroy_inode,
+ .drop_inode = generic_delete_inode,
+};
+
+static struct dentry *dax_mount(struct file_system_type *fs_type,
+ int flags, const char *dev_name, void *data)
+{
+ return mount_pseudo(fs_type, "dax:", &dax_sops, NULL, DAXFS_MAGIC);
+}
+
+static struct file_system_type dax_type = {
+ .name = "dax",
+ .mount = dax_mount,
+ .kill_sb = kill_anon_super,
+};
+
+static int dax_test(struct inode *inode, void *data)
+{
+ dev_t devt = *(dev_t *) data;
+
+ return inode->i_rdev == devt;
+}
+
+static int dax_set(struct inode *inode, void *data)
+{
+ dev_t devt = *(dev_t *) data;
+
+ inode->i_rdev = devt;
+ return 0;
+}
+
+static struct dax_inode *dax_inode_get(dev_t devt)
+{
+ struct dax_inode *dax_inode;
+ struct inode *inode;
+
+ inode = iget5_locked(dax_superblock, hash_32(devt + DAXFS_MAGIC, 31),
+ dax_test, dax_set, &devt);
+
+ if (!inode)
+ return NULL;
+
+ dax_inode = to_dax_inode(inode);
+ if (inode->i_state & I_NEW) {
+ dax_inode->alive = true;
+ inode->i_cdev = &dax_inode->cdev;
+ inode->i_mode = S_IFCHR;
+ inode->i_flags = S_DAX;
+ mapping_set_gfp_mask(&inode->i_data, GFP_USER);
+ unlock_new_inode(inode);
+ }
+
+ return dax_inode;
+}
+
+struct dax_inode *alloc_dax_inode(void *private)
+{
+ struct dax_inode *dax_inode;
+ dev_t devt;
+ int minor;
+
+ minor = ida_simple_get(&dax_minor_ida, 0, nr_dax, GFP_KERNEL);
+ if (minor < 0)
+ return NULL;
+
+ devt = MKDEV(MAJOR(dax_devt), minor);
+ dax_inode = dax_inode_get(devt);
+ if (!dax_inode)
+ goto err_inode;
+
+ dax_inode->private = private;
+ return dax_inode;
+
+ err_inode:
+ ida_simple_remove(&dax_minor_ida, minor);
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(alloc_dax_inode);
+
+void put_dax_inode(struct dax_inode *dax_inode)
+{
+ if (!dax_inode)
+ return;
+ iput(&dax_inode->inode);
+}
+EXPORT_SYMBOL_GPL(put_dax_inode);
+
+/**
+ * inode_to_dax_inode: convert a public inode into its dax_inode
+ * @inode: An inode with i_cdev pointing to a dax_inode
+ */
+struct dax_inode *inode_to_dax_inode(struct inode *inode)
+{
+ struct cdev *cdev = inode->i_cdev;
+
+ return container_of(cdev, struct dax_inode, cdev);
+}
+EXPORT_SYMBOL_GPL(inode_to_dax_inode);
+
+struct inode *dax_inode_to_inode(struct dax_inode *dax_inode)
+{
+ return &dax_inode->inode;
+}
+EXPORT_SYMBOL_GPL(dax_inode_to_inode);
+
+void *dax_inode_get_private(struct dax_inode *dax_inode)
+{
+ return dax_inode->private;
+}
+EXPORT_SYMBOL_GPL(dax_inode_get_private);
+
+int dax_inode_register(struct dax_inode *dax_inode,
+ const struct file_operations *fops, struct module *owner,
+ struct kobject *parent)
+{
+ struct cdev *cdev = &dax_inode->cdev;
+ struct inode *inode = &dax_inode->inode;
+
+ cdev_init(cdev, fops);
+ cdev->owner = owner;
+ cdev->kobj.parent = parent;
+ return cdev_add(cdev, inode->i_rdev, 1);
+}
+EXPORT_SYMBOL_GPL(dax_inode_register);
+
+void dax_inode_unregister(struct dax_inode *dax_inode)
+{
+ struct cdev *cdev = &dax_inode->cdev;
+
+ cdev_del(cdev);
+}
+EXPORT_SYMBOL_GPL(dax_inode_unregister);
+
+static void init_once(void *_dax_inode)
+{
+ struct dax_inode *dax_inode = _dax_inode;
+ struct inode *inode = &dax_inode->inode;
+
+ inode_init_once(inode);
+}
+
+static int dax_inode_init(void)
+{
+ int rc;
+
+ dax_cache = kmem_cache_create("dax_cache", sizeof(struct dax_inode), 0,
+ (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ init_once);
+ if (!dax_cache)
+ return -ENOMEM;
+
+ rc = register_filesystem(&dax_type);
+ if (rc)
+ goto err_register_fs;
+
+ dax_mnt = kern_mount(&dax_type);
+ if (IS_ERR(dax_mnt)) {
+ rc = PTR_ERR(dax_mnt);
+ goto err_mount;
+ }
+ dax_superblock = dax_mnt->mnt_sb;
+
+ return 0;
+
+ err_mount:
+ unregister_filesystem(&dax_type);
+ err_register_fs:
+ kmem_cache_destroy(dax_cache);
+
+ return rc;
+}
+
+static void dax_inode_exit(void)
+{
+ kern_unmount(dax_mnt);
+ unregister_filesystem(&dax_type);
+ kmem_cache_destroy(dax_cache);
+}
+
+static int __init dax_fs_init(void)
+{
+ int rc;
+
+ rc = dax_inode_init();
+ if (rc)
+ return rc;
+
+ nr_dax = max(nr_dax, 256);
+ rc = alloc_chrdev_region(&dax_devt, 0, nr_dax, "dax");
+ if (rc)
+ dax_inode_exit();
+ return rc;
+}
+
+static void __exit dax_fs_exit(void)
+{
+ unregister_chrdev_region(dax_devt, nr_dax);
+ ida_destroy(&dax_minor_ida);
+ dax_inode_exit();
+}
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_LICENSE("GPL v2");
+subsys_initcall(dax_fs_init);
+module_exit(dax_fs_exit);
diff --git a/tools/testing/nvdimm/Kbuild b/tools/testing/nvdimm/Kbuild
index 405212be044a..a1ed891d239a 100644
--- a/tools/testing/nvdimm/Kbuild
+++ b/tools/testing/nvdimm/Kbuild
@@ -28,7 +28,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
obj-$(CONFIG_ND_BLK) += nd_blk.o
obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
obj-$(CONFIG_ACPI_NFIT) += nfit.o
-obj-$(CONFIG_DEV_DAX) += dax.o
+obj-$(CONFIG_DEV_DAX) += device_dax.o
obj-$(CONFIG_DEV_DAX_PMEM) += dax_pmem.o
nfit-y := $(ACPI_SRC)/core.o
@@ -48,8 +48,8 @@ nd_blk-y += config_check.o
nd_e820-y := $(NVDIMM_SRC)/e820.o
nd_e820-y += config_check.o
-dax-y := $(DAX_SRC)/dax.o
-dax-y += config_check.o
+device_dax-y := $(DAX_SRC)/device.o
+device_dax-y += config_check.o
dax_pmem-y := $(DAX_SRC)/pmem.o
dax_pmem-y += config_check.o
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 01/17] dax: refactor dax-fs into a generic provider of dax inodes
@ 2017-01-28 8:36 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
We want dax capable drivers to be able to publish a set of dax
operations [1]. However, we do not want to further abuse block_devices
to advertise these operations. Instead we will attach these operations
to a dax inode and add a lookup mechanism to go from block device path
to a dax inode. A dax capable driver like pmem or brd is responsible for
registering a dax inode, alongside a block device, and then a dax
capable filesystem is responsible for retrieving the dax inode by path
name if it wants to call dax_operations.
For now, we refactor the dax pseudo-fs to be a generic facility, rather
than an implementation detail, of the device-dax use case. Where a "dax
inode" is just an inode + dax infrastructure, and "Device DAX" is a
mapping service layered on top of that base inode. "Filesystem DAX" is
then a mapping service that layers a filesystem on top of the base dax
inode. Filesystem DAX goes through a block_device for now, but perhaps
directly to a dax inode in the future, or for new pmem-only filesystems.
[1]: https://lkml.org/lkml/2017/1/19/880
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/Makefile | 2
drivers/dax/Kconfig | 8 +
drivers/dax/Makefile | 5 +
drivers/dax/dax.h | 24 ++-
drivers/dax/device-dax.h | 25 +++
drivers/dax/device.c | 241 +++++----------------------------
drivers/dax/pmem.c | 2
drivers/dax/super.c | 310 +++++++++++++++++++++++++++++++++++++++++++
tools/testing/nvdimm/Kbuild | 6 -
9 files changed, 402 insertions(+), 221 deletions(-)
create mode 100644 drivers/dax/device-dax.h
rename drivers/dax/{dax.c => device.c} (75%)
create mode 100644 drivers/dax/super.c
diff --git a/drivers/Makefile b/drivers/Makefile
index 060026a02f59..17f42e4a6717 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -68,7 +68,7 @@ obj-$(CONFIG_PARPORT) += parport/
obj-$(CONFIG_NVM) += lightnvm/
obj-y += base/ block/ misc/ mfd/ nfc/
obj-$(CONFIG_LIBNVDIMM) += nvdimm/
-obj-$(CONFIG_DEV_DAX) += dax/
+obj-$(CONFIG_DAX) += dax/
obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf/
obj-$(CONFIG_NUBUS) += nubus/
obj-y += macintosh/
diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index 3e2ab3b14eea..39bcbf4c5e40 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -1,6 +1,11 @@
-menuconfig DEV_DAX
+menuconfig DAX
tristate "DAX: direct access to differentiated memory"
default m if NVDIMM_DAX
+
+if DAX
+
+config DEV_DAX
+ tristate "Device DAX: direct access mapping device"
depends on TRANSPARENT_HUGEPAGE
help
Support raw access to differentiated (persistence, bandwidth,
@@ -10,7 +15,6 @@ menuconfig DEV_DAX
baseline memory pool. Mappings of a /dev/daxX.Y device impose
restrictions that make the mapping behavior deterministic.
-if DEV_DAX
config DEV_DAX_PMEM
tristate "PMEM DAX: direct access to persistent memory"
diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
index 27c54e38478a..dc7422530462 100644
--- a/drivers/dax/Makefile
+++ b/drivers/dax/Makefile
@@ -1,4 +1,7 @@
-obj-$(CONFIG_DEV_DAX) += dax.o
+obj-$(CONFIG_DAX) += dax.o
+obj-$(CONFIG_DEV_DAX) += device_dax.o
obj-$(CONFIG_DEV_DAX_PMEM) += dax_pmem.o
+dax-y := super.o
dax_pmem-y := pmem.o
+device_dax-y := device.o
diff --git a/drivers/dax/dax.h b/drivers/dax/dax.h
index ddd829ab58c0..def061aa75f4 100644
--- a/drivers/dax/dax.h
+++ b/drivers/dax/dax.h
@@ -1,5 +1,5 @@
/*
- * Copyright(c) 2016 Intel Corporation. All rights reserved.
+ * Copyright(c) 2016 - 2017 Intel Corporation. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -12,14 +12,16 @@
*/
#ifndef __DAX_H__
#define __DAX_H__
-struct device;
-struct dax_dev;
-struct resource;
-struct dax_region;
-void dax_region_put(struct dax_region *dax_region);
-struct dax_region *alloc_dax_region(struct device *parent,
- int region_id, struct resource *res, unsigned int align,
- void *addr, unsigned long flags);
-struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
- struct resource *res, int count);
+struct dax_inode;
+struct dax_inode *alloc_dax_inode(void *private);
+void put_dax_inode(struct dax_inode *dax_inode);
+bool dax_inode_alive(struct dax_inode *dax_inode);
+void kill_dax_inode(struct dax_inode *dax_inode);
+struct dax_inode *inode_to_dax_inode(struct inode *inode);
+struct inode *dax_inode_to_inode(struct dax_inode *dax_inode);
+void *dax_inode_get_private(struct dax_inode *dax_inode);
+int dax_inode_register(struct dax_inode *dax_inode,
+ const struct file_operations *fops, struct module *owner,
+ struct kobject *parent);
+void dax_inode_unregister(struct dax_inode *dax_inode);
#endif /* __DAX_H__ */
diff --git a/drivers/dax/device-dax.h b/drivers/dax/device-dax.h
new file mode 100644
index 000000000000..c9b7e9cc227e
--- /dev/null
+++ b/drivers/dax/device-dax.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#ifndef __DEVICE_DAX_H__
+#define __DEVICE_DAX_H__
+struct device;
+struct dax_dev;
+struct resource;
+struct dax_region;
+void dax_region_put(struct dax_region *dax_region);
+struct dax_region *alloc_dax_region(struct device *parent,
+ int region_id, struct resource *res, unsigned int align,
+ void *addr, unsigned long flags);
+struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
+ struct resource *res, int count);
+#endif /* __DEVICE_DAX_H__ */
diff --git a/drivers/dax/dax.c b/drivers/dax/device.c
similarity index 75%
rename from drivers/dax/dax.c
rename to drivers/dax/device.c
index ed758b74ddf0..5b5572314929 100644
--- a/drivers/dax/dax.c
+++ b/drivers/dax/device.c
@@ -1,5 +1,5 @@
/*
- * Copyright(c) 2016 Intel Corporation. All rights reserved.
+ * Copyright(c) 2016 - 2017 Intel Corporation. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -13,25 +13,14 @@
#include <linux/pagemap.h>
#include <linux/module.h>
#include <linux/device.h>
-#include <linux/mount.h>
#include <linux/pfn_t.h>
-#include <linux/hash.h>
-#include <linux/cdev.h>
#include <linux/slab.h>
#include <linux/dax.h>
#include <linux/fs.h>
#include <linux/mm.h>
#include "dax.h"
-static dev_t dax_devt;
static struct class *dax_class;
-static DEFINE_IDA(dax_minor_ida);
-static int nr_dax = CONFIG_NR_DEV_DAX;
-module_param(nr_dax, int, S_IRUGO);
-static struct vfsmount *dax_mnt;
-static struct kmem_cache *dax_cache __read_mostly;
-static struct super_block *dax_superblock __read_mostly;
-MODULE_PARM_DESC(nr_dax, "max number of device-dax instances");
/**
* struct dax_region - mapping infrastructure for dax devices
@@ -57,19 +46,16 @@ struct dax_region {
/**
* struct dax_dev - subdivision of a dax region
* @region - parent region
- * @dev - device backing the character device
- * @cdev - core chardev data
- * @alive - !alive + rcu grace period == no new mappings can be established
+ * @dax_inode - core dax functionality
+ * @dev - device core
* @id - child id in the region
* @num_resources - number of physical address extents in this device
* @res - array of physical address ranges
*/
struct dax_dev {
struct dax_region *region;
- struct inode *inode;
+ struct dax_inode *dax_inode;
struct device dev;
- struct cdev cdev;
- bool alive;
int id;
int num_resources;
struct resource res[0];
@@ -142,117 +128,6 @@ static const struct attribute_group *dax_region_attribute_groups[] = {
NULL,
};
-static struct inode *dax_alloc_inode(struct super_block *sb)
-{
- return kmem_cache_alloc(dax_cache, GFP_KERNEL);
-}
-
-static void dax_i_callback(struct rcu_head *head)
-{
- struct inode *inode = container_of(head, struct inode, i_rcu);
-
- kmem_cache_free(dax_cache, inode);
-}
-
-static void dax_destroy_inode(struct inode *inode)
-{
- call_rcu(&inode->i_rcu, dax_i_callback);
-}
-
-static const struct super_operations dax_sops = {
- .statfs = simple_statfs,
- .alloc_inode = dax_alloc_inode,
- .destroy_inode = dax_destroy_inode,
- .drop_inode = generic_delete_inode,
-};
-
-static struct dentry *dax_mount(struct file_system_type *fs_type,
- int flags, const char *dev_name, void *data)
-{
- return mount_pseudo(fs_type, "dax:", &dax_sops, NULL, DAXFS_MAGIC);
-}
-
-static struct file_system_type dax_type = {
- .name = "dax",
- .mount = dax_mount,
- .kill_sb = kill_anon_super,
-};
-
-static int dax_test(struct inode *inode, void *data)
-{
- return inode->i_cdev == data;
-}
-
-static int dax_set(struct inode *inode, void *data)
-{
- inode->i_cdev = data;
- return 0;
-}
-
-static struct inode *dax_inode_get(struct cdev *cdev, dev_t devt)
-{
- struct inode *inode;
-
- inode = iget5_locked(dax_superblock, hash_32(devt + DAXFS_MAGIC, 31),
- dax_test, dax_set, cdev);
-
- if (!inode)
- return NULL;
-
- if (inode->i_state & I_NEW) {
- inode->i_mode = S_IFCHR;
- inode->i_flags = S_DAX;
- inode->i_rdev = devt;
- mapping_set_gfp_mask(&inode->i_data, GFP_USER);
- unlock_new_inode(inode);
- }
- return inode;
-}
-
-static void init_once(void *inode)
-{
- inode_init_once(inode);
-}
-
-static int dax_inode_init(void)
-{
- int rc;
-
- dax_cache = kmem_cache_create("dax_cache", sizeof(struct inode), 0,
- (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
- init_once);
- if (!dax_cache)
- return -ENOMEM;
-
- rc = register_filesystem(&dax_type);
- if (rc)
- goto err_register_fs;
-
- dax_mnt = kern_mount(&dax_type);
- if (IS_ERR(dax_mnt)) {
- rc = PTR_ERR(dax_mnt);
- goto err_mount;
- }
- dax_superblock = dax_mnt->mnt_sb;
-
- return 0;
-
- err_mount:
- unregister_filesystem(&dax_type);
- err_register_fs:
- kmem_cache_destroy(dax_cache);
-
- return rc;
-}
-
-static void dax_inode_exit(void)
-{
- kern_unmount(dax_mnt);
- unregister_filesystem(&dax_type);
- kmem_cache_destroy(dax_cache);
-}
-
static void dax_region_free(struct kref *kref)
{
struct dax_region *dax_region;
@@ -361,7 +236,7 @@ static int check_vma(struct dax_dev *dax_dev, struct vm_area_struct *vma,
struct device *dev = &dax_dev->dev;
unsigned long mask;
- if (!dax_dev->alive)
+ if (!dax_inode_alive(dax_dev->dax_inode))
return -ENXIO;
/* prevent private mappings from being established */
@@ -542,7 +417,13 @@ static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
dev_dbg(&dax_dev->dev, "%s\n", __func__);
+ /*
+ * We lock to check dax_inode liveness and will re-check at
+ * fault time.
+ */
+ rcu_read_lock();
rc = check_vma(dax_dev, vma, __func__);
+ rcu_read_unlock();
if (rc)
return rc;
@@ -588,12 +469,13 @@ static unsigned long dax_get_unmapped_area(struct file *filp,
static int dax_open(struct inode *inode, struct file *filp)
{
- struct dax_dev *dax_dev;
+ struct dax_inode *dax_inode = inode_to_dax_inode(inode);
+ struct inode *__dax_inode = dax_inode_to_inode(dax_inode);
+ struct dax_dev *dax_dev = dax_inode_get_private(dax_inode);
- dax_dev = container_of(inode->i_cdev, struct dax_dev, cdev);
dev_dbg(&dax_dev->dev, "%s\n", __func__);
- inode->i_mapping = dax_dev->inode->i_mapping;
- inode->i_mapping->host = dax_dev->inode;
+ inode->i_mapping = __dax_inode->i_mapping;
+ inode->i_mapping->host = __dax_inode;
filp->f_mapping = inode->i_mapping;
filp->private_data = dax_dev;
inode->i_flags = S_DAX;
@@ -622,32 +504,25 @@ static void dax_dev_release(struct device *dev)
{
struct dax_dev *dax_dev = to_dax_dev(dev);
struct dax_region *dax_region = dax_dev->region;
+ struct dax_inode *dax_inode = dax_dev->dax_inode;
ida_simple_remove(&dax_region->ida, dax_dev->id);
- ida_simple_remove(&dax_minor_ida, MINOR(dev->devt));
dax_region_put(dax_region);
- iput(dax_dev->inode);
+ put_dax_inode(dax_inode);
kfree(dax_dev);
}
static void unregister_dax_dev(void *dev)
{
struct dax_dev *dax_dev = to_dax_dev(dev);
- struct cdev *cdev = &dax_dev->cdev;
+ struct dax_inode *dax_inode = dax_dev->dax_inode;
+ struct inode *inode = dax_inode_to_inode(dax_inode);
dev_dbg(dev, "%s\n", __func__);
- /*
- * Note, rcu is not protecting the liveness of dax_dev, rcu is
- * ensuring that any fault handlers that might have seen
- * dax_dev->alive == true, have completed. Any fault handlers
- * that start after synchronize_rcu() has started will abort
- * upon seeing dax_dev->alive == false.
- */
- dax_dev->alive = false;
- synchronize_rcu();
- unmap_mapping_range(dax_dev->inode->i_mapping, 0, 0, 1);
- cdev_del(cdev);
+ kill_dax_inode(dax_inode);
+ unmap_mapping_range(inode->i_mapping, 0, 0, 1);
+ dax_inode_unregister(dax_inode);
device_unregister(dev);
}
@@ -655,11 +530,11 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
struct resource *res, int count)
{
struct device *parent = dax_region->dev;
+ struct dax_inode *dax_inode;
struct dax_dev *dax_dev;
- int rc = 0, minor, i;
+ struct inode *inode;
struct device *dev;
- struct cdev *cdev;
- dev_t dev_t;
+ int rc = 0, i;
dax_dev = kzalloc(sizeof(*dax_dev) + sizeof(*res) * count, GFP_KERNEL);
if (!dax_dev)
@@ -685,38 +560,27 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
goto err_id;
}
- minor = ida_simple_get(&dax_minor_ida, 0, 0, GFP_KERNEL);
- if (minor < 0) {
- rc = minor;
- goto err_minor;
- }
-
- dev_t = MKDEV(MAJOR(dax_devt), minor);
- dev = &dax_dev->dev;
- dax_dev->inode = dax_inode_get(&dax_dev->cdev, dev_t);
- if (!dax_dev->inode) {
- rc = -ENOMEM;
+ dax_inode = alloc_dax_inode(dax_dev);
+ if (!dax_inode)
goto err_inode;
- }
- /* device_initialize() so cdev can reference kobj parent */
+ /* initialize now so dax_inode_register() can reference dev->kobj */
+ dax_dev->dax_inode = dax_inode;
+ dev = &dax_dev->dev;
device_initialize(dev);
- cdev = &dax_dev->cdev;
- cdev_init(cdev, &dax_fops);
- cdev->owner = parent->driver->owner;
- cdev->kobj.parent = &dev->kobj;
- rc = cdev_add(&dax_dev->cdev, dev_t, 1);
+ rc = dax_inode_register(dax_inode, &dax_fops,
+ parent->driver->owner, &dev->kobj);
if (rc)
- goto err_cdev;
+ goto err_register;
/* from here on we're committed to teardown via dax_dev_release() */
dax_dev->num_resources = count;
- dax_dev->alive = true;
dax_dev->region = dax_region;
kref_get(&dax_region->kref);
- dev->devt = dev_t;
+ inode = dax_inode_to_inode(dax_inode);
+ dev->devt = inode->i_rdev;
dev->class = dax_class;
dev->parent = parent;
dev->groups = dax_attribute_groups;
@@ -734,11 +598,9 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
return dax_dev;
- err_cdev:
- iput(dax_dev->inode);
+ err_register:
+ put_dax_inode(dax_inode);
err_inode:
- ida_simple_remove(&dax_minor_ida, minor);
- err_minor:
ida_simple_remove(&dax_region->ida, dax_dev->id);
err_id:
kfree(dax_dev);
@@ -749,38 +611,13 @@ EXPORT_SYMBOL_GPL(devm_create_dax_dev);
static int __init dax_init(void)
{
- int rc;
-
- rc = dax_inode_init();
- if (rc)
- return rc;
-
- nr_dax = max(nr_dax, 256);
- rc = alloc_chrdev_region(&dax_devt, 0, nr_dax, "dax");
- if (rc)
- goto err_chrdev;
-
dax_class = class_create(THIS_MODULE, "dax");
- if (IS_ERR(dax_class)) {
- rc = PTR_ERR(dax_class);
- goto err_class;
- }
-
- return 0;
-
- err_class:
- unregister_chrdev_region(dax_devt, nr_dax);
- err_chrdev:
- dax_inode_exit();
- return rc;
+ return PTR_ERR_OR_ZERO(dax_class);
}
static void __exit dax_exit(void)
{
class_destroy(dax_class);
- unregister_chrdev_region(dax_devt, nr_dax);
- ida_destroy(&dax_minor_ida);
- dax_inode_exit();
}
MODULE_AUTHOR("Intel Corporation");
diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index 033f49b31fdc..9c98b1dd24c1 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -16,7 +16,7 @@
#include <linux/pfn_t.h>
#include "../nvdimm/pfn.h"
#include "../nvdimm/nd.h"
-#include "dax.h"
+#include "device-dax.h"
struct dax_pmem {
struct device *dev;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
new file mode 100644
index 000000000000..e6369b851619
--- /dev/null
+++ b/drivers/dax/super.c
@@ -0,0 +1,310 @@
+/*
+ * Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <linux/pagemap.h>
+#include <linux/module.h>
+#include <linux/mount.h>
+#include <linux/magic.h>
+#include <linux/cdev.h>
+#include <linux/hash.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+
+static int nr_dax = CONFIG_NR_DEV_DAX;
+module_param(nr_dax, int, S_IRUGO);
+MODULE_PARM_DESC(nr_dax, "max number of dax device instances");
+
+static dev_t dax_devt;
+static struct vfsmount *dax_mnt;
+static DEFINE_IDA(dax_minor_ida);
+static struct kmem_cache *dax_cache __read_mostly;
+static struct super_block *dax_superblock __read_mostly;
+
+/**
+ * struct dax_inode - anchor object for dax services
+ * @inode: core vfs
+ * @cdev: optional character interface for "device dax"
+ * @private: dax driver private data
+ * @alive: !alive + rcu grace period == no new operations / mappings
+ */
+struct dax_inode {
+ struct inode inode;
+ struct cdev cdev;
+ void *private;
+ bool alive;
+};
+
+bool dax_inode_alive(struct dax_inode *dax_inode)
+{
+ RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
+ "dax operations require rcu_read_lock()\n");
+ return dax_inode->alive;
+}
+EXPORT_SYMBOL_GPL(dax_inode_alive);
+
+/*
+ * Note, rcu is not protecting the liveness of dax_inode, rcu is
+ * ensuring that any fault handlers or operations that might have seen
+ * dax_inode_alive(), have completed. Any operations that start after
+ * synchronize_rcu() has run will abort upon seeing !dax_inode_alive().
+ */
+void kill_dax_inode(struct dax_inode *dax_inode)
+{
+ if (!dax_inode)
+ return;
+
+ dax_inode->alive = false;
+ synchronize_rcu();
+ dax_inode->private = NULL;
+}
+EXPORT_SYMBOL_GPL(kill_dax_inode);
+
+static struct inode *dax_alloc_inode(struct super_block *sb)
+{
+ struct dax_inode *dax_inode;
+
+ dax_inode = kmem_cache_alloc(dax_cache, GFP_KERNEL);
+ return &dax_inode->inode;
+}
+
+static struct dax_inode *to_dax_inode(struct inode *inode)
+{
+ return container_of(inode, struct dax_inode, inode);
+}
+
+static void dax_i_callback(struct rcu_head *head)
+{
+ struct inode *inode = container_of(head, struct inode, i_rcu);
+ struct dax_inode *dax_inode = to_dax_inode(inode);
+
+ ida_simple_remove(&dax_minor_ida, MINOR(inode->i_rdev));
+ kmem_cache_free(dax_cache, dax_inode);
+}
+
+static void dax_destroy_inode(struct inode *inode)
+{
+ struct dax_inode *dax_inode = to_dax_inode(inode);
+
+ WARN_ONCE(dax_inode->alive,
+ "kill_dax_inode() must be called before final iput()\n");
+ call_rcu(&inode->i_rcu, dax_i_callback);
+}
+
+static const struct super_operations dax_sops = {
+ .statfs = simple_statfs,
+ .alloc_inode = dax_alloc_inode,
+ .destroy_inode = dax_destroy_inode,
+ .drop_inode = generic_delete_inode,
+};
+
+static struct dentry *dax_mount(struct file_system_type *fs_type,
+ int flags, const char *dev_name, void *data)
+{
+ return mount_pseudo(fs_type, "dax:", &dax_sops, NULL, DAXFS_MAGIC);
+}
+
+static struct file_system_type dax_type = {
+ .name = "dax",
+ .mount = dax_mount,
+ .kill_sb = kill_anon_super,
+};
+
+static int dax_test(struct inode *inode, void *data)
+{
+ dev_t devt = *(dev_t *) data;
+
+ return inode->i_rdev == devt;
+}
+
+static int dax_set(struct inode *inode, void *data)
+{
+ dev_t devt = *(dev_t *) data;
+
+ inode->i_rdev = devt;
+ return 0;
+}
+
+static struct dax_inode *dax_inode_get(dev_t devt)
+{
+ struct dax_inode *dax_inode;
+ struct inode *inode;
+
+ inode = iget5_locked(dax_superblock, hash_32(devt + DAXFS_MAGIC, 31),
+ dax_test, dax_set, &devt);
+
+ if (!inode)
+ return NULL;
+
+ dax_inode = to_dax_inode(inode);
+ if (inode->i_state & I_NEW) {
+ dax_inode->alive = true;
+ inode->i_cdev = &dax_inode->cdev;
+ inode->i_mode = S_IFCHR;
+ inode->i_flags = S_DAX;
+ mapping_set_gfp_mask(&inode->i_data, GFP_USER);
+ unlock_new_inode(inode);
+ }
+
+ return dax_inode;
+}
+
+struct dax_inode *alloc_dax_inode(void *private)
+{
+ struct dax_inode *dax_inode;
+ dev_t devt;
+ int minor;
+
+ minor = ida_simple_get(&dax_minor_ida, 0, nr_dax, GFP_KERNEL);
+ if (minor < 0)
+ return NULL;
+
+ devt = MKDEV(MAJOR(dax_devt), minor);
+ dax_inode = dax_inode_get(devt);
+ if (!dax_inode)
+ goto err_inode;
+
+ dax_inode->private = private;
+ return dax_inode;
+
+ err_inode:
+ ida_simple_remove(&dax_minor_ida, minor);
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(alloc_dax_inode);
+
+void put_dax_inode(struct dax_inode *dax_inode)
+{
+ if (!dax_inode)
+ return;
+ iput(&dax_inode->inode);
+}
+EXPORT_SYMBOL_GPL(put_dax_inode);
+
+/**
+ * inode_to_dax_inode: convert a public inode into its dax_inode
+ * @inode: An inode with i_cdev pointing to a dax_inode
+ */
+struct dax_inode *inode_to_dax_inode(struct inode *inode)
+{
+ struct cdev *cdev = inode->i_cdev;
+
+ return container_of(cdev, struct dax_inode, cdev);
+}
+EXPORT_SYMBOL_GPL(inode_to_dax_inode);
+
+struct inode *dax_inode_to_inode(struct dax_inode *dax_inode)
+{
+ return &dax_inode->inode;
+}
+EXPORT_SYMBOL_GPL(dax_inode_to_inode);
+
+void *dax_inode_get_private(struct dax_inode *dax_inode)
+{
+ return dax_inode->private;
+}
+EXPORT_SYMBOL_GPL(dax_inode_get_private);
+
+int dax_inode_register(struct dax_inode *dax_inode,
+ const struct file_operations *fops, struct module *owner,
+ struct kobject *parent)
+{
+ struct cdev *cdev = &dax_inode->cdev;
+ struct inode *inode = &dax_inode->inode;
+
+ cdev_init(cdev, fops);
+ cdev->owner = owner;
+ cdev->kobj.parent = parent;
+ return cdev_add(cdev, inode->i_rdev, 1);
+}
+EXPORT_SYMBOL_GPL(dax_inode_register);
+
+void dax_inode_unregister(struct dax_inode *dax_inode)
+{
+ struct cdev *cdev = &dax_inode->cdev;
+
+ cdev_del(cdev);
+}
+EXPORT_SYMBOL_GPL(dax_inode_unregister);
+
+static void init_once(void *_dax_inode)
+{
+ struct dax_inode *dax_inode = _dax_inode;
+ struct inode *inode = &dax_inode->inode;
+
+ inode_init_once(inode);
+}
+
+static int dax_inode_init(void)
+{
+ int rc;
+
+ dax_cache = kmem_cache_create("dax_cache", sizeof(struct dax_inode), 0,
+ (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ init_once);
+ if (!dax_cache)
+ return -ENOMEM;
+
+ rc = register_filesystem(&dax_type);
+ if (rc)
+ goto err_register_fs;
+
+ dax_mnt = kern_mount(&dax_type);
+ if (IS_ERR(dax_mnt)) {
+ rc = PTR_ERR(dax_mnt);
+ goto err_mount;
+ }
+ dax_superblock = dax_mnt->mnt_sb;
+
+ return 0;
+
+ err_mount:
+ unregister_filesystem(&dax_type);
+ err_register_fs:
+ kmem_cache_destroy(dax_cache);
+
+ return rc;
+}
+
+static void dax_inode_exit(void)
+{
+ kern_unmount(dax_mnt);
+ unregister_filesystem(&dax_type);
+ kmem_cache_destroy(dax_cache);
+}
+
+static int __init dax_fs_init(void)
+{
+ int rc;
+
+ rc = dax_inode_init();
+ if (rc)
+ return rc;
+
+ nr_dax = max(nr_dax, 256);
+ rc = alloc_chrdev_region(&dax_devt, 0, nr_dax, "dax");
+ if (rc)
+ dax_inode_exit();
+ return rc;
+}
+
+static void __exit dax_fs_exit(void)
+{
+ unregister_chrdev_region(dax_devt, nr_dax);
+ ida_destroy(&dax_minor_ida);
+ dax_inode_exit();
+}
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_LICENSE("GPL v2");
+subsys_initcall(dax_fs_init);
+module_exit(dax_fs_exit);
diff --git a/tools/testing/nvdimm/Kbuild b/tools/testing/nvdimm/Kbuild
index 405212be044a..a1ed891d239a 100644
--- a/tools/testing/nvdimm/Kbuild
+++ b/tools/testing/nvdimm/Kbuild
@@ -28,7 +28,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
obj-$(CONFIG_ND_BLK) += nd_blk.o
obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
obj-$(CONFIG_ACPI_NFIT) += nfit.o
-obj-$(CONFIG_DEV_DAX) += dax.o
+obj-$(CONFIG_DEV_DAX) += device_dax.o
obj-$(CONFIG_DEV_DAX_PMEM) += dax_pmem.o
nfit-y := $(ACPI_SRC)/core.o
@@ -48,8 +48,8 @@ nd_blk-y += config_check.o
nd_e820-y := $(NVDIMM_SRC)/e820.o
nd_e820-y += config_check.o
-dax-y := $(DAX_SRC)/dax.o
-dax-y += config_check.o
+device_dax-y := $(DAX_SRC)/device.o
+device_dax-y += config_check.o
dax_pmem-y := $(DAX_SRC)/pmem.o
dax_pmem-y += config_check.o
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 02/17] dax: convert dax_inode locking to srcu
2017-01-28 8:36 ` Dan Williams
@ 2017-01-28 8:36 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
In preparation for adding dax_operations that perform ->direct_access()
and user copy operations relative to a dax_inode, convert the existing
dax_inode locking to srcu. Some dax drivers need to sleep in their
->direct_access() methods and user copying may fault / sleep.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/dax/Kconfig | 1 +
drivers/dax/device.c | 18 +++++++++---------
drivers/dax/super.c | 20 ++++++++++++++++----
include/linux/dax.h | 3 +++
4 files changed, 29 insertions(+), 13 deletions(-)
diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index 39bcbf4c5e40..b7053eafd88e 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -1,5 +1,6 @@
menuconfig DAX
tristate "DAX: direct access to differentiated memory"
+ select SRCU
default m if NVDIMM_DAX
if DAX
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 5b5572314929..af06d0bfd6ea 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -333,16 +333,16 @@ static int __dax_dev_fault(struct dax_dev *dax_dev, struct vm_area_struct *vma,
static int dax_dev_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
- int rc;
+ int rc, id;
struct file *filp = vma->vm_file;
struct dax_dev *dax_dev = filp->private_data;
dev_dbg(&dax_dev->dev, "%s: %s: %s (%#lx - %#lx)\n", __func__,
current->comm, (vmf->flags & FAULT_FLAG_WRITE)
? "write" : "read", vma->vm_start, vma->vm_end);
- rcu_read_lock();
+ id = dax_read_lock();
rc = __dax_dev_fault(dax_dev, vma, vmf);
- rcu_read_unlock();
+ dax_read_unlock(id);
return rc;
}
@@ -390,7 +390,7 @@ static int __dax_dev_pmd_fault(struct dax_dev *dax_dev,
static int dax_dev_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
pmd_t *pmd, unsigned int flags)
{
- int rc;
+ int rc, id;
struct file *filp = vma->vm_file;
struct dax_dev *dax_dev = filp->private_data;
@@ -398,9 +398,9 @@ static int dax_dev_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
current->comm, (flags & FAULT_FLAG_WRITE)
? "write" : "read", vma->vm_start, vma->vm_end);
- rcu_read_lock();
+ id = dax_read_lock();
rc = __dax_dev_pmd_fault(dax_dev, vma, addr, pmd, flags);
- rcu_read_unlock();
+ dax_read_unlock(id);
return rc;
}
@@ -412,8 +412,8 @@ static const struct vm_operations_struct dax_dev_vm_ops = {
static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
{
+ int rc, id;
struct dax_dev *dax_dev = filp->private_data;
- int rc;
dev_dbg(&dax_dev->dev, "%s\n", __func__);
@@ -421,9 +421,9 @@ static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
* We lock to check dax_inode liveness and will re-check at
* fault time.
*/
- rcu_read_lock();
+ id = dax_read_lock();
rc = check_vma(dax_dev, vma, __func__);
- rcu_read_unlock();
+ dax_read_unlock(id);
if (rc)
return rc;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e6369b851619..7c4dc97d53a8 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -24,11 +24,24 @@ module_param(nr_dax, int, S_IRUGO);
MODULE_PARM_DESC(nr_dax, "max number of dax device instances");
static dev_t dax_devt;
+DEFINE_STATIC_SRCU(dax_srcu);
static struct vfsmount *dax_mnt;
static DEFINE_IDA(dax_minor_ida);
static struct kmem_cache *dax_cache __read_mostly;
static struct super_block *dax_superblock __read_mostly;
+int dax_read_lock(void)
+{
+ return srcu_read_lock(&dax_srcu);
+}
+EXPORT_SYMBOL_GPL(dax_read_lock);
+
+void dax_read_unlock(int id)
+{
+ srcu_read_unlock(&dax_srcu, id);
+}
+EXPORT_SYMBOL_GPL(dax_read_unlock);
+
/**
* struct dax_inode - anchor object for dax services
* @inode: core vfs
@@ -45,8 +58,7 @@ struct dax_inode {
bool dax_inode_alive(struct dax_inode *dax_inode)
{
- RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
- "dax operations require rcu_read_lock()\n");
+ lockdep_assert_held(&dax_srcu);
return dax_inode->alive;
}
EXPORT_SYMBOL_GPL(dax_inode_alive);
@@ -55,7 +67,7 @@ EXPORT_SYMBOL_GPL(dax_inode_alive);
* Note, rcu is not protecting the liveness of dax_inode, rcu is
* ensuring that any fault handlers or operations that might have seen
* dax_inode_alive(), have completed. Any operations that start after
- * synchronize_rcu() has run will abort upon seeing !dax_inode_alive().
+ * synchronize_srcu() has run will abort upon seeing !dax_inode_alive().
*/
void kill_dax_inode(struct dax_inode *dax_inode)
{
@@ -63,7 +75,7 @@ void kill_dax_inode(struct dax_inode *dax_inode)
return;
dax_inode->alive = false;
- synchronize_rcu();
+ synchronize_srcu(&dax_srcu);
dax_inode->private = NULL;
}
EXPORT_SYMBOL_GPL(kill_dax_inode);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 24ad71173995..67002898d130 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -8,6 +8,9 @@
struct iomap_ops;
+int dax_read_lock(void);
+void dax_read_unlock(int id);
+
/*
* We use lowest available bit in exceptional entry for locking, one bit for
* the entry size (PMD) and two more to tell us if the entry is a huge zero
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 02/17] dax: convert dax_inode locking to srcu
@ 2017-01-28 8:36 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
In preparation for adding dax_operations that perform ->direct_access()
and user copy operations relative to a dax_inode, convert the existing
dax_inode locking to srcu. Some dax drivers need to sleep in their
->direct_access() methods and user copying may fault / sleep.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/dax/Kconfig | 1 +
drivers/dax/device.c | 18 +++++++++---------
drivers/dax/super.c | 20 ++++++++++++++++----
include/linux/dax.h | 3 +++
4 files changed, 29 insertions(+), 13 deletions(-)
diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index 39bcbf4c5e40..b7053eafd88e 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -1,5 +1,6 @@
menuconfig DAX
tristate "DAX: direct access to differentiated memory"
+ select SRCU
default m if NVDIMM_DAX
if DAX
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 5b5572314929..af06d0bfd6ea 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -333,16 +333,16 @@ static int __dax_dev_fault(struct dax_dev *dax_dev, struct vm_area_struct *vma,
static int dax_dev_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
- int rc;
+ int rc, id;
struct file *filp = vma->vm_file;
struct dax_dev *dax_dev = filp->private_data;
dev_dbg(&dax_dev->dev, "%s: %s: %s (%#lx - %#lx)\n", __func__,
current->comm, (vmf->flags & FAULT_FLAG_WRITE)
? "write" : "read", vma->vm_start, vma->vm_end);
- rcu_read_lock();
+ id = dax_read_lock();
rc = __dax_dev_fault(dax_dev, vma, vmf);
- rcu_read_unlock();
+ dax_read_unlock(id);
return rc;
}
@@ -390,7 +390,7 @@ static int __dax_dev_pmd_fault(struct dax_dev *dax_dev,
static int dax_dev_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
pmd_t *pmd, unsigned int flags)
{
- int rc;
+ int rc, id;
struct file *filp = vma->vm_file;
struct dax_dev *dax_dev = filp->private_data;
@@ -398,9 +398,9 @@ static int dax_dev_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
current->comm, (flags & FAULT_FLAG_WRITE)
? "write" : "read", vma->vm_start, vma->vm_end);
- rcu_read_lock();
+ id = dax_read_lock();
rc = __dax_dev_pmd_fault(dax_dev, vma, addr, pmd, flags);
- rcu_read_unlock();
+ dax_read_unlock(id);
return rc;
}
@@ -412,8 +412,8 @@ static const struct vm_operations_struct dax_dev_vm_ops = {
static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
{
+ int rc, id;
struct dax_dev *dax_dev = filp->private_data;
- int rc;
dev_dbg(&dax_dev->dev, "%s\n", __func__);
@@ -421,9 +421,9 @@ static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
* We lock to check dax_inode liveness and will re-check at
* fault time.
*/
- rcu_read_lock();
+ id = dax_read_lock();
rc = check_vma(dax_dev, vma, __func__);
- rcu_read_unlock();
+ dax_read_unlock(id);
if (rc)
return rc;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e6369b851619..7c4dc97d53a8 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -24,11 +24,24 @@ module_param(nr_dax, int, S_IRUGO);
MODULE_PARM_DESC(nr_dax, "max number of dax device instances");
static dev_t dax_devt;
+DEFINE_STATIC_SRCU(dax_srcu);
static struct vfsmount *dax_mnt;
static DEFINE_IDA(dax_minor_ida);
static struct kmem_cache *dax_cache __read_mostly;
static struct super_block *dax_superblock __read_mostly;
+int dax_read_lock(void)
+{
+ return srcu_read_lock(&dax_srcu);
+}
+EXPORT_SYMBOL_GPL(dax_read_lock);
+
+void dax_read_unlock(int id)
+{
+ srcu_read_unlock(&dax_srcu, id);
+}
+EXPORT_SYMBOL_GPL(dax_read_unlock);
+
/**
* struct dax_inode - anchor object for dax services
* @inode: core vfs
@@ -45,8 +58,7 @@ struct dax_inode {
bool dax_inode_alive(struct dax_inode *dax_inode)
{
- RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
- "dax operations require rcu_read_lock()\n");
+ lockdep_assert_held(&dax_srcu);
return dax_inode->alive;
}
EXPORT_SYMBOL_GPL(dax_inode_alive);
@@ -55,7 +67,7 @@ EXPORT_SYMBOL_GPL(dax_inode_alive);
* Note, rcu is not protecting the liveness of dax_inode, rcu is
* ensuring that any fault handlers or operations that might have seen
* dax_inode_alive(), have completed. Any operations that start after
- * synchronize_rcu() has run will abort upon seeing !dax_inode_alive().
+ * synchronize_srcu() has run will abort upon seeing !dax_inode_alive().
*/
void kill_dax_inode(struct dax_inode *dax_inode)
{
@@ -63,7 +75,7 @@ void kill_dax_inode(struct dax_inode *dax_inode)
return;
dax_inode->alive = false;
- synchronize_rcu();
+ synchronize_srcu(&dax_srcu);
dax_inode->private = NULL;
}
EXPORT_SYMBOL_GPL(kill_dax_inode);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 24ad71173995..67002898d130 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -8,6 +8,9 @@
struct iomap_ops;
+int dax_read_lock(void);
+void dax_read_unlock(int id);
+
/*
* We use lowest available bit in exceptional entry for locking, one bit for
* the entry size (PMD) and two more to tell us if the entry is a huge zero
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 03/17] dax: add a facility to lookup a dax inode by 'host' device name
2017-01-28 8:36 ` Dan Williams
@ 2017-01-28 8:36 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
For the current block_device based filesystem-dax path, we need a way
for it to lookup the dax_inode associated with a block_device. Add a
'host' property of a dax_inode that can be used for this purpose. It is
a free form string, but for a dax_inode associated with a block device
it is the bdev name.
This is a band-aid until filesystems are able to mount on a dax-inode
directly.
We use a hash list since blkdev_writepages() will need to use this
interface to issue dax_writeback_mapping_range().
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/dax/dax.h | 2 +
drivers/dax/device.c | 2 +
drivers/dax/super.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++-
include/linux/dax.h | 1 +
4 files changed, 80 insertions(+), 4 deletions(-)
diff --git a/drivers/dax/dax.h b/drivers/dax/dax.h
index def061aa75f4..f33c16ed2ec6 100644
--- a/drivers/dax/dax.h
+++ b/drivers/dax/dax.h
@@ -13,7 +13,7 @@
#ifndef __DAX_H__
#define __DAX_H__
struct dax_inode;
-struct dax_inode *alloc_dax_inode(void *private);
+struct dax_inode *alloc_dax_inode(void *private, const char *host);
void put_dax_inode(struct dax_inode *dax_inode);
bool dax_inode_alive(struct dax_inode *dax_inode);
void kill_dax_inode(struct dax_inode *dax_inode);
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index af06d0bfd6ea..6d0a3241a608 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -560,7 +560,7 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
goto err_id;
}
- dax_inode = alloc_dax_inode(dax_dev);
+ dax_inode = alloc_dax_inode(dax_dev, NULL);
if (!dax_inode)
goto err_inode;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 7c4dc97d53a8..7ac048f94b2b 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -30,6 +30,10 @@ static DEFINE_IDA(dax_minor_ida);
static struct kmem_cache *dax_cache __read_mostly;
static struct super_block *dax_superblock __read_mostly;
+#define DAX_HASH_SIZE (PAGE_SIZE / sizeof(struct hlist_head))
+static struct hlist_head dax_host_list[DAX_HASH_SIZE];
+static DEFINE_SPINLOCK(dax_host_lock);
+
int dax_read_lock(void)
{
return srcu_read_lock(&dax_srcu);
@@ -46,12 +50,15 @@ EXPORT_SYMBOL_GPL(dax_read_unlock);
* struct dax_inode - anchor object for dax services
* @inode: core vfs
* @cdev: optional character interface for "device dax"
+ * @host: optional name for lookups where the device path is not available
* @private: dax driver private data
* @alive: !alive + rcu grace period == no new operations / mappings
*/
struct dax_inode {
+ struct hlist_node list;
struct inode inode;
struct cdev cdev;
+ const char *host;
void *private;
bool alive;
};
@@ -63,6 +70,11 @@ bool dax_inode_alive(struct dax_inode *dax_inode)
}
EXPORT_SYMBOL_GPL(dax_inode_alive);
+static int dax_host_hash(const char *host)
+{
+ return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
+}
+
/*
* Note, rcu is not protecting the liveness of dax_inode, rcu is
* ensuring that any fault handlers or operations that might have seen
@@ -75,6 +87,12 @@ void kill_dax_inode(struct dax_inode *dax_inode)
return;
dax_inode->alive = false;
+
+ spin_lock(&dax_host_lock);
+ if (!hlist_unhashed(&dax_inode->list))
+ hlist_del_init(&dax_inode->list);
+ spin_unlock(&dax_host_lock);
+
synchronize_srcu(&dax_srcu);
dax_inode->private = NULL;
}
@@ -98,6 +116,8 @@ static void dax_i_callback(struct rcu_head *head)
struct inode *inode = container_of(head, struct inode, i_rcu);
struct dax_inode *dax_inode = to_dax_inode(inode);
+ kfree(dax_inode->host);
+ dax_inode->host = NULL;
ida_simple_remove(&dax_minor_ida, MINOR(inode->i_rdev));
kmem_cache_free(dax_cache, dax_inode);
}
@@ -169,26 +189,49 @@ static struct dax_inode *dax_inode_get(dev_t devt)
return dax_inode;
}
-struct dax_inode *alloc_dax_inode(void *private)
+static void dax_add_host(struct dax_inode *dax_inode, const char *host)
+{
+ int hash;
+
+ INIT_HLIST_NODE(&dax_inode->list);
+ if (!host)
+ return;
+
+ dax_inode->host = host;
+ hash = dax_host_hash(host);
+ spin_lock(&dax_host_lock);
+ hlist_add_head(&dax_inode->list, &dax_host_list[hash]);
+ spin_unlock(&dax_host_lock);
+}
+
+struct dax_inode *alloc_dax_inode(void *private, const char *__host)
{
struct dax_inode *dax_inode;
+ const char *host;
dev_t devt;
int minor;
+ host = kstrdup(__host, GFP_KERNEL);
+ if (__host && !host)
+ return NULL;
+
minor = ida_simple_get(&dax_minor_ida, 0, nr_dax, GFP_KERNEL);
if (minor < 0)
- return NULL;
+ goto err_minor;
devt = MKDEV(MAJOR(dax_devt), minor);
dax_inode = dax_inode_get(devt);
if (!dax_inode)
goto err_inode;
+ dax_add_host(dax_inode, host);
dax_inode->private = private;
return dax_inode;
err_inode:
ida_simple_remove(&dax_minor_ida, minor);
+ err_minor:
+ kfree(host);
return NULL;
}
EXPORT_SYMBOL_GPL(alloc_dax_inode);
@@ -202,6 +245,38 @@ void put_dax_inode(struct dax_inode *dax_inode)
EXPORT_SYMBOL_GPL(put_dax_inode);
/**
+ * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
+ * @host: alternate name for the inode registered by a dax driver
+ */
+struct dax_inode *dax_get_by_host(const char *host)
+{
+ struct dax_inode *dax_inode, *found = NULL;
+ int hash, id;
+
+ if (!host)
+ return NULL;
+
+ hash = dax_host_hash(host);
+
+ id = dax_read_lock();
+ spin_lock(&dax_host_lock);
+ hlist_for_each_entry(dax_inode, &dax_host_list[hash], list) {
+ if (!dax_inode_alive(dax_inode)
+ || strcmp(host, dax_inode->host) != 0)
+ continue;
+
+ if (igrab(&dax_inode->inode))
+ found = dax_inode;
+ break;
+ }
+ spin_unlock(&dax_host_lock);
+ dax_read_unlock(id);
+
+ return found;
+}
+EXPORT_SYMBOL_GPL(dax_get_by_host);
+
+/**
* inode_to_dax_inode: convert a public inode into its dax_inode
* @inode: An inode with i_cdev pointing to a dax_inode
*/
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 67002898d130..8fe19230e118 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -10,6 +10,7 @@ struct iomap_ops;
int dax_read_lock(void);
void dax_read_unlock(int id);
+struct dax_inode *dax_get_by_host(const char *host);
/*
* We use lowest available bit in exceptional entry for locking, one bit for
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 03/17] dax: add a facility to lookup a dax inode by 'host' device name
@ 2017-01-28 8:36 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
For the current block_device based filesystem-dax path, we need a way
for it to lookup the dax_inode associated with a block_device. Add a
'host' property of a dax_inode that can be used for this purpose. It is
a free form string, but for a dax_inode associated with a block device
it is the bdev name.
This is a band-aid until filesystems are able to mount on a dax-inode
directly.
We use a hash list since blkdev_writepages() will need to use this
interface to issue dax_writeback_mapping_range().
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/dax/dax.h | 2 +
drivers/dax/device.c | 2 +
drivers/dax/super.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++-
include/linux/dax.h | 1 +
4 files changed, 80 insertions(+), 4 deletions(-)
diff --git a/drivers/dax/dax.h b/drivers/dax/dax.h
index def061aa75f4..f33c16ed2ec6 100644
--- a/drivers/dax/dax.h
+++ b/drivers/dax/dax.h
@@ -13,7 +13,7 @@
#ifndef __DAX_H__
#define __DAX_H__
struct dax_inode;
-struct dax_inode *alloc_dax_inode(void *private);
+struct dax_inode *alloc_dax_inode(void *private, const char *host);
void put_dax_inode(struct dax_inode *dax_inode);
bool dax_inode_alive(struct dax_inode *dax_inode);
void kill_dax_inode(struct dax_inode *dax_inode);
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index af06d0bfd6ea..6d0a3241a608 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -560,7 +560,7 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
goto err_id;
}
- dax_inode = alloc_dax_inode(dax_dev);
+ dax_inode = alloc_dax_inode(dax_dev, NULL);
if (!dax_inode)
goto err_inode;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 7c4dc97d53a8..7ac048f94b2b 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -30,6 +30,10 @@ static DEFINE_IDA(dax_minor_ida);
static struct kmem_cache *dax_cache __read_mostly;
static struct super_block *dax_superblock __read_mostly;
+#define DAX_HASH_SIZE (PAGE_SIZE / sizeof(struct hlist_head))
+static struct hlist_head dax_host_list[DAX_HASH_SIZE];
+static DEFINE_SPINLOCK(dax_host_lock);
+
int dax_read_lock(void)
{
return srcu_read_lock(&dax_srcu);
@@ -46,12 +50,15 @@ EXPORT_SYMBOL_GPL(dax_read_unlock);
* struct dax_inode - anchor object for dax services
* @inode: core vfs
* @cdev: optional character interface for "device dax"
+ * @host: optional name for lookups where the device path is not available
* @private: dax driver private data
* @alive: !alive + rcu grace period == no new operations / mappings
*/
struct dax_inode {
+ struct hlist_node list;
struct inode inode;
struct cdev cdev;
+ const char *host;
void *private;
bool alive;
};
@@ -63,6 +70,11 @@ bool dax_inode_alive(struct dax_inode *dax_inode)
}
EXPORT_SYMBOL_GPL(dax_inode_alive);
+static int dax_host_hash(const char *host)
+{
+ return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
+}
+
/*
* Note, rcu is not protecting the liveness of dax_inode, rcu is
* ensuring that any fault handlers or operations that might have seen
@@ -75,6 +87,12 @@ void kill_dax_inode(struct dax_inode *dax_inode)
return;
dax_inode->alive = false;
+
+ spin_lock(&dax_host_lock);
+ if (!hlist_unhashed(&dax_inode->list))
+ hlist_del_init(&dax_inode->list);
+ spin_unlock(&dax_host_lock);
+
synchronize_srcu(&dax_srcu);
dax_inode->private = NULL;
}
@@ -98,6 +116,8 @@ static void dax_i_callback(struct rcu_head *head)
struct inode *inode = container_of(head, struct inode, i_rcu);
struct dax_inode *dax_inode = to_dax_inode(inode);
+ kfree(dax_inode->host);
+ dax_inode->host = NULL;
ida_simple_remove(&dax_minor_ida, MINOR(inode->i_rdev));
kmem_cache_free(dax_cache, dax_inode);
}
@@ -169,26 +189,49 @@ static struct dax_inode *dax_inode_get(dev_t devt)
return dax_inode;
}
-struct dax_inode *alloc_dax_inode(void *private)
+static void dax_add_host(struct dax_inode *dax_inode, const char *host)
+{
+ int hash;
+
+ INIT_HLIST_NODE(&dax_inode->list);
+ if (!host)
+ return;
+
+ dax_inode->host = host;
+ hash = dax_host_hash(host);
+ spin_lock(&dax_host_lock);
+ hlist_add_head(&dax_inode->list, &dax_host_list[hash]);
+ spin_unlock(&dax_host_lock);
+}
+
+struct dax_inode *alloc_dax_inode(void *private, const char *__host)
{
struct dax_inode *dax_inode;
+ const char *host;
dev_t devt;
int minor;
+ host = kstrdup(__host, GFP_KERNEL);
+ if (__host && !host)
+ return NULL;
+
minor = ida_simple_get(&dax_minor_ida, 0, nr_dax, GFP_KERNEL);
if (minor < 0)
- return NULL;
+ goto err_minor;
devt = MKDEV(MAJOR(dax_devt), minor);
dax_inode = dax_inode_get(devt);
if (!dax_inode)
goto err_inode;
+ dax_add_host(dax_inode, host);
dax_inode->private = private;
return dax_inode;
err_inode:
ida_simple_remove(&dax_minor_ida, minor);
+ err_minor:
+ kfree(host);
return NULL;
}
EXPORT_SYMBOL_GPL(alloc_dax_inode);
@@ -202,6 +245,38 @@ void put_dax_inode(struct dax_inode *dax_inode)
EXPORT_SYMBOL_GPL(put_dax_inode);
/**
+ * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
+ * @host: alternate name for the inode registered by a dax driver
+ */
+struct dax_inode *dax_get_by_host(const char *host)
+{
+ struct dax_inode *dax_inode, *found = NULL;
+ int hash, id;
+
+ if (!host)
+ return NULL;
+
+ hash = dax_host_hash(host);
+
+ id = dax_read_lock();
+ spin_lock(&dax_host_lock);
+ hlist_for_each_entry(dax_inode, &dax_host_list[hash], list) {
+ if (!dax_inode_alive(dax_inode)
+ || strcmp(host, dax_inode->host) != 0)
+ continue;
+
+ if (igrab(&dax_inode->inode))
+ found = dax_inode;
+ break;
+ }
+ spin_unlock(&dax_host_lock);
+ dax_read_unlock(id);
+
+ return found;
+}
+EXPORT_SYMBOL_GPL(dax_get_by_host);
+
+/**
* inode_to_dax_inode: convert a public inode into its dax_inode
* @inode: An inode with i_cdev pointing to a dax_inode
*/
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 67002898d130..8fe19230e118 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -10,6 +10,7 @@ struct iomap_ops;
int dax_read_lock(void);
void dax_read_unlock(int id);
+struct dax_inode *dax_get_by_host(const char *host);
/*
* We use lowest available bit in exceptional entry for locking, one bit for
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 04/17] dax: introduce dax_operations
2017-01-28 8:36 ` Dan Williams
@ 2017-01-28 8:36 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
Track a set of dax_operations per dax_inode that can be set at
alloc_dax_inode() time. These operations will be used to stop the abuse
of block_device_operations for communicating dax capabilities to
filesystems. It will also be used to replace the "pmem api" and move
pmem-specific cache maintenance, and other dax-driver-specific
filesystem-dax operations, to dax inode methods. In particular this
allows us to stop abusing __copy_user_nocache(), via memcpy_to_pmem(),
with a driver specific replacement.
This is a standalone introduction of the operations. Follow on patches
convert each dax-driver and teach fs/dax.c to use ->direct_access() from
dax_operations instead of block_device_operations.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/dax/dax.h | 4 +++-
drivers/dax/device.c | 6 +++++-
drivers/dax/super.c | 6 +++++-
include/linux/dax.h | 5 +++++
4 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/drivers/dax/dax.h b/drivers/dax/dax.h
index f33c16ed2ec6..aeb1d49aafb8 100644
--- a/drivers/dax/dax.h
+++ b/drivers/dax/dax.h
@@ -13,7 +13,9 @@
#ifndef __DAX_H__
#define __DAX_H__
struct dax_inode;
-struct dax_inode *alloc_dax_inode(void *private, const char *host);
+struct dax_operations;
+struct dax_inode *alloc_dax_inode(void *private, const char *host,
+ const struct dax_operations *ops);
void put_dax_inode(struct dax_inode *dax_inode);
bool dax_inode_alive(struct dax_inode *dax_inode);
void kill_dax_inode(struct dax_inode *dax_inode);
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 6d0a3241a608..c3d9405ec285 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -560,7 +560,11 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
goto err_id;
}
- dax_inode = alloc_dax_inode(dax_dev, NULL);
+ /*
+ * No 'host' or dax_operations since there is no access to this
+ * device outside of mmap of the resulting character device.
+ */
+ dax_inode = alloc_dax_inode(dax_dev, NULL, NULL);
if (!dax_inode)
goto err_inode;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 7ac048f94b2b..eb844ffea3cf 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -17,6 +17,7 @@
#include <linux/cdev.h>
#include <linux/hash.h>
#include <linux/slab.h>
+#include <linux/dax.h>
#include <linux/fs.h>
static int nr_dax = CONFIG_NR_DEV_DAX;
@@ -61,6 +62,7 @@ struct dax_inode {
const char *host;
void *private;
bool alive;
+ const struct dax_operations *ops;
};
bool dax_inode_alive(struct dax_inode *dax_inode)
@@ -204,7 +206,8 @@ static void dax_add_host(struct dax_inode *dax_inode, const char *host)
spin_unlock(&dax_host_lock);
}
-struct dax_inode *alloc_dax_inode(void *private, const char *__host)
+struct dax_inode *alloc_dax_inode(void *private, const char *__host,
+ const struct dax_operations *ops)
{
struct dax_inode *dax_inode;
const char *host;
@@ -225,6 +228,7 @@ struct dax_inode *alloc_dax_inode(void *private, const char *__host)
goto err_inode;
dax_add_host(dax_inode, host);
+ dax_inode->ops = ops;
dax_inode->private = private;
return dax_inode;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 8fe19230e118..def9a9d118c9 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,6 +7,11 @@
#include <asm/pgtable.h>
struct iomap_ops;
+struct dax_inode;
+struct dax_operations {
+ long (*direct_access)(struct dax_inode *, phys_addr_t, void **,
+ pfn_t *, long);
+};
int dax_read_lock(void);
void dax_read_unlock(int id);
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 04/17] dax: introduce dax_operations
@ 2017-01-28 8:36 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
Track a set of dax_operations per dax_inode that can be set at
alloc_dax_inode() time. These operations will be used to stop the abuse
of block_device_operations for communicating dax capabilities to
filesystems. It will also be used to replace the "pmem api" and move
pmem-specific cache maintenance, and other dax-driver-specific
filesystem-dax operations, to dax inode methods. In particular this
allows us to stop abusing __copy_user_nocache(), via memcpy_to_pmem(),
with a driver specific replacement.
This is a standalone introduction of the operations. Follow on patches
convert each dax-driver and teach fs/dax.c to use ->direct_access() from
dax_operations instead of block_device_operations.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/dax/dax.h | 4 +++-
drivers/dax/device.c | 6 +++++-
drivers/dax/super.c | 6 +++++-
include/linux/dax.h | 5 +++++
4 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/drivers/dax/dax.h b/drivers/dax/dax.h
index f33c16ed2ec6..aeb1d49aafb8 100644
--- a/drivers/dax/dax.h
+++ b/drivers/dax/dax.h
@@ -13,7 +13,9 @@
#ifndef __DAX_H__
#define __DAX_H__
struct dax_inode;
-struct dax_inode *alloc_dax_inode(void *private, const char *host);
+struct dax_operations;
+struct dax_inode *alloc_dax_inode(void *private, const char *host,
+ const struct dax_operations *ops);
void put_dax_inode(struct dax_inode *dax_inode);
bool dax_inode_alive(struct dax_inode *dax_inode);
void kill_dax_inode(struct dax_inode *dax_inode);
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 6d0a3241a608..c3d9405ec285 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -560,7 +560,11 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
goto err_id;
}
- dax_inode = alloc_dax_inode(dax_dev, NULL);
+ /*
+ * No 'host' or dax_operations since there is no access to this
+ * device outside of mmap of the resulting character device.
+ */
+ dax_inode = alloc_dax_inode(dax_dev, NULL, NULL);
if (!dax_inode)
goto err_inode;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 7ac048f94b2b..eb844ffea3cf 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -17,6 +17,7 @@
#include <linux/cdev.h>
#include <linux/hash.h>
#include <linux/slab.h>
+#include <linux/dax.h>
#include <linux/fs.h>
static int nr_dax = CONFIG_NR_DEV_DAX;
@@ -61,6 +62,7 @@ struct dax_inode {
const char *host;
void *private;
bool alive;
+ const struct dax_operations *ops;
};
bool dax_inode_alive(struct dax_inode *dax_inode)
@@ -204,7 +206,8 @@ static void dax_add_host(struct dax_inode *dax_inode, const char *host)
spin_unlock(&dax_host_lock);
}
-struct dax_inode *alloc_dax_inode(void *private, const char *__host)
+struct dax_inode *alloc_dax_inode(void *private, const char *__host,
+ const struct dax_operations *ops)
{
struct dax_inode *dax_inode;
const char *host;
@@ -225,6 +228,7 @@ struct dax_inode *alloc_dax_inode(void *private, const char *__host)
goto err_inode;
dax_add_host(dax_inode, host);
+ dax_inode->ops = ops;
dax_inode->private = private;
return dax_inode;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 8fe19230e118..def9a9d118c9 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,6 +7,11 @@
#include <asm/pgtable.h>
struct iomap_ops;
+struct dax_inode;
+struct dax_operations {
+ long (*direct_access)(struct dax_inode *, phys_addr_t, void **,
+ pfn_t *, long);
+};
int dax_read_lock(void);
void dax_read_unlock(int id);
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 05/17] pmem: add dax_operations support
2017-01-28 8:36 ` Dan Williams
@ 2017-01-28 8:36 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
Setup a dax_inode to have the same lifetime as the pmem block device and
add a ->direct_access() method that is equivalent to
pmem_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old pmem_direct_access() will be removed.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/dax/dax.h | 7 -----
drivers/nvdimm/Kconfig | 1 +
drivers/nvdimm/pmem.c | 55 +++++++++++++++++++++++++++++++--------
drivers/nvdimm/pmem.h | 7 ++++-
include/linux/dax.h | 6 ++++
tools/testing/nvdimm/pmem-dax.c | 12 ++++-----
6 files changed, 61 insertions(+), 27 deletions(-)
diff --git a/drivers/dax/dax.h b/drivers/dax/dax.h
index aeb1d49aafb8..b4c686d2d446 100644
--- a/drivers/dax/dax.h
+++ b/drivers/dax/dax.h
@@ -13,15 +13,8 @@
#ifndef __DAX_H__
#define __DAX_H__
struct dax_inode;
-struct dax_operations;
-struct dax_inode *alloc_dax_inode(void *private, const char *host,
- const struct dax_operations *ops);
-void put_dax_inode(struct dax_inode *dax_inode);
-bool dax_inode_alive(struct dax_inode *dax_inode);
-void kill_dax_inode(struct dax_inode *dax_inode);
struct dax_inode *inode_to_dax_inode(struct inode *inode);
struct inode *dax_inode_to_inode(struct dax_inode *dax_inode);
-void *dax_inode_get_private(struct dax_inode *dax_inode);
int dax_inode_register(struct dax_inode *dax_inode,
const struct file_operations *fops, struct module *owner,
struct kobject *parent);
diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index 59e750183b7f..5bdd499b5f4f 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -20,6 +20,7 @@ if LIBNVDIMM
config BLK_DEV_PMEM
tristate "PMEM: Persistent memory block device support"
default LIBNVDIMM
+ select DAX
select ND_BTT if BTT
select ND_PFN if NVDIMM_PFN
help
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 5b536be5a12e..d3d7de645e20 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -28,6 +28,7 @@
#include <linux/pfn_t.h>
#include <linux/slab.h>
#include <linux/pmem.h>
+#include <linux/dax.h>
#include <linux/nd.h>
#include "pmem.h"
#include "pfn.h"
@@ -199,13 +200,12 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
}
/* see "strong" declaration in tools/testing/nvdimm/pmem-dax.c */
-__weak long pmem_direct_access(struct block_device *bdev, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
+__weak long __pmem_direct_access(struct pmem_device *pmem, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
{
- struct pmem_device *pmem = bdev->bd_queue->queuedata;
- resource_size_t offset = sector * 512 + pmem->data_offset;
+ resource_size_t offset = dev_addr + pmem->data_offset;
- if (unlikely(is_bad_pmem(&pmem->bb, sector, size)))
+ if (unlikely(is_bad_pmem(&pmem->bb, dev_addr / 512, size)))
return -EIO;
*kaddr = pmem->virt_addr + offset;
*pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
@@ -219,22 +219,46 @@ __weak long pmem_direct_access(struct block_device *bdev, sector_t sector,
return pmem->size - pmem->pfn_pad - offset;
}
+static long pmem_blk_direct_access(struct block_device *bdev, sector_t sector,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ struct pmem_device *pmem = bdev->bd_queue->queuedata;
+
+ return __pmem_direct_access(pmem, sector * 512, kaddr, pfn, size);
+}
+
static const struct block_device_operations pmem_fops = {
.owner = THIS_MODULE,
.rw_page = pmem_rw_page,
- .direct_access = pmem_direct_access,
+ .direct_access = pmem_blk_direct_access,
.revalidate_disk = nvdimm_revalidate_disk,
};
+static long pmem_dax_direct_access(struct dax_inode *dax_inode,
+ phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size)
+{
+ struct pmem_device *pmem = dax_inode_get_private(dax_inode);
+
+ return __pmem_direct_access(pmem, dev_addr, kaddr, pfn, size);
+}
+
+static const struct dax_operations pmem_dax_ops = {
+ .direct_access = pmem_dax_direct_access,
+};
+
static void pmem_release_queue(void *q)
{
blk_cleanup_queue(q);
}
-static void pmem_release_disk(void *disk)
+static void pmem_release_disk(void *__pmem)
{
- del_gendisk(disk);
- put_disk(disk);
+ struct pmem_device *pmem = __pmem;
+
+ kill_dax_inode(pmem->dax_inode);
+ put_dax_inode(pmem->dax_inode);
+ del_gendisk(pmem->disk);
+ put_disk(pmem->disk);
}
static int pmem_attach_disk(struct device *dev,
@@ -245,6 +269,7 @@ static int pmem_attach_disk(struct device *dev,
struct vmem_altmap __altmap, *altmap = NULL;
struct resource *res = &nsio->res;
struct nd_pfn *nd_pfn = NULL;
+ struct dax_inode *dax_inode;
int nid = dev_to_node(dev);
struct nd_pfn_sb *pfn_sb;
struct pmem_device *pmem;
@@ -325,6 +350,7 @@ static int pmem_attach_disk(struct device *dev,
disk = alloc_disk_node(0, nid);
if (!disk)
return -ENOMEM;
+ pmem->disk = disk;
disk->fops = &pmem_fops;
disk->queue = q;
@@ -336,9 +362,16 @@ static int pmem_attach_disk(struct device *dev,
return -ENOMEM;
nvdimm_badblocks_populate(nd_region, &pmem->bb, res);
disk->bb = &pmem->bb;
- device_add_disk(dev, disk);
- if (devm_add_action_or_reset(dev, pmem_release_disk, disk))
+ dax_inode = alloc_dax_inode(pmem, disk->disk_name, &pmem_dax_ops);
+ if (!dax_inode) {
+ put_disk(disk);
+ return -ENOMEM;
+ }
+ pmem->dax_inode = dax_inode;
+
+ device_add_disk(dev, disk);
+ if (devm_add_action_or_reset(dev, pmem_release_disk, pmem))
return -ENOMEM;
revalidate_disk(disk);
diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h
index b4ee4f71b4a1..a26ade213eb5 100644
--- a/drivers/nvdimm/pmem.h
+++ b/drivers/nvdimm/pmem.h
@@ -5,8 +5,6 @@
#include <linux/pfn_t.h>
#include <linux/fs.h>
-long pmem_direct_access(struct block_device *bdev, sector_t sector,
- void **kaddr, pfn_t *pfn, long size);
/* this definition is in it's own header for tools/testing/nvdimm to consume */
struct pmem_device {
/* One contiguous memory region per device */
@@ -20,5 +18,10 @@ struct pmem_device {
/* trim size when namespace capacity has been section aligned */
u32 pfn_pad;
struct badblocks bb;
+ struct dax_inode *dax_inode;
+ struct gendisk *disk;
};
+
+long __pmem_direct_access(struct pmem_device *pmem, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size);
#endif /* __NVDIMM_PMEM_H__ */
diff --git a/include/linux/dax.h b/include/linux/dax.h
index def9a9d118c9..5aa620e8e5a2 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -16,6 +16,12 @@ struct dax_operations {
int dax_read_lock(void);
void dax_read_unlock(int id);
struct dax_inode *dax_get_by_host(const char *host);
+struct dax_inode *alloc_dax_inode(void *private, const char *host,
+ const struct dax_operations *ops);
+void *dax_inode_get_private(struct dax_inode *dax_inode);
+void put_dax_inode(struct dax_inode *dax_inode);
+bool dax_inode_alive(struct dax_inode *dax_inode);
+void kill_dax_inode(struct dax_inode *dax_inode);
/*
* We use lowest available bit in exceptional entry for locking, one bit for
diff --git a/tools/testing/nvdimm/pmem-dax.c b/tools/testing/nvdimm/pmem-dax.c
index c9b8c48f85fc..2c93836c169e 100644
--- a/tools/testing/nvdimm/pmem-dax.c
+++ b/tools/testing/nvdimm/pmem-dax.c
@@ -15,13 +15,12 @@
#include <pmem.h>
#include <nd.h>
-long pmem_direct_access(struct block_device *bdev, sector_t sector,
+long __pmem_direct_access(struct pmem_device *pmem, phys_addr_t dev_addr,
void **kaddr, pfn_t *pfn, long size)
{
- struct pmem_device *pmem = bdev->bd_queue->queuedata;
- resource_size_t offset = sector * 512 + pmem->data_offset;
+ resource_size_t offset = dev_addr + pmem->data_offset;
- if (unlikely(is_bad_pmem(&pmem->bb, sector, size)))
+ if (unlikely(is_bad_pmem(&pmem->bb, dev_addr / 512, size)))
return -EIO;
/*
@@ -34,9 +33,8 @@ long pmem_direct_access(struct block_device *bdev, sector_t sector,
*kaddr = pmem->virt_addr + offset;
page = vmalloc_to_page(pmem->virt_addr + offset);
*pfn = page_to_pfn_t(page);
- dev_dbg_ratelimited(disk_to_dev(bdev->bd_disk)->parent,
- "%s: sector: %#llx pfn: %#lx\n", __func__,
- (unsigned long long) sector, page_to_pfn(page));
+ pr_debug_ratelimited("%s: pmem: %p dev_addr: %pa pfn: %#lx\n",
+ __func__, pmem, &dev_addr, page_to_pfn(page));
return PAGE_SIZE;
}
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 05/17] pmem: add dax_operations support
@ 2017-01-28 8:36 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
Setup a dax_inode to have the same lifetime as the pmem block device and
add a ->direct_access() method that is equivalent to
pmem_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old pmem_direct_access() will be removed.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/dax/dax.h | 7 -----
drivers/nvdimm/Kconfig | 1 +
drivers/nvdimm/pmem.c | 55 +++++++++++++++++++++++++++++++--------
drivers/nvdimm/pmem.h | 7 ++++-
include/linux/dax.h | 6 ++++
tools/testing/nvdimm/pmem-dax.c | 12 ++++-----
6 files changed, 61 insertions(+), 27 deletions(-)
diff --git a/drivers/dax/dax.h b/drivers/dax/dax.h
index aeb1d49aafb8..b4c686d2d446 100644
--- a/drivers/dax/dax.h
+++ b/drivers/dax/dax.h
@@ -13,15 +13,8 @@
#ifndef __DAX_H__
#define __DAX_H__
struct dax_inode;
-struct dax_operations;
-struct dax_inode *alloc_dax_inode(void *private, const char *host,
- const struct dax_operations *ops);
-void put_dax_inode(struct dax_inode *dax_inode);
-bool dax_inode_alive(struct dax_inode *dax_inode);
-void kill_dax_inode(struct dax_inode *dax_inode);
struct dax_inode *inode_to_dax_inode(struct inode *inode);
struct inode *dax_inode_to_inode(struct dax_inode *dax_inode);
-void *dax_inode_get_private(struct dax_inode *dax_inode);
int dax_inode_register(struct dax_inode *dax_inode,
const struct file_operations *fops, struct module *owner,
struct kobject *parent);
diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index 59e750183b7f..5bdd499b5f4f 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -20,6 +20,7 @@ if LIBNVDIMM
config BLK_DEV_PMEM
tristate "PMEM: Persistent memory block device support"
default LIBNVDIMM
+ select DAX
select ND_BTT if BTT
select ND_PFN if NVDIMM_PFN
help
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 5b536be5a12e..d3d7de645e20 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -28,6 +28,7 @@
#include <linux/pfn_t.h>
#include <linux/slab.h>
#include <linux/pmem.h>
+#include <linux/dax.h>
#include <linux/nd.h>
#include "pmem.h"
#include "pfn.h"
@@ -199,13 +200,12 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
}
/* see "strong" declaration in tools/testing/nvdimm/pmem-dax.c */
-__weak long pmem_direct_access(struct block_device *bdev, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
+__weak long __pmem_direct_access(struct pmem_device *pmem, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
{
- struct pmem_device *pmem = bdev->bd_queue->queuedata;
- resource_size_t offset = sector * 512 + pmem->data_offset;
+ resource_size_t offset = dev_addr + pmem->data_offset;
- if (unlikely(is_bad_pmem(&pmem->bb, sector, size)))
+ if (unlikely(is_bad_pmem(&pmem->bb, dev_addr / 512, size)))
return -EIO;
*kaddr = pmem->virt_addr + offset;
*pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
@@ -219,22 +219,46 @@ __weak long pmem_direct_access(struct block_device *bdev, sector_t sector,
return pmem->size - pmem->pfn_pad - offset;
}
+static long pmem_blk_direct_access(struct block_device *bdev, sector_t sector,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ struct pmem_device *pmem = bdev->bd_queue->queuedata;
+
+ return __pmem_direct_access(pmem, sector * 512, kaddr, pfn, size);
+}
+
static const struct block_device_operations pmem_fops = {
.owner = THIS_MODULE,
.rw_page = pmem_rw_page,
- .direct_access = pmem_direct_access,
+ .direct_access = pmem_blk_direct_access,
.revalidate_disk = nvdimm_revalidate_disk,
};
+static long pmem_dax_direct_access(struct dax_inode *dax_inode,
+ phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size)
+{
+ struct pmem_device *pmem = dax_inode_get_private(dax_inode);
+
+ return __pmem_direct_access(pmem, dev_addr, kaddr, pfn, size);
+}
+
+static const struct dax_operations pmem_dax_ops = {
+ .direct_access = pmem_dax_direct_access,
+};
+
static void pmem_release_queue(void *q)
{
blk_cleanup_queue(q);
}
-static void pmem_release_disk(void *disk)
+static void pmem_release_disk(void *__pmem)
{
- del_gendisk(disk);
- put_disk(disk);
+ struct pmem_device *pmem = __pmem;
+
+ kill_dax_inode(pmem->dax_inode);
+ put_dax_inode(pmem->dax_inode);
+ del_gendisk(pmem->disk);
+ put_disk(pmem->disk);
}
static int pmem_attach_disk(struct device *dev,
@@ -245,6 +269,7 @@ static int pmem_attach_disk(struct device *dev,
struct vmem_altmap __altmap, *altmap = NULL;
struct resource *res = &nsio->res;
struct nd_pfn *nd_pfn = NULL;
+ struct dax_inode *dax_inode;
int nid = dev_to_node(dev);
struct nd_pfn_sb *pfn_sb;
struct pmem_device *pmem;
@@ -325,6 +350,7 @@ static int pmem_attach_disk(struct device *dev,
disk = alloc_disk_node(0, nid);
if (!disk)
return -ENOMEM;
+ pmem->disk = disk;
disk->fops = &pmem_fops;
disk->queue = q;
@@ -336,9 +362,16 @@ static int pmem_attach_disk(struct device *dev,
return -ENOMEM;
nvdimm_badblocks_populate(nd_region, &pmem->bb, res);
disk->bb = &pmem->bb;
- device_add_disk(dev, disk);
- if (devm_add_action_or_reset(dev, pmem_release_disk, disk))
+ dax_inode = alloc_dax_inode(pmem, disk->disk_name, &pmem_dax_ops);
+ if (!dax_inode) {
+ put_disk(disk);
+ return -ENOMEM;
+ }
+ pmem->dax_inode = dax_inode;
+
+ device_add_disk(dev, disk);
+ if (devm_add_action_or_reset(dev, pmem_release_disk, pmem))
return -ENOMEM;
revalidate_disk(disk);
diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h
index b4ee4f71b4a1..a26ade213eb5 100644
--- a/drivers/nvdimm/pmem.h
+++ b/drivers/nvdimm/pmem.h
@@ -5,8 +5,6 @@
#include <linux/pfn_t.h>
#include <linux/fs.h>
-long pmem_direct_access(struct block_device *bdev, sector_t sector,
- void **kaddr, pfn_t *pfn, long size);
/* this definition is in it's own header for tools/testing/nvdimm to consume */
struct pmem_device {
/* One contiguous memory region per device */
@@ -20,5 +18,10 @@ struct pmem_device {
/* trim size when namespace capacity has been section aligned */
u32 pfn_pad;
struct badblocks bb;
+ struct dax_inode *dax_inode;
+ struct gendisk *disk;
};
+
+long __pmem_direct_access(struct pmem_device *pmem, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size);
#endif /* __NVDIMM_PMEM_H__ */
diff --git a/include/linux/dax.h b/include/linux/dax.h
index def9a9d118c9..5aa620e8e5a2 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -16,6 +16,12 @@ struct dax_operations {
int dax_read_lock(void);
void dax_read_unlock(int id);
struct dax_inode *dax_get_by_host(const char *host);
+struct dax_inode *alloc_dax_inode(void *private, const char *host,
+ const struct dax_operations *ops);
+void *dax_inode_get_private(struct dax_inode *dax_inode);
+void put_dax_inode(struct dax_inode *dax_inode);
+bool dax_inode_alive(struct dax_inode *dax_inode);
+void kill_dax_inode(struct dax_inode *dax_inode);
/*
* We use lowest available bit in exceptional entry for locking, one bit for
diff --git a/tools/testing/nvdimm/pmem-dax.c b/tools/testing/nvdimm/pmem-dax.c
index c9b8c48f85fc..2c93836c169e 100644
--- a/tools/testing/nvdimm/pmem-dax.c
+++ b/tools/testing/nvdimm/pmem-dax.c
@@ -15,13 +15,12 @@
#include <pmem.h>
#include <nd.h>
-long pmem_direct_access(struct block_device *bdev, sector_t sector,
+long __pmem_direct_access(struct pmem_device *pmem, phys_addr_t dev_addr,
void **kaddr, pfn_t *pfn, long size)
{
- struct pmem_device *pmem = bdev->bd_queue->queuedata;
- resource_size_t offset = sector * 512 + pmem->data_offset;
+ resource_size_t offset = dev_addr + pmem->data_offset;
- if (unlikely(is_bad_pmem(&pmem->bb, sector, size)))
+ if (unlikely(is_bad_pmem(&pmem->bb, dev_addr / 512, size)))
return -EIO;
/*
@@ -34,9 +33,8 @@ long pmem_direct_access(struct block_device *bdev, sector_t sector,
*kaddr = pmem->virt_addr + offset;
page = vmalloc_to_page(pmem->virt_addr + offset);
*pfn = page_to_pfn_t(page);
- dev_dbg_ratelimited(disk_to_dev(bdev->bd_disk)->parent,
- "%s: sector: %#llx pfn: %#lx\n", __func__,
- (unsigned long long) sector, page_to_pfn(page));
+ pr_debug_ratelimited("%s: pmem: %p dev_addr: %pa pfn: %#lx\n",
+ __func__, pmem, &dev_addr, page_to_pfn(page));
return PAGE_SIZE;
}
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 06/17] axon_ram: add dax_operations support
2017-01-28 8:36 ` Dan Williams
@ 2017-01-28 8:36 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
Setup a dax_inode to have the same lifetime as the axon_ram block device
and add a ->direct_access() method that is equivalent to
axon_ram_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old axon_ram_direct_access() will be removed.
---
arch/powerpc/platforms/Kconfig | 1 +
arch/powerpc/sysdev/axonram.c | 46 +++++++++++++++++++++++++++++++++++-----
2 files changed, 41 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index 7e3a2ebba29b..33244e3d9375 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -284,6 +284,7 @@ config CPM2
config AXON_RAM
tristate "Axon DDR2 memory device driver"
depends on PPC_IBM_CELL_BLADE && BLOCK
+ select DAX
default m
help
It registers one block device per Axon's DDR2 memory bank found
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index ada29eaed6e2..4e1f58187726 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -25,6 +25,7 @@
#include <linux/bio.h>
#include <linux/blkdev.h>
+#include <linux/dax.h>
#include <linux/device.h>
#include <linux/errno.h>
#include <linux/fs.h>
@@ -62,6 +63,7 @@ static int azfs_major, azfs_minor;
struct axon_ram_bank {
struct platform_device *device;
struct gendisk *disk;
+ struct dax_inode *dax_inode;
unsigned int irq_id;
unsigned long ph_addr;
unsigned long io_addr;
@@ -137,25 +139,45 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
return BLK_QC_T_NONE;
}
+static long
+__axon_ram_direct_access(struct axon_ram_bank *bank, phys_addr_t offset,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ *kaddr = (void *) bank->io_addr + offset;
+ *pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
+ return bank->size - offset;
+}
+
/**
* axon_ram_direct_access - direct_access() method for block device
* @device, @sector, @data: see block_device_operations method
*/
static long
-axon_ram_direct_access(struct block_device *device, sector_t sector,
+axon_ram_blk_direct_access(struct block_device *device, sector_t sector,
void **kaddr, pfn_t *pfn, long size)
{
struct axon_ram_bank *bank = device->bd_disk->private_data;
- loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
- *kaddr = (void *) bank->io_addr + offset;
- *pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
- return bank->size - offset;
+ return __axon_ram_direct_access(bank, sector << AXON_RAM_SECTOR_SHIFT,
+ kaddr, pfn, size);
}
static const struct block_device_operations axon_ram_devops = {
.owner = THIS_MODULE,
- .direct_access = axon_ram_direct_access
+ .direct_access = axon_ram_blk_direct_access
+};
+
+static long
+axon_ram_dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ struct axon_ram_bank *bank = dax_inode_get_private(dax_inode);
+
+ return __axon_ram_direct_access(bank, dev_addr, kaddr, pfn, size);
+}
+
+static const struct dax_operations axon_ram_dax_ops = {
+ .direct_access = axon_ram_dax_direct_access,
};
/**
@@ -219,6 +241,7 @@ static int axon_ram_probe(struct platform_device *device)
goto failed;
}
+
bank->disk->major = azfs_major;
bank->disk->first_minor = azfs_minor;
bank->disk->fops = &axon_ram_devops;
@@ -227,6 +250,11 @@ static int axon_ram_probe(struct platform_device *device)
sprintf(bank->disk->disk_name, "%s%d",
AXON_RAM_DEVICE_NAME, axon_ram_bank_id);
+ bank->dax_inode = alloc_dax_inode(bank, bank->disk->disk_name,
+ &axon_ram_dax_ops);
+ if (!bank->dax_inode)
+ goto failed;
+
bank->disk->queue = blk_alloc_queue(GFP_KERNEL);
if (bank->disk->queue == NULL) {
dev_err(&device->dev, "Cannot register disk queue\n");
@@ -276,6 +304,10 @@ static int axon_ram_probe(struct platform_device *device)
bank->disk->disk_name);
del_gendisk(bank->disk);
}
+ if (bank->dax_inode) {
+ kill_dax_inode(bank->dax_inode);
+ put_dax_inode(bank->dax_inode);
+ }
device->dev.platform_data = NULL;
if (bank->io_addr != 0)
iounmap((void __iomem *) bank->io_addr);
@@ -298,6 +330,8 @@ axon_ram_remove(struct platform_device *device)
device_remove_file(&device->dev, &dev_attr_ecc);
free_irq(bank->irq_id, device);
+ kill_dax_inode(bank->dax_inode);
+ put_dax_inode(bank->dax_inode);
del_gendisk(bank->disk);
iounmap((void __iomem *) bank->io_addr);
kfree(bank);
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 06/17] axon_ram: add dax_operations support
@ 2017-01-28 8:36 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
Setup a dax_inode to have the same lifetime as the axon_ram block device
and add a ->direct_access() method that is equivalent to
axon_ram_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old axon_ram_direct_access() will be removed.
---
arch/powerpc/platforms/Kconfig | 1 +
arch/powerpc/sysdev/axonram.c | 46 +++++++++++++++++++++++++++++++++++-----
2 files changed, 41 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index 7e3a2ebba29b..33244e3d9375 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -284,6 +284,7 @@ config CPM2
config AXON_RAM
tristate "Axon DDR2 memory device driver"
depends on PPC_IBM_CELL_BLADE && BLOCK
+ select DAX
default m
help
It registers one block device per Axon's DDR2 memory bank found
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index ada29eaed6e2..4e1f58187726 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -25,6 +25,7 @@
#include <linux/bio.h>
#include <linux/blkdev.h>
+#include <linux/dax.h>
#include <linux/device.h>
#include <linux/errno.h>
#include <linux/fs.h>
@@ -62,6 +63,7 @@ static int azfs_major, azfs_minor;
struct axon_ram_bank {
struct platform_device *device;
struct gendisk *disk;
+ struct dax_inode *dax_inode;
unsigned int irq_id;
unsigned long ph_addr;
unsigned long io_addr;
@@ -137,25 +139,45 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
return BLK_QC_T_NONE;
}
+static long
+__axon_ram_direct_access(struct axon_ram_bank *bank, phys_addr_t offset,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ *kaddr = (void *) bank->io_addr + offset;
+ *pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
+ return bank->size - offset;
+}
+
/**
* axon_ram_direct_access - direct_access() method for block device
* @device, @sector, @data: see block_device_operations method
*/
static long
-axon_ram_direct_access(struct block_device *device, sector_t sector,
+axon_ram_blk_direct_access(struct block_device *device, sector_t sector,
void **kaddr, pfn_t *pfn, long size)
{
struct axon_ram_bank *bank = device->bd_disk->private_data;
- loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
- *kaddr = (void *) bank->io_addr + offset;
- *pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
- return bank->size - offset;
+ return __axon_ram_direct_access(bank, sector << AXON_RAM_SECTOR_SHIFT,
+ kaddr, pfn, size);
}
static const struct block_device_operations axon_ram_devops = {
.owner = THIS_MODULE,
- .direct_access = axon_ram_direct_access
+ .direct_access = axon_ram_blk_direct_access
+};
+
+static long
+axon_ram_dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ struct axon_ram_bank *bank = dax_inode_get_private(dax_inode);
+
+ return __axon_ram_direct_access(bank, dev_addr, kaddr, pfn, size);
+}
+
+static const struct dax_operations axon_ram_dax_ops = {
+ .direct_access = axon_ram_dax_direct_access,
};
/**
@@ -219,6 +241,7 @@ static int axon_ram_probe(struct platform_device *device)
goto failed;
}
+
bank->disk->major = azfs_major;
bank->disk->first_minor = azfs_minor;
bank->disk->fops = &axon_ram_devops;
@@ -227,6 +250,11 @@ static int axon_ram_probe(struct platform_device *device)
sprintf(bank->disk->disk_name, "%s%d",
AXON_RAM_DEVICE_NAME, axon_ram_bank_id);
+ bank->dax_inode = alloc_dax_inode(bank, bank->disk->disk_name,
+ &axon_ram_dax_ops);
+ if (!bank->dax_inode)
+ goto failed;
+
bank->disk->queue = blk_alloc_queue(GFP_KERNEL);
if (bank->disk->queue == NULL) {
dev_err(&device->dev, "Cannot register disk queue\n");
@@ -276,6 +304,10 @@ static int axon_ram_probe(struct platform_device *device)
bank->disk->disk_name);
del_gendisk(bank->disk);
}
+ if (bank->dax_inode) {
+ kill_dax_inode(bank->dax_inode);
+ put_dax_inode(bank->dax_inode);
+ }
device->dev.platform_data = NULL;
if (bank->io_addr != 0)
iounmap((void __iomem *) bank->io_addr);
@@ -298,6 +330,8 @@ axon_ram_remove(struct platform_device *device)
device_remove_file(&device->dev, &dev_attr_ecc);
free_irq(bank->irq_id, device);
+ kill_dax_inode(bank->dax_inode);
+ put_dax_inode(bank->dax_inode);
del_gendisk(bank->disk);
iounmap((void __iomem *) bank->io_addr);
kfree(bank);
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 07/17] brd: add dax_operations support
2017-01-28 8:36 ` Dan Williams
@ 2017-01-28 8:36 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
Setup a dax_inode to have the same lifetime as the brd block device and
add a ->direct_access() method that is equivalent to
brd_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old brd_direct_access() will be removed.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/block/Kconfig | 1 +
drivers/block/brd.c | 57 +++++++++++++++++++++++++++++++++++++++++--------
2 files changed, 49 insertions(+), 9 deletions(-)
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index 223ff2fcae7e..604b51a884b6 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -337,6 +337,7 @@ config BLK_DEV_SX8
config BLK_DEV_RAM
tristate "RAM block device support"
+ select DAX if BLK_DEV_RAM_DAX
---help---
Saying Y here will allow you to use a portion of your RAM memory as
a block device, so that you can make file systems on it, read and
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 3adc32a3153b..1279df4dc07c 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -21,6 +21,7 @@
#include <linux/slab.h>
#ifdef CONFIG_BLK_DEV_RAM_DAX
#include <linux/pfn_t.h>
+#include <linux/dax.h>
#endif
#include <linux/uaccess.h>
@@ -41,6 +42,9 @@ struct brd_device {
struct request_queue *brd_queue;
struct gendisk *brd_disk;
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+ struct dax_inode *dax_inode;
+#endif
struct list_head brd_list;
/*
@@ -375,15 +379,14 @@ static int brd_rw_page(struct block_device *bdev, sector_t sector,
}
#ifdef CONFIG_BLK_DEV_RAM_DAX
-static long brd_direct_access(struct block_device *bdev, sector_t sector,
+static long __brd_direct_access(struct brd_device *brd, phys_addr_t dev_addr,
void **kaddr, pfn_t *pfn, long size)
{
- struct brd_device *brd = bdev->bd_disk->private_data;
struct page *page;
if (!brd)
return -ENODEV;
- page = brd_insert_page(brd, sector);
+ page = brd_insert_page(brd, dev_addr / 512);
if (!page)
return -ENOSPC;
*kaddr = page_address(page);
@@ -391,14 +394,34 @@ static long brd_direct_access(struct block_device *bdev, sector_t sector,
return PAGE_SIZE;
}
+
+static long brd_blk_direct_access(struct block_device *bdev, sector_t sector,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ struct brd_device *brd = bdev->bd_disk->private_data;
+
+ return __brd_direct_access(brd, sector * 512, kaddr, pfn, size);
+}
+
+static long brd_dax_direct_access(struct dax_inode *dax_inode,
+ phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size)
+{
+ struct brd_device *brd = dax_inode_get_private(dax_inode);
+
+ return __brd_direct_access(brd, dev_addr, kaddr, pfn, size);
+}
+
+static const struct dax_operations brd_dax_ops = {
+ .direct_access = brd_dax_direct_access,
+};
#else
-#define brd_direct_access NULL
+#define brd_blk_direct_access NULL
#endif
static const struct block_device_operations brd_fops = {
.owner = THIS_MODULE,
.rw_page = brd_rw_page,
- .direct_access = brd_direct_access,
+ .direct_access = brd_blk_direct_access,
};
/*
@@ -441,7 +464,9 @@ static struct brd_device *brd_alloc(int i)
{
struct brd_device *brd;
struct gendisk *disk;
-
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+ struct dax_inode *dax_inode;
+#endif
brd = kzalloc(sizeof(*brd), GFP_KERNEL);
if (!brd)
goto out;
@@ -469,9 +494,6 @@ static struct brd_device *brd_alloc(int i)
blk_queue_max_discard_sectors(brd->brd_queue, UINT_MAX);
brd->brd_queue->limits.discard_zeroes_data = 1;
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, brd->brd_queue);
-#ifdef CONFIG_BLK_DEV_RAM_DAX
- queue_flag_set_unlocked(QUEUE_FLAG_DAX, brd->brd_queue);
-#endif
disk = brd->brd_disk = alloc_disk(max_part);
if (!disk)
goto out_free_queue;
@@ -484,8 +506,21 @@ static struct brd_device *brd_alloc(int i)
sprintf(disk->disk_name, "ram%d", i);
set_capacity(disk, rd_size * 2);
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+ queue_flag_set_unlocked(QUEUE_FLAG_DAX, brd->brd_queue);
+ dax_inode = alloc_dax_inode(brd, disk->disk_name, &brd_dax_ops);
+ if (!dax_inode)
+ goto out_free_inode;
+#endif
+
+
return brd;
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+out_free_inode:
+ kill_dax_inode(dax_inode);
+ put_dax_inode(dax_inode);
+#endif
out_free_queue:
blk_cleanup_queue(brd->brd_queue);
out_free_dev:
@@ -525,6 +560,10 @@ static struct brd_device *brd_init_one(int i, bool *new)
static void brd_del_one(struct brd_device *brd)
{
list_del(&brd->brd_list);
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+ kill_dax_inode(brd->dax_inode);
+ put_dax_inode(brd->dax_inode);
+#endif
del_gendisk(brd->brd_disk);
brd_free(brd);
}
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 07/17] brd: add dax_operations support
@ 2017-01-28 8:36 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
Setup a dax_inode to have the same lifetime as the brd block device and
add a ->direct_access() method that is equivalent to
brd_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old brd_direct_access() will be removed.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/block/Kconfig | 1 +
drivers/block/brd.c | 57 +++++++++++++++++++++++++++++++++++++++++--------
2 files changed, 49 insertions(+), 9 deletions(-)
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index 223ff2fcae7e..604b51a884b6 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -337,6 +337,7 @@ config BLK_DEV_SX8
config BLK_DEV_RAM
tristate "RAM block device support"
+ select DAX if BLK_DEV_RAM_DAX
---help---
Saying Y here will allow you to use a portion of your RAM memory as
a block device, so that you can make file systems on it, read and
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 3adc32a3153b..1279df4dc07c 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -21,6 +21,7 @@
#include <linux/slab.h>
#ifdef CONFIG_BLK_DEV_RAM_DAX
#include <linux/pfn_t.h>
+#include <linux/dax.h>
#endif
#include <linux/uaccess.h>
@@ -41,6 +42,9 @@ struct brd_device {
struct request_queue *brd_queue;
struct gendisk *brd_disk;
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+ struct dax_inode *dax_inode;
+#endif
struct list_head brd_list;
/*
@@ -375,15 +379,14 @@ static int brd_rw_page(struct block_device *bdev, sector_t sector,
}
#ifdef CONFIG_BLK_DEV_RAM_DAX
-static long brd_direct_access(struct block_device *bdev, sector_t sector,
+static long __brd_direct_access(struct brd_device *brd, phys_addr_t dev_addr,
void **kaddr, pfn_t *pfn, long size)
{
- struct brd_device *brd = bdev->bd_disk->private_data;
struct page *page;
if (!brd)
return -ENODEV;
- page = brd_insert_page(brd, sector);
+ page = brd_insert_page(brd, dev_addr / 512);
if (!page)
return -ENOSPC;
*kaddr = page_address(page);
@@ -391,14 +394,34 @@ static long brd_direct_access(struct block_device *bdev, sector_t sector,
return PAGE_SIZE;
}
+
+static long brd_blk_direct_access(struct block_device *bdev, sector_t sector,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ struct brd_device *brd = bdev->bd_disk->private_data;
+
+ return __brd_direct_access(brd, sector * 512, kaddr, pfn, size);
+}
+
+static long brd_dax_direct_access(struct dax_inode *dax_inode,
+ phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size)
+{
+ struct brd_device *brd = dax_inode_get_private(dax_inode);
+
+ return __brd_direct_access(brd, dev_addr, kaddr, pfn, size);
+}
+
+static const struct dax_operations brd_dax_ops = {
+ .direct_access = brd_dax_direct_access,
+};
#else
-#define brd_direct_access NULL
+#define brd_blk_direct_access NULL
#endif
static const struct block_device_operations brd_fops = {
.owner = THIS_MODULE,
.rw_page = brd_rw_page,
- .direct_access = brd_direct_access,
+ .direct_access = brd_blk_direct_access,
};
/*
@@ -441,7 +464,9 @@ static struct brd_device *brd_alloc(int i)
{
struct brd_device *brd;
struct gendisk *disk;
-
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+ struct dax_inode *dax_inode;
+#endif
brd = kzalloc(sizeof(*brd), GFP_KERNEL);
if (!brd)
goto out;
@@ -469,9 +494,6 @@ static struct brd_device *brd_alloc(int i)
blk_queue_max_discard_sectors(brd->brd_queue, UINT_MAX);
brd->brd_queue->limits.discard_zeroes_data = 1;
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, brd->brd_queue);
-#ifdef CONFIG_BLK_DEV_RAM_DAX
- queue_flag_set_unlocked(QUEUE_FLAG_DAX, brd->brd_queue);
-#endif
disk = brd->brd_disk = alloc_disk(max_part);
if (!disk)
goto out_free_queue;
@@ -484,8 +506,21 @@ static struct brd_device *brd_alloc(int i)
sprintf(disk->disk_name, "ram%d", i);
set_capacity(disk, rd_size * 2);
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+ queue_flag_set_unlocked(QUEUE_FLAG_DAX, brd->brd_queue);
+ dax_inode = alloc_dax_inode(brd, disk->disk_name, &brd_dax_ops);
+ if (!dax_inode)
+ goto out_free_inode;
+#endif
+
+
return brd;
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+out_free_inode:
+ kill_dax_inode(dax_inode);
+ put_dax_inode(dax_inode);
+#endif
out_free_queue:
blk_cleanup_queue(brd->brd_queue);
out_free_dev:
@@ -525,6 +560,10 @@ static struct brd_device *brd_init_one(int i, bool *new)
static void brd_del_one(struct brd_device *brd)
{
list_del(&brd->brd_list);
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+ kill_dax_inode(brd->dax_inode);
+ put_dax_inode(brd->dax_inode);
+#endif
del_gendisk(brd->brd_disk);
brd_free(brd);
}
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 08/17] dcssblk: add dax_operations support
2017-01-28 8:36 ` Dan Williams
@ 2017-01-28 8:36 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
Setup a dax_inode to have the same lifetime as the dcssblk block device
and add a ->direct_access() method that is equivalent to
dcssblk_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old dcssblk_direct_access() will be removed.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/s390/block/Kconfig | 1 +
drivers/s390/block/dcssblk.c | 53 +++++++++++++++++++++++++++++++++++-------
2 files changed, 45 insertions(+), 9 deletions(-)
diff --git a/drivers/s390/block/Kconfig b/drivers/s390/block/Kconfig
index 4a3b62326183..0acb8c2f9475 100644
--- a/drivers/s390/block/Kconfig
+++ b/drivers/s390/block/Kconfig
@@ -14,6 +14,7 @@ config BLK_DEV_XPRAM
config DCSSBLK
def_tristate m
+ select DAX
prompt "DCSSBLK support"
depends on S390 && BLOCK
help
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 9d66b4fb174b..67b0885b4d12 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -18,6 +18,7 @@
#include <linux/interrupt.h>
#include <linux/platform_device.h>
#include <linux/pfn_t.h>
+#include <linux/dax.h>
#include <asm/extmem.h>
#include <asm/io.h>
@@ -30,8 +31,10 @@ static int dcssblk_open(struct block_device *bdev, fmode_t mode);
static void dcssblk_release(struct gendisk *disk, fmode_t mode);
static blk_qc_t dcssblk_make_request(struct request_queue *q,
struct bio *bio);
-static long dcssblk_direct_access(struct block_device *bdev, sector_t secnum,
+static long dcssblk_blk_direct_access(struct block_device *bdev, sector_t secnum,
void **kaddr, pfn_t *pfn, long size);
+static long dcssblk_dax_direct_access(struct dax_inode *dax_inode,
+ phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size);
static char dcssblk_segments[DCSSBLK_PARM_LEN] = "\0";
@@ -40,7 +43,11 @@ static const struct block_device_operations dcssblk_devops = {
.owner = THIS_MODULE,
.open = dcssblk_open,
.release = dcssblk_release,
- .direct_access = dcssblk_direct_access,
+ .direct_access = dcssblk_blk_direct_access,
+};
+
+static const struct dax_operations dcssblk_dax_ops = {
+ .direct_access = dcssblk_dax_direct_access,
};
struct dcssblk_dev_info {
@@ -57,6 +64,7 @@ struct dcssblk_dev_info {
struct request_queue *dcssblk_queue;
int num_of_segments;
struct list_head seg_list;
+ struct dax_inode *dax_inode;
};
struct segment_info {
@@ -389,6 +397,8 @@ dcssblk_shared_store(struct device *dev, struct device_attribute *attr, const ch
}
list_del(&dev_info->lh);
+ kill_dax_inode(dev_info->dax_inode);
+ put_dax_inode(dev_info->dax_inode);
del_gendisk(dev_info->gd);
blk_cleanup_queue(dev_info->dcssblk_queue);
dev_info->gd->queue = NULL;
@@ -525,6 +535,7 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
int rc, i, j, num_of_segments;
struct dcssblk_dev_info *dev_info;
struct segment_info *seg_info, *temp;
+ struct dax_inode *dax_inode;
char *local_buf;
unsigned long seg_byte_size;
@@ -654,6 +665,11 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
if (rc)
goto put_dev;
+ dax_inode = alloc_dax_inode(dev_info, dev_info->gd->disk_name,
+ &dcssblk_dax_ops);
+ if (!dax_inode)
+ goto put_dev;
+
get_device(&dev_info->dev);
device_add_disk(&dev_info->dev, dev_info->gd);
@@ -752,6 +768,8 @@ dcssblk_remove_store(struct device *dev, struct device_attribute *attr, const ch
}
list_del(&dev_info->lh);
+ kill_dax_inode(dev_info->dax_inode);
+ put_dax_inode(dev_info->dax_inode);
del_gendisk(dev_info->gd);
blk_cleanup_queue(dev_info->dcssblk_queue);
dev_info->gd->queue = NULL;
@@ -883,21 +901,38 @@ dcssblk_make_request(struct request_queue *q, struct bio *bio)
}
static long
-dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
+__dcssblk_direct_access(struct dcssblk_dev_info *dev_info, phys_addr_t offset,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ unsigned long dev_sz;
+
+ dev_sz = dev_info->end - dev_info->start;
+ *kaddr = (void *) dev_info->start + offset;
+ *pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), PFN_DEV);
+
+ return dev_sz - offset;
+}
+
+static long
+dcssblk_blk_direct_access(struct block_device *bdev, sector_t secnum,
void **kaddr, pfn_t *pfn, long size)
{
struct dcssblk_dev_info *dev_info;
- unsigned long offset, dev_sz;
dev_info = bdev->bd_disk->private_data;
if (!dev_info)
return -ENODEV;
- dev_sz = dev_info->end - dev_info->start;
- offset = secnum * 512;
- *kaddr = (void *) dev_info->start + offset;
- *pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), PFN_DEV);
+ return __dcssblk_direct_access(dev_info, secnum * 512, kaddr, pfn,
+ size);
+}
- return dev_sz - offset;
+static long
+dcssblk_dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ struct dcssblk_dev_info *dev_info = dax_inode_get_private(dax_inode);
+
+ return __dcssblk_direct_access(dev_info, dev_addr, kaddr, pfn, size);
}
static void
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 08/17] dcssblk: add dax_operations support
@ 2017-01-28 8:36 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
Setup a dax_inode to have the same lifetime as the dcssblk block device
and add a ->direct_access() method that is equivalent to
dcssblk_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old dcssblk_direct_access() will be removed.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/s390/block/Kconfig | 1 +
drivers/s390/block/dcssblk.c | 53 +++++++++++++++++++++++++++++++++++-------
2 files changed, 45 insertions(+), 9 deletions(-)
diff --git a/drivers/s390/block/Kconfig b/drivers/s390/block/Kconfig
index 4a3b62326183..0acb8c2f9475 100644
--- a/drivers/s390/block/Kconfig
+++ b/drivers/s390/block/Kconfig
@@ -14,6 +14,7 @@ config BLK_DEV_XPRAM
config DCSSBLK
def_tristate m
+ select DAX
prompt "DCSSBLK support"
depends on S390 && BLOCK
help
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 9d66b4fb174b..67b0885b4d12 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -18,6 +18,7 @@
#include <linux/interrupt.h>
#include <linux/platform_device.h>
#include <linux/pfn_t.h>
+#include <linux/dax.h>
#include <asm/extmem.h>
#include <asm/io.h>
@@ -30,8 +31,10 @@ static int dcssblk_open(struct block_device *bdev, fmode_t mode);
static void dcssblk_release(struct gendisk *disk, fmode_t mode);
static blk_qc_t dcssblk_make_request(struct request_queue *q,
struct bio *bio);
-static long dcssblk_direct_access(struct block_device *bdev, sector_t secnum,
+static long dcssblk_blk_direct_access(struct block_device *bdev, sector_t secnum,
void **kaddr, pfn_t *pfn, long size);
+static long dcssblk_dax_direct_access(struct dax_inode *dax_inode,
+ phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size);
static char dcssblk_segments[DCSSBLK_PARM_LEN] = "\0";
@@ -40,7 +43,11 @@ static const struct block_device_operations dcssblk_devops = {
.owner = THIS_MODULE,
.open = dcssblk_open,
.release = dcssblk_release,
- .direct_access = dcssblk_direct_access,
+ .direct_access = dcssblk_blk_direct_access,
+};
+
+static const struct dax_operations dcssblk_dax_ops = {
+ .direct_access = dcssblk_dax_direct_access,
};
struct dcssblk_dev_info {
@@ -57,6 +64,7 @@ struct dcssblk_dev_info {
struct request_queue *dcssblk_queue;
int num_of_segments;
struct list_head seg_list;
+ struct dax_inode *dax_inode;
};
struct segment_info {
@@ -389,6 +397,8 @@ dcssblk_shared_store(struct device *dev, struct device_attribute *attr, const ch
}
list_del(&dev_info->lh);
+ kill_dax_inode(dev_info->dax_inode);
+ put_dax_inode(dev_info->dax_inode);
del_gendisk(dev_info->gd);
blk_cleanup_queue(dev_info->dcssblk_queue);
dev_info->gd->queue = NULL;
@@ -525,6 +535,7 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
int rc, i, j, num_of_segments;
struct dcssblk_dev_info *dev_info;
struct segment_info *seg_info, *temp;
+ struct dax_inode *dax_inode;
char *local_buf;
unsigned long seg_byte_size;
@@ -654,6 +665,11 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
if (rc)
goto put_dev;
+ dax_inode = alloc_dax_inode(dev_info, dev_info->gd->disk_name,
+ &dcssblk_dax_ops);
+ if (!dax_inode)
+ goto put_dev;
+
get_device(&dev_info->dev);
device_add_disk(&dev_info->dev, dev_info->gd);
@@ -752,6 +768,8 @@ dcssblk_remove_store(struct device *dev, struct device_attribute *attr, const ch
}
list_del(&dev_info->lh);
+ kill_dax_inode(dev_info->dax_inode);
+ put_dax_inode(dev_info->dax_inode);
del_gendisk(dev_info->gd);
blk_cleanup_queue(dev_info->dcssblk_queue);
dev_info->gd->queue = NULL;
@@ -883,21 +901,38 @@ dcssblk_make_request(struct request_queue *q, struct bio *bio)
}
static long
-dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
+__dcssblk_direct_access(struct dcssblk_dev_info *dev_info, phys_addr_t offset,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ unsigned long dev_sz;
+
+ dev_sz = dev_info->end - dev_info->start;
+ *kaddr = (void *) dev_info->start + offset;
+ *pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), PFN_DEV);
+
+ return dev_sz - offset;
+}
+
+static long
+dcssblk_blk_direct_access(struct block_device *bdev, sector_t secnum,
void **kaddr, pfn_t *pfn, long size)
{
struct dcssblk_dev_info *dev_info;
- unsigned long offset, dev_sz;
dev_info = bdev->bd_disk->private_data;
if (!dev_info)
return -ENODEV;
- dev_sz = dev_info->end - dev_info->start;
- offset = secnum * 512;
- *kaddr = (void *) dev_info->start + offset;
- *pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), PFN_DEV);
+ return __dcssblk_direct_access(dev_info, secnum * 512, kaddr, pfn,
+ size);
+}
- return dev_sz - offset;
+static long
+dcssblk_dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ struct dcssblk_dev_info *dev_info = dax_inode_get_private(dax_inode);
+
+ return __dcssblk_direct_access(dev_info, dev_addr, kaddr, pfn, size);
}
static void
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 09/17] block: kill bdev_dax_capable()
2017-01-28 8:36 ` Dan Williams
@ 2017-01-28 8:36 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
This is leftover dead code that has since been replaced by
bdev_dax_supported().
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
fs/block_dev.c | 24 ------------------------
include/linux/blkdev.h | 1 -
2 files changed, 25 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 601b71b76d7f..edb1d2b16b8f 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -807,30 +807,6 @@ int bdev_dax_supported(struct super_block *sb, int blocksize)
}
EXPORT_SYMBOL_GPL(bdev_dax_supported);
-/**
- * bdev_dax_capable() - Return if the raw device is capable for dax
- * @bdev: The device for raw block device access
- */
-bool bdev_dax_capable(struct block_device *bdev)
-{
- struct blk_dax_ctl dax = {
- .size = PAGE_SIZE,
- };
-
- if (!IS_ENABLED(CONFIG_FS_DAX))
- return false;
-
- dax.sector = 0;
- if (bdev_direct_access(bdev, &dax) < 0)
- return false;
-
- dax.sector = bdev->bd_part->nr_sects - (PAGE_SIZE / 512);
- if (bdev_direct_access(bdev, &dax) < 0)
- return false;
-
- return true;
-}
-
/*
* pseudo-fs
*/
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3c0ff78b1219..5e7706f7d533 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1904,7 +1904,6 @@ extern int bdev_write_page(struct block_device *, sector_t, struct page *,
struct writeback_control *);
extern long bdev_direct_access(struct block_device *, struct blk_dax_ctl *);
extern int bdev_dax_supported(struct super_block *, int);
-extern bool bdev_dax_capable(struct block_device *);
#else /* CONFIG_BLOCK */
struct block_device;
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 09/17] block: kill bdev_dax_capable()
@ 2017-01-28 8:36 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
This is leftover dead code that has since been replaced by
bdev_dax_supported().
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
fs/block_dev.c | 24 ------------------------
include/linux/blkdev.h | 1 -
2 files changed, 25 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 601b71b76d7f..edb1d2b16b8f 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -807,30 +807,6 @@ int bdev_dax_supported(struct super_block *sb, int blocksize)
}
EXPORT_SYMBOL_GPL(bdev_dax_supported);
-/**
- * bdev_dax_capable() - Return if the raw device is capable for dax
- * @bdev: The device for raw block device access
- */
-bool bdev_dax_capable(struct block_device *bdev)
-{
- struct blk_dax_ctl dax = {
- .size = PAGE_SIZE,
- };
-
- if (!IS_ENABLED(CONFIG_FS_DAX))
- return false;
-
- dax.sector = 0;
- if (bdev_direct_access(bdev, &dax) < 0)
- return false;
-
- dax.sector = bdev->bd_part->nr_sects - (PAGE_SIZE / 512);
- if (bdev_direct_access(bdev, &dax) < 0)
- return false;
-
- return true;
-}
-
/*
* pseudo-fs
*/
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3c0ff78b1219..5e7706f7d533 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1904,7 +1904,6 @@ extern int bdev_write_page(struct block_device *, sector_t, struct page *,
struct writeback_control *);
extern long bdev_direct_access(struct block_device *, struct blk_dax_ctl *);
extern int bdev_dax_supported(struct super_block *, int);
-extern bool bdev_dax_capable(struct block_device *);
#else /* CONFIG_BLOCK */
struct block_device;
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
2017-01-28 8:36 ` Dan Williams
@ 2017-01-28 8:36 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
Provide a replacement for bdev_direct_access() that uses
dax_operations.direct_access() instead of
block_device_operations.direct_access(). Once all consumers of the old
api have been converted bdev_direct_access() will be deleted.
Given that block device partitioning decisions can cause dax page
alignment constraints to be violated we still need to validate the
block_device before calling the dax ->direct_access method.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
block/Kconfig | 1 +
drivers/dax/super.c | 33 +++++++++++++++++++++++++++++++++
fs/block_dev.c | 28 ++++++++++++++++++++++++++++
include/linux/blkdev.h | 3 +++
include/linux/dax.h | 2 ++
5 files changed, 67 insertions(+)
diff --git a/block/Kconfig b/block/Kconfig
index 8bf114a3858a..9be785173280 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -6,6 +6,7 @@ menuconfig BLOCK
default y
select SBITMAP
select SRCU
+ select DAX
help
Provide block layer support for the kernel.
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index eb844ffea3cf..ab5b082df5dd 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -65,6 +65,39 @@ struct dax_inode {
const struct dax_operations *ops;
};
+long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ long avail;
+
+ /*
+ * The device driver is allowed to sleep, in order to make the
+ * memory directly accessible.
+ */
+ might_sleep();
+
+ if (!dax_inode)
+ return -EOPNOTSUPP;
+
+ if (!dax_inode_alive(dax_inode))
+ return -ENXIO;
+
+ if (size < 0)
+ return size;
+
+ if (dev_addr % PAGE_SIZE)
+ return -EINVAL;
+
+ avail = dax_inode->ops->direct_access(dax_inode, dev_addr, kaddr, pfn,
+ size);
+ if (!avail)
+ return -ERANGE;
+ if (avail > 0 && avail & ~PAGE_MASK)
+ return -ENXIO;
+ return min(avail, size);
+}
+EXPORT_SYMBOL_GPL(dax_direct_access);
+
bool dax_inode_alive(struct dax_inode *dax_inode)
{
lockdep_assert_held(&dax_srcu);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index edb1d2b16b8f..bf4b51a3a412 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -18,6 +18,7 @@
#include <linux/module.h>
#include <linux/blkpg.h>
#include <linux/magic.h>
+#include <linux/dax.h>
#include <linux/buffer_head.h>
#include <linux/swap.h>
#include <linux/pagevec.h>
@@ -763,6 +764,33 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
EXPORT_SYMBOL_GPL(bdev_direct_access);
/**
+ * bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address
+ * @bdev: host block device for @dax_inode
+ * @dax_inode: interface data and operations for a memory device
+ * @dax: control and output parameters for ->direct_access
+ *
+ * Return: negative errno if an error occurs, otherwise the number of bytes
+ * accessible at this address.
+ *
+ * Locking: must be called with dax_read_lock() held
+ */
+long bdev_dax_direct_access(struct block_device *bdev,
+ struct dax_inode *dax_inode, struct blk_dax_ctl *dax)
+{
+ sector_t sector = dax->sector;
+
+ if (!blk_queue_dax(bdev->bd_queue))
+ return -EOPNOTSUPP;
+ if ((sector + DIV_ROUND_UP(dax->size, 512))
+ > part_nr_sects_read(bdev->bd_part))
+ return -ERANGE;
+ sector += get_start_sect(bdev);
+ return dax_direct_access(dax_inode, sector * 512, &dax->addr,
+ &dax->pfn, dax->size);
+}
+EXPORT_SYMBOL_GPL(bdev_dax_direct_access);
+
+/**
* bdev_dax_supported() - Check if the device supports dax for filesystem
* @sb: The superblock of the device
* @blocksize: The block size of the device
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5e7706f7d533..3b3c5ce376fd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1903,6 +1903,9 @@ extern int bdev_read_page(struct block_device *, sector_t, struct page *);
extern int bdev_write_page(struct block_device *, sector_t, struct page *,
struct writeback_control *);
extern long bdev_direct_access(struct block_device *, struct blk_dax_ctl *);
+struct dax_inode;
+extern long bdev_dax_direct_access(struct block_device *bdev,
+ struct dax_inode *dax_inode, struct blk_dax_ctl *dax);
extern int bdev_dax_supported(struct super_block *, int);
#else /* CONFIG_BLOCK */
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 5aa620e8e5a2..2ef8e18e2587 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -22,6 +22,8 @@ void *dax_inode_get_private(struct dax_inode *dax_inode);
void put_dax_inode(struct dax_inode *dax_inode);
bool dax_inode_alive(struct dax_inode *dax_inode);
void kill_dax_inode(struct dax_inode *dax_inode);
+long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size);
/*
* We use lowest available bit in exceptional entry for locking, one bit for
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
@ 2017-01-28 8:36 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:36 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
Provide a replacement for bdev_direct_access() that uses
dax_operations.direct_access() instead of
block_device_operations.direct_access(). Once all consumers of the old
api have been converted bdev_direct_access() will be deleted.
Given that block device partitioning decisions can cause dax page
alignment constraints to be violated we still need to validate the
block_device before calling the dax ->direct_access method.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
block/Kconfig | 1 +
drivers/dax/super.c | 33 +++++++++++++++++++++++++++++++++
fs/block_dev.c | 28 ++++++++++++++++++++++++++++
include/linux/blkdev.h | 3 +++
include/linux/dax.h | 2 ++
5 files changed, 67 insertions(+)
diff --git a/block/Kconfig b/block/Kconfig
index 8bf114a3858a..9be785173280 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -6,6 +6,7 @@ menuconfig BLOCK
default y
select SBITMAP
select SRCU
+ select DAX
help
Provide block layer support for the kernel.
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index eb844ffea3cf..ab5b082df5dd 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -65,6 +65,39 @@ struct dax_inode {
const struct dax_operations *ops;
};
+long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ long avail;
+
+ /*
+ * The device driver is allowed to sleep, in order to make the
+ * memory directly accessible.
+ */
+ might_sleep();
+
+ if (!dax_inode)
+ return -EOPNOTSUPP;
+
+ if (!dax_inode_alive(dax_inode))
+ return -ENXIO;
+
+ if (size < 0)
+ return size;
+
+ if (dev_addr % PAGE_SIZE)
+ return -EINVAL;
+
+ avail = dax_inode->ops->direct_access(dax_inode, dev_addr, kaddr, pfn,
+ size);
+ if (!avail)
+ return -ERANGE;
+ if (avail > 0 && avail & ~PAGE_MASK)
+ return -ENXIO;
+ return min(avail, size);
+}
+EXPORT_SYMBOL_GPL(dax_direct_access);
+
bool dax_inode_alive(struct dax_inode *dax_inode)
{
lockdep_assert_held(&dax_srcu);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index edb1d2b16b8f..bf4b51a3a412 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -18,6 +18,7 @@
#include <linux/module.h>
#include <linux/blkpg.h>
#include <linux/magic.h>
+#include <linux/dax.h>
#include <linux/buffer_head.h>
#include <linux/swap.h>
#include <linux/pagevec.h>
@@ -763,6 +764,33 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
EXPORT_SYMBOL_GPL(bdev_direct_access);
/**
+ * bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address
+ * @bdev: host block device for @dax_inode
+ * @dax_inode: interface data and operations for a memory device
+ * @dax: control and output parameters for ->direct_access
+ *
+ * Return: negative errno if an error occurs, otherwise the number of bytes
+ * accessible at this address.
+ *
+ * Locking: must be called with dax_read_lock() held
+ */
+long bdev_dax_direct_access(struct block_device *bdev,
+ struct dax_inode *dax_inode, struct blk_dax_ctl *dax)
+{
+ sector_t sector = dax->sector;
+
+ if (!blk_queue_dax(bdev->bd_queue))
+ return -EOPNOTSUPP;
+ if ((sector + DIV_ROUND_UP(dax->size, 512))
+ > part_nr_sects_read(bdev->bd_part))
+ return -ERANGE;
+ sector += get_start_sect(bdev);
+ return dax_direct_access(dax_inode, sector * 512, &dax->addr,
+ &dax->pfn, dax->size);
+}
+EXPORT_SYMBOL_GPL(bdev_dax_direct_access);
+
+/**
* bdev_dax_supported() - Check if the device supports dax for filesystem
* @sb: The superblock of the device
* @blocksize: The block size of the device
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5e7706f7d533..3b3c5ce376fd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1903,6 +1903,9 @@ extern int bdev_read_page(struct block_device *, sector_t, struct page *);
extern int bdev_write_page(struct block_device *, sector_t, struct page *,
struct writeback_control *);
extern long bdev_direct_access(struct block_device *, struct blk_dax_ctl *);
+struct dax_inode;
+extern long bdev_dax_direct_access(struct block_device *bdev,
+ struct dax_inode *dax_inode, struct blk_dax_ctl *dax);
extern int bdev_dax_supported(struct super_block *, int);
#else /* CONFIG_BLOCK */
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 5aa620e8e5a2..2ef8e18e2587 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -22,6 +22,8 @@ void *dax_inode_get_private(struct dax_inode *dax_inode);
void put_dax_inode(struct dax_inode *dax_inode);
bool dax_inode_alive(struct dax_inode *dax_inode);
void kill_dax_inode(struct dax_inode *dax_inode);
+long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size);
/*
* We use lowest available bit in exceptional entry for locking, one bit for
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 11/17] dm: add dax_operations support (producer)
2017-01-28 8:36 ` Dan Williams
@ 2017-01-28 8:37 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:37 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
Setup a dax_inode to have the same lifetime as the dm block device and
add a ->direct_access() method that is equivalent to
dm_blk_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old dm_blk_direct_access() will be removed.
This enabling is only for the top-level dm representation to upper
layers. Sub-sequent patches are needed to convert the bottom layer
interface to backing devices.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/md/Kconfig | 1 +
drivers/md/dm-core.h | 3 +++
drivers/md/dm.c | 42 +++++++++++++++++++++++++++++++++++++++---
3 files changed, 43 insertions(+), 3 deletions(-)
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index b7767da50c26..1de8372d9459 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -200,6 +200,7 @@ config BLK_DEV_DM_BUILTIN
config BLK_DEV_DM
tristate "Device mapper support"
select BLK_DEV_DM_BUILTIN
+ select DAX
---help---
Device-mapper is a low level volume manager. It works by allowing
people to specify mappings for ranges of logical sectors. Various
diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 40ceba1fe8be..f6eb8d8db646 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -24,6 +24,8 @@ struct dm_kobject_holder {
struct completion completion;
};
+struct dax_inode;
+
/*
* DM core internal structure that used directly by dm.c and dm-rq.c
* DM targets must _not_ deference a mapped_device to directly access its members!
@@ -58,6 +60,7 @@ struct mapped_device {
struct target_type *immutable_target_type;
struct gendisk *disk;
+ struct dax_inode *dax_inode;
char name[16];
void *interface_ptr;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index db934b1dba9d..1b3d9253e92c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -15,6 +15,7 @@
#include <linux/blkpg.h>
#include <linux/bio.h>
#include <linux/mempool.h>
+#include <linux/dax.h>
#include <linux/slab.h>
#include <linux/idr.h>
#include <linux/hdreg.h>
@@ -905,10 +906,10 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
}
EXPORT_SYMBOL_GPL(dm_set_target_max_io_len);
-static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
+static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
{
- struct mapped_device *md = bdev->bd_disk->private_data;
+ sector_t sector = dev_addr >> SECTOR_SHIFT;
struct dm_table *map;
struct dm_target *ti;
int srcu_idx;
@@ -932,6 +933,23 @@ static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
return min(ret, size);
}
+static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ struct mapped_device *md = bdev->bd_disk->private_data;
+
+ return __dm_direct_access(md, sector << SECTOR_SHIFT, kaddr, pfn, size);
+}
+
+static long dm_dax_direct_access(struct dax_inode *dax_inode,
+ phys_addr_t dev_addr, void **kaddr, pfn_t *pfn,
+ long size)
+{
+ struct mapped_device *md = dax_inode_get_private(dax_inode);
+
+ return __dm_direct_access(md, dev_addr, kaddr, pfn, size);
+}
+
/*
* A target may call dm_accept_partial_bio only from the map routine. It is
* allowed for all bio types except REQ_PREFLUSH.
@@ -1376,6 +1394,7 @@ static int next_free_minor(int *minor)
}
static const struct block_device_operations dm_blk_dops;
+static const struct dax_operations dm_dax_ops;
static void dm_wq_work(struct work_struct *work);
@@ -1423,6 +1442,12 @@ static void cleanup_mapped_device(struct mapped_device *md)
if (md->bs)
bioset_free(md->bs);
+ if (md->dax_inode) {
+ kill_dax_inode(md->dax_inode);
+ put_dax_inode(md->dax_inode);
+ md->dax_inode = NULL;
+ }
+
if (md->disk) {
spin_lock(&_minor_lock);
md->disk->private_data = NULL;
@@ -1450,6 +1475,7 @@ static void cleanup_mapped_device(struct mapped_device *md)
static struct mapped_device *alloc_dev(int minor)
{
int r, numa_node_id = dm_get_numa_node();
+ struct dax_inode *dax_inode;
struct mapped_device *md;
void *old_md;
@@ -1514,6 +1540,12 @@ static struct mapped_device *alloc_dev(int minor)
md->disk->queue = md->queue;
md->disk->private_data = md;
sprintf(md->disk->disk_name, "dm-%d", minor);
+
+ dax_inode = alloc_dax_inode(md, md->disk->disk_name, &dm_dax_ops);
+ if (!dax_inode)
+ goto bad;
+ md->dax_inode = dax_inode;
+
add_disk(md->disk);
format_dev_t(md->name, MKDEV(_major, minor));
@@ -2735,6 +2767,10 @@ static const struct block_device_operations dm_blk_dops = {
.owner = THIS_MODULE
};
+static const struct dax_operations dm_dax_ops = {
+ .direct_access = dm_dax_direct_access,
+};
+
/*
* module hooks
*/
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 11/17] dm: add dax_operations support (producer)
@ 2017-01-28 8:37 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:37 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
Setup a dax_inode to have the same lifetime as the dm block device and
add a ->direct_access() method that is equivalent to
dm_blk_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old dm_blk_direct_access() will be removed.
This enabling is only for the top-level dm representation to upper
layers. Sub-sequent patches are needed to convert the bottom layer
interface to backing devices.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/md/Kconfig | 1 +
drivers/md/dm-core.h | 3 +++
drivers/md/dm.c | 42 +++++++++++++++++++++++++++++++++++++++---
3 files changed, 43 insertions(+), 3 deletions(-)
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index b7767da50c26..1de8372d9459 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -200,6 +200,7 @@ config BLK_DEV_DM_BUILTIN
config BLK_DEV_DM
tristate "Device mapper support"
select BLK_DEV_DM_BUILTIN
+ select DAX
---help---
Device-mapper is a low level volume manager. It works by allowing
people to specify mappings for ranges of logical sectors. Various
diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 40ceba1fe8be..f6eb8d8db646 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -24,6 +24,8 @@ struct dm_kobject_holder {
struct completion completion;
};
+struct dax_inode;
+
/*
* DM core internal structure that used directly by dm.c and dm-rq.c
* DM targets must _not_ deference a mapped_device to directly access its members!
@@ -58,6 +60,7 @@ struct mapped_device {
struct target_type *immutable_target_type;
struct gendisk *disk;
+ struct dax_inode *dax_inode;
char name[16];
void *interface_ptr;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index db934b1dba9d..1b3d9253e92c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -15,6 +15,7 @@
#include <linux/blkpg.h>
#include <linux/bio.h>
#include <linux/mempool.h>
+#include <linux/dax.h>
#include <linux/slab.h>
#include <linux/idr.h>
#include <linux/hdreg.h>
@@ -905,10 +906,10 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
}
EXPORT_SYMBOL_GPL(dm_set_target_max_io_len);
-static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
+static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
{
- struct mapped_device *md = bdev->bd_disk->private_data;
+ sector_t sector = dev_addr >> SECTOR_SHIFT;
struct dm_table *map;
struct dm_target *ti;
int srcu_idx;
@@ -932,6 +933,23 @@ static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
return min(ret, size);
}
+static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ struct mapped_device *md = bdev->bd_disk->private_data;
+
+ return __dm_direct_access(md, sector << SECTOR_SHIFT, kaddr, pfn, size);
+}
+
+static long dm_dax_direct_access(struct dax_inode *dax_inode,
+ phys_addr_t dev_addr, void **kaddr, pfn_t *pfn,
+ long size)
+{
+ struct mapped_device *md = dax_inode_get_private(dax_inode);
+
+ return __dm_direct_access(md, dev_addr, kaddr, pfn, size);
+}
+
/*
* A target may call dm_accept_partial_bio only from the map routine. It is
* allowed for all bio types except REQ_PREFLUSH.
@@ -1376,6 +1394,7 @@ static int next_free_minor(int *minor)
}
static const struct block_device_operations dm_blk_dops;
+static const struct dax_operations dm_dax_ops;
static void dm_wq_work(struct work_struct *work);
@@ -1423,6 +1442,12 @@ static void cleanup_mapped_device(struct mapped_device *md)
if (md->bs)
bioset_free(md->bs);
+ if (md->dax_inode) {
+ kill_dax_inode(md->dax_inode);
+ put_dax_inode(md->dax_inode);
+ md->dax_inode = NULL;
+ }
+
if (md->disk) {
spin_lock(&_minor_lock);
md->disk->private_data = NULL;
@@ -1450,6 +1475,7 @@ static void cleanup_mapped_device(struct mapped_device *md)
static struct mapped_device *alloc_dev(int minor)
{
int r, numa_node_id = dm_get_numa_node();
+ struct dax_inode *dax_inode;
struct mapped_device *md;
void *old_md;
@@ -1514,6 +1540,12 @@ static struct mapped_device *alloc_dev(int minor)
md->disk->queue = md->queue;
md->disk->private_data = md;
sprintf(md->disk->disk_name, "dm-%d", minor);
+
+ dax_inode = alloc_dax_inode(md, md->disk->disk_name, &dm_dax_ops);
+ if (!dax_inode)
+ goto bad;
+ md->dax_inode = dax_inode;
+
add_disk(md->disk);
format_dev_t(md->name, MKDEV(_major, minor));
@@ -2735,6 +2767,10 @@ static const struct block_device_operations dm_blk_dops = {
.owner = THIS_MODULE
};
+static const struct dax_operations dm_dax_ops = {
+ .direct_access = dm_dax_direct_access,
+};
+
/*
* module hooks
*/
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 12/17] dm: add dax_operations support (consumer)
2017-01-28 8:36 ` Dan Williams
@ 2017-01-28 8:37 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:37 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
Arrange for dm to lookup the dax services available from member
devices. Update the dax-capable targets, linear and stripe, to route dax
operations to the underlying device.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/md/dm-linear.c | 24 ++++++++++++++++++++++++
drivers/md/dm-snap.c | 12 ++++++++++++
drivers/md/dm-stripe.c | 30 ++++++++++++++++++++++++++++++
drivers/md/dm-target.c | 11 +++++++++++
drivers/md/dm.c | 16 ++++++++++++----
include/linux/device-mapper.h | 7 +++++++
6 files changed, 96 insertions(+), 4 deletions(-)
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 4788b0b989a9..e91ca8089333 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -159,6 +159,29 @@ static long linear_direct_access(struct dm_target *ti, sector_t sector,
return ret;
}
+static long linear_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ struct linear_c *lc = ti->private;
+ struct block_device *bdev = lc->dev->bdev;
+ struct dax_inode *dax_inode = lc->dev->dax_inode;
+ struct blk_dax_ctl dax = {
+ .sector = linear_map_sector(ti, dev_addr >> SECTOR_SHIFT),
+ .size = size,
+ };
+ long ret;
+
+ ret = bdev_dax_direct_access(bdev, dax_inode, &dax);
+ *kaddr = dax.addr;
+ *pfn = dax.pfn;
+
+ return ret;
+}
+
+static const struct dm_dax_operations linear_dax_ops = {
+ .dm_direct_access = linear_dax_direct_access,
+};
+
static struct target_type linear_target = {
.name = "linear",
.version = {1, 3, 0},
@@ -170,6 +193,7 @@ static struct target_type linear_target = {
.prepare_ioctl = linear_prepare_ioctl,
.iterate_devices = linear_iterate_devices,
.direct_access = linear_direct_access,
+ .dax_ops = &linear_dax_ops,
};
int __init dm_linear_init(void)
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index c65feeada864..1990e3bd6958 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -2309,6 +2309,13 @@ static long origin_direct_access(struct dm_target *ti, sector_t sector,
return -EIO;
}
+static long origin_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ DMWARN("device does not support dax.");
+ return -EIO;
+}
+
/*
* Set the target "max_io_len" field to the minimum of all the snapshots'
* chunk sizes.
@@ -2357,6 +2364,10 @@ static int origin_iterate_devices(struct dm_target *ti,
return fn(ti, o->dev, 0, ti->len, data);
}
+static const struct dm_dax_operations origin_dax_ops = {
+ .dm_direct_access = origin_dax_direct_access,
+};
+
static struct target_type origin_target = {
.name = "snapshot-origin",
.version = {1, 9, 0},
@@ -2369,6 +2380,7 @@ static struct target_type origin_target = {
.status = origin_status,
.iterate_devices = origin_iterate_devices,
.direct_access = origin_direct_access,
+ .dax_ops = &origin_dax_ops,
};
static struct target_type snapshot_target = {
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 28193a57bf47..47fb56a6184a 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -331,6 +331,31 @@ static long stripe_direct_access(struct dm_target *ti, sector_t sector,
return ret;
}
+static long stripe_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ struct stripe_c *sc = ti->private;
+ uint32_t stripe;
+ struct block_device *bdev;
+ struct dax_inode *dax_inode;
+ struct blk_dax_ctl dax = {
+ .size = size,
+ };
+ long ret;
+
+ stripe_map_sector(sc, dev_addr >> SECTOR_SHIFT, &stripe, &dax.sector);
+
+ dax.sector += sc->stripe[stripe].physical_start;
+ bdev = sc->stripe[stripe].dev->bdev;
+ dax_inode = sc->stripe[stripe].dev->dax_inode;
+
+ ret = bdev_dax_direct_access(bdev, dax_inode, &dax);
+ *kaddr = dax.addr;
+ *pfn = dax.pfn;
+
+ return ret;
+}
+
/*
* Stripe status:
*
@@ -437,6 +462,10 @@ static void stripe_io_hints(struct dm_target *ti,
blk_limits_io_opt(limits, chunk_size * sc->stripes);
}
+static const struct dm_dax_operations stripe_dax_ops = {
+ .dm_direct_access = stripe_dax_direct_access,
+};
+
static struct target_type stripe_target = {
.name = "striped",
.version = {1, 6, 0},
@@ -449,6 +478,7 @@ static struct target_type stripe_target = {
.iterate_devices = stripe_iterate_devices,
.io_hints = stripe_io_hints,
.direct_access = stripe_direct_access,
+ .dax_ops = &stripe_dax_ops,
};
int __init dm_stripe_init(void)
diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c
index 710ae28fd618..ab072f53cf24 100644
--- a/drivers/md/dm-target.c
+++ b/drivers/md/dm-target.c
@@ -154,6 +154,16 @@ static long io_err_direct_access(struct dm_target *ti, sector_t sector,
return -EIO;
}
+static long io_err_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ return -EIO;
+}
+
+static const struct dm_dax_operations err_dax_ops = {
+ .dm_direct_access = io_err_dax_direct_access,
+};
+
static struct target_type error_target = {
.name = "error",
.version = {1, 5, 0},
@@ -165,6 +175,7 @@ static struct target_type error_target = {
.clone_and_map_rq = io_err_clone_and_map_rq,
.release_clone_rq = io_err_release_clone_rq,
.direct_access = io_err_direct_access,
+ .dax_ops = &err_dax_ops,
};
int __init dm_target_init(void)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 1b3d9253e92c..5c5eeda0eb0a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -627,6 +627,7 @@ static int open_table_device(struct table_device *td, dev_t dev,
}
td->dm_dev.bdev = bdev;
+ td->dm_dev.dax_inode = dax_get_by_host(bdev->bd_disk->disk_name);
return 0;
}
@@ -640,7 +641,9 @@ static void close_table_device(struct table_device *td, struct mapped_device *md
bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
+ put_dax_inode(td->dm_dev.dax_inode);
td->dm_dev.bdev = NULL;
+ td->dm_dev.dax_inode = NULL;
}
static struct table_device *find_table_device(struct list_head *l, dev_t dev,
@@ -907,7 +910,7 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
EXPORT_SYMBOL_GPL(dm_set_target_max_io_len);
static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
- void **kaddr, pfn_t *pfn, long size)
+ void **kaddr, pfn_t *pfn, long size, bool blk)
{
sector_t sector = dev_addr >> SECTOR_SHIFT;
struct dm_table *map;
@@ -926,8 +929,11 @@ static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
len = max_io_len(sector, ti) << SECTOR_SHIFT;
size = min(len, size);
- if (ti->type->direct_access)
+ if (blk && ti->type->direct_access)
ret = ti->type->direct_access(ti, sector, kaddr, pfn, size);
+ else if (ti->type->dax_ops)
+ ret = ti->type->dax_ops->dm_direct_access(ti, dev_addr, kaddr,
+ pfn, size);
out:
dm_put_live_table(md, srcu_idx);
return min(ret, size);
@@ -938,7 +944,8 @@ static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
{
struct mapped_device *md = bdev->bd_disk->private_data;
- return __dm_direct_access(md, sector << SECTOR_SHIFT, kaddr, pfn, size);
+ return __dm_direct_access(md, sector << SECTOR_SHIFT, kaddr, pfn, size,
+ true);
}
static long dm_dax_direct_access(struct dax_inode *dax_inode,
@@ -947,7 +954,8 @@ static long dm_dax_direct_access(struct dax_inode *dax_inode,
{
struct mapped_device *md = dax_inode_get_private(dax_inode);
- return __dm_direct_access(md, dev_addr, kaddr, pfn, size);
+ return __dm_direct_access(md, dev_addr, kaddr, pfn, size,
+ false);
}
/*
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index ef7962e84444..1b64f412bb45 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -137,12 +137,18 @@ void dm_error(const char *message);
struct dm_dev {
struct block_device *bdev;
+ struct dax_inode *dax_inode;
fmode_t mode;
char name[16];
};
dev_t dm_get_dev_t(const char *path);
+struct dm_dax_operations {
+ long (*dm_direct_access)(struct dm_target *ti, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size);
+};
+
/*
* Constructors should call these functions to ensure destination devices
* are opened/closed correctly.
@@ -180,6 +186,7 @@ struct target_type {
dm_iterate_devices_fn iterate_devices;
dm_io_hints_fn io_hints;
dm_direct_access_fn direct_access;
+ const struct dm_dax_operations *dax_ops;
/* For internal device-mapper use. */
struct list_head list;
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 12/17] dm: add dax_operations support (consumer)
@ 2017-01-28 8:37 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:37 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
Arrange for dm to lookup the dax services available from member
devices. Update the dax-capable targets, linear and stripe, to route dax
operations to the underlying device.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/md/dm-linear.c | 24 ++++++++++++++++++++++++
drivers/md/dm-snap.c | 12 ++++++++++++
drivers/md/dm-stripe.c | 30 ++++++++++++++++++++++++++++++
drivers/md/dm-target.c | 11 +++++++++++
drivers/md/dm.c | 16 ++++++++++++----
include/linux/device-mapper.h | 7 +++++++
6 files changed, 96 insertions(+), 4 deletions(-)
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 4788b0b989a9..e91ca8089333 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -159,6 +159,29 @@ static long linear_direct_access(struct dm_target *ti, sector_t sector,
return ret;
}
+static long linear_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ struct linear_c *lc = ti->private;
+ struct block_device *bdev = lc->dev->bdev;
+ struct dax_inode *dax_inode = lc->dev->dax_inode;
+ struct blk_dax_ctl dax = {
+ .sector = linear_map_sector(ti, dev_addr >> SECTOR_SHIFT),
+ .size = size,
+ };
+ long ret;
+
+ ret = bdev_dax_direct_access(bdev, dax_inode, &dax);
+ *kaddr = dax.addr;
+ *pfn = dax.pfn;
+
+ return ret;
+}
+
+static const struct dm_dax_operations linear_dax_ops = {
+ .dm_direct_access = linear_dax_direct_access,
+};
+
static struct target_type linear_target = {
.name = "linear",
.version = {1, 3, 0},
@@ -170,6 +193,7 @@ static struct target_type linear_target = {
.prepare_ioctl = linear_prepare_ioctl,
.iterate_devices = linear_iterate_devices,
.direct_access = linear_direct_access,
+ .dax_ops = &linear_dax_ops,
};
int __init dm_linear_init(void)
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index c65feeada864..1990e3bd6958 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -2309,6 +2309,13 @@ static long origin_direct_access(struct dm_target *ti, sector_t sector,
return -EIO;
}
+static long origin_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ DMWARN("device does not support dax.");
+ return -EIO;
+}
+
/*
* Set the target "max_io_len" field to the minimum of all the snapshots'
* chunk sizes.
@@ -2357,6 +2364,10 @@ static int origin_iterate_devices(struct dm_target *ti,
return fn(ti, o->dev, 0, ti->len, data);
}
+static const struct dm_dax_operations origin_dax_ops = {
+ .dm_direct_access = origin_dax_direct_access,
+};
+
static struct target_type origin_target = {
.name = "snapshot-origin",
.version = {1, 9, 0},
@@ -2369,6 +2380,7 @@ static struct target_type origin_target = {
.status = origin_status,
.iterate_devices = origin_iterate_devices,
.direct_access = origin_direct_access,
+ .dax_ops = &origin_dax_ops,
};
static struct target_type snapshot_target = {
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 28193a57bf47..47fb56a6184a 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -331,6 +331,31 @@ static long stripe_direct_access(struct dm_target *ti, sector_t sector,
return ret;
}
+static long stripe_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ struct stripe_c *sc = ti->private;
+ uint32_t stripe;
+ struct block_device *bdev;
+ struct dax_inode *dax_inode;
+ struct blk_dax_ctl dax = {
+ .size = size,
+ };
+ long ret;
+
+ stripe_map_sector(sc, dev_addr >> SECTOR_SHIFT, &stripe, &dax.sector);
+
+ dax.sector += sc->stripe[stripe].physical_start;
+ bdev = sc->stripe[stripe].dev->bdev;
+ dax_inode = sc->stripe[stripe].dev->dax_inode;
+
+ ret = bdev_dax_direct_access(bdev, dax_inode, &dax);
+ *kaddr = dax.addr;
+ *pfn = dax.pfn;
+
+ return ret;
+}
+
/*
* Stripe status:
*
@@ -437,6 +462,10 @@ static void stripe_io_hints(struct dm_target *ti,
blk_limits_io_opt(limits, chunk_size * sc->stripes);
}
+static const struct dm_dax_operations stripe_dax_ops = {
+ .dm_direct_access = stripe_dax_direct_access,
+};
+
static struct target_type stripe_target = {
.name = "striped",
.version = {1, 6, 0},
@@ -449,6 +478,7 @@ static struct target_type stripe_target = {
.iterate_devices = stripe_iterate_devices,
.io_hints = stripe_io_hints,
.direct_access = stripe_direct_access,
+ .dax_ops = &stripe_dax_ops,
};
int __init dm_stripe_init(void)
diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c
index 710ae28fd618..ab072f53cf24 100644
--- a/drivers/md/dm-target.c
+++ b/drivers/md/dm-target.c
@@ -154,6 +154,16 @@ static long io_err_direct_access(struct dm_target *ti, sector_t sector,
return -EIO;
}
+static long io_err_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size)
+{
+ return -EIO;
+}
+
+static const struct dm_dax_operations err_dax_ops = {
+ .dm_direct_access = io_err_dax_direct_access,
+};
+
static struct target_type error_target = {
.name = "error",
.version = {1, 5, 0},
@@ -165,6 +175,7 @@ static struct target_type error_target = {
.clone_and_map_rq = io_err_clone_and_map_rq,
.release_clone_rq = io_err_release_clone_rq,
.direct_access = io_err_direct_access,
+ .dax_ops = &err_dax_ops,
};
int __init dm_target_init(void)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 1b3d9253e92c..5c5eeda0eb0a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -627,6 +627,7 @@ static int open_table_device(struct table_device *td, dev_t dev,
}
td->dm_dev.bdev = bdev;
+ td->dm_dev.dax_inode = dax_get_by_host(bdev->bd_disk->disk_name);
return 0;
}
@@ -640,7 +641,9 @@ static void close_table_device(struct table_device *td, struct mapped_device *md
bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
+ put_dax_inode(td->dm_dev.dax_inode);
td->dm_dev.bdev = NULL;
+ td->dm_dev.dax_inode = NULL;
}
static struct table_device *find_table_device(struct list_head *l, dev_t dev,
@@ -907,7 +910,7 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
EXPORT_SYMBOL_GPL(dm_set_target_max_io_len);
static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
- void **kaddr, pfn_t *pfn, long size)
+ void **kaddr, pfn_t *pfn, long size, bool blk)
{
sector_t sector = dev_addr >> SECTOR_SHIFT;
struct dm_table *map;
@@ -926,8 +929,11 @@ static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
len = max_io_len(sector, ti) << SECTOR_SHIFT;
size = min(len, size);
- if (ti->type->direct_access)
+ if (blk && ti->type->direct_access)
ret = ti->type->direct_access(ti, sector, kaddr, pfn, size);
+ else if (ti->type->dax_ops)
+ ret = ti->type->dax_ops->dm_direct_access(ti, dev_addr, kaddr,
+ pfn, size);
out:
dm_put_live_table(md, srcu_idx);
return min(ret, size);
@@ -938,7 +944,8 @@ static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
{
struct mapped_device *md = bdev->bd_disk->private_data;
- return __dm_direct_access(md, sector << SECTOR_SHIFT, kaddr, pfn, size);
+ return __dm_direct_access(md, sector << SECTOR_SHIFT, kaddr, pfn, size,
+ true);
}
static long dm_dax_direct_access(struct dax_inode *dax_inode,
@@ -947,7 +954,8 @@ static long dm_dax_direct_access(struct dax_inode *dax_inode,
{
struct mapped_device *md = dax_inode_get_private(dax_inode);
- return __dm_direct_access(md, dev_addr, kaddr, pfn, size);
+ return __dm_direct_access(md, dev_addr, kaddr, pfn, size,
+ false);
}
/*
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index ef7962e84444..1b64f412bb45 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -137,12 +137,18 @@ void dm_error(const char *message);
struct dm_dev {
struct block_device *bdev;
+ struct dax_inode *dax_inode;
fmode_t mode;
char name[16];
};
dev_t dm_get_dev_t(const char *path);
+struct dm_dax_operations {
+ long (*dm_direct_access)(struct dm_target *ti, phys_addr_t dev_addr,
+ void **kaddr, pfn_t *pfn, long size);
+};
+
/*
* Constructors should call these functions to ensure destination devices
* are opened/closed correctly.
@@ -180,6 +186,7 @@ struct target_type {
dm_iterate_devices_fn iterate_devices;
dm_io_hints_fn io_hints;
dm_direct_access_fn direct_access;
+ const struct dm_dax_operations *dax_ops;
/* For internal device-mapper use. */
struct list_head list;
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
2017-01-28 8:36 ` Dan Williams
@ 2017-01-28 8:37 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:37 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
This is in preparation for removing the ->direct_access() method from
block_device_operations.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
fs/block_dev.c | 6 ++++--
fs/super.c | 32 +++++++++++++++++++++++++++++---
include/linux/fs.h | 1 +
3 files changed, 34 insertions(+), 5 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index bf4b51a3a412..a73f2388c515 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -806,14 +806,16 @@ int bdev_dax_supported(struct super_block *sb, int blocksize)
.sector = 0,
.size = PAGE_SIZE,
};
- int err;
+ int err, id;
if (blocksize != PAGE_SIZE) {
vfs_msg(sb, KERN_ERR, "error: unsupported blocksize for dax");
return -EINVAL;
}
- err = bdev_direct_access(sb->s_bdev, &dax);
+ id = dax_read_lock();
+ err = bdev_dax_direct_access(sb->s_bdev, sb->s_dax, &dax);
+ dax_read_unlock(id);
if (err < 0) {
switch (err) {
case -EOPNOTSUPP:
diff --git a/fs/super.c b/fs/super.c
index ea662b0e5e78..5e64d11c46c1 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -26,6 +26,7 @@
#include <linux/mount.h>
#include <linux/security.h>
#include <linux/writeback.h> /* for the emergency remount stuff */
+#include <linux/dax.h>
#include <linux/idr.h>
#include <linux/mutex.h>
#include <linux/backing-dev.h>
@@ -1038,9 +1039,17 @@ struct dentry *mount_ns(struct file_system_type *fs_type,
EXPORT_SYMBOL(mount_ns);
#ifdef CONFIG_BLOCK
+struct mount_bdev_data {
+ struct block_device *bdev;
+ struct dax_inode *dax_inode;
+};
+
static int set_bdev_super(struct super_block *s, void *data)
{
- s->s_bdev = data;
+ struct mount_bdev_data *mb_data = data;
+
+ s->s_bdev = mb_data->bdev;
+ s->s_dax = mb_data->dax_inode;
s->s_dev = s->s_bdev->bd_dev;
/*
@@ -1053,14 +1062,18 @@ static int set_bdev_super(struct super_block *s, void *data)
static int test_bdev_super(struct super_block *s, void *data)
{
- return (void *)s->s_bdev == data;
+ struct mount_bdev_data *mb_data = data;
+
+ return s->s_bdev == mb_data->bdev;
}
struct dentry *mount_bdev(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data,
int (*fill_super)(struct super_block *, void *, int))
{
+ struct mount_bdev_data mb_data;
struct block_device *bdev;
+ struct dax_inode *dax_inode;
struct super_block *s;
fmode_t mode = FMODE_READ | FMODE_EXCL;
int error = 0;
@@ -1072,6 +1085,11 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
if (IS_ERR(bdev))
return ERR_CAST(bdev);
+ if (IS_ENABLED(CONFIG_FS_DAX))
+ dax_inode = dax_get_by_host(bdev->bd_disk->disk_name);
+ else
+ dax_inode = NULL;
+
/*
* once the super is inserted into the list by sget, s_umount
* will protect the lockfs code from trying to start a snapshot
@@ -1083,8 +1101,13 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
error = -EBUSY;
goto error_bdev;
}
+
+ mb_data = (struct mount_bdev_data) {
+ .bdev = bdev,
+ .dax_inode = dax_inode,
+ };
s = sget(fs_type, test_bdev_super, set_bdev_super, flags | MS_NOSEC,
- bdev);
+ &mb_data);
mutex_unlock(&bdev->bd_fsfreeze_mutex);
if (IS_ERR(s))
goto error_s;
@@ -1126,6 +1149,7 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
error = PTR_ERR(s);
error_bdev:
blkdev_put(bdev, mode);
+ put_dax_inode(dax_inode);
error:
return ERR_PTR(error);
}
@@ -1133,6 +1157,7 @@ EXPORT_SYMBOL(mount_bdev);
void kill_block_super(struct super_block *sb)
{
+ struct dax_inode *dax_inode = sb->s_dax;
struct block_device *bdev = sb->s_bdev;
fmode_t mode = sb->s_mode;
@@ -1141,6 +1166,7 @@ void kill_block_super(struct super_block *sb)
sync_blockdev(bdev);
WARN_ON_ONCE(!(mode & FMODE_EXCL));
blkdev_put(bdev, mode | FMODE_EXCL);
+ put_dax_inode(dax_inode);
}
EXPORT_SYMBOL(kill_block_super);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c930cbc19342..fdad43169146 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1313,6 +1313,7 @@ struct super_block {
struct hlist_bl_head s_anon; /* anonymous dentries for (nfs) exporting */
struct list_head s_mounts; /* list of mounts; _not_ for fs use */
struct block_device *s_bdev;
+ struct dax_inode *s_dax;
struct backing_dev_info *s_bdi;
struct mtd_info *s_mtd;
struct hlist_node s_instances;
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
@ 2017-01-28 8:37 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:37 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
This is in preparation for removing the ->direct_access() method from
block_device_operations.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
fs/block_dev.c | 6 ++++--
fs/super.c | 32 +++++++++++++++++++++++++++++---
include/linux/fs.h | 1 +
3 files changed, 34 insertions(+), 5 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index bf4b51a3a412..a73f2388c515 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -806,14 +806,16 @@ int bdev_dax_supported(struct super_block *sb, int blocksize)
.sector = 0,
.size = PAGE_SIZE,
};
- int err;
+ int err, id;
if (blocksize != PAGE_SIZE) {
vfs_msg(sb, KERN_ERR, "error: unsupported blocksize for dax");
return -EINVAL;
}
- err = bdev_direct_access(sb->s_bdev, &dax);
+ id = dax_read_lock();
+ err = bdev_dax_direct_access(sb->s_bdev, sb->s_dax, &dax);
+ dax_read_unlock(id);
if (err < 0) {
switch (err) {
case -EOPNOTSUPP:
diff --git a/fs/super.c b/fs/super.c
index ea662b0e5e78..5e64d11c46c1 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -26,6 +26,7 @@
#include <linux/mount.h>
#include <linux/security.h>
#include <linux/writeback.h> /* for the emergency remount stuff */
+#include <linux/dax.h>
#include <linux/idr.h>
#include <linux/mutex.h>
#include <linux/backing-dev.h>
@@ -1038,9 +1039,17 @@ struct dentry *mount_ns(struct file_system_type *fs_type,
EXPORT_SYMBOL(mount_ns);
#ifdef CONFIG_BLOCK
+struct mount_bdev_data {
+ struct block_device *bdev;
+ struct dax_inode *dax_inode;
+};
+
static int set_bdev_super(struct super_block *s, void *data)
{
- s->s_bdev = data;
+ struct mount_bdev_data *mb_data = data;
+
+ s->s_bdev = mb_data->bdev;
+ s->s_dax = mb_data->dax_inode;
s->s_dev = s->s_bdev->bd_dev;
/*
@@ -1053,14 +1062,18 @@ static int set_bdev_super(struct super_block *s, void *data)
static int test_bdev_super(struct super_block *s, void *data)
{
- return (void *)s->s_bdev == data;
+ struct mount_bdev_data *mb_data = data;
+
+ return s->s_bdev == mb_data->bdev;
}
struct dentry *mount_bdev(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data,
int (*fill_super)(struct super_block *, void *, int))
{
+ struct mount_bdev_data mb_data;
struct block_device *bdev;
+ struct dax_inode *dax_inode;
struct super_block *s;
fmode_t mode = FMODE_READ | FMODE_EXCL;
int error = 0;
@@ -1072,6 +1085,11 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
if (IS_ERR(bdev))
return ERR_CAST(bdev);
+ if (IS_ENABLED(CONFIG_FS_DAX))
+ dax_inode = dax_get_by_host(bdev->bd_disk->disk_name);
+ else
+ dax_inode = NULL;
+
/*
* once the super is inserted into the list by sget, s_umount
* will protect the lockfs code from trying to start a snapshot
@@ -1083,8 +1101,13 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
error = -EBUSY;
goto error_bdev;
}
+
+ mb_data = (struct mount_bdev_data) {
+ .bdev = bdev,
+ .dax_inode = dax_inode,
+ };
s = sget(fs_type, test_bdev_super, set_bdev_super, flags | MS_NOSEC,
- bdev);
+ &mb_data);
mutex_unlock(&bdev->bd_fsfreeze_mutex);
if (IS_ERR(s))
goto error_s;
@@ -1126,6 +1149,7 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
error = PTR_ERR(s);
error_bdev:
blkdev_put(bdev, mode);
+ put_dax_inode(dax_inode);
error:
return ERR_PTR(error);
}
@@ -1133,6 +1157,7 @@ EXPORT_SYMBOL(mount_bdev);
void kill_block_super(struct super_block *sb)
{
+ struct dax_inode *dax_inode = sb->s_dax;
struct block_device *bdev = sb->s_bdev;
fmode_t mode = sb->s_mode;
@@ -1141,6 +1166,7 @@ void kill_block_super(struct super_block *sb)
sync_blockdev(bdev);
WARN_ON_ONCE(!(mode & FMODE_EXCL));
blkdev_put(bdev, mode | FMODE_EXCL);
+ put_dax_inode(dax_inode);
}
EXPORT_SYMBOL(kill_block_super);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c930cbc19342..fdad43169146 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1313,6 +1313,7 @@ struct super_block {
struct hlist_bl_head s_anon; /* anonymous dentries for (nfs) exporting */
struct list_head s_mounts; /* list of mounts; _not_ for fs use */
struct block_device *s_bdev;
+ struct dax_inode *s_dax;
struct backing_dev_info *s_bdi;
struct mtd_info *s_mtd;
struct hlist_node s_instances;
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 14/17] ext2, ext4, xfs: retrieve dax_inode through iomap operations
2017-01-28 8:36 ` Dan Williams
@ 2017-01-28 8:37 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:37 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
In preparation for converting fs/dax.c to use bdev_dax_direct_access()
instead of bdev_direct_access(), add the plumbing to retrieve the
dax_inode determined at mount through ->iomap_begin.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
fs/ext2/inode.c | 1 +
fs/ext4/inode.c | 1 +
fs/xfs/xfs_aops.c | 13 +++++++++++++
fs/xfs/xfs_aops.h | 1 +
fs/xfs/xfs_buf.h | 1 +
fs/xfs/xfs_iomap.c | 1 +
fs/xfs/xfs_super.c | 3 +++
include/linux/iomap.h | 1 +
8 files changed, 22 insertions(+)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index f073bfca694b..c83f84748ec9 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -813,6 +813,7 @@ static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
iomap->flags = 0;
iomap->bdev = inode->i_sb->s_bdev;
+ iomap->dax_inode = inode->i_sb->s_dax;
iomap->offset = (u64)first_block << blkbits;
if (ret == 0) {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 88d57af1b516..ae6fa6a78d0d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3344,6 +3344,7 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
iomap->flags = 0;
iomap->bdev = inode->i_sb->s_bdev;
+ iomap->dax_inode = inode->i_sb->s_dax;
iomap->offset = first_block << blkbits;
if (ret == 0) {
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 631e7c0e0a29..7d22938a4d8b 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -80,6 +80,19 @@ xfs_find_bdev_for_inode(
return mp->m_ddev_targp->bt_bdev;
}
+struct dax_inode *
+xfs_find_dax_for_inode(
+ struct inode *inode)
+{
+ struct xfs_inode *ip = XFS_I(inode);
+ struct xfs_mount *mp = ip->i_mount;
+
+ if (XFS_IS_REALTIME_INODE(ip))
+ return NULL;
+ else
+ return mp->m_ddev_targp->bt_dax;
+}
+
/*
* We're now finished for good with this page. Update the page state via the
* associated buffer_heads, paying attention to the start and end offsets that
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index cc174ec6c2fd..e5b65f436acf 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -59,5 +59,6 @@ int xfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, size_t size);
extern void xfs_count_page_state(struct page *, int *, int *);
extern struct block_device *xfs_find_bdev_for_inode(struct inode *);
+extern struct dax_inode *xfs_find_dax_for_inode(struct inode *);
#endif /* __XFS_AOPS_H__ */
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 8a9d3a9599f0..1ff83f398649 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -109,6 +109,7 @@ typedef unsigned int xfs_buf_flags_t;
typedef struct xfs_buftarg {
dev_t bt_dev;
struct block_device *bt_bdev;
+ struct dax_inode *bt_dax;
struct backing_dev_info *bt_bdi;
struct xfs_mount *bt_mount;
unsigned int bt_meta_sectorsize;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 0d147428971e..1d08bd2433d5 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -69,6 +69,7 @@ xfs_bmbt_to_iomap(
iomap->offset = XFS_FSB_TO_B(mp, imap->br_startoff);
iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount);
iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip));
+ iomap->dax_inode = xfs_find_dax_for_inode(VFS_I(ip));
}
xfs_extlen_t
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index eecbaac08eba..1a99013a0701 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -774,6 +774,9 @@ xfs_open_devices(
if (!mp->m_ddev_targp)
goto out_close_rtdev;
+ /* associate dax inode for filesystem-dax */
+ mp->m_ddev_targp->bt_dax = mp->m_super->s_dax;
+
if (rtdev) {
mp->m_rtdev_targp = xfs_alloc_buftarg(mp, rtdev);
if (!mp->m_rtdev_targp)
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index a4c94b86401e..01e265e7cf55 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -41,6 +41,7 @@ struct iomap {
u16 type; /* type of mapping */
u16 flags; /* flags for mapping */
struct block_device *bdev; /* block device for I/O */
+ struct dax_inode *dax_inode; /* dax_inode for dax operations */
};
/*
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 14/17] ext2, ext4, xfs: retrieve dax_inode through iomap operations
@ 2017-01-28 8:37 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:37 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
In preparation for converting fs/dax.c to use bdev_dax_direct_access()
instead of bdev_direct_access(), add the plumbing to retrieve the
dax_inode determined at mount through ->iomap_begin.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
fs/ext2/inode.c | 1 +
fs/ext4/inode.c | 1 +
fs/xfs/xfs_aops.c | 13 +++++++++++++
fs/xfs/xfs_aops.h | 1 +
fs/xfs/xfs_buf.h | 1 +
fs/xfs/xfs_iomap.c | 1 +
fs/xfs/xfs_super.c | 3 +++
include/linux/iomap.h | 1 +
8 files changed, 22 insertions(+)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index f073bfca694b..c83f84748ec9 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -813,6 +813,7 @@ static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
iomap->flags = 0;
iomap->bdev = inode->i_sb->s_bdev;
+ iomap->dax_inode = inode->i_sb->s_dax;
iomap->offset = (u64)first_block << blkbits;
if (ret == 0) {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 88d57af1b516..ae6fa6a78d0d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3344,6 +3344,7 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
iomap->flags = 0;
iomap->bdev = inode->i_sb->s_bdev;
+ iomap->dax_inode = inode->i_sb->s_dax;
iomap->offset = first_block << blkbits;
if (ret == 0) {
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 631e7c0e0a29..7d22938a4d8b 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -80,6 +80,19 @@ xfs_find_bdev_for_inode(
return mp->m_ddev_targp->bt_bdev;
}
+struct dax_inode *
+xfs_find_dax_for_inode(
+ struct inode *inode)
+{
+ struct xfs_inode *ip = XFS_I(inode);
+ struct xfs_mount *mp = ip->i_mount;
+
+ if (XFS_IS_REALTIME_INODE(ip))
+ return NULL;
+ else
+ return mp->m_ddev_targp->bt_dax;
+}
+
/*
* We're now finished for good with this page. Update the page state via the
* associated buffer_heads, paying attention to the start and end offsets that
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index cc174ec6c2fd..e5b65f436acf 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -59,5 +59,6 @@ int xfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, size_t size);
extern void xfs_count_page_state(struct page *, int *, int *);
extern struct block_device *xfs_find_bdev_for_inode(struct inode *);
+extern struct dax_inode *xfs_find_dax_for_inode(struct inode *);
#endif /* __XFS_AOPS_H__ */
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 8a9d3a9599f0..1ff83f398649 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -109,6 +109,7 @@ typedef unsigned int xfs_buf_flags_t;
typedef struct xfs_buftarg {
dev_t bt_dev;
struct block_device *bt_bdev;
+ struct dax_inode *bt_dax;
struct backing_dev_info *bt_bdi;
struct xfs_mount *bt_mount;
unsigned int bt_meta_sectorsize;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 0d147428971e..1d08bd2433d5 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -69,6 +69,7 @@ xfs_bmbt_to_iomap(
iomap->offset = XFS_FSB_TO_B(mp, imap->br_startoff);
iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount);
iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip));
+ iomap->dax_inode = xfs_find_dax_for_inode(VFS_I(ip));
}
xfs_extlen_t
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index eecbaac08eba..1a99013a0701 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -774,6 +774,9 @@ xfs_open_devices(
if (!mp->m_ddev_targp)
goto out_close_rtdev;
+ /* associate dax inode for filesystem-dax */
+ mp->m_ddev_targp->bt_dax = mp->m_super->s_dax;
+
if (rtdev) {
mp->m_rtdev_targp = xfs_alloc_buftarg(mp, rtdev);
if (!mp->m_rtdev_targp)
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index a4c94b86401e..01e265e7cf55 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -41,6 +41,7 @@ struct iomap {
u16 type; /* type of mapping */
u16 flags; /* flags for mapping */
struct block_device *bdev; /* block device for I/O */
+ struct dax_inode *dax_inode; /* dax_inode for dax operations */
};
/*
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 15/17] Revert "block: use DAX for partition table reads"
2017-01-28 8:36 ` Dan Williams
@ 2017-01-28 8:37 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:37 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
commit d1a5f2b4d8a1 ("block: use DAX for partition table reads") was
part of a stalled effort to allow dax mappings of block devices. Since
then the device-dax mechanism has filled the role of dax-mapping static
device ranges.
Now that we are moving ->direct_access() from a block_device operation
to a dax_inode operation we would need block devices to map and carry
their own dax_inode reference.
Unless / until we decide to revive dax mapping of raw block devices
through the dax_inode scheme, there is no need to carry
read_dax_sector(). Its removal in turn allows for the removal of
bdev_direct_access() and should have been included in commit
223757016837 ("block_dev: remove DAX leftovers").
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
block/partition-generic.c | 17 ++---------------
fs/dax.c | 20 --------------------
include/linux/dax.h | 6 ------
3 files changed, 2 insertions(+), 41 deletions(-)
diff --git a/block/partition-generic.c b/block/partition-generic.c
index 7afb9907821f..5dfac337b0f2 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -16,7 +16,6 @@
#include <linux/kmod.h>
#include <linux/ctype.h>
#include <linux/genhd.h>
-#include <linux/dax.h>
#include <linux/blktrace_api.h>
#include "partitions/check.h"
@@ -631,24 +630,12 @@ int invalidate_partitions(struct gendisk *disk, struct block_device *bdev)
return 0;
}
-static struct page *read_pagecache_sector(struct block_device *bdev, sector_t n)
-{
- struct address_space *mapping = bdev->bd_inode->i_mapping;
-
- return read_mapping_page(mapping, (pgoff_t)(n >> (PAGE_SHIFT-9)),
- NULL);
-}
-
unsigned char *read_dev_sector(struct block_device *bdev, sector_t n, Sector *p)
{
+ struct address_space *mapping = bdev->bd_inode->i_mapping;
struct page *page;
- /* don't populate page cache for dax capable devices */
- if (IS_DAX(bdev->bd_inode))
- page = read_dax_sector(bdev, n);
- else
- page = read_pagecache_sector(bdev, n);
-
+ page = read_mapping_page(mapping, (pgoff_t)(n >> (PAGE_SHIFT-9)), NULL);
if (!IS_ERR(page)) {
if (PageError(page))
goto fail;
diff --git a/fs/dax.c b/fs/dax.c
index ddcddfeaa03b..a990211c8a3d 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -97,26 +97,6 @@ static int dax_is_empty_entry(void *entry)
return (unsigned long)entry & RADIX_DAX_EMPTY;
}
-struct page *read_dax_sector(struct block_device *bdev, sector_t n)
-{
- struct page *page = alloc_pages(GFP_KERNEL, 0);
- struct blk_dax_ctl dax = {
- .size = PAGE_SIZE,
- .sector = n & ~((((int) PAGE_SIZE) / 512) - 1),
- };
- long rc;
-
- if (!page)
- return ERR_PTR(-ENOMEM);
-
- rc = dax_map_atomic(bdev, &dax);
- if (rc < 0)
- return ERR_PTR(rc);
- memcpy_from_pmem(page_address(page), dax.addr, PAGE_SIZE);
- dax_unmap_atomic(bdev, &dax);
- return page;
-}
-
/*
* DAX radix tree locking
*/
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 2ef8e18e2587..10b742af3d56 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -65,15 +65,9 @@ void dax_wake_mapping_entry_waiter(struct address_space *mapping,
pgoff_t index, void *entry, bool wake_all);
#ifdef CONFIG_FS_DAX
-struct page *read_dax_sector(struct block_device *bdev, sector_t n);
int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
unsigned int offset, unsigned int length);
#else
-static inline struct page *read_dax_sector(struct block_device *bdev,
- sector_t n)
-{
- return ERR_PTR(-ENXIO);
-}
static inline int __dax_zero_page_range(struct block_device *bdev,
sector_t sector, unsigned int offset, unsigned int length)
{
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 15/17] Revert "block: use DAX for partition table reads"
@ 2017-01-28 8:37 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:37 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
commit d1a5f2b4d8a1 ("block: use DAX for partition table reads") was
part of a stalled effort to allow dax mappings of block devices. Since
then the device-dax mechanism has filled the role of dax-mapping static
device ranges.
Now that we are moving ->direct_access() from a block_device operation
to a dax_inode operation we would need block devices to map and carry
their own dax_inode reference.
Unless / until we decide to revive dax mapping of raw block devices
through the dax_inode scheme, there is no need to carry
read_dax_sector(). Its removal in turn allows for the removal of
bdev_direct_access() and should have been included in commit
223757016837 ("block_dev: remove DAX leftovers").
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
block/partition-generic.c | 17 ++---------------
fs/dax.c | 20 --------------------
include/linux/dax.h | 6 ------
3 files changed, 2 insertions(+), 41 deletions(-)
diff --git a/block/partition-generic.c b/block/partition-generic.c
index 7afb9907821f..5dfac337b0f2 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -16,7 +16,6 @@
#include <linux/kmod.h>
#include <linux/ctype.h>
#include <linux/genhd.h>
-#include <linux/dax.h>
#include <linux/blktrace_api.h>
#include "partitions/check.h"
@@ -631,24 +630,12 @@ int invalidate_partitions(struct gendisk *disk, struct block_device *bdev)
return 0;
}
-static struct page *read_pagecache_sector(struct block_device *bdev, sector_t n)
-{
- struct address_space *mapping = bdev->bd_inode->i_mapping;
-
- return read_mapping_page(mapping, (pgoff_t)(n >> (PAGE_SHIFT-9)),
- NULL);
-}
-
unsigned char *read_dev_sector(struct block_device *bdev, sector_t n, Sector *p)
{
+ struct address_space *mapping = bdev->bd_inode->i_mapping;
struct page *page;
- /* don't populate page cache for dax capable devices */
- if (IS_DAX(bdev->bd_inode))
- page = read_dax_sector(bdev, n);
- else
- page = read_pagecache_sector(bdev, n);
-
+ page = read_mapping_page(mapping, (pgoff_t)(n >> (PAGE_SHIFT-9)), NULL);
if (!IS_ERR(page)) {
if (PageError(page))
goto fail;
diff --git a/fs/dax.c b/fs/dax.c
index ddcddfeaa03b..a990211c8a3d 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -97,26 +97,6 @@ static int dax_is_empty_entry(void *entry)
return (unsigned long)entry & RADIX_DAX_EMPTY;
}
-struct page *read_dax_sector(struct block_device *bdev, sector_t n)
-{
- struct page *page = alloc_pages(GFP_KERNEL, 0);
- struct blk_dax_ctl dax = {
- .size = PAGE_SIZE,
- .sector = n & ~((((int) PAGE_SIZE) / 512) - 1),
- };
- long rc;
-
- if (!page)
- return ERR_PTR(-ENOMEM);
-
- rc = dax_map_atomic(bdev, &dax);
- if (rc < 0)
- return ERR_PTR(rc);
- memcpy_from_pmem(page_address(page), dax.addr, PAGE_SIZE);
- dax_unmap_atomic(bdev, &dax);
- return page;
-}
-
/*
* DAX radix tree locking
*/
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 2ef8e18e2587..10b742af3d56 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -65,15 +65,9 @@ void dax_wake_mapping_entry_waiter(struct address_space *mapping,
pgoff_t index, void *entry, bool wake_all);
#ifdef CONFIG_FS_DAX
-struct page *read_dax_sector(struct block_device *bdev, sector_t n);
int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
unsigned int offset, unsigned int length);
#else
-static inline struct page *read_dax_sector(struct block_device *bdev,
- sector_t n)
-{
- return ERR_PTR(-ENXIO);
-}
static inline int __dax_zero_page_range(struct block_device *bdev,
sector_t sector, unsigned int offset, unsigned int length)
{
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 16/17] fs, dax: convert filesystem-dax to bdev_dax_direct_access
2017-01-28 8:36 ` Dan Williams
@ 2017-01-28 8:37 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:37 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
Now that a dax_inode is plumbed through all dax-capable drivers we can
switch from block_device_operations to dax_operations for invoking
->direct_access.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
fs/dax.c | 143 +++++++++++++++++++++++++++------------------------
fs/iomap.c | 3 +
include/linux/dax.h | 6 +-
3 files changed, 82 insertions(+), 70 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index a990211c8a3d..07b36a26db06 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -51,32 +51,6 @@ static int __init init_dax_wait_table(void)
}
fs_initcall(init_dax_wait_table);
-static long dax_map_atomic(struct block_device *bdev, struct blk_dax_ctl *dax)
-{
- struct request_queue *q = bdev->bd_queue;
- long rc = -EIO;
-
- dax->addr = ERR_PTR(-EIO);
- if (blk_queue_enter(q, true) != 0)
- return rc;
-
- rc = bdev_direct_access(bdev, dax);
- if (rc < 0) {
- dax->addr = ERR_PTR(rc);
- blk_queue_exit(q);
- return rc;
- }
- return rc;
-}
-
-static void dax_unmap_atomic(struct block_device *bdev,
- const struct blk_dax_ctl *dax)
-{
- if (IS_ERR(dax->addr))
- return;
- blk_queue_exit(bdev->bd_queue);
-}
-
static int dax_is_pmd_entry(void *entry)
{
return (unsigned long)entry & RADIX_DAX_PMD;
@@ -549,21 +523,28 @@ static int dax_load_hole(struct address_space *mapping, void **entry,
return ret;
}
-static int copy_user_dax(struct block_device *bdev, sector_t sector, size_t size,
- struct page *to, unsigned long vaddr)
+static int copy_user_dax(struct block_device *bdev, struct dax_inode *dax_inode,
+ sector_t sector, size_t size, struct page *to,
+ unsigned long vaddr)
{
struct blk_dax_ctl dax = {
.sector = sector,
.size = size,
};
void *vto;
+ long rc;
+ int id;
- if (dax_map_atomic(bdev, &dax) < 0)
- return PTR_ERR(dax.addr);
+ id = dax_read_lock();
+ rc = bdev_dax_direct_access(bdev, dax_inode, &dax);
+ if (rc < 0) {
+ dax_read_unlock(id);
+ return rc;
+ }
vto = kmap_atomic(to);
copy_user_page(vto, (void __force *)dax.addr, vaddr, to);
kunmap_atomic(vto);
- dax_unmap_atomic(bdev, &dax);
+ dax_read_unlock(id);
return 0;
}
@@ -731,12 +712,13 @@ static void dax_mapping_entry_mkclean(struct address_space *mapping,
}
static int dax_writeback_one(struct block_device *bdev,
- struct address_space *mapping, pgoff_t index, void *entry)
+ struct dax_inode *dax_inode, struct address_space *mapping,
+ pgoff_t index, void *entry)
{
struct radix_tree_root *page_tree = &mapping->page_tree;
struct blk_dax_ctl dax;
void *entry2, **slot;
- int ret = 0;
+ int ret = 0, id;
/*
* A page got tagged dirty in DAX mapping? Something is seriously
@@ -789,18 +771,20 @@ static int dax_writeback_one(struct block_device *bdev,
dax.size = PAGE_SIZE << dax_radix_order(entry);
/*
- * We cannot hold tree_lock while calling dax_map_atomic() because it
- * eventually calls cond_resched().
+ * bdev_dax_direct_access() may sleep, so cannot hold tree_lock
+ * over its invocation.
*/
- ret = dax_map_atomic(bdev, &dax);
+ id = dax_read_lock();
+ ret = bdev_dax_direct_access(bdev, dax_inode, &dax);
if (ret < 0) {
+ dax_read_unlock(id);
put_locked_mapping_entry(mapping, index, entry);
return ret;
}
if (WARN_ON_ONCE(ret < dax.size)) {
ret = -EIO;
- goto unmap;
+ goto dax_unlock;
}
dax_mapping_entry_mkclean(mapping, index, pfn_t_to_pfn(dax.pfn));
@@ -814,8 +798,8 @@ static int dax_writeback_one(struct block_device *bdev,
spin_lock_irq(&mapping->tree_lock);
radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_DIRTY);
spin_unlock_irq(&mapping->tree_lock);
- unmap:
- dax_unmap_atomic(bdev, &dax);
+ dax_unlock:
+ dax_read_unlock(id);
put_locked_mapping_entry(mapping, index, entry);
return ret;
@@ -836,6 +820,7 @@ int dax_writeback_mapping_range(struct address_space *mapping,
struct inode *inode = mapping->host;
pgoff_t start_index, end_index;
pgoff_t indices[PAGEVEC_SIZE];
+ struct dax_inode *dax_inode;
struct pagevec pvec;
bool done = false;
int i, ret = 0;
@@ -846,6 +831,10 @@ int dax_writeback_mapping_range(struct address_space *mapping,
if (!mapping->nrexceptional || wbc->sync_mode != WB_SYNC_ALL)
return 0;
+ dax_inode = dax_get_by_host(bdev->bd_disk->disk_name);
+ if (!dax_inode)
+ return -EIO;
+
start_index = wbc->range_start >> PAGE_SHIFT;
end_index = wbc->range_end >> PAGE_SHIFT;
@@ -866,19 +855,23 @@ int dax_writeback_mapping_range(struct address_space *mapping,
break;
}
- ret = dax_writeback_one(bdev, mapping, indices[i],
- pvec.pages[i]);
- if (ret < 0)
+ ret = dax_writeback_one(bdev, dax_inode, mapping,
+ indices[i], pvec.pages[i]);
+ if (ret < 0) {
+ put_dax_inode(dax_inode);
return ret;
+ }
}
}
+ put_dax_inode(dax_inode);
return 0;
}
EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
static int dax_insert_mapping(struct address_space *mapping,
- struct block_device *bdev, sector_t sector, size_t size,
- void **entryp, struct vm_area_struct *vma, struct vm_fault *vmf)
+ struct block_device *bdev, struct dax_inode *dax_inode,
+ sector_t sector, size_t size, void **entryp,
+ struct vm_area_struct *vma, struct vm_fault *vmf)
{
unsigned long vaddr = vmf->address;
struct blk_dax_ctl dax = {
@@ -887,10 +880,15 @@ static int dax_insert_mapping(struct address_space *mapping,
};
void *ret;
void *entry = *entryp;
+ int id, rc;
- if (dax_map_atomic(bdev, &dax) < 0)
- return PTR_ERR(dax.addr);
- dax_unmap_atomic(bdev, &dax);
+ id = dax_read_lock();
+ rc = bdev_dax_direct_access(bdev, dax_inode, &dax);
+ if (rc < 0) {
+ dax_read_unlock(id);
+ return rc;
+ }
+ dax_read_unlock(id);
ret = dax_insert_mapping_entry(mapping, vmf, entry, dax.sector, 0);
if (IS_ERR(ret))
@@ -947,7 +945,8 @@ static bool dax_range_is_aligned(struct block_device *bdev,
return true;
}
-int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
+int __dax_zero_page_range(struct block_device *bdev,
+ struct dax_inode *dax_inode, sector_t sector,
unsigned int offset, unsigned int length)
{
struct blk_dax_ctl dax = {
@@ -961,10 +960,16 @@ int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
return blkdev_issue_zeroout(bdev, start_sector,
length >> 9, GFP_NOFS, true);
} else {
- if (dax_map_atomic(bdev, &dax) < 0)
- return PTR_ERR(dax.addr);
+ int rc, id;
+
+ id = dax_read_lock();
+ rc = bdev_dax_direct_access(bdev, dax_inode, &dax);
+ if (rc < 0) {
+ dax_read_unlock(id);
+ return rc;
+ }
clear_pmem(dax.addr + offset, length);
- dax_unmap_atomic(bdev, &dax);
+ dax_read_unlock(id);
}
return 0;
}
@@ -983,6 +988,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct iov_iter *iter = data;
loff_t end = pos + length, done = 0;
ssize_t ret = 0;
+ int id;
if (iov_iter_rw(iter) == READ) {
end = min(end, i_size_read(inode));
@@ -1007,6 +1013,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
(end - 1) >> PAGE_SHIFT);
}
+ id = dax_read_lock();
while (pos < end) {
unsigned offset = pos & (PAGE_SIZE - 1);
struct blk_dax_ctl dax = { 0 };
@@ -1014,7 +1021,8 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
dax.sector = dax_iomap_sector(iomap, pos);
dax.size = (length + offset + PAGE_SIZE - 1) & PAGE_MASK;
- map_len = dax_map_atomic(iomap->bdev, &dax);
+ map_len = bdev_dax_direct_access(iomap->bdev, iomap->dax_inode,
+ &dax);
if (map_len < 0) {
ret = map_len;
break;
@@ -1029,7 +1037,6 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
map_len = copy_from_iter_pmem(dax.addr, map_len, iter);
else
map_len = copy_to_iter(dax.addr, map_len, iter);
- dax_unmap_atomic(iomap->bdev, &dax);
if (map_len <= 0) {
ret = map_len ? map_len : -EFAULT;
break;
@@ -1039,6 +1046,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
length -= map_len;
done += map_len;
}
+ dax_read_unlock(id);
return done ? done : ret;
}
@@ -1151,8 +1159,8 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
clear_user_highpage(vmf->cow_page, vaddr);
break;
case IOMAP_MAPPED:
- error = copy_user_dax(iomap.bdev, sector, PAGE_SIZE,
- vmf->cow_page, vaddr);
+ error = copy_user_dax(iomap.bdev, iomap.dax_inode,
+ sector, PAGE_SIZE, vmf->cow_page, vaddr);
break;
default:
WARN_ON_ONCE(1);
@@ -1177,8 +1185,8 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
major = VM_FAULT_MAJOR;
}
- error = dax_insert_mapping(mapping, iomap.bdev, sector,
- PAGE_SIZE, &entry, vma, vmf);
+ error = dax_insert_mapping(mapping, iomap.bdev, iomap.dax_inode,
+ sector, PAGE_SIZE, &entry, vma, vmf);
/* -EBUSY is fine, somebody else faulted on the same PTE */
if (error == -EBUSY)
error = 0;
@@ -1231,23 +1239,24 @@ static int dax_pmd_insert_mapping(struct vm_area_struct *vma, pmd_t *pmd,
{
struct address_space *mapping = vma->vm_file->f_mapping;
struct block_device *bdev = iomap->bdev;
+ struct dax_inode *dax_inode = iomap->dax_inode;
struct blk_dax_ctl dax = {
.sector = dax_iomap_sector(iomap, pos),
.size = PMD_SIZE,
};
- long length = dax_map_atomic(bdev, &dax);
+ long length;
void *ret;
+ int id;
- if (length < 0) /* dax_map_atomic() failed */
- return VM_FAULT_FALLBACK;
+ id = dax_read_lock();
+ length = bdev_dax_direct_access(bdev, dax_inode, &dax);
if (length < PMD_SIZE)
- goto unmap_fallback;
+ goto unlock_fallback;
if (pfn_t_to_pfn(dax.pfn) & PG_PMD_COLOUR)
- goto unmap_fallback;
+ goto unlock_fallback;
if (!pfn_t_devmap(dax.pfn))
- goto unmap_fallback;
-
- dax_unmap_atomic(bdev, &dax);
+ goto unlock_fallback;
+ dax_read_unlock(id);
ret = dax_insert_mapping_entry(mapping, vmf, *entryp, dax.sector,
RADIX_DAX_PMD);
@@ -1257,8 +1266,8 @@ static int dax_pmd_insert_mapping(struct vm_area_struct *vma, pmd_t *pmd,
return vmf_insert_pfn_pmd(vma, address, pmd, dax.pfn, write);
- unmap_fallback:
- dax_unmap_atomic(bdev, &dax);
+ unlock_fallback:
+ dax_read_unlock(id);
return VM_FAULT_FALLBACK;
}
diff --git a/fs/iomap.c b/fs/iomap.c
index 354a123f170e..279d18cc1cb6 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -355,7 +355,8 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
sector_t sector = iomap->blkno +
(((pos & ~(PAGE_SIZE - 1)) - iomap->offset) >> 9);
- return __dax_zero_page_range(iomap->bdev, sector, offset, bytes);
+ return __dax_zero_page_range(iomap->bdev, iomap->dax_inode, sector,
+ offset, bytes);
}
static loff_t
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 10b742af3d56..b8e8e7896452 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -65,11 +65,13 @@ void dax_wake_mapping_entry_waiter(struct address_space *mapping,
pgoff_t index, void *entry, bool wake_all);
#ifdef CONFIG_FS_DAX
-int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
+int __dax_zero_page_range(struct block_device *bdev,
+ struct dax_inode *dax_inode, sector_t sector,
unsigned int offset, unsigned int length);
#else
static inline int __dax_zero_page_range(struct block_device *bdev,
- sector_t sector, unsigned int offset, unsigned int length)
+ struct dax_inode *dax_inode, sector_t sector,
+ unsigned int offset, unsigned int length)
{
return -ENXIO;
}
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 16/17] fs, dax: convert filesystem-dax to bdev_dax_direct_access
@ 2017-01-28 8:37 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:37 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
Now that a dax_inode is plumbed through all dax-capable drivers we can
switch from block_device_operations to dax_operations for invoking
->direct_access.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
fs/dax.c | 143 +++++++++++++++++++++++++++------------------------
fs/iomap.c | 3 +
include/linux/dax.h | 6 +-
3 files changed, 82 insertions(+), 70 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index a990211c8a3d..07b36a26db06 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -51,32 +51,6 @@ static int __init init_dax_wait_table(void)
}
fs_initcall(init_dax_wait_table);
-static long dax_map_atomic(struct block_device *bdev, struct blk_dax_ctl *dax)
-{
- struct request_queue *q = bdev->bd_queue;
- long rc = -EIO;
-
- dax->addr = ERR_PTR(-EIO);
- if (blk_queue_enter(q, true) != 0)
- return rc;
-
- rc = bdev_direct_access(bdev, dax);
- if (rc < 0) {
- dax->addr = ERR_PTR(rc);
- blk_queue_exit(q);
- return rc;
- }
- return rc;
-}
-
-static void dax_unmap_atomic(struct block_device *bdev,
- const struct blk_dax_ctl *dax)
-{
- if (IS_ERR(dax->addr))
- return;
- blk_queue_exit(bdev->bd_queue);
-}
-
static int dax_is_pmd_entry(void *entry)
{
return (unsigned long)entry & RADIX_DAX_PMD;
@@ -549,21 +523,28 @@ static int dax_load_hole(struct address_space *mapping, void **entry,
return ret;
}
-static int copy_user_dax(struct block_device *bdev, sector_t sector, size_t size,
- struct page *to, unsigned long vaddr)
+static int copy_user_dax(struct block_device *bdev, struct dax_inode *dax_inode,
+ sector_t sector, size_t size, struct page *to,
+ unsigned long vaddr)
{
struct blk_dax_ctl dax = {
.sector = sector,
.size = size,
};
void *vto;
+ long rc;
+ int id;
- if (dax_map_atomic(bdev, &dax) < 0)
- return PTR_ERR(dax.addr);
+ id = dax_read_lock();
+ rc = bdev_dax_direct_access(bdev, dax_inode, &dax);
+ if (rc < 0) {
+ dax_read_unlock(id);
+ return rc;
+ }
vto = kmap_atomic(to);
copy_user_page(vto, (void __force *)dax.addr, vaddr, to);
kunmap_atomic(vto);
- dax_unmap_atomic(bdev, &dax);
+ dax_read_unlock(id);
return 0;
}
@@ -731,12 +712,13 @@ static void dax_mapping_entry_mkclean(struct address_space *mapping,
}
static int dax_writeback_one(struct block_device *bdev,
- struct address_space *mapping, pgoff_t index, void *entry)
+ struct dax_inode *dax_inode, struct address_space *mapping,
+ pgoff_t index, void *entry)
{
struct radix_tree_root *page_tree = &mapping->page_tree;
struct blk_dax_ctl dax;
void *entry2, **slot;
- int ret = 0;
+ int ret = 0, id;
/*
* A page got tagged dirty in DAX mapping? Something is seriously
@@ -789,18 +771,20 @@ static int dax_writeback_one(struct block_device *bdev,
dax.size = PAGE_SIZE << dax_radix_order(entry);
/*
- * We cannot hold tree_lock while calling dax_map_atomic() because it
- * eventually calls cond_resched().
+ * bdev_dax_direct_access() may sleep, so cannot hold tree_lock
+ * over its invocation.
*/
- ret = dax_map_atomic(bdev, &dax);
+ id = dax_read_lock();
+ ret = bdev_dax_direct_access(bdev, dax_inode, &dax);
if (ret < 0) {
+ dax_read_unlock(id);
put_locked_mapping_entry(mapping, index, entry);
return ret;
}
if (WARN_ON_ONCE(ret < dax.size)) {
ret = -EIO;
- goto unmap;
+ goto dax_unlock;
}
dax_mapping_entry_mkclean(mapping, index, pfn_t_to_pfn(dax.pfn));
@@ -814,8 +798,8 @@ static int dax_writeback_one(struct block_device *bdev,
spin_lock_irq(&mapping->tree_lock);
radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_DIRTY);
spin_unlock_irq(&mapping->tree_lock);
- unmap:
- dax_unmap_atomic(bdev, &dax);
+ dax_unlock:
+ dax_read_unlock(id);
put_locked_mapping_entry(mapping, index, entry);
return ret;
@@ -836,6 +820,7 @@ int dax_writeback_mapping_range(struct address_space *mapping,
struct inode *inode = mapping->host;
pgoff_t start_index, end_index;
pgoff_t indices[PAGEVEC_SIZE];
+ struct dax_inode *dax_inode;
struct pagevec pvec;
bool done = false;
int i, ret = 0;
@@ -846,6 +831,10 @@ int dax_writeback_mapping_range(struct address_space *mapping,
if (!mapping->nrexceptional || wbc->sync_mode != WB_SYNC_ALL)
return 0;
+ dax_inode = dax_get_by_host(bdev->bd_disk->disk_name);
+ if (!dax_inode)
+ return -EIO;
+
start_index = wbc->range_start >> PAGE_SHIFT;
end_index = wbc->range_end >> PAGE_SHIFT;
@@ -866,19 +855,23 @@ int dax_writeback_mapping_range(struct address_space *mapping,
break;
}
- ret = dax_writeback_one(bdev, mapping, indices[i],
- pvec.pages[i]);
- if (ret < 0)
+ ret = dax_writeback_one(bdev, dax_inode, mapping,
+ indices[i], pvec.pages[i]);
+ if (ret < 0) {
+ put_dax_inode(dax_inode);
return ret;
+ }
}
}
+ put_dax_inode(dax_inode);
return 0;
}
EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
static int dax_insert_mapping(struct address_space *mapping,
- struct block_device *bdev, sector_t sector, size_t size,
- void **entryp, struct vm_area_struct *vma, struct vm_fault *vmf)
+ struct block_device *bdev, struct dax_inode *dax_inode,
+ sector_t sector, size_t size, void **entryp,
+ struct vm_area_struct *vma, struct vm_fault *vmf)
{
unsigned long vaddr = vmf->address;
struct blk_dax_ctl dax = {
@@ -887,10 +880,15 @@ static int dax_insert_mapping(struct address_space *mapping,
};
void *ret;
void *entry = *entryp;
+ int id, rc;
- if (dax_map_atomic(bdev, &dax) < 0)
- return PTR_ERR(dax.addr);
- dax_unmap_atomic(bdev, &dax);
+ id = dax_read_lock();
+ rc = bdev_dax_direct_access(bdev, dax_inode, &dax);
+ if (rc < 0) {
+ dax_read_unlock(id);
+ return rc;
+ }
+ dax_read_unlock(id);
ret = dax_insert_mapping_entry(mapping, vmf, entry, dax.sector, 0);
if (IS_ERR(ret))
@@ -947,7 +945,8 @@ static bool dax_range_is_aligned(struct block_device *bdev,
return true;
}
-int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
+int __dax_zero_page_range(struct block_device *bdev,
+ struct dax_inode *dax_inode, sector_t sector,
unsigned int offset, unsigned int length)
{
struct blk_dax_ctl dax = {
@@ -961,10 +960,16 @@ int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
return blkdev_issue_zeroout(bdev, start_sector,
length >> 9, GFP_NOFS, true);
} else {
- if (dax_map_atomic(bdev, &dax) < 0)
- return PTR_ERR(dax.addr);
+ int rc, id;
+
+ id = dax_read_lock();
+ rc = bdev_dax_direct_access(bdev, dax_inode, &dax);
+ if (rc < 0) {
+ dax_read_unlock(id);
+ return rc;
+ }
clear_pmem(dax.addr + offset, length);
- dax_unmap_atomic(bdev, &dax);
+ dax_read_unlock(id);
}
return 0;
}
@@ -983,6 +988,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct iov_iter *iter = data;
loff_t end = pos + length, done = 0;
ssize_t ret = 0;
+ int id;
if (iov_iter_rw(iter) == READ) {
end = min(end, i_size_read(inode));
@@ -1007,6 +1013,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
(end - 1) >> PAGE_SHIFT);
}
+ id = dax_read_lock();
while (pos < end) {
unsigned offset = pos & (PAGE_SIZE - 1);
struct blk_dax_ctl dax = { 0 };
@@ -1014,7 +1021,8 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
dax.sector = dax_iomap_sector(iomap, pos);
dax.size = (length + offset + PAGE_SIZE - 1) & PAGE_MASK;
- map_len = dax_map_atomic(iomap->bdev, &dax);
+ map_len = bdev_dax_direct_access(iomap->bdev, iomap->dax_inode,
+ &dax);
if (map_len < 0) {
ret = map_len;
break;
@@ -1029,7 +1037,6 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
map_len = copy_from_iter_pmem(dax.addr, map_len, iter);
else
map_len = copy_to_iter(dax.addr, map_len, iter);
- dax_unmap_atomic(iomap->bdev, &dax);
if (map_len <= 0) {
ret = map_len ? map_len : -EFAULT;
break;
@@ -1039,6 +1046,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
length -= map_len;
done += map_len;
}
+ dax_read_unlock(id);
return done ? done : ret;
}
@@ -1151,8 +1159,8 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
clear_user_highpage(vmf->cow_page, vaddr);
break;
case IOMAP_MAPPED:
- error = copy_user_dax(iomap.bdev, sector, PAGE_SIZE,
- vmf->cow_page, vaddr);
+ error = copy_user_dax(iomap.bdev, iomap.dax_inode,
+ sector, PAGE_SIZE, vmf->cow_page, vaddr);
break;
default:
WARN_ON_ONCE(1);
@@ -1177,8 +1185,8 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
major = VM_FAULT_MAJOR;
}
- error = dax_insert_mapping(mapping, iomap.bdev, sector,
- PAGE_SIZE, &entry, vma, vmf);
+ error = dax_insert_mapping(mapping, iomap.bdev, iomap.dax_inode,
+ sector, PAGE_SIZE, &entry, vma, vmf);
/* -EBUSY is fine, somebody else faulted on the same PTE */
if (error == -EBUSY)
error = 0;
@@ -1231,23 +1239,24 @@ static int dax_pmd_insert_mapping(struct vm_area_struct *vma, pmd_t *pmd,
{
struct address_space *mapping = vma->vm_file->f_mapping;
struct block_device *bdev = iomap->bdev;
+ struct dax_inode *dax_inode = iomap->dax_inode;
struct blk_dax_ctl dax = {
.sector = dax_iomap_sector(iomap, pos),
.size = PMD_SIZE,
};
- long length = dax_map_atomic(bdev, &dax);
+ long length;
void *ret;
+ int id;
- if (length < 0) /* dax_map_atomic() failed */
- return VM_FAULT_FALLBACK;
+ id = dax_read_lock();
+ length = bdev_dax_direct_access(bdev, dax_inode, &dax);
if (length < PMD_SIZE)
- goto unmap_fallback;
+ goto unlock_fallback;
if (pfn_t_to_pfn(dax.pfn) & PG_PMD_COLOUR)
- goto unmap_fallback;
+ goto unlock_fallback;
if (!pfn_t_devmap(dax.pfn))
- goto unmap_fallback;
-
- dax_unmap_atomic(bdev, &dax);
+ goto unlock_fallback;
+ dax_read_unlock(id);
ret = dax_insert_mapping_entry(mapping, vmf, *entryp, dax.sector,
RADIX_DAX_PMD);
@@ -1257,8 +1266,8 @@ static int dax_pmd_insert_mapping(struct vm_area_struct *vma, pmd_t *pmd,
return vmf_insert_pfn_pmd(vma, address, pmd, dax.pfn, write);
- unmap_fallback:
- dax_unmap_atomic(bdev, &dax);
+ unlock_fallback:
+ dax_read_unlock(id);
return VM_FAULT_FALLBACK;
}
diff --git a/fs/iomap.c b/fs/iomap.c
index 354a123f170e..279d18cc1cb6 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -355,7 +355,8 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
sector_t sector = iomap->blkno +
(((pos & ~(PAGE_SIZE - 1)) - iomap->offset) >> 9);
- return __dax_zero_page_range(iomap->bdev, sector, offset, bytes);
+ return __dax_zero_page_range(iomap->bdev, iomap->dax_inode, sector,
+ offset, bytes);
}
static loff_t
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 10b742af3d56..b8e8e7896452 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -65,11 +65,13 @@ void dax_wake_mapping_entry_waiter(struct address_space *mapping,
pgoff_t index, void *entry, bool wake_all);
#ifdef CONFIG_FS_DAX
-int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
+int __dax_zero_page_range(struct block_device *bdev,
+ struct dax_inode *dax_inode, sector_t sector,
unsigned int offset, unsigned int length);
#else
static inline int __dax_zero_page_range(struct block_device *bdev,
- sector_t sector, unsigned int offset, unsigned int length)
+ struct dax_inode *dax_inode, sector_t sector,
+ unsigned int offset, unsigned int length)
{
return -ENXIO;
}
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 17/17] block: remove block_device_operations.direct_access and related infrastructure
2017-01-28 8:36 ` Dan Williams
@ 2017-01-28 8:37 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:37 UTC (permalink / raw)
To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch
Now that all the producers and consumers of dax interfaces have been
converted to using dax_operations on a dax_inode, remove the block
device direct_access enabling.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
arch/powerpc/sysdev/axonram.c | 15 --------------
drivers/block/brd.c | 11 ----------
drivers/md/dm-linear.c | 19 -----------------
drivers/md/dm-snap.c | 8 -------
drivers/md/dm-stripe.c | 24 ----------------------
drivers/md/dm-table.c | 2 +-
drivers/md/dm-target.c | 7 ------
drivers/md/dm.c | 19 +++--------------
drivers/nvdimm/pmem.c | 9 --------
drivers/s390/block/dcssblk.c | 16 ---------------
fs/block_dev.c | 45 -----------------------------------------
include/linux/blkdev.h | 3 ---
include/linux/device-mapper.h | 9 --------
13 files changed, 4 insertions(+), 183 deletions(-)
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 4e1f58187726..1337b5829980 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -148,23 +148,8 @@ __axon_ram_direct_access(struct axon_ram_bank *bank, phys_addr_t offset,
return bank->size - offset;
}
-/**
- * axon_ram_direct_access - direct_access() method for block device
- * @device, @sector, @data: see block_device_operations method
- */
-static long
-axon_ram_blk_direct_access(struct block_device *device, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
-{
- struct axon_ram_bank *bank = device->bd_disk->private_data;
-
- return __axon_ram_direct_access(bank, sector << AXON_RAM_SECTOR_SHIFT,
- kaddr, pfn, size);
-}
-
static const struct block_device_operations axon_ram_devops = {
.owner = THIS_MODULE,
- .direct_access = axon_ram_blk_direct_access
};
static long
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 1279df4dc07c..52a1259f8ded 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -395,14 +395,6 @@ static long __brd_direct_access(struct brd_device *brd, phys_addr_t dev_addr,
return PAGE_SIZE;
}
-static long brd_blk_direct_access(struct block_device *bdev, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
-{
- struct brd_device *brd = bdev->bd_disk->private_data;
-
- return __brd_direct_access(brd, sector * 512, kaddr, pfn, size);
-}
-
static long brd_dax_direct_access(struct dax_inode *dax_inode,
phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size)
{
@@ -414,14 +406,11 @@ static long brd_dax_direct_access(struct dax_inode *dax_inode,
static const struct dax_operations brd_dax_ops = {
.direct_access = brd_dax_direct_access,
};
-#else
-#define brd_blk_direct_access NULL
#endif
static const struct block_device_operations brd_fops = {
.owner = THIS_MODULE,
.rw_page = brd_rw_page,
- .direct_access = brd_blk_direct_access,
};
/*
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index e91ca8089333..7ec2a8eb8a14 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -141,24 +141,6 @@ static int linear_iterate_devices(struct dm_target *ti,
return fn(ti, lc->dev, lc->start, ti->len, data);
}
-static long linear_direct_access(struct dm_target *ti, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
-{
- struct linear_c *lc = ti->private;
- struct block_device *bdev = lc->dev->bdev;
- struct blk_dax_ctl dax = {
- .sector = linear_map_sector(ti, sector),
- .size = size,
- };
- long ret;
-
- ret = bdev_direct_access(bdev, &dax);
- *kaddr = dax.addr;
- *pfn = dax.pfn;
-
- return ret;
-}
-
static long linear_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
void **kaddr, pfn_t *pfn, long size)
{
@@ -192,7 +174,6 @@ static struct target_type linear_target = {
.status = linear_status,
.prepare_ioctl = linear_prepare_ioctl,
.iterate_devices = linear_iterate_devices,
- .direct_access = linear_direct_access,
.dax_ops = &linear_dax_ops,
};
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 1990e3bd6958..1d9407633bb5 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -2302,13 +2302,6 @@ static int origin_map(struct dm_target *ti, struct bio *bio)
return do_origin(o->dev, bio);
}
-static long origin_direct_access(struct dm_target *ti, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
-{
- DMWARN("device does not support dax.");
- return -EIO;
-}
-
static long origin_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
void **kaddr, pfn_t *pfn, long size)
{
@@ -2379,7 +2372,6 @@ static struct target_type origin_target = {
.postsuspend = origin_postsuspend,
.status = origin_status,
.iterate_devices = origin_iterate_devices,
- .direct_access = origin_direct_access,
.dax_ops = &origin_dax_ops,
};
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 47fb56a6184a..229b2c543902 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -308,29 +308,6 @@ static int stripe_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_REMAPPED;
}
-static long stripe_direct_access(struct dm_target *ti, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
-{
- struct stripe_c *sc = ti->private;
- uint32_t stripe;
- struct block_device *bdev;
- struct blk_dax_ctl dax = {
- .size = size,
- };
- long ret;
-
- stripe_map_sector(sc, sector, &stripe, &dax.sector);
-
- dax.sector += sc->stripe[stripe].physical_start;
- bdev = sc->stripe[stripe].dev->bdev;
-
- ret = bdev_direct_access(bdev, &dax);
- *kaddr = dax.addr;
- *pfn = dax.pfn;
-
- return ret;
-}
-
static long stripe_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
void **kaddr, pfn_t *pfn, long size)
{
@@ -477,7 +454,6 @@ static struct target_type stripe_target = {
.status = stripe_status,
.iterate_devices = stripe_iterate_devices,
.io_hints = stripe_io_hints,
- .direct_access = stripe_direct_access,
.dax_ops = &stripe_dax_ops,
};
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 3ad16d9c9d5a..cd23be26384c 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -856,7 +856,7 @@ static bool dm_table_supports_dax(struct dm_table *t)
while (i < dm_table_get_num_targets(t)) {
ti = dm_table_get_target(t, i++);
- if (!ti->type->direct_access)
+ if (!ti->type->dax_ops)
return false;
if (!ti->type->iterate_devices ||
diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c
index ab072f53cf24..c3f55df90157 100644
--- a/drivers/md/dm-target.c
+++ b/drivers/md/dm-target.c
@@ -148,12 +148,6 @@ static void io_err_release_clone_rq(struct request *clone)
{
}
-static long io_err_direct_access(struct dm_target *ti, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
-{
- return -EIO;
-}
-
static long io_err_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
void **kaddr, pfn_t *pfn, long size)
{
@@ -174,7 +168,6 @@ static struct target_type error_target = {
.map_rq = io_err_map_rq,
.clone_and_map_rq = io_err_clone_and_map_rq,
.release_clone_rq = io_err_release_clone_rq,
- .direct_access = io_err_direct_access,
.dax_ops = &err_dax_ops,
};
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 5c5eeda0eb0a..497fb8adc660 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -910,7 +910,7 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
EXPORT_SYMBOL_GPL(dm_set_target_max_io_len);
static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
- void **kaddr, pfn_t *pfn, long size, bool blk)
+ void **kaddr, pfn_t *pfn, long size)
{
sector_t sector = dev_addr >> SECTOR_SHIFT;
struct dm_table *map;
@@ -929,9 +929,7 @@ static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
len = max_io_len(sector, ti) << SECTOR_SHIFT;
size = min(len, size);
- if (blk && ti->type->direct_access)
- ret = ti->type->direct_access(ti, sector, kaddr, pfn, size);
- else if (ti->type->dax_ops)
+ if (ti->type->dax_ops)
ret = ti->type->dax_ops->dm_direct_access(ti, dev_addr, kaddr,
pfn, size);
out:
@@ -939,23 +937,13 @@ static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
return min(ret, size);
}
-static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
-{
- struct mapped_device *md = bdev->bd_disk->private_data;
-
- return __dm_direct_access(md, sector << SECTOR_SHIFT, kaddr, pfn, size,
- true);
-}
-
static long dm_dax_direct_access(struct dax_inode *dax_inode,
phys_addr_t dev_addr, void **kaddr, pfn_t *pfn,
long size)
{
struct mapped_device *md = dax_inode_get_private(dax_inode);
- return __dm_direct_access(md, dev_addr, kaddr, pfn, size,
- false);
+ return __dm_direct_access(md, dev_addr, kaddr, pfn, size);
}
/*
@@ -2769,7 +2757,6 @@ static const struct block_device_operations dm_blk_dops = {
.open = dm_blk_open,
.release = dm_blk_close,
.ioctl = dm_blk_ioctl,
- .direct_access = dm_blk_direct_access,
.getgeo = dm_blk_getgeo,
.pr_ops = &dm_pr_ops,
.owner = THIS_MODULE
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index d3d7de645e20..41781f853396 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -219,18 +219,9 @@ __weak long __pmem_direct_access(struct pmem_device *pmem, phys_addr_t dev_addr,
return pmem->size - pmem->pfn_pad - offset;
}
-static long pmem_blk_direct_access(struct block_device *bdev, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
-{
- struct pmem_device *pmem = bdev->bd_queue->queuedata;
-
- return __pmem_direct_access(pmem, sector * 512, kaddr, pfn, size);
-}
-
static const struct block_device_operations pmem_fops = {
.owner = THIS_MODULE,
.rw_page = pmem_rw_page,
- .direct_access = pmem_blk_direct_access,
.revalidate_disk = nvdimm_revalidate_disk,
};
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 67b0885b4d12..03140c93dbd1 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -31,8 +31,6 @@ static int dcssblk_open(struct block_device *bdev, fmode_t mode);
static void dcssblk_release(struct gendisk *disk, fmode_t mode);
static blk_qc_t dcssblk_make_request(struct request_queue *q,
struct bio *bio);
-static long dcssblk_blk_direct_access(struct block_device *bdev, sector_t secnum,
- void **kaddr, pfn_t *pfn, long size);
static long dcssblk_dax_direct_access(struct dax_inode *dax_inode,
phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size);
@@ -43,7 +41,6 @@ static const struct block_device_operations dcssblk_devops = {
.owner = THIS_MODULE,
.open = dcssblk_open,
.release = dcssblk_release,
- .direct_access = dcssblk_blk_direct_access,
};
static const struct dax_operations dcssblk_dax_ops = {
@@ -914,19 +911,6 @@ __dcssblk_direct_access(struct dcssblk_dev_info *dev_info, phys_addr_t offset,
}
static long
-dcssblk_blk_direct_access(struct block_device *bdev, sector_t secnum,
- void **kaddr, pfn_t *pfn, long size)
-{
- struct dcssblk_dev_info *dev_info;
-
- dev_info = bdev->bd_disk->private_data;
- if (!dev_info)
- return -ENODEV;
- return __dcssblk_direct_access(dev_info, secnum * 512, kaddr, pfn,
- size);
-}
-
-static long
dcssblk_dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
void **kaddr, pfn_t *pfn, long size)
{
diff --git a/fs/block_dev.c b/fs/block_dev.c
index a73f2388c515..ba0252736950 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -719,51 +719,6 @@ int bdev_write_page(struct block_device *bdev, sector_t sector,
EXPORT_SYMBOL_GPL(bdev_write_page);
/**
- * bdev_direct_access() - Get the address for directly-accessibly memory
- * @bdev: The device containing the memory
- * @dax: control and output parameters for ->direct_access
- *
- * If a block device is made up of directly addressable memory, this function
- * will tell the caller the PFN and the address of the memory. The address
- * may be directly dereferenced within the kernel without the need to call
- * ioremap(), kmap() or similar. The PFN is suitable for inserting into
- * page tables.
- *
- * Return: negative errno if an error occurs, otherwise the number of bytes
- * accessible at this address.
- */
-long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
-{
- sector_t sector = dax->sector;
- long avail, size = dax->size;
- const struct block_device_operations *ops = bdev->bd_disk->fops;
-
- /*
- * The device driver is allowed to sleep, in order to make the
- * memory directly accessible.
- */
- might_sleep();
-
- if (size < 0)
- return size;
- if (!blk_queue_dax(bdev_get_queue(bdev)) || !ops->direct_access)
- return -EOPNOTSUPP;
- if ((sector + DIV_ROUND_UP(size, 512)) >
- part_nr_sects_read(bdev->bd_part))
- return -ERANGE;
- sector += get_start_sect(bdev);
- if (sector % (PAGE_SIZE / 512))
- return -EINVAL;
- avail = ops->direct_access(bdev, sector, &dax->addr, &dax->pfn, size);
- if (!avail)
- return -ERANGE;
- if (avail > 0 && avail & ~PAGE_MASK)
- return -ENXIO;
- return min(avail, size);
-}
-EXPORT_SYMBOL_GPL(bdev_direct_access);
-
-/**
* bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address
* @bdev: host block device for @dax_inode
* @dax_inode: interface data and operations for a memory device
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3b3c5ce376fd..bb87390a29b1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1882,8 +1882,6 @@ struct block_device_operations {
int (*rw_page)(struct block_device *, sector_t, struct page *, bool);
int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
- long (*direct_access)(struct block_device *, sector_t, void **, pfn_t *,
- long);
unsigned int (*check_events) (struct gendisk *disk,
unsigned int clearing);
/* ->media_changed() is DEPRECATED, use ->check_events() instead */
@@ -1902,7 +1900,6 @@ extern int __blkdev_driver_ioctl(struct block_device *, fmode_t, unsigned int,
extern int bdev_read_page(struct block_device *, sector_t, struct page *);
extern int bdev_write_page(struct block_device *, sector_t, struct page *,
struct writeback_control *);
-extern long bdev_direct_access(struct block_device *, struct blk_dax_ctl *);
struct dax_inode;
extern long bdev_dax_direct_access(struct block_device *bdev,
struct dax_inode *dax_inode, struct blk_dax_ctl *dax);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 1b64f412bb45..6e8762f093d3 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -125,14 +125,6 @@ typedef void (*dm_io_hints_fn) (struct dm_target *ti,
*/
typedef int (*dm_busy_fn) (struct dm_target *ti);
-/*
- * Returns:
- * < 0 : error
- * >= 0 : the number of bytes accessible at the address
- */
-typedef long (*dm_direct_access_fn) (struct dm_target *ti, sector_t sector,
- void **kaddr, pfn_t *pfn, long size);
-
void dm_error(const char *message);
struct dm_dev {
@@ -185,7 +177,6 @@ struct target_type {
dm_busy_fn busy;
dm_iterate_devices_fn iterate_devices;
dm_io_hints_fn io_hints;
- dm_direct_access_fn direct_access;
const struct dm_dax_operations *dax_ops;
/* For internal device-mapper use. */
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [RFC PATCH 17/17] block: remove block_device_operations.direct_access and related infrastructure
@ 2017-01-28 8:37 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28 8:37 UTC (permalink / raw)
To: linux-nvdimm
Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
Now that all the producers and consumers of dax interfaces have been
converted to using dax_operations on a dax_inode, remove the block
device direct_access enabling.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
arch/powerpc/sysdev/axonram.c | 15 --------------
drivers/block/brd.c | 11 ----------
drivers/md/dm-linear.c | 19 -----------------
drivers/md/dm-snap.c | 8 -------
drivers/md/dm-stripe.c | 24 ----------------------
drivers/md/dm-table.c | 2 +-
drivers/md/dm-target.c | 7 ------
drivers/md/dm.c | 19 +++--------------
drivers/nvdimm/pmem.c | 9 --------
drivers/s390/block/dcssblk.c | 16 ---------------
fs/block_dev.c | 45 -----------------------------------------
include/linux/blkdev.h | 3 ---
include/linux/device-mapper.h | 9 --------
13 files changed, 4 insertions(+), 183 deletions(-)
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 4e1f58187726..1337b5829980 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -148,23 +148,8 @@ __axon_ram_direct_access(struct axon_ram_bank *bank, phys_addr_t offset,
return bank->size - offset;
}
-/**
- * axon_ram_direct_access - direct_access() method for block device
- * @device, @sector, @data: see block_device_operations method
- */
-static long
-axon_ram_blk_direct_access(struct block_device *device, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
-{
- struct axon_ram_bank *bank = device->bd_disk->private_data;
-
- return __axon_ram_direct_access(bank, sector << AXON_RAM_SECTOR_SHIFT,
- kaddr, pfn, size);
-}
-
static const struct block_device_operations axon_ram_devops = {
.owner = THIS_MODULE,
- .direct_access = axon_ram_blk_direct_access
};
static long
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 1279df4dc07c..52a1259f8ded 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -395,14 +395,6 @@ static long __brd_direct_access(struct brd_device *brd, phys_addr_t dev_addr,
return PAGE_SIZE;
}
-static long brd_blk_direct_access(struct block_device *bdev, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
-{
- struct brd_device *brd = bdev->bd_disk->private_data;
-
- return __brd_direct_access(brd, sector * 512, kaddr, pfn, size);
-}
-
static long brd_dax_direct_access(struct dax_inode *dax_inode,
phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size)
{
@@ -414,14 +406,11 @@ static long brd_dax_direct_access(struct dax_inode *dax_inode,
static const struct dax_operations brd_dax_ops = {
.direct_access = brd_dax_direct_access,
};
-#else
-#define brd_blk_direct_access NULL
#endif
static const struct block_device_operations brd_fops = {
.owner = THIS_MODULE,
.rw_page = brd_rw_page,
- .direct_access = brd_blk_direct_access,
};
/*
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index e91ca8089333..7ec2a8eb8a14 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -141,24 +141,6 @@ static int linear_iterate_devices(struct dm_target *ti,
return fn(ti, lc->dev, lc->start, ti->len, data);
}
-static long linear_direct_access(struct dm_target *ti, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
-{
- struct linear_c *lc = ti->private;
- struct block_device *bdev = lc->dev->bdev;
- struct blk_dax_ctl dax = {
- .sector = linear_map_sector(ti, sector),
- .size = size,
- };
- long ret;
-
- ret = bdev_direct_access(bdev, &dax);
- *kaddr = dax.addr;
- *pfn = dax.pfn;
-
- return ret;
-}
-
static long linear_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
void **kaddr, pfn_t *pfn, long size)
{
@@ -192,7 +174,6 @@ static struct target_type linear_target = {
.status = linear_status,
.prepare_ioctl = linear_prepare_ioctl,
.iterate_devices = linear_iterate_devices,
- .direct_access = linear_direct_access,
.dax_ops = &linear_dax_ops,
};
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 1990e3bd6958..1d9407633bb5 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -2302,13 +2302,6 @@ static int origin_map(struct dm_target *ti, struct bio *bio)
return do_origin(o->dev, bio);
}
-static long origin_direct_access(struct dm_target *ti, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
-{
- DMWARN("device does not support dax.");
- return -EIO;
-}
-
static long origin_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
void **kaddr, pfn_t *pfn, long size)
{
@@ -2379,7 +2372,6 @@ static struct target_type origin_target = {
.postsuspend = origin_postsuspend,
.status = origin_status,
.iterate_devices = origin_iterate_devices,
- .direct_access = origin_direct_access,
.dax_ops = &origin_dax_ops,
};
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 47fb56a6184a..229b2c543902 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -308,29 +308,6 @@ static int stripe_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_REMAPPED;
}
-static long stripe_direct_access(struct dm_target *ti, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
-{
- struct stripe_c *sc = ti->private;
- uint32_t stripe;
- struct block_device *bdev;
- struct blk_dax_ctl dax = {
- .size = size,
- };
- long ret;
-
- stripe_map_sector(sc, sector, &stripe, &dax.sector);
-
- dax.sector += sc->stripe[stripe].physical_start;
- bdev = sc->stripe[stripe].dev->bdev;
-
- ret = bdev_direct_access(bdev, &dax);
- *kaddr = dax.addr;
- *pfn = dax.pfn;
-
- return ret;
-}
-
static long stripe_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
void **kaddr, pfn_t *pfn, long size)
{
@@ -477,7 +454,6 @@ static struct target_type stripe_target = {
.status = stripe_status,
.iterate_devices = stripe_iterate_devices,
.io_hints = stripe_io_hints,
- .direct_access = stripe_direct_access,
.dax_ops = &stripe_dax_ops,
};
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 3ad16d9c9d5a..cd23be26384c 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -856,7 +856,7 @@ static bool dm_table_supports_dax(struct dm_table *t)
while (i < dm_table_get_num_targets(t)) {
ti = dm_table_get_target(t, i++);
- if (!ti->type->direct_access)
+ if (!ti->type->dax_ops)
return false;
if (!ti->type->iterate_devices ||
diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c
index ab072f53cf24..c3f55df90157 100644
--- a/drivers/md/dm-target.c
+++ b/drivers/md/dm-target.c
@@ -148,12 +148,6 @@ static void io_err_release_clone_rq(struct request *clone)
{
}
-static long io_err_direct_access(struct dm_target *ti, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
-{
- return -EIO;
-}
-
static long io_err_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
void **kaddr, pfn_t *pfn, long size)
{
@@ -174,7 +168,6 @@ static struct target_type error_target = {
.map_rq = io_err_map_rq,
.clone_and_map_rq = io_err_clone_and_map_rq,
.release_clone_rq = io_err_release_clone_rq,
- .direct_access = io_err_direct_access,
.dax_ops = &err_dax_ops,
};
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 5c5eeda0eb0a..497fb8adc660 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -910,7 +910,7 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
EXPORT_SYMBOL_GPL(dm_set_target_max_io_len);
static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
- void **kaddr, pfn_t *pfn, long size, bool blk)
+ void **kaddr, pfn_t *pfn, long size)
{
sector_t sector = dev_addr >> SECTOR_SHIFT;
struct dm_table *map;
@@ -929,9 +929,7 @@ static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
len = max_io_len(sector, ti) << SECTOR_SHIFT;
size = min(len, size);
- if (blk && ti->type->direct_access)
- ret = ti->type->direct_access(ti, sector, kaddr, pfn, size);
- else if (ti->type->dax_ops)
+ if (ti->type->dax_ops)
ret = ti->type->dax_ops->dm_direct_access(ti, dev_addr, kaddr,
pfn, size);
out:
@@ -939,23 +937,13 @@ static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
return min(ret, size);
}
-static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
-{
- struct mapped_device *md = bdev->bd_disk->private_data;
-
- return __dm_direct_access(md, sector << SECTOR_SHIFT, kaddr, pfn, size,
- true);
-}
-
static long dm_dax_direct_access(struct dax_inode *dax_inode,
phys_addr_t dev_addr, void **kaddr, pfn_t *pfn,
long size)
{
struct mapped_device *md = dax_inode_get_private(dax_inode);
- return __dm_direct_access(md, dev_addr, kaddr, pfn, size,
- false);
+ return __dm_direct_access(md, dev_addr, kaddr, pfn, size);
}
/*
@@ -2769,7 +2757,6 @@ static const struct block_device_operations dm_blk_dops = {
.open = dm_blk_open,
.release = dm_blk_close,
.ioctl = dm_blk_ioctl,
- .direct_access = dm_blk_direct_access,
.getgeo = dm_blk_getgeo,
.pr_ops = &dm_pr_ops,
.owner = THIS_MODULE
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index d3d7de645e20..41781f853396 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -219,18 +219,9 @@ __weak long __pmem_direct_access(struct pmem_device *pmem, phys_addr_t dev_addr,
return pmem->size - pmem->pfn_pad - offset;
}
-static long pmem_blk_direct_access(struct block_device *bdev, sector_t sector,
- void **kaddr, pfn_t *pfn, long size)
-{
- struct pmem_device *pmem = bdev->bd_queue->queuedata;
-
- return __pmem_direct_access(pmem, sector * 512, kaddr, pfn, size);
-}
-
static const struct block_device_operations pmem_fops = {
.owner = THIS_MODULE,
.rw_page = pmem_rw_page,
- .direct_access = pmem_blk_direct_access,
.revalidate_disk = nvdimm_revalidate_disk,
};
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 67b0885b4d12..03140c93dbd1 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -31,8 +31,6 @@ static int dcssblk_open(struct block_device *bdev, fmode_t mode);
static void dcssblk_release(struct gendisk *disk, fmode_t mode);
static blk_qc_t dcssblk_make_request(struct request_queue *q,
struct bio *bio);
-static long dcssblk_blk_direct_access(struct block_device *bdev, sector_t secnum,
- void **kaddr, pfn_t *pfn, long size);
static long dcssblk_dax_direct_access(struct dax_inode *dax_inode,
phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size);
@@ -43,7 +41,6 @@ static const struct block_device_operations dcssblk_devops = {
.owner = THIS_MODULE,
.open = dcssblk_open,
.release = dcssblk_release,
- .direct_access = dcssblk_blk_direct_access,
};
static const struct dax_operations dcssblk_dax_ops = {
@@ -914,19 +911,6 @@ __dcssblk_direct_access(struct dcssblk_dev_info *dev_info, phys_addr_t offset,
}
static long
-dcssblk_blk_direct_access(struct block_device *bdev, sector_t secnum,
- void **kaddr, pfn_t *pfn, long size)
-{
- struct dcssblk_dev_info *dev_info;
-
- dev_info = bdev->bd_disk->private_data;
- if (!dev_info)
- return -ENODEV;
- return __dcssblk_direct_access(dev_info, secnum * 512, kaddr, pfn,
- size);
-}
-
-static long
dcssblk_dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
void **kaddr, pfn_t *pfn, long size)
{
diff --git a/fs/block_dev.c b/fs/block_dev.c
index a73f2388c515..ba0252736950 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -719,51 +719,6 @@ int bdev_write_page(struct block_device *bdev, sector_t sector,
EXPORT_SYMBOL_GPL(bdev_write_page);
/**
- * bdev_direct_access() - Get the address for directly-accessibly memory
- * @bdev: The device containing the memory
- * @dax: control and output parameters for ->direct_access
- *
- * If a block device is made up of directly addressable memory, this function
- * will tell the caller the PFN and the address of the memory. The address
- * may be directly dereferenced within the kernel without the need to call
- * ioremap(), kmap() or similar. The PFN is suitable for inserting into
- * page tables.
- *
- * Return: negative errno if an error occurs, otherwise the number of bytes
- * accessible at this address.
- */
-long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
-{
- sector_t sector = dax->sector;
- long avail, size = dax->size;
- const struct block_device_operations *ops = bdev->bd_disk->fops;
-
- /*
- * The device driver is allowed to sleep, in order to make the
- * memory directly accessible.
- */
- might_sleep();
-
- if (size < 0)
- return size;
- if (!blk_queue_dax(bdev_get_queue(bdev)) || !ops->direct_access)
- return -EOPNOTSUPP;
- if ((sector + DIV_ROUND_UP(size, 512)) >
- part_nr_sects_read(bdev->bd_part))
- return -ERANGE;
- sector += get_start_sect(bdev);
- if (sector % (PAGE_SIZE / 512))
- return -EINVAL;
- avail = ops->direct_access(bdev, sector, &dax->addr, &dax->pfn, size);
- if (!avail)
- return -ERANGE;
- if (avail > 0 && avail & ~PAGE_MASK)
- return -ENXIO;
- return min(avail, size);
-}
-EXPORT_SYMBOL_GPL(bdev_direct_access);
-
-/**
* bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address
* @bdev: host block device for @dax_inode
* @dax_inode: interface data and operations for a memory device
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3b3c5ce376fd..bb87390a29b1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1882,8 +1882,6 @@ struct block_device_operations {
int (*rw_page)(struct block_device *, sector_t, struct page *, bool);
int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
- long (*direct_access)(struct block_device *, sector_t, void **, pfn_t *,
- long);
unsigned int (*check_events) (struct gendisk *disk,
unsigned int clearing);
/* ->media_changed() is DEPRECATED, use ->check_events() instead */
@@ -1902,7 +1900,6 @@ extern int __blkdev_driver_ioctl(struct block_device *, fmode_t, unsigned int,
extern int bdev_read_page(struct block_device *, sector_t, struct page *);
extern int bdev_write_page(struct block_device *, sector_t, struct page *,
struct writeback_control *);
-extern long bdev_direct_access(struct block_device *, struct blk_dax_ctl *);
struct dax_inode;
extern long bdev_dax_direct_access(struct block_device *bdev,
struct dax_inode *dax_inode, struct blk_dax_ctl *dax);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 1b64f412bb45..6e8762f093d3 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -125,14 +125,6 @@ typedef void (*dm_io_hints_fn) (struct dm_target *ti,
*/
typedef int (*dm_busy_fn) (struct dm_target *ti);
-/*
- * Returns:
- * < 0 : error
- * >= 0 : the number of bytes accessible at the address
- */
-typedef long (*dm_direct_access_fn) (struct dm_target *ti, sector_t sector,
- void **kaddr, pfn_t *pfn, long size);
-
void dm_error(const char *message);
struct dm_dev {
@@ -185,7 +177,6 @@ struct target_type {
dm_busy_fn busy;
dm_iterate_devices_fn iterate_devices;
dm_io_hints_fn io_hints;
- dm_direct_access_fn direct_access;
const struct dm_dax_operations *dax_ops;
/* For internal device-mapper use. */
^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
2017-01-28 8:37 ` Dan Williams
(?)
@ 2017-01-30 12:26 ` Christoph Hellwig
2017-01-30 18:29 ` Dan Williams
-1 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2017-01-30 12:26 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm, snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
On Sat, Jan 28, 2017 at 12:37:14AM -0800, Dan Williams wrote:
> This is in preparation for removing the ->direct_access() method from
> block_device_operations.
I don't think mount_bdev has any business knowing about DAX.
Just call dax_get_by_host manually from the affected file systems for
now, and in the future we can have a pure-DAX mount_dax helper.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 01/17] dax: refactor dax-fs into a generic provider of dax inodes
2017-01-28 8:36 ` Dan Williams
(?)
@ 2017-01-30 12:28 ` Christoph Hellwig
2017-01-30 17:12 ` Dan Williams
-1 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2017-01-30 12:28 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm, snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
I really don't like the dax_inode name. Why not something like
dax_device or dax_region?
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
2017-01-28 8:36 ` Dan Williams
(?)
@ 2017-01-30 12:32 ` Christoph Hellwig
2017-01-30 18:16 ` Dan Williams
-1 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2017-01-30 12:32 UTC (permalink / raw)
To: Dan Williams
Cc: linux-nvdimm, snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
linux-fsdevel, ross.zwisler, hch
On Sat, Jan 28, 2017 at 12:36:58AM -0800, Dan Williams wrote:
> Provide a replacement for bdev_direct_access() that uses
> dax_operations.direct_access() instead of
> block_device_operations.direct_access(). Once all consumers of the old
> api have been converted bdev_direct_access() will be deleted.
>
> Given that block device partitioning decisions can cause dax page
> alignment constraints to be violated we still need to validate the
> block_device before calling the dax ->direct_access method.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> block/Kconfig | 1 +
> drivers/dax/super.c | 33 +++++++++++++++++++++++++++++++++
> fs/block_dev.c | 28 ++++++++++++++++++++++++++++
> include/linux/blkdev.h | 3 +++
> include/linux/dax.h | 2 ++
> 5 files changed, 67 insertions(+)
>
> diff --git a/block/Kconfig b/block/Kconfig
> index 8bf114a3858a..9be785173280 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -6,6 +6,7 @@ menuconfig BLOCK
> default y
> select SBITMAP
> select SRCU
> + select DAX
> help
> Provide block layer support for the kernel.
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index eb844ffea3cf..ab5b082df5dd 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -65,6 +65,39 @@ struct dax_inode {
> const struct dax_operations *ops;
> };
>
> +long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
> + void **kaddr, pfn_t *pfn, long size)
> +{
> + long avail;
> +
> + /*
> + * The device driver is allowed to sleep, in order to make the
> + * memory directly accessible.
> + */
> + might_sleep();
> +
> + if (!dax_inode)
> + return -EOPNOTSUPP;
> +
> + if (!dax_inode_alive(dax_inode))
> + return -ENXIO;
> +
> + if (size < 0)
> + return size;
> +
> + if (dev_addr % PAGE_SIZE)
> + return -EINVAL;
> +
> + avail = dax_inode->ops->direct_access(dax_inode, dev_addr, kaddr, pfn,
> + size);
> + if (!avail)
> + return -ERANGE;
> + if (avail > 0 && avail & ~PAGE_MASK)
> + return -ENXIO;
> + return min(avail, size);
> +}
> +EXPORT_SYMBOL_GPL(dax_direct_access);
> +
> bool dax_inode_alive(struct dax_inode *dax_inode)
> {
> lockdep_assert_held(&dax_srcu);
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index edb1d2b16b8f..bf4b51a3a412 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -18,6 +18,7 @@
> #include <linux/module.h>
> #include <linux/blkpg.h>
> #include <linux/magic.h>
> +#include <linux/dax.h>
> #include <linux/buffer_head.h>
> #include <linux/swap.h>
> #include <linux/pagevec.h>
> @@ -763,6 +764,33 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
> EXPORT_SYMBOL_GPL(bdev_direct_access);
>
> /**
> + * bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address
> + * @bdev: host block device for @dax_inode
> + * @dax_inode: interface data and operations for a memory device
> + * @dax: control and output parameters for ->direct_access
> + *
> + * Return: negative errno if an error occurs, otherwise the number of bytes
> + * accessible at this address.
> + *
> + * Locking: must be called with dax_read_lock() held
> + */
> +long bdev_dax_direct_access(struct block_device *bdev,
> + struct dax_inode *dax_inode, struct blk_dax_ctl *dax)
> +{
> + sector_t sector = dax->sector;
> +
> + if (!blk_queue_dax(bdev->bd_queue))
> + return -EOPNOTSUPP;
I don't think this should take a bdev - the caller should know if
it has a dax_inode. Also if you touch this anyway can we kill
the annoying struct blk_dax_ctl calling convention? Passing the
four arguments explicitly is just a lot more readable and understandable.
> + if ((sector + DIV_ROUND_UP(dax->size, 512))
> + > part_nr_sects_read(bdev->bd_part))
> + return -ERANGE;
> + sector += get_start_sect(bdev);
> + return dax_direct_access(dax_inode, sector * 512, &dax->addr,
> + &dax->pfn, dax->size);
And please switch to using bytes as the granularity given that we're
deadling with byte addressable memory.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 01/17] dax: refactor dax-fs into a generic provider of dax inodes
2017-01-30 12:28 ` Christoph Hellwig
@ 2017-01-30 17:12 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-30 17:12 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mike Snitzer, Matthew Wilcox, linux-nvdimm, linux-block, linux-fsdevel
On Mon, Jan 30, 2017 at 4:28 AM, Christoph Hellwig <hch@lst.de> wrote:
> I really don't like the dax_inode name. Why not something like
> dax_device or dax_region?
Fair enough, I'll switch struct dax_inode to dax_device and switch the
existing struct dax_dev to dax_info.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 01/17] dax: refactor dax-fs into a generic provider of dax inodes
@ 2017-01-30 17:12 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-30 17:12 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-nvdimm, Mike Snitzer, Toshi Kani, Matthew Wilcox,
linux-block, jmoyer, linux-fsdevel, Ross Zwisler
On Mon, Jan 30, 2017 at 4:28 AM, Christoph Hellwig <hch@lst.de> wrote:
> I really don't like the dax_inode name. Why not something like
> dax_device or dax_region?
Fair enough, I'll switch struct dax_inode to dax_device and switch the
existing struct dax_dev to dax_info.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
2017-01-30 12:32 ` Christoph Hellwig
@ 2017-01-30 18:16 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-30 18:16 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mike Snitzer, Matthew Wilcox, linux-nvdimm, linux-block, linux-fsdevel
On Mon, Jan 30, 2017 at 4:32 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Sat, Jan 28, 2017 at 12:36:58AM -0800, Dan Williams wrote:
>> Provide a replacement for bdev_direct_access() that uses
>> dax_operations.direct_access() instead of
>> block_device_operations.direct_access(). Once all consumers of the old
>> api have been converted bdev_direct_access() will be deleted.
>>
>> Given that block device partitioning decisions can cause dax page
>> alignment constraints to be violated we still need to validate the
>> block_device before calling the dax ->direct_access method.
>>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>> block/Kconfig | 1 +
>> drivers/dax/super.c | 33 +++++++++++++++++++++++++++++++++
>> fs/block_dev.c | 28 ++++++++++++++++++++++++++++
>> include/linux/blkdev.h | 3 +++
>> include/linux/dax.h | 2 ++
>> 5 files changed, 67 insertions(+)
>>
>> diff --git a/block/Kconfig b/block/Kconfig
>> index 8bf114a3858a..9be785173280 100644
>> --- a/block/Kconfig
>> +++ b/block/Kconfig
>> @@ -6,6 +6,7 @@ menuconfig BLOCK
>> default y
>> select SBITMAP
>> select SRCU
>> + select DAX
>> help
>> Provide block layer support for the kernel.
>>
>> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
>> index eb844ffea3cf..ab5b082df5dd 100644
>> --- a/drivers/dax/super.c
>> +++ b/drivers/dax/super.c
>> @@ -65,6 +65,39 @@ struct dax_inode {
>> const struct dax_operations *ops;
>> };
>>
>> +long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
>> + void **kaddr, pfn_t *pfn, long size)
>> +{
>> + long avail;
>> +
>> + /*
>> + * The device driver is allowed to sleep, in order to make the
>> + * memory directly accessible.
>> + */
>> + might_sleep();
>> +
>> + if (!dax_inode)
>> + return -EOPNOTSUPP;
>> +
>> + if (!dax_inode_alive(dax_inode))
>> + return -ENXIO;
>> +
>> + if (size < 0)
>> + return size;
>> +
>> + if (dev_addr % PAGE_SIZE)
>> + return -EINVAL;
>> +
>> + avail = dax_inode->ops->direct_access(dax_inode, dev_addr, kaddr, pfn,
>> + size);
>> + if (!avail)
>> + return -ERANGE;
>> + if (avail > 0 && avail & ~PAGE_MASK)
>> + return -ENXIO;
>> + return min(avail, size);
>> +}
>> +EXPORT_SYMBOL_GPL(dax_direct_access);
>> +
>> bool dax_inode_alive(struct dax_inode *dax_inode)
>> {
>> lockdep_assert_held(&dax_srcu);
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index edb1d2b16b8f..bf4b51a3a412 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -18,6 +18,7 @@
>> #include <linux/module.h>
>> #include <linux/blkpg.h>
>> #include <linux/magic.h>
>> +#include <linux/dax.h>
>> #include <linux/buffer_head.h>
>> #include <linux/swap.h>
>> #include <linux/pagevec.h>
>> @@ -763,6 +764,33 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
>> EXPORT_SYMBOL_GPL(bdev_direct_access);
>>
>> /**
>> + * bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address
>> + * @bdev: host block device for @dax_inode
>> + * @dax_inode: interface data and operations for a memory device
>> + * @dax: control and output parameters for ->direct_access
>> + *
>> + * Return: negative errno if an error occurs, otherwise the number of bytes
>> + * accessible at this address.
>> + *
>> + * Locking: must be called with dax_read_lock() held
>> + */
>> +long bdev_dax_direct_access(struct block_device *bdev,
>> + struct dax_inode *dax_inode, struct blk_dax_ctl *dax)
>> +{
>> + sector_t sector = dax->sector;
>> +
>> + if (!blk_queue_dax(bdev->bd_queue))
>> + return -EOPNOTSUPP;
>
> I don't think this should take a bdev - the caller should know if
> it has a dax_inode. Also if you touch this anyway can we kill
> the annoying struct blk_dax_ctl calling convention? Passing the
> four arguments explicitly is just a lot more readable and understandable.
Ok, now that dax_map_atomic() is gone, it's much easier to remove
struct blk_dax_ctl.
We can also move the partition alignment checks to be a one-time check
at bdev_dax_capable() time and kill bdev_dax_direct_access() in favor
of calling dax_direct_access() directly.
>> + if ((sector + DIV_ROUND_UP(dax->size, 512))
>> + > part_nr_sects_read(bdev->bd_part))
>> + return -ERANGE;
>> + sector += get_start_sect(bdev);
>> + return dax_direct_access(dax_inode, sector * 512, &dax->addr,
>> + &dax->pfn, dax->size);
>
> And please switch to using bytes as the granularity given that we're
> deadling with byte addressable memory.
dax_direct_access() does take a byte aligned physical address, but it
needs to be at least page aligned since we are returning a pfn_t...
Hmm, perhaps the input should be raw page frame number. We could
reduce one of the arguments by making the current 'pfn_t *' parameter
an in/out-parameter.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
@ 2017-01-30 18:16 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-30 18:16 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-nvdimm, Mike Snitzer, Toshi Kani, Matthew Wilcox,
linux-block, jmoyer, linux-fsdevel, Ross Zwisler
On Mon, Jan 30, 2017 at 4:32 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Sat, Jan 28, 2017 at 12:36:58AM -0800, Dan Williams wrote:
>> Provide a replacement for bdev_direct_access() that uses
>> dax_operations.direct_access() instead of
>> block_device_operations.direct_access(). Once all consumers of the old
>> api have been converted bdev_direct_access() will be deleted.
>>
>> Given that block device partitioning decisions can cause dax page
>> alignment constraints to be violated we still need to validate the
>> block_device before calling the dax ->direct_access method.
>>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>> block/Kconfig | 1 +
>> drivers/dax/super.c | 33 +++++++++++++++++++++++++++++++++
>> fs/block_dev.c | 28 ++++++++++++++++++++++++++++
>> include/linux/blkdev.h | 3 +++
>> include/linux/dax.h | 2 ++
>> 5 files changed, 67 insertions(+)
>>
>> diff --git a/block/Kconfig b/block/Kconfig
>> index 8bf114a3858a..9be785173280 100644
>> --- a/block/Kconfig
>> +++ b/block/Kconfig
>> @@ -6,6 +6,7 @@ menuconfig BLOCK
>> default y
>> select SBITMAP
>> select SRCU
>> + select DAX
>> help
>> Provide block layer support for the kernel.
>>
>> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
>> index eb844ffea3cf..ab5b082df5dd 100644
>> --- a/drivers/dax/super.c
>> +++ b/drivers/dax/super.c
>> @@ -65,6 +65,39 @@ struct dax_inode {
>> const struct dax_operations *ops;
>> };
>>
>> +long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
>> + void **kaddr, pfn_t *pfn, long size)
>> +{
>> + long avail;
>> +
>> + /*
>> + * The device driver is allowed to sleep, in order to make the
>> + * memory directly accessible.
>> + */
>> + might_sleep();
>> +
>> + if (!dax_inode)
>> + return -EOPNOTSUPP;
>> +
>> + if (!dax_inode_alive(dax_inode))
>> + return -ENXIO;
>> +
>> + if (size < 0)
>> + return size;
>> +
>> + if (dev_addr % PAGE_SIZE)
>> + return -EINVAL;
>> +
>> + avail = dax_inode->ops->direct_access(dax_inode, dev_addr, kaddr, pfn,
>> + size);
>> + if (!avail)
>> + return -ERANGE;
>> + if (avail > 0 && avail & ~PAGE_MASK)
>> + return -ENXIO;
>> + return min(avail, size);
>> +}
>> +EXPORT_SYMBOL_GPL(dax_direct_access);
>> +
>> bool dax_inode_alive(struct dax_inode *dax_inode)
>> {
>> lockdep_assert_held(&dax_srcu);
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index edb1d2b16b8f..bf4b51a3a412 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -18,6 +18,7 @@
>> #include <linux/module.h>
>> #include <linux/blkpg.h>
>> #include <linux/magic.h>
>> +#include <linux/dax.h>
>> #include <linux/buffer_head.h>
>> #include <linux/swap.h>
>> #include <linux/pagevec.h>
>> @@ -763,6 +764,33 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
>> EXPORT_SYMBOL_GPL(bdev_direct_access);
>>
>> /**
>> + * bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address
>> + * @bdev: host block device for @dax_inode
>> + * @dax_inode: interface data and operations for a memory device
>> + * @dax: control and output parameters for ->direct_access
>> + *
>> + * Return: negative errno if an error occurs, otherwise the number of bytes
>> + * accessible at this address.
>> + *
>> + * Locking: must be called with dax_read_lock() held
>> + */
>> +long bdev_dax_direct_access(struct block_device *bdev,
>> + struct dax_inode *dax_inode, struct blk_dax_ctl *dax)
>> +{
>> + sector_t sector = dax->sector;
>> +
>> + if (!blk_queue_dax(bdev->bd_queue))
>> + return -EOPNOTSUPP;
>
> I don't think this should take a bdev - the caller should know if
> it has a dax_inode. Also if you touch this anyway can we kill
> the annoying struct blk_dax_ctl calling convention? Passing the
> four arguments explicitly is just a lot more readable and understandable.
Ok, now that dax_map_atomic() is gone, it's much easier to remove
struct blk_dax_ctl.
We can also move the partition alignment checks to be a one-time check
at bdev_dax_capable() time and kill bdev_dax_direct_access() in favor
of calling dax_direct_access() directly.
>> + if ((sector + DIV_ROUND_UP(dax->size, 512))
>> + > part_nr_sects_read(bdev->bd_part))
>> + return -ERANGE;
>> + sector += get_start_sect(bdev);
>> + return dax_direct_access(dax_inode, sector * 512, &dax->addr,
>> + &dax->pfn, dax->size);
>
> And please switch to using bytes as the granularity given that we're
> deadling with byte addressable memory.
dax_direct_access() does take a byte aligned physical address, but it
needs to be at least page aligned since we are returning a pfn_t...
Hmm, perhaps the input should be raw page frame number. We could
reduce one of the arguments by making the current 'pfn_t *' parameter
an in/out-parameter.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
2017-01-30 12:26 ` Christoph Hellwig
@ 2017-01-30 18:29 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-30 18:29 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mike Snitzer, Matthew Wilcox, linux-nvdimm, linux-block, linux-fsdevel
On Mon, Jan 30, 2017 at 4:26 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Sat, Jan 28, 2017 at 12:37:14AM -0800, Dan Williams wrote:
>> This is in preparation for removing the ->direct_access() method from
>> block_device_operations.
>
> I don't think mount_bdev has any business knowing about DAX.
> Just call dax_get_by_host manually from the affected file systems for
> now, and in the future we can have a pure-DAX mount_dax helper.
Ok, since we already need dax_get_by_host() in the blkdev_writepages()
path I can sprinkle a few more of those calls and leave mount_bdev
alone.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
@ 2017-01-30 18:29 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-30 18:29 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-nvdimm, Mike Snitzer, Toshi Kani, Matthew Wilcox,
linux-block, jmoyer, linux-fsdevel, Ross Zwisler
On Mon, Jan 30, 2017 at 4:26 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Sat, Jan 28, 2017 at 12:37:14AM -0800, Dan Williams wrote:
>> This is in preparation for removing the ->direct_access() method from
>> block_device_operations.
>
> I don't think mount_bdev has any business knowing about DAX.
> Just call dax_get_by_host manually from the affected file systems for
> now, and in the future we can have a pure-DAX mount_dax helper.
Ok, since we already need dax_get_by_host() in the blkdev_writepages()
path I can sprinkle a few more of those calls and leave mount_bdev
alone.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
2017-01-30 18:29 ` Dan Williams
@ 2017-02-01 8:08 ` Christoph Hellwig
-1 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2017-02-01 8:08 UTC (permalink / raw)
To: Dan Williams
Cc: Mike Snitzer, Matthew Wilcox, linux-nvdimm, linux-block,
linux-fsdevel, Christoph Hellwig
On Mon, Jan 30, 2017 at 10:29:12AM -0800, Dan Williams wrote:
> On Mon, Jan 30, 2017 at 4:26 AM, Christoph Hellwig <hch@lst.de> wrote:
> > On Sat, Jan 28, 2017 at 12:37:14AM -0800, Dan Williams wrote:
> >> This is in preparation for removing the ->direct_access() method from
> >> block_device_operations.
> >
> > I don't think mount_bdev has any business knowing about DAX.
> > Just call dax_get_by_host manually from the affected file systems for
> > now, and in the future we can have a pure-DAX mount_dax helper.
>
> Ok, since we already need dax_get_by_host() in the blkdev_writepages()
> path I can sprinkle a few more of those calls and leave mount_bdev
> alone.
Huh? I thought we stopped using DAX I/O for the block device nodes
a while ago?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
@ 2017-02-01 8:08 ` Christoph Hellwig
0 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2017-02-01 8:08 UTC (permalink / raw)
To: Dan Williams
Cc: Christoph Hellwig, linux-nvdimm, Mike Snitzer, Toshi Kani,
Matthew Wilcox, linux-block, jmoyer, linux-fsdevel, Ross Zwisler
On Mon, Jan 30, 2017 at 10:29:12AM -0800, Dan Williams wrote:
> On Mon, Jan 30, 2017 at 4:26 AM, Christoph Hellwig <hch@lst.de> wrote:
> > On Sat, Jan 28, 2017 at 12:37:14AM -0800, Dan Williams wrote:
> >> This is in preparation for removing the ->direct_access() method from
> >> block_device_operations.
> >
> > I don't think mount_bdev has any business knowing about DAX.
> > Just call dax_get_by_host manually from the affected file systems for
> > now, and in the future we can have a pure-DAX mount_dax helper.
>
> Ok, since we already need dax_get_by_host() in the blkdev_writepages()
> path I can sprinkle a few more of those calls and leave mount_bdev
> alone.
Huh? I thought we stopped using DAX I/O for the block device nodes
a while ago?
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
2017-01-30 18:16 ` Dan Williams
@ 2017-02-01 8:10 ` Christoph Hellwig
-1 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2017-02-01 8:10 UTC (permalink / raw)
To: Dan Williams
Cc: Mike Snitzer, Matthew Wilcox, linux-nvdimm, linux-block,
linux-fsdevel, Christoph Hellwig
On Mon, Jan 30, 2017 at 10:16:29AM -0800, Dan Williams wrote:
> Ok, now that dax_map_atomic() is gone, it's much easier to remove
> struct blk_dax_ctl.
>
> We can also move the partition alignment checks to be a one-time check
> at bdev_dax_capable() time and kill bdev_dax_direct_access() in favor
> of calling dax_direct_access() directly.
Yes, please.
> >> + if ((sector + DIV_ROUND_UP(dax->size, 512))
> >> + > part_nr_sects_read(bdev->bd_part))
> >> + return -ERANGE;
> >> + sector += get_start_sect(bdev);
> >> + return dax_direct_access(dax_inode, sector * 512, &dax->addr,
> >> + &dax->pfn, dax->size);
> >
> > And please switch to using bytes as the granularity given that we're
> > deadling with byte addressable memory.
>
> dax_direct_access() does take a byte aligned physical address, but it
> needs to be at least page aligned since we are returning a pfn_t...
>
> Hmm, perhaps the input should be raw page frame number. We could
> reduce one of the arguments by making the current 'pfn_t *' parameter
> an in/out-parameter.
In/Out parameters are always a bit problematic in terms of API clarity.
And updating a device-relative address with an absolute physical one
sounds like an odd API for sure.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
@ 2017-02-01 8:10 ` Christoph Hellwig
0 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2017-02-01 8:10 UTC (permalink / raw)
To: Dan Williams
Cc: Christoph Hellwig, linux-nvdimm, Mike Snitzer, Toshi Kani,
Matthew Wilcox, linux-block, jmoyer, linux-fsdevel, Ross Zwisler
On Mon, Jan 30, 2017 at 10:16:29AM -0800, Dan Williams wrote:
> Ok, now that dax_map_atomic() is gone, it's much easier to remove
> struct blk_dax_ctl.
>
> We can also move the partition alignment checks to be a one-time check
> at bdev_dax_capable() time and kill bdev_dax_direct_access() in favor
> of calling dax_direct_access() directly.
Yes, please.
> >> + if ((sector + DIV_ROUND_UP(dax->size, 512))
> >> + > part_nr_sects_read(bdev->bd_part))
> >> + return -ERANGE;
> >> + sector += get_start_sect(bdev);
> >> + return dax_direct_access(dax_inode, sector * 512, &dax->addr,
> >> + &dax->pfn, dax->size);
> >
> > And please switch to using bytes as the granularity given that we're
> > deadling with byte addressable memory.
>
> dax_direct_access() does take a byte aligned physical address, but it
> needs to be at least page aligned since we are returning a pfn_t...
>
> Hmm, perhaps the input should be raw page frame number. We could
> reduce one of the arguments by making the current 'pfn_t *' parameter
> an in/out-parameter.
In/Out parameters are always a bit problematic in terms of API clarity.
And updating a device-relative address with an absolute physical one
sounds like an odd API for sure.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
2017-02-01 8:08 ` Christoph Hellwig
@ 2017-02-01 9:16 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-02-01 9:16 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mike Snitzer, Matthew Wilcox, linux-nvdimm, linux-block, linux-fsdevel
On Wed, Feb 1, 2017 at 12:08 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Jan 30, 2017 at 10:29:12AM -0800, Dan Williams wrote:
>> On Mon, Jan 30, 2017 at 4:26 AM, Christoph Hellwig <hch@lst.de> wrote:
>> > On Sat, Jan 28, 2017 at 12:37:14AM -0800, Dan Williams wrote:
>> >> This is in preparation for removing the ->direct_access() method from
>> >> block_device_operations.
>> >
>> > I don't think mount_bdev has any business knowing about DAX.
>> > Just call dax_get_by_host manually from the affected file systems for
>> > now, and in the future we can have a pure-DAX mount_dax helper.
>>
>> Ok, since we already need dax_get_by_host() in the blkdev_writepages()
>> path I can sprinkle a few more of those calls and leave mount_bdev
>> alone.
>
> Huh? I thought we stopped using DAX I/O for the block device nodes
> a while ago?
Oh, yeah, you're right. The blkdev_writepages() call to
dax_writeback_mapping_range() is likely leftover dead code. I'll clean
it up.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
@ 2017-02-01 9:16 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-02-01 9:16 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-nvdimm, Mike Snitzer, Toshi Kani, Matthew Wilcox,
linux-block, jmoyer, linux-fsdevel, Ross Zwisler
On Wed, Feb 1, 2017 at 12:08 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Jan 30, 2017 at 10:29:12AM -0800, Dan Williams wrote:
>> On Mon, Jan 30, 2017 at 4:26 AM, Christoph Hellwig <hch@lst.de> wrote:
>> > On Sat, Jan 28, 2017 at 12:37:14AM -0800, Dan Williams wrote:
>> >> This is in preparation for removing the ->direct_access() method from
>> >> block_device_operations.
>> >
>> > I don't think mount_bdev has any business knowing about DAX.
>> > Just call dax_get_by_host manually from the affected file systems for
>> > now, and in the future we can have a pure-DAX mount_dax helper.
>>
>> Ok, since we already need dax_get_by_host() in the blkdev_writepages()
>> path I can sprinkle a few more of those calls and leave mount_bdev
>> alone.
>
> Huh? I thought we stopped using DAX I/O for the block device nodes
> a while ago?
Oh, yeah, you're right. The blkdev_writepages() call to
dax_writeback_mapping_range() is likely leftover dead code. I'll clean
it up.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
2017-02-01 8:10 ` Christoph Hellwig
@ 2017-02-01 9:21 ` Dan Williams
-1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-02-01 9:21 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mike Snitzer, Matthew Wilcox, linux-nvdimm, linux-block, linux-fsdevel
On Wed, Feb 1, 2017 at 12:10 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Jan 30, 2017 at 10:16:29AM -0800, Dan Williams wrote:
>> Ok, now that dax_map_atomic() is gone, it's much easier to remove
>> struct blk_dax_ctl.
>>
>> We can also move the partition alignment checks to be a one-time check
>> at bdev_dax_capable() time and kill bdev_dax_direct_access() in favor
>> of calling dax_direct_access() directly.
>
> Yes, please.
>
>> >> + if ((sector + DIV_ROUND_UP(dax->size, 512))
>> >> + > part_nr_sects_read(bdev->bd_part))
>> >> + return -ERANGE;
>> >> + sector += get_start_sect(bdev);
>> >> + return dax_direct_access(dax_inode, sector * 512, &dax->addr,
>> >> + &dax->pfn, dax->size);
>> >
>> > And please switch to using bytes as the granularity given that we're
>> > deadling with byte addressable memory.
>>
>> dax_direct_access() does take a byte aligned physical address, but it
>> needs to be at least page aligned since we are returning a pfn_t...
>>
>> Hmm, perhaps the input should be raw page frame number. We could
>> reduce one of the arguments by making the current 'pfn_t *' parameter
>> an in/out-parameter.
>
> In/Out parameters are always a bit problematic in terms of API clarity.
> And updating a device-relative address with an absolute physical one
> sounds like an odd API for sure.
Yes, it does, and I thought better of it shortly after sending that. How about:
long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
unsigned long nr_pages, void **kaddr, pfn_t *pfn)
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
@ 2017-02-01 9:21 ` Dan Williams
0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-02-01 9:21 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-nvdimm, Mike Snitzer, Toshi Kani, Matthew Wilcox,
linux-block, jmoyer, linux-fsdevel, Ross Zwisler
On Wed, Feb 1, 2017 at 12:10 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Jan 30, 2017 at 10:16:29AM -0800, Dan Williams wrote:
>> Ok, now that dax_map_atomic() is gone, it's much easier to remove
>> struct blk_dax_ctl.
>>
>> We can also move the partition alignment checks to be a one-time check
>> at bdev_dax_capable() time and kill bdev_dax_direct_access() in favor
>> of calling dax_direct_access() directly.
>
> Yes, please.
>
>> >> + if ((sector + DIV_ROUND_UP(dax->size, 512))
>> >> + > part_nr_sects_read(bdev->bd_part))
>> >> + return -ERANGE;
>> >> + sector += get_start_sect(bdev);
>> >> + return dax_direct_access(dax_inode, sector * 512, &dax->addr,
>> >> + &dax->pfn, dax->size);
>> >
>> > And please switch to using bytes as the granularity given that we're
>> > deadling with byte addressable memory.
>>
>> dax_direct_access() does take a byte aligned physical address, but it
>> needs to be at least page aligned since we are returning a pfn_t...
>>
>> Hmm, perhaps the input should be raw page frame number. We could
>> reduce one of the arguments by making the current 'pfn_t *' parameter
>> an in/out-parameter.
>
> In/Out parameters are always a bit problematic in terms of API clarity.
> And updating a device-relative address with an absolute physical one
> sounds like an odd API for sure.
Yes, it does, and I thought better of it shortly after sending that. How about:
long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
unsigned long nr_pages, void **kaddr, pfn_t *pfn)
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
2017-02-01 9:21 ` Dan Williams
@ 2017-02-01 9:28 ` Christoph Hellwig
-1 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2017-02-01 9:28 UTC (permalink / raw)
To: Dan Williams
Cc: Mike Snitzer, Matthew Wilcox, linux-nvdimm, linux-block,
linux-fsdevel, Christoph Hellwig
On Wed, Feb 01, 2017 at 01:21:40AM -0800, Dan Williams wrote:
> > In/Out parameters are always a bit problematic in terms of API clarity.
> > And updating a device-relative address with an absolute physical one
> > sounds like an odd API for sure.
>
> Yes, it does, and I thought better of it shortly after sending that. How about:
>
> long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
> unsigned long nr_pages, void **kaddr, pfn_t *pfn)
Yes, that looks good to me.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
@ 2017-02-01 9:28 ` Christoph Hellwig
0 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2017-02-01 9:28 UTC (permalink / raw)
To: Dan Williams
Cc: Christoph Hellwig, linux-nvdimm, Mike Snitzer, Toshi Kani,
Matthew Wilcox, linux-block, jmoyer, linux-fsdevel, Ross Zwisler
On Wed, Feb 01, 2017 at 01:21:40AM -0800, Dan Williams wrote:
> > In/Out parameters are always a bit problematic in terms of API clarity.
> > And updating a device-relative address with an absolute physical one
> > sounds like an odd API for sure.
>
> Yes, it does, and I thought better of it shortly after sending that. How about:
>
> long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
> unsigned long nr_pages, void **kaddr, pfn_t *pfn)
Yes, that looks good to me.
^ permalink raw reply [flat|nested] 55+ messages in thread
end of thread, other threads:[~2017-02-01 9:28 UTC | newest]
Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-28 8:36 [RFC PATCH 00/17] introduce a dax_inode for dax_operations Dan Williams
2017-01-28 8:36 ` Dan Williams
2017-01-28 8:36 ` [RFC PATCH 01/17] dax: refactor dax-fs into a generic provider of dax inodes Dan Williams
2017-01-28 8:36 ` Dan Williams
2017-01-30 12:28 ` Christoph Hellwig
2017-01-30 17:12 ` Dan Williams
2017-01-30 17:12 ` Dan Williams
2017-01-28 8:36 ` [RFC PATCH 02/17] dax: convert dax_inode locking to srcu Dan Williams
2017-01-28 8:36 ` Dan Williams
2017-01-28 8:36 ` [RFC PATCH 03/17] dax: add a facility to lookup a dax inode by 'host' device name Dan Williams
2017-01-28 8:36 ` Dan Williams
2017-01-28 8:36 ` [RFC PATCH 04/17] dax: introduce dax_operations Dan Williams
2017-01-28 8:36 ` Dan Williams
2017-01-28 8:36 ` [RFC PATCH 05/17] pmem: add dax_operations support Dan Williams
2017-01-28 8:36 ` Dan Williams
2017-01-28 8:36 ` [RFC PATCH 06/17] axon_ram: " Dan Williams
2017-01-28 8:36 ` Dan Williams
2017-01-28 8:36 ` [RFC PATCH 07/17] brd: " Dan Williams
2017-01-28 8:36 ` Dan Williams
2017-01-28 8:36 ` [RFC PATCH 08/17] dcssblk: " Dan Williams
2017-01-28 8:36 ` Dan Williams
2017-01-28 8:36 ` [RFC PATCH 09/17] block: kill bdev_dax_capable() Dan Williams
2017-01-28 8:36 ` Dan Williams
2017-01-28 8:36 ` [RFC PATCH 10/17] block: introduce bdev_dax_direct_access() Dan Williams
2017-01-28 8:36 ` Dan Williams
2017-01-30 12:32 ` Christoph Hellwig
2017-01-30 18:16 ` Dan Williams
2017-01-30 18:16 ` Dan Williams
2017-02-01 8:10 ` Christoph Hellwig
2017-02-01 8:10 ` Christoph Hellwig
2017-02-01 9:21 ` Dan Williams
2017-02-01 9:21 ` Dan Williams
2017-02-01 9:28 ` Christoph Hellwig
2017-02-01 9:28 ` Christoph Hellwig
2017-01-28 8:37 ` [RFC PATCH 11/17] dm: add dax_operations support (producer) Dan Williams
2017-01-28 8:37 ` Dan Williams
2017-01-28 8:37 ` [RFC PATCH 12/17] dm: add dax_operations support (consumer) Dan Williams
2017-01-28 8:37 ` Dan Williams
2017-01-28 8:37 ` [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure Dan Williams
2017-01-28 8:37 ` Dan Williams
2017-01-30 12:26 ` Christoph Hellwig
2017-01-30 18:29 ` Dan Williams
2017-01-30 18:29 ` Dan Williams
2017-02-01 8:08 ` Christoph Hellwig
2017-02-01 8:08 ` Christoph Hellwig
2017-02-01 9:16 ` Dan Williams
2017-02-01 9:16 ` Dan Williams
2017-01-28 8:37 ` [RFC PATCH 14/17] ext2, ext4, xfs: retrieve dax_inode through iomap operations Dan Williams
2017-01-28 8:37 ` Dan Williams
2017-01-28 8:37 ` [RFC PATCH 15/17] Revert "block: use DAX for partition table reads" Dan Williams
2017-01-28 8:37 ` Dan Williams
2017-01-28 8:37 ` [RFC PATCH 16/17] fs, dax: convert filesystem-dax to bdev_dax_direct_access Dan Williams
2017-01-28 8:37 ` Dan Williams
2017-01-28 8:37 ` [RFC PATCH 17/17] block: remove block_device_operations.direct_access and related infrastructure Dan Williams
2017-01-28 8:37 ` Dan Williams
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.