All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/17] introduce a dax_inode for dax_operations
@ 2017-01-28  8:36 ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

Recently there was an effort to introduce dax_operations to unwind the
abuse of the user-copy api in the pmem api [1]. Christoph noted that we
should not add new block-dax operations as it is further abuse of struct
block_device [2].

The ->direct_access() method in block_device_operations was an expedient
way to get the filesystem-dax capability bootstrapped. However, looking
forward to native persistent memory filesystems, they can forgo the
block layer and mount directly on a provider of dax services, a dax
inode.

For the time being, since current dax capable filesystems are block
based, we need a facility to look up this dax object via the
block-device name. If this approach looks reasonable I'll follow up with
reworking the proposed ->copy_from_iter(), ->flush(), and ->clear() dax
operations into this new scheme.

These patches survive a run of the libnvdimm unit tests, but I have not
tested the non-libnvdimm dax drivers.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008586.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008638.html

---

Dan Williams (17):
      dax: refactor dax-fs into a generic provider of dax inodes
      dax: convert dax_inode locking to srcu
      dax: add a facility to lookup a dax inode by 'host' device name
      dax: introduce dax_operations
      pmem: add dax_operations support
      axon_ram: add dax_operations support
      brd: add dax_operations support
      dcssblk: add dax_operations support
      block: kill bdev_dax_capable()
      block: introduce bdev_dax_direct_access()
      dm: add dax_operations support (producer)
      dm: add dax_operations support (consumer)
      fs: update mount_bdev() to lookup dax infrastructure
      ext2, ext4, xfs: retrieve dax_inode through iomap operations
      Revert "block: use DAX for partition table reads"
      fs, dax: convert filesystem-dax to bdev_dax_direct_access
      block: remove block_device_operations.direct_access and related infrastructure


 arch/powerpc/platforms/Kconfig  |    1 
 arch/powerpc/sysdev/axonram.c   |   37 +++
 block/Kconfig                   |    1 
 block/partition-generic.c       |   17 --
 drivers/Makefile                |    2 
 drivers/block/Kconfig           |    1 
 drivers/block/brd.c             |   48 +++-
 drivers/dax/Kconfig             |    9 +
 drivers/dax/Makefile            |    5 
 drivers/dax/dax.h               |   19 +-
 drivers/dax/device-dax.h        |   25 ++
 drivers/dax/device.c            |  257 ++++-------------------
 drivers/dax/pmem.c              |    2 
 drivers/dax/super.c             |  434 +++++++++++++++++++++++++++++++++++++++
 drivers/md/Kconfig              |    1 
 drivers/md/dm-core.h            |    3 
 drivers/md/dm-linear.c          |   15 +
 drivers/md/dm-snap.c            |    8 +
 drivers/md/dm-stripe.c          |   16 +
 drivers/md/dm-table.c           |    2 
 drivers/md/dm-target.c          |   10 +
 drivers/md/dm.c                 |   43 +++-
 drivers/nvdimm/Kconfig          |    1 
 drivers/nvdimm/pmem.c           |   46 +++-
 drivers/nvdimm/pmem.h           |    7 -
 drivers/s390/block/Kconfig      |    1 
 drivers/s390/block/dcssblk.c    |   41 +++-
 fs/block_dev.c                  |   75 ++-----
 fs/dax.c                        |  149 ++++++-------
 fs/ext2/inode.c                 |    1 
 fs/ext4/inode.c                 |    1 
 fs/iomap.c                      |    3 
 fs/super.c                      |   32 +++
 fs/xfs/xfs_aops.c               |   13 +
 fs/xfs/xfs_aops.h               |    1 
 fs/xfs/xfs_buf.h                |    1 
 fs/xfs/xfs_iomap.c              |    1 
 fs/xfs/xfs_super.c              |    3 
 include/linux/blkdev.h          |    7 -
 include/linux/dax.h             |   29 ++-
 include/linux/device-mapper.h   |   16 +
 include/linux/fs.h              |    1 
 include/linux/iomap.h           |    1 
 tools/testing/nvdimm/Kbuild     |    6 -
 tools/testing/nvdimm/pmem-dax.c |   12 -
 45 files changed, 927 insertions(+), 477 deletions(-)
 create mode 100644 drivers/dax/device-dax.h
 rename drivers/dax/{dax.c => device.c} (74%)
 create mode 100644 drivers/dax/super.c
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH 00/17] introduce a dax_inode for dax_operations
@ 2017-01-28  8:36 ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

Recently there was an effort to introduce dax_operations to unwind the
abuse of the user-copy api in the pmem api [1]. Christoph noted that we
should not add new block-dax operations as it is further abuse of struct
block_device [2].

The ->direct_access() method in block_device_operations was an expedient
way to get the filesystem-dax capability bootstrapped. However, looking
forward to native persistent memory filesystems, they can forgo the
block layer and mount directly on a provider of dax services, a dax
inode.

For the time being, since current dax capable filesystems are block
based, we need a facility to look up this dax object via the
block-device name. If this approach looks reasonable I'll follow up with
reworking the proposed ->copy_from_iter(), ->flush(), and ->clear() dax
operations into this new scheme.

These patches survive a run of the libnvdimm unit tests, but I have not
tested the non-libnvdimm dax drivers.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008586.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008638.html

---

Dan Williams (17):
      dax: refactor dax-fs into a generic provider of dax inodes
      dax: convert dax_inode locking to srcu
      dax: add a facility to lookup a dax inode by 'host' device name
      dax: introduce dax_operations
      pmem: add dax_operations support
      axon_ram: add dax_operations support
      brd: add dax_operations support
      dcssblk: add dax_operations support
      block: kill bdev_dax_capable()
      block: introduce bdev_dax_direct_access()
      dm: add dax_operations support (producer)
      dm: add dax_operations support (consumer)
      fs: update mount_bdev() to lookup dax infrastructure
      ext2, ext4, xfs: retrieve dax_inode through iomap operations
      Revert "block: use DAX for partition table reads"
      fs, dax: convert filesystem-dax to bdev_dax_direct_access
      block: remove block_device_operations.direct_access and related infrastructure


 arch/powerpc/platforms/Kconfig  |    1 
 arch/powerpc/sysdev/axonram.c   |   37 +++
 block/Kconfig                   |    1 
 block/partition-generic.c       |   17 --
 drivers/Makefile                |    2 
 drivers/block/Kconfig           |    1 
 drivers/block/brd.c             |   48 +++-
 drivers/dax/Kconfig             |    9 +
 drivers/dax/Makefile            |    5 
 drivers/dax/dax.h               |   19 +-
 drivers/dax/device-dax.h        |   25 ++
 drivers/dax/device.c            |  257 ++++-------------------
 drivers/dax/pmem.c              |    2 
 drivers/dax/super.c             |  434 +++++++++++++++++++++++++++++++++++++++
 drivers/md/Kconfig              |    1 
 drivers/md/dm-core.h            |    3 
 drivers/md/dm-linear.c          |   15 +
 drivers/md/dm-snap.c            |    8 +
 drivers/md/dm-stripe.c          |   16 +
 drivers/md/dm-table.c           |    2 
 drivers/md/dm-target.c          |   10 +
 drivers/md/dm.c                 |   43 +++-
 drivers/nvdimm/Kconfig          |    1 
 drivers/nvdimm/pmem.c           |   46 +++-
 drivers/nvdimm/pmem.h           |    7 -
 drivers/s390/block/Kconfig      |    1 
 drivers/s390/block/dcssblk.c    |   41 +++-
 fs/block_dev.c                  |   75 ++-----
 fs/dax.c                        |  149 ++++++-------
 fs/ext2/inode.c                 |    1 
 fs/ext4/inode.c                 |    1 
 fs/iomap.c                      |    3 
 fs/super.c                      |   32 +++
 fs/xfs/xfs_aops.c               |   13 +
 fs/xfs/xfs_aops.h               |    1 
 fs/xfs/xfs_buf.h                |    1 
 fs/xfs/xfs_iomap.c              |    1 
 fs/xfs/xfs_super.c              |    3 
 include/linux/blkdev.h          |    7 -
 include/linux/dax.h             |   29 ++-
 include/linux/device-mapper.h   |   16 +
 include/linux/fs.h              |    1 
 include/linux/iomap.h           |    1 
 tools/testing/nvdimm/Kbuild     |    6 -
 tools/testing/nvdimm/pmem-dax.c |   12 -
 45 files changed, 927 insertions(+), 477 deletions(-)
 create mode 100644 drivers/dax/device-dax.h
 rename drivers/dax/{dax.c => device.c} (74%)
 create mode 100644 drivers/dax/super.c

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH 01/17] dax: refactor dax-fs into a generic provider of dax inodes
  2017-01-28  8:36 ` Dan Williams
@ 2017-01-28  8:36   ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

We want dax capable drivers to be able to publish a set of dax
operations [1]. However, we do not want to further abuse block_devices
to advertise these operations. Instead we will attach these operations
to a dax inode and add a lookup mechanism to go from block device path
to a dax inode. A dax capable driver like pmem or brd is responsible for
registering a dax inode, alongside a block device, and then a dax
capable filesystem is responsible for retrieving the dax inode by path
name if it wants to call dax_operations.

For now, we refactor the dax pseudo-fs to be a generic facility, rather
than an implementation detail, of the device-dax use case. Where a "dax
inode" is just an inode + dax infrastructure, and "Device DAX" is a
mapping service layered on top of that base inode. "Filesystem DAX" is
then a mapping service that layers a filesystem on top of the base dax
inode. Filesystem DAX goes through a block_device for now, but perhaps
directly to a dax inode in the future, or for new pmem-only filesystems.

[1]: https://lkml.org/lkml/2017/1/19/880

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/Makefile            |    2 
 drivers/dax/Kconfig         |    8 +
 drivers/dax/Makefile        |    5 +
 drivers/dax/dax.h           |   24 ++-
 drivers/dax/device-dax.h    |   25 +++
 drivers/dax/device.c        |  241 +++++----------------------------
 drivers/dax/pmem.c          |    2 
 drivers/dax/super.c         |  310 +++++++++++++++++++++++++++++++++++++++++++
 tools/testing/nvdimm/Kbuild |    6 -
 9 files changed, 402 insertions(+), 221 deletions(-)
 create mode 100644 drivers/dax/device-dax.h
 rename drivers/dax/{dax.c => device.c} (75%)
 create mode 100644 drivers/dax/super.c

diff --git a/drivers/Makefile b/drivers/Makefile
index 060026a02f59..17f42e4a6717 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -68,7 +68,7 @@ obj-$(CONFIG_PARPORT)		+= parport/
 obj-$(CONFIG_NVM)		+= lightnvm/
 obj-y				+= base/ block/ misc/ mfd/ nfc/
 obj-$(CONFIG_LIBNVDIMM)		+= nvdimm/
-obj-$(CONFIG_DEV_DAX)		+= dax/
+obj-$(CONFIG_DAX)		+= dax/
 obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf/
 obj-$(CONFIG_NUBUS)		+= nubus/
 obj-y				+= macintosh/
diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index 3e2ab3b14eea..39bcbf4c5e40 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -1,6 +1,11 @@
-menuconfig DEV_DAX
+menuconfig DAX
 	tristate "DAX: direct access to differentiated memory"
 	default m if NVDIMM_DAX
+
+if DAX
+
+config DEV_DAX
+	tristate "Device DAX: direct access mapping device"
 	depends on TRANSPARENT_HUGEPAGE
 	help
 	  Support raw access to differentiated (persistence, bandwidth,
@@ -10,7 +15,6 @@ menuconfig DEV_DAX
 	  baseline memory pool.  Mappings of a /dev/daxX.Y device impose
 	  restrictions that make the mapping behavior deterministic.
 
-if DEV_DAX
 
 config DEV_DAX_PMEM
 	tristate "PMEM DAX: direct access to persistent memory"
diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
index 27c54e38478a..dc7422530462 100644
--- a/drivers/dax/Makefile
+++ b/drivers/dax/Makefile
@@ -1,4 +1,7 @@
-obj-$(CONFIG_DEV_DAX) += dax.o
+obj-$(CONFIG_DAX) += dax.o
+obj-$(CONFIG_DEV_DAX) += device_dax.o
 obj-$(CONFIG_DEV_DAX_PMEM) += dax_pmem.o
 
+dax-y := super.o
 dax_pmem-y := pmem.o
+device_dax-y := device.o
diff --git a/drivers/dax/dax.h b/drivers/dax/dax.h
index ddd829ab58c0..def061aa75f4 100644
--- a/drivers/dax/dax.h
+++ b/drivers/dax/dax.h
@@ -1,5 +1,5 @@
 /*
- * Copyright(c) 2016 Intel Corporation. All rights reserved.
+ * Copyright(c) 2016 - 2017 Intel Corporation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of version 2 of the GNU General Public License as
@@ -12,14 +12,16 @@
  */
 #ifndef __DAX_H__
 #define __DAX_H__
-struct device;
-struct dax_dev;
-struct resource;
-struct dax_region;
-void dax_region_put(struct dax_region *dax_region);
-struct dax_region *alloc_dax_region(struct device *parent,
-		int region_id, struct resource *res, unsigned int align,
-		void *addr, unsigned long flags);
-struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
-		struct resource *res, int count);
+struct dax_inode;
+struct dax_inode *alloc_dax_inode(void *private);
+void put_dax_inode(struct dax_inode *dax_inode);
+bool dax_inode_alive(struct dax_inode *dax_inode);
+void kill_dax_inode(struct dax_inode *dax_inode);
+struct dax_inode *inode_to_dax_inode(struct inode *inode);
+struct inode *dax_inode_to_inode(struct dax_inode *dax_inode);
+void *dax_inode_get_private(struct dax_inode *dax_inode);
+int dax_inode_register(struct dax_inode *dax_inode,
+		const struct file_operations *fops, struct module *owner,
+		struct kobject *parent);
+void dax_inode_unregister(struct dax_inode *dax_inode);
 #endif /* __DAX_H__ */
diff --git a/drivers/dax/device-dax.h b/drivers/dax/device-dax.h
new file mode 100644
index 000000000000..c9b7e9cc227e
--- /dev/null
+++ b/drivers/dax/device-dax.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#ifndef __DEVICE_DAX_H__
+#define __DEVICE_DAX_H__
+struct device;
+struct dax_dev;
+struct resource;
+struct dax_region;
+void dax_region_put(struct dax_region *dax_region);
+struct dax_region *alloc_dax_region(struct device *parent,
+		int region_id, struct resource *res, unsigned int align,
+		void *addr, unsigned long flags);
+struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
+		struct resource *res, int count);
+#endif /* __DEVICE_DAX_H__ */
diff --git a/drivers/dax/dax.c b/drivers/dax/device.c
similarity index 75%
rename from drivers/dax/dax.c
rename to drivers/dax/device.c
index ed758b74ddf0..5b5572314929 100644
--- a/drivers/dax/dax.c
+++ b/drivers/dax/device.c
@@ -1,5 +1,5 @@
 /*
- * Copyright(c) 2016 Intel Corporation. All rights reserved.
+ * Copyright(c) 2016 - 2017 Intel Corporation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of version 2 of the GNU General Public License as
@@ -13,25 +13,14 @@
 #include <linux/pagemap.h>
 #include <linux/module.h>
 #include <linux/device.h>
-#include <linux/mount.h>
 #include <linux/pfn_t.h>
-#include <linux/hash.h>
-#include <linux/cdev.h>
 #include <linux/slab.h>
 #include <linux/dax.h>
 #include <linux/fs.h>
 #include <linux/mm.h>
 #include "dax.h"
 
-static dev_t dax_devt;
 static struct class *dax_class;
-static DEFINE_IDA(dax_minor_ida);
-static int nr_dax = CONFIG_NR_DEV_DAX;
-module_param(nr_dax, int, S_IRUGO);
-static struct vfsmount *dax_mnt;
-static struct kmem_cache *dax_cache __read_mostly;
-static struct super_block *dax_superblock __read_mostly;
-MODULE_PARM_DESC(nr_dax, "max number of device-dax instances");
 
 /**
  * struct dax_region - mapping infrastructure for dax devices
@@ -57,19 +46,16 @@ struct dax_region {
 /**
  * struct dax_dev - subdivision of a dax region
  * @region - parent region
- * @dev - device backing the character device
- * @cdev - core chardev data
- * @alive - !alive + rcu grace period == no new mappings can be established
+ * @dax_inode - core dax functionality
+ * @dev - device core
  * @id - child id in the region
  * @num_resources - number of physical address extents in this device
  * @res - array of physical address ranges
  */
 struct dax_dev {
 	struct dax_region *region;
-	struct inode *inode;
+	struct dax_inode *dax_inode;
 	struct device dev;
-	struct cdev cdev;
-	bool alive;
 	int id;
 	int num_resources;
 	struct resource res[0];
@@ -142,117 +128,6 @@ static const struct attribute_group *dax_region_attribute_groups[] = {
 	NULL,
 };
 
-static struct inode *dax_alloc_inode(struct super_block *sb)
-{
-	return kmem_cache_alloc(dax_cache, GFP_KERNEL);
-}
-
-static void dax_i_callback(struct rcu_head *head)
-{
-	struct inode *inode = container_of(head, struct inode, i_rcu);
-
-	kmem_cache_free(dax_cache, inode);
-}
-
-static void dax_destroy_inode(struct inode *inode)
-{
-	call_rcu(&inode->i_rcu, dax_i_callback);
-}
-
-static const struct super_operations dax_sops = {
-	.statfs = simple_statfs,
-	.alloc_inode = dax_alloc_inode,
-	.destroy_inode = dax_destroy_inode,
-	.drop_inode = generic_delete_inode,
-};
-
-static struct dentry *dax_mount(struct file_system_type *fs_type,
-		int flags, const char *dev_name, void *data)
-{
-	return mount_pseudo(fs_type, "dax:", &dax_sops, NULL, DAXFS_MAGIC);
-}
-
-static struct file_system_type dax_type = {
-	.name = "dax",
-	.mount = dax_mount,
-	.kill_sb = kill_anon_super,
-};
-
-static int dax_test(struct inode *inode, void *data)
-{
-	return inode->i_cdev == data;
-}
-
-static int dax_set(struct inode *inode, void *data)
-{
-	inode->i_cdev = data;
-	return 0;
-}
-
-static struct inode *dax_inode_get(struct cdev *cdev, dev_t devt)
-{
-	struct inode *inode;
-
-	inode = iget5_locked(dax_superblock, hash_32(devt + DAXFS_MAGIC, 31),
-			dax_test, dax_set, cdev);
-
-	if (!inode)
-		return NULL;
-
-	if (inode->i_state & I_NEW) {
-		inode->i_mode = S_IFCHR;
-		inode->i_flags = S_DAX;
-		inode->i_rdev = devt;
-		mapping_set_gfp_mask(&inode->i_data, GFP_USER);
-		unlock_new_inode(inode);
-	}
-	return inode;
-}
-
-static void init_once(void *inode)
-{
-	inode_init_once(inode);
-}
-
-static int dax_inode_init(void)
-{
-	int rc;
-
-	dax_cache = kmem_cache_create("dax_cache", sizeof(struct inode), 0,
-			(SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
-			 SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-			init_once);
-	if (!dax_cache)
-		return -ENOMEM;
-
-	rc = register_filesystem(&dax_type);
-	if (rc)
-		goto err_register_fs;
-
-	dax_mnt = kern_mount(&dax_type);
-	if (IS_ERR(dax_mnt)) {
-		rc = PTR_ERR(dax_mnt);
-		goto err_mount;
-	}
-	dax_superblock = dax_mnt->mnt_sb;
-
-	return 0;
-
- err_mount:
-	unregister_filesystem(&dax_type);
- err_register_fs:
-	kmem_cache_destroy(dax_cache);
-
-	return rc;
-}
-
-static void dax_inode_exit(void)
-{
-	kern_unmount(dax_mnt);
-	unregister_filesystem(&dax_type);
-	kmem_cache_destroy(dax_cache);
-}
-
 static void dax_region_free(struct kref *kref)
 {
 	struct dax_region *dax_region;
@@ -361,7 +236,7 @@ static int check_vma(struct dax_dev *dax_dev, struct vm_area_struct *vma,
 	struct device *dev = &dax_dev->dev;
 	unsigned long mask;
 
-	if (!dax_dev->alive)
+	if (!dax_inode_alive(dax_dev->dax_inode))
 		return -ENXIO;
 
 	/* prevent private mappings from being established */
@@ -542,7 +417,13 @@ static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
 
 	dev_dbg(&dax_dev->dev, "%s\n", __func__);
 
+	/*
+	 * We lock to check dax_inode liveness and will re-check at
+	 * fault time.
+	 */
+	rcu_read_lock();
 	rc = check_vma(dax_dev, vma, __func__);
+	rcu_read_unlock();
 	if (rc)
 		return rc;
 
@@ -588,12 +469,13 @@ static unsigned long dax_get_unmapped_area(struct file *filp,
 
 static int dax_open(struct inode *inode, struct file *filp)
 {
-	struct dax_dev *dax_dev;
+	struct dax_inode *dax_inode = inode_to_dax_inode(inode);
+	struct inode *__dax_inode = dax_inode_to_inode(dax_inode);
+	struct dax_dev *dax_dev = dax_inode_get_private(dax_inode);
 
-	dax_dev = container_of(inode->i_cdev, struct dax_dev, cdev);
 	dev_dbg(&dax_dev->dev, "%s\n", __func__);
-	inode->i_mapping = dax_dev->inode->i_mapping;
-	inode->i_mapping->host = dax_dev->inode;
+	inode->i_mapping = __dax_inode->i_mapping;
+	inode->i_mapping->host = __dax_inode;
 	filp->f_mapping = inode->i_mapping;
 	filp->private_data = dax_dev;
 	inode->i_flags = S_DAX;
@@ -622,32 +504,25 @@ static void dax_dev_release(struct device *dev)
 {
 	struct dax_dev *dax_dev = to_dax_dev(dev);
 	struct dax_region *dax_region = dax_dev->region;
+	struct dax_inode *dax_inode = dax_dev->dax_inode;
 
 	ida_simple_remove(&dax_region->ida, dax_dev->id);
-	ida_simple_remove(&dax_minor_ida, MINOR(dev->devt));
 	dax_region_put(dax_region);
-	iput(dax_dev->inode);
+	put_dax_inode(dax_inode);
 	kfree(dax_dev);
 }
 
 static void unregister_dax_dev(void *dev)
 {
 	struct dax_dev *dax_dev = to_dax_dev(dev);
-	struct cdev *cdev = &dax_dev->cdev;
+	struct dax_inode *dax_inode = dax_dev->dax_inode;
+	struct inode *inode = dax_inode_to_inode(dax_inode);
 
 	dev_dbg(dev, "%s\n", __func__);
 
-	/*
-	 * Note, rcu is not protecting the liveness of dax_dev, rcu is
-	 * ensuring that any fault handlers that might have seen
-	 * dax_dev->alive == true, have completed.  Any fault handlers
-	 * that start after synchronize_rcu() has started will abort
-	 * upon seeing dax_dev->alive == false.
-	 */
-	dax_dev->alive = false;
-	synchronize_rcu();
-	unmap_mapping_range(dax_dev->inode->i_mapping, 0, 0, 1);
-	cdev_del(cdev);
+	kill_dax_inode(dax_inode);
+	unmap_mapping_range(inode->i_mapping, 0, 0, 1);
+	dax_inode_unregister(dax_inode);
 	device_unregister(dev);
 }
 
@@ -655,11 +530,11 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
 		struct resource *res, int count)
 {
 	struct device *parent = dax_region->dev;
+	struct dax_inode *dax_inode;
 	struct dax_dev *dax_dev;
-	int rc = 0, minor, i;
+	struct inode *inode;
 	struct device *dev;
-	struct cdev *cdev;
-	dev_t dev_t;
+	int rc = 0, i;
 
 	dax_dev = kzalloc(sizeof(*dax_dev) + sizeof(*res) * count, GFP_KERNEL);
 	if (!dax_dev)
@@ -685,38 +560,27 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
 		goto err_id;
 	}
 
-	minor = ida_simple_get(&dax_minor_ida, 0, 0, GFP_KERNEL);
-	if (minor < 0) {
-		rc = minor;
-		goto err_minor;
-	}
-
-	dev_t = MKDEV(MAJOR(dax_devt), minor);
-	dev = &dax_dev->dev;
-	dax_dev->inode = dax_inode_get(&dax_dev->cdev, dev_t);
-	if (!dax_dev->inode) {
-		rc = -ENOMEM;
+	dax_inode = alloc_dax_inode(dax_dev);
+	if (!dax_inode)
 		goto err_inode;
-	}
 
-	/* device_initialize() so cdev can reference kobj parent */
+	/* initialize now so dax_inode_register() can reference dev->kobj */
+	dax_dev->dax_inode = dax_inode;
+	dev = &dax_dev->dev;
 	device_initialize(dev);
 
-	cdev = &dax_dev->cdev;
-	cdev_init(cdev, &dax_fops);
-	cdev->owner = parent->driver->owner;
-	cdev->kobj.parent = &dev->kobj;
-	rc = cdev_add(&dax_dev->cdev, dev_t, 1);
+	rc = dax_inode_register(dax_inode, &dax_fops,
+			parent->driver->owner, &dev->kobj);
 	if (rc)
-		goto err_cdev;
+		goto err_register;
 
 	/* from here on we're committed to teardown via dax_dev_release() */
 	dax_dev->num_resources = count;
-	dax_dev->alive = true;
 	dax_dev->region = dax_region;
 	kref_get(&dax_region->kref);
 
-	dev->devt = dev_t;
+	inode = dax_inode_to_inode(dax_inode);
+	dev->devt = inode->i_rdev;
 	dev->class = dax_class;
 	dev->parent = parent;
 	dev->groups = dax_attribute_groups;
@@ -734,11 +598,9 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
 
 	return dax_dev;
 
- err_cdev:
-	iput(dax_dev->inode);
+ err_register:
+	put_dax_inode(dax_inode);
  err_inode:
-	ida_simple_remove(&dax_minor_ida, minor);
- err_minor:
 	ida_simple_remove(&dax_region->ida, dax_dev->id);
  err_id:
 	kfree(dax_dev);
@@ -749,38 +611,13 @@ EXPORT_SYMBOL_GPL(devm_create_dax_dev);
 
 static int __init dax_init(void)
 {
-	int rc;
-
-	rc = dax_inode_init();
-	if (rc)
-		return rc;
-
-	nr_dax = max(nr_dax, 256);
-	rc = alloc_chrdev_region(&dax_devt, 0, nr_dax, "dax");
-	if (rc)
-		goto err_chrdev;
-
 	dax_class = class_create(THIS_MODULE, "dax");
-	if (IS_ERR(dax_class)) {
-		rc = PTR_ERR(dax_class);
-		goto err_class;
-	}
-
-	return 0;
-
- err_class:
-	unregister_chrdev_region(dax_devt, nr_dax);
- err_chrdev:
-	dax_inode_exit();
-	return rc;
+	return PTR_ERR_OR_ZERO(dax_class);
 }
 
 static void __exit dax_exit(void)
 {
 	class_destroy(dax_class);
-	unregister_chrdev_region(dax_devt, nr_dax);
-	ida_destroy(&dax_minor_ida);
-	dax_inode_exit();
 }
 
 MODULE_AUTHOR("Intel Corporation");
diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index 033f49b31fdc..9c98b1dd24c1 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -16,7 +16,7 @@
 #include <linux/pfn_t.h>
 #include "../nvdimm/pfn.h"
 #include "../nvdimm/nd.h"
-#include "dax.h"
+#include "device-dax.h"
 
 struct dax_pmem {
 	struct device *dev;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
new file mode 100644
index 000000000000..e6369b851619
--- /dev/null
+++ b/drivers/dax/super.c
@@ -0,0 +1,310 @@
+/*
+ * Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#include <linux/pagemap.h>
+#include <linux/module.h>
+#include <linux/mount.h>
+#include <linux/magic.h>
+#include <linux/cdev.h>
+#include <linux/hash.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+
+static int nr_dax = CONFIG_NR_DEV_DAX;
+module_param(nr_dax, int, S_IRUGO);
+MODULE_PARM_DESC(nr_dax, "max number of dax device instances");
+
+static dev_t dax_devt;
+static struct vfsmount *dax_mnt;
+static DEFINE_IDA(dax_minor_ida);
+static struct kmem_cache *dax_cache __read_mostly;
+static struct super_block *dax_superblock __read_mostly;
+
+/**
+ * struct dax_inode - anchor object for dax services
+ * @inode: core vfs
+ * @cdev: optional character interface for "device dax"
+ * @private: dax driver private data
+ * @alive: !alive + rcu grace period == no new operations / mappings
+ */
+struct dax_inode {
+	struct inode inode;
+	struct cdev cdev;
+	void *private;
+	bool alive;
+};
+
+bool dax_inode_alive(struct dax_inode *dax_inode)
+{
+	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
+			"dax operations require rcu_read_lock()\n");
+	return dax_inode->alive;
+}
+EXPORT_SYMBOL_GPL(dax_inode_alive);
+
+/*
+ * Note, rcu is not protecting the liveness of dax_inode, rcu is
+ * ensuring that any fault handlers or operations that might have seen
+ * dax_inode_alive(), have completed.  Any operations that start after
+ * synchronize_rcu() has run will abort upon seeing !dax_inode_alive().
+ */
+void kill_dax_inode(struct dax_inode *dax_inode)
+{
+	if (!dax_inode)
+		return;
+
+	dax_inode->alive = false;
+	synchronize_rcu();
+	dax_inode->private = NULL;
+}
+EXPORT_SYMBOL_GPL(kill_dax_inode);
+
+static struct inode *dax_alloc_inode(struct super_block *sb)
+{
+	struct dax_inode *dax_inode;
+
+	dax_inode = kmem_cache_alloc(dax_cache, GFP_KERNEL);
+	return &dax_inode->inode;
+}
+
+static struct dax_inode *to_dax_inode(struct inode *inode)
+{
+	return container_of(inode, struct dax_inode, inode);
+}
+
+static void dax_i_callback(struct rcu_head *head)
+{
+	struct inode *inode = container_of(head, struct inode, i_rcu);
+	struct dax_inode *dax_inode = to_dax_inode(inode);
+
+	ida_simple_remove(&dax_minor_ida, MINOR(inode->i_rdev));
+	kmem_cache_free(dax_cache, dax_inode);
+}
+
+static void dax_destroy_inode(struct inode *inode)
+{
+	struct dax_inode *dax_inode = to_dax_inode(inode);
+
+	WARN_ONCE(dax_inode->alive,
+			"kill_dax_inode() must be called before final iput()\n");
+	call_rcu(&inode->i_rcu, dax_i_callback);
+}
+
+static const struct super_operations dax_sops = {
+	.statfs = simple_statfs,
+	.alloc_inode = dax_alloc_inode,
+	.destroy_inode = dax_destroy_inode,
+	.drop_inode = generic_delete_inode,
+};
+
+static struct dentry *dax_mount(struct file_system_type *fs_type,
+		int flags, const char *dev_name, void *data)
+{
+	return mount_pseudo(fs_type, "dax:", &dax_sops, NULL, DAXFS_MAGIC);
+}
+
+static struct file_system_type dax_type = {
+	.name = "dax",
+	.mount = dax_mount,
+	.kill_sb = kill_anon_super,
+};
+
+static int dax_test(struct inode *inode, void *data)
+{
+	dev_t devt = *(dev_t *) data;
+
+	return inode->i_rdev == devt;
+}
+
+static int dax_set(struct inode *inode, void *data)
+{
+	dev_t devt = *(dev_t *) data;
+
+	inode->i_rdev = devt;
+	return 0;
+}
+
+static struct dax_inode *dax_inode_get(dev_t devt)
+{
+	struct dax_inode *dax_inode;
+	struct inode *inode;
+
+	inode = iget5_locked(dax_superblock, hash_32(devt + DAXFS_MAGIC, 31),
+			dax_test, dax_set, &devt);
+
+	if (!inode)
+		return NULL;
+
+	dax_inode = to_dax_inode(inode);
+	if (inode->i_state & I_NEW) {
+		dax_inode->alive = true;
+		inode->i_cdev = &dax_inode->cdev;
+		inode->i_mode = S_IFCHR;
+		inode->i_flags = S_DAX;
+		mapping_set_gfp_mask(&inode->i_data, GFP_USER);
+		unlock_new_inode(inode);
+	}
+
+	return dax_inode;
+}
+
+struct dax_inode *alloc_dax_inode(void *private)
+{
+	struct dax_inode *dax_inode;
+	dev_t devt;
+	int minor;
+
+	minor = ida_simple_get(&dax_minor_ida, 0, nr_dax, GFP_KERNEL);
+	if (minor < 0)
+		return NULL;
+
+	devt = MKDEV(MAJOR(dax_devt), minor);
+	dax_inode = dax_inode_get(devt);
+	if (!dax_inode)
+		goto err_inode;
+
+	dax_inode->private = private;
+	return dax_inode;
+
+ err_inode:
+	ida_simple_remove(&dax_minor_ida, minor);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(alloc_dax_inode);
+
+void put_dax_inode(struct dax_inode *dax_inode)
+{
+	if (!dax_inode)
+		return;
+	iput(&dax_inode->inode);
+}
+EXPORT_SYMBOL_GPL(put_dax_inode);
+
+/**
+ * inode_to_dax_inode: convert a public inode into its dax_inode
+ * @inode: An inode with i_cdev pointing to a dax_inode
+ */
+struct dax_inode *inode_to_dax_inode(struct inode *inode)
+{
+	struct cdev *cdev = inode->i_cdev;
+
+	return container_of(cdev, struct dax_inode, cdev);
+}
+EXPORT_SYMBOL_GPL(inode_to_dax_inode);
+
+struct inode *dax_inode_to_inode(struct dax_inode *dax_inode)
+{
+	return &dax_inode->inode;
+}
+EXPORT_SYMBOL_GPL(dax_inode_to_inode);
+
+void *dax_inode_get_private(struct dax_inode *dax_inode)
+{
+	return dax_inode->private;
+}
+EXPORT_SYMBOL_GPL(dax_inode_get_private);
+
+int dax_inode_register(struct dax_inode *dax_inode,
+		const struct file_operations *fops, struct module *owner,
+		struct kobject *parent)
+{
+	struct cdev *cdev = &dax_inode->cdev;
+	struct inode *inode = &dax_inode->inode;
+
+	cdev_init(cdev, fops);
+	cdev->owner = owner;
+	cdev->kobj.parent = parent;
+	return cdev_add(cdev, inode->i_rdev, 1);
+}
+EXPORT_SYMBOL_GPL(dax_inode_register);
+
+void dax_inode_unregister(struct dax_inode *dax_inode)
+{
+	struct cdev *cdev = &dax_inode->cdev;
+
+	cdev_del(cdev);
+}
+EXPORT_SYMBOL_GPL(dax_inode_unregister);
+
+static void init_once(void *_dax_inode)
+{
+	struct dax_inode *dax_inode = _dax_inode;
+	struct inode *inode = &dax_inode->inode;
+
+	inode_init_once(inode);
+}
+
+static int dax_inode_init(void)
+{
+	int rc;
+
+	dax_cache = kmem_cache_create("dax_cache", sizeof(struct dax_inode), 0,
+			(SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
+			 SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+			init_once);
+	if (!dax_cache)
+		return -ENOMEM;
+
+	rc = register_filesystem(&dax_type);
+	if (rc)
+		goto err_register_fs;
+
+	dax_mnt = kern_mount(&dax_type);
+	if (IS_ERR(dax_mnt)) {
+		rc = PTR_ERR(dax_mnt);
+		goto err_mount;
+	}
+	dax_superblock = dax_mnt->mnt_sb;
+
+	return 0;
+
+ err_mount:
+	unregister_filesystem(&dax_type);
+ err_register_fs:
+	kmem_cache_destroy(dax_cache);
+
+	return rc;
+}
+
+static void dax_inode_exit(void)
+{
+	kern_unmount(dax_mnt);
+	unregister_filesystem(&dax_type);
+	kmem_cache_destroy(dax_cache);
+}
+
+static int __init dax_fs_init(void)
+{
+	int rc;
+
+	rc = dax_inode_init();
+	if (rc)
+		return rc;
+
+	nr_dax = max(nr_dax, 256);
+	rc = alloc_chrdev_region(&dax_devt, 0, nr_dax, "dax");
+	if (rc)
+		dax_inode_exit();
+	return rc;
+}
+
+static void __exit dax_fs_exit(void)
+{
+	unregister_chrdev_region(dax_devt, nr_dax);
+	ida_destroy(&dax_minor_ida);
+	dax_inode_exit();
+}
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_LICENSE("GPL v2");
+subsys_initcall(dax_fs_init);
+module_exit(dax_fs_exit);
diff --git a/tools/testing/nvdimm/Kbuild b/tools/testing/nvdimm/Kbuild
index 405212be044a..a1ed891d239a 100644
--- a/tools/testing/nvdimm/Kbuild
+++ b/tools/testing/nvdimm/Kbuild
@@ -28,7 +28,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
 obj-$(CONFIG_ND_BLK) += nd_blk.o
 obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
 obj-$(CONFIG_ACPI_NFIT) += nfit.o
-obj-$(CONFIG_DEV_DAX) += dax.o
+obj-$(CONFIG_DEV_DAX) += device_dax.o
 obj-$(CONFIG_DEV_DAX_PMEM) += dax_pmem.o
 
 nfit-y := $(ACPI_SRC)/core.o
@@ -48,8 +48,8 @@ nd_blk-y += config_check.o
 nd_e820-y := $(NVDIMM_SRC)/e820.o
 nd_e820-y += config_check.o
 
-dax-y := $(DAX_SRC)/dax.o
-dax-y += config_check.o
+device_dax-y := $(DAX_SRC)/device.o
+device_dax-y += config_check.o
 
 dax_pmem-y := $(DAX_SRC)/pmem.o
 dax_pmem-y += config_check.o

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 01/17] dax: refactor dax-fs into a generic provider of dax inodes
@ 2017-01-28  8:36   ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

We want dax capable drivers to be able to publish a set of dax
operations [1]. However, we do not want to further abuse block_devices
to advertise these operations. Instead we will attach these operations
to a dax inode and add a lookup mechanism to go from block device path
to a dax inode. A dax capable driver like pmem or brd is responsible for
registering a dax inode, alongside a block device, and then a dax
capable filesystem is responsible for retrieving the dax inode by path
name if it wants to call dax_operations.

For now, we refactor the dax pseudo-fs to be a generic facility, rather
than an implementation detail, of the device-dax use case. Where a "dax
inode" is just an inode + dax infrastructure, and "Device DAX" is a
mapping service layered on top of that base inode. "Filesystem DAX" is
then a mapping service that layers a filesystem on top of the base dax
inode. Filesystem DAX goes through a block_device for now, but perhaps
directly to a dax inode in the future, or for new pmem-only filesystems.

[1]: https://lkml.org/lkml/2017/1/19/880

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/Makefile            |    2 
 drivers/dax/Kconfig         |    8 +
 drivers/dax/Makefile        |    5 +
 drivers/dax/dax.h           |   24 ++-
 drivers/dax/device-dax.h    |   25 +++
 drivers/dax/device.c        |  241 +++++----------------------------
 drivers/dax/pmem.c          |    2 
 drivers/dax/super.c         |  310 +++++++++++++++++++++++++++++++++++++++++++
 tools/testing/nvdimm/Kbuild |    6 -
 9 files changed, 402 insertions(+), 221 deletions(-)
 create mode 100644 drivers/dax/device-dax.h
 rename drivers/dax/{dax.c => device.c} (75%)
 create mode 100644 drivers/dax/super.c

diff --git a/drivers/Makefile b/drivers/Makefile
index 060026a02f59..17f42e4a6717 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -68,7 +68,7 @@ obj-$(CONFIG_PARPORT)		+= parport/
 obj-$(CONFIG_NVM)		+= lightnvm/
 obj-y				+= base/ block/ misc/ mfd/ nfc/
 obj-$(CONFIG_LIBNVDIMM)		+= nvdimm/
-obj-$(CONFIG_DEV_DAX)		+= dax/
+obj-$(CONFIG_DAX)		+= dax/
 obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf/
 obj-$(CONFIG_NUBUS)		+= nubus/
 obj-y				+= macintosh/
diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index 3e2ab3b14eea..39bcbf4c5e40 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -1,6 +1,11 @@
-menuconfig DEV_DAX
+menuconfig DAX
 	tristate "DAX: direct access to differentiated memory"
 	default m if NVDIMM_DAX
+
+if DAX
+
+config DEV_DAX
+	tristate "Device DAX: direct access mapping device"
 	depends on TRANSPARENT_HUGEPAGE
 	help
 	  Support raw access to differentiated (persistence, bandwidth,
@@ -10,7 +15,6 @@ menuconfig DEV_DAX
 	  baseline memory pool.  Mappings of a /dev/daxX.Y device impose
 	  restrictions that make the mapping behavior deterministic.
 
-if DEV_DAX
 
 config DEV_DAX_PMEM
 	tristate "PMEM DAX: direct access to persistent memory"
diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
index 27c54e38478a..dc7422530462 100644
--- a/drivers/dax/Makefile
+++ b/drivers/dax/Makefile
@@ -1,4 +1,7 @@
-obj-$(CONFIG_DEV_DAX) += dax.o
+obj-$(CONFIG_DAX) += dax.o
+obj-$(CONFIG_DEV_DAX) += device_dax.o
 obj-$(CONFIG_DEV_DAX_PMEM) += dax_pmem.o
 
+dax-y := super.o
 dax_pmem-y := pmem.o
+device_dax-y := device.o
diff --git a/drivers/dax/dax.h b/drivers/dax/dax.h
index ddd829ab58c0..def061aa75f4 100644
--- a/drivers/dax/dax.h
+++ b/drivers/dax/dax.h
@@ -1,5 +1,5 @@
 /*
- * Copyright(c) 2016 Intel Corporation. All rights reserved.
+ * Copyright(c) 2016 - 2017 Intel Corporation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of version 2 of the GNU General Public License as
@@ -12,14 +12,16 @@
  */
 #ifndef __DAX_H__
 #define __DAX_H__
-struct device;
-struct dax_dev;
-struct resource;
-struct dax_region;
-void dax_region_put(struct dax_region *dax_region);
-struct dax_region *alloc_dax_region(struct device *parent,
-		int region_id, struct resource *res, unsigned int align,
-		void *addr, unsigned long flags);
-struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
-		struct resource *res, int count);
+struct dax_inode;
+struct dax_inode *alloc_dax_inode(void *private);
+void put_dax_inode(struct dax_inode *dax_inode);
+bool dax_inode_alive(struct dax_inode *dax_inode);
+void kill_dax_inode(struct dax_inode *dax_inode);
+struct dax_inode *inode_to_dax_inode(struct inode *inode);
+struct inode *dax_inode_to_inode(struct dax_inode *dax_inode);
+void *dax_inode_get_private(struct dax_inode *dax_inode);
+int dax_inode_register(struct dax_inode *dax_inode,
+		const struct file_operations *fops, struct module *owner,
+		struct kobject *parent);
+void dax_inode_unregister(struct dax_inode *dax_inode);
 #endif /* __DAX_H__ */
diff --git a/drivers/dax/device-dax.h b/drivers/dax/device-dax.h
new file mode 100644
index 000000000000..c9b7e9cc227e
--- /dev/null
+++ b/drivers/dax/device-dax.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#ifndef __DEVICE_DAX_H__
+#define __DEVICE_DAX_H__
+struct device;
+struct dax_dev;
+struct resource;
+struct dax_region;
+void dax_region_put(struct dax_region *dax_region);
+struct dax_region *alloc_dax_region(struct device *parent,
+		int region_id, struct resource *res, unsigned int align,
+		void *addr, unsigned long flags);
+struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
+		struct resource *res, int count);
+#endif /* __DEVICE_DAX_H__ */
diff --git a/drivers/dax/dax.c b/drivers/dax/device.c
similarity index 75%
rename from drivers/dax/dax.c
rename to drivers/dax/device.c
index ed758b74ddf0..5b5572314929 100644
--- a/drivers/dax/dax.c
+++ b/drivers/dax/device.c
@@ -1,5 +1,5 @@
 /*
- * Copyright(c) 2016 Intel Corporation. All rights reserved.
+ * Copyright(c) 2016 - 2017 Intel Corporation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of version 2 of the GNU General Public License as
@@ -13,25 +13,14 @@
 #include <linux/pagemap.h>
 #include <linux/module.h>
 #include <linux/device.h>
-#include <linux/mount.h>
 #include <linux/pfn_t.h>
-#include <linux/hash.h>
-#include <linux/cdev.h>
 #include <linux/slab.h>
 #include <linux/dax.h>
 #include <linux/fs.h>
 #include <linux/mm.h>
 #include "dax.h"
 
-static dev_t dax_devt;
 static struct class *dax_class;
-static DEFINE_IDA(dax_minor_ida);
-static int nr_dax = CONFIG_NR_DEV_DAX;
-module_param(nr_dax, int, S_IRUGO);
-static struct vfsmount *dax_mnt;
-static struct kmem_cache *dax_cache __read_mostly;
-static struct super_block *dax_superblock __read_mostly;
-MODULE_PARM_DESC(nr_dax, "max number of device-dax instances");
 
 /**
  * struct dax_region - mapping infrastructure for dax devices
@@ -57,19 +46,16 @@ struct dax_region {
 /**
  * struct dax_dev - subdivision of a dax region
  * @region - parent region
- * @dev - device backing the character device
- * @cdev - core chardev data
- * @alive - !alive + rcu grace period == no new mappings can be established
+ * @dax_inode - core dax functionality
+ * @dev - device core
  * @id - child id in the region
  * @num_resources - number of physical address extents in this device
  * @res - array of physical address ranges
  */
 struct dax_dev {
 	struct dax_region *region;
-	struct inode *inode;
+	struct dax_inode *dax_inode;
 	struct device dev;
-	struct cdev cdev;
-	bool alive;
 	int id;
 	int num_resources;
 	struct resource res[0];
@@ -142,117 +128,6 @@ static const struct attribute_group *dax_region_attribute_groups[] = {
 	NULL,
 };
 
-static struct inode *dax_alloc_inode(struct super_block *sb)
-{
-	return kmem_cache_alloc(dax_cache, GFP_KERNEL);
-}
-
-static void dax_i_callback(struct rcu_head *head)
-{
-	struct inode *inode = container_of(head, struct inode, i_rcu);
-
-	kmem_cache_free(dax_cache, inode);
-}
-
-static void dax_destroy_inode(struct inode *inode)
-{
-	call_rcu(&inode->i_rcu, dax_i_callback);
-}
-
-static const struct super_operations dax_sops = {
-	.statfs = simple_statfs,
-	.alloc_inode = dax_alloc_inode,
-	.destroy_inode = dax_destroy_inode,
-	.drop_inode = generic_delete_inode,
-};
-
-static struct dentry *dax_mount(struct file_system_type *fs_type,
-		int flags, const char *dev_name, void *data)
-{
-	return mount_pseudo(fs_type, "dax:", &dax_sops, NULL, DAXFS_MAGIC);
-}
-
-static struct file_system_type dax_type = {
-	.name = "dax",
-	.mount = dax_mount,
-	.kill_sb = kill_anon_super,
-};
-
-static int dax_test(struct inode *inode, void *data)
-{
-	return inode->i_cdev == data;
-}
-
-static int dax_set(struct inode *inode, void *data)
-{
-	inode->i_cdev = data;
-	return 0;
-}
-
-static struct inode *dax_inode_get(struct cdev *cdev, dev_t devt)
-{
-	struct inode *inode;
-
-	inode = iget5_locked(dax_superblock, hash_32(devt + DAXFS_MAGIC, 31),
-			dax_test, dax_set, cdev);
-
-	if (!inode)
-		return NULL;
-
-	if (inode->i_state & I_NEW) {
-		inode->i_mode = S_IFCHR;
-		inode->i_flags = S_DAX;
-		inode->i_rdev = devt;
-		mapping_set_gfp_mask(&inode->i_data, GFP_USER);
-		unlock_new_inode(inode);
-	}
-	return inode;
-}
-
-static void init_once(void *inode)
-{
-	inode_init_once(inode);
-}
-
-static int dax_inode_init(void)
-{
-	int rc;
-
-	dax_cache = kmem_cache_create("dax_cache", sizeof(struct inode), 0,
-			(SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
-			 SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-			init_once);
-	if (!dax_cache)
-		return -ENOMEM;
-
-	rc = register_filesystem(&dax_type);
-	if (rc)
-		goto err_register_fs;
-
-	dax_mnt = kern_mount(&dax_type);
-	if (IS_ERR(dax_mnt)) {
-		rc = PTR_ERR(dax_mnt);
-		goto err_mount;
-	}
-	dax_superblock = dax_mnt->mnt_sb;
-
-	return 0;
-
- err_mount:
-	unregister_filesystem(&dax_type);
- err_register_fs:
-	kmem_cache_destroy(dax_cache);
-
-	return rc;
-}
-
-static void dax_inode_exit(void)
-{
-	kern_unmount(dax_mnt);
-	unregister_filesystem(&dax_type);
-	kmem_cache_destroy(dax_cache);
-}
-
 static void dax_region_free(struct kref *kref)
 {
 	struct dax_region *dax_region;
@@ -361,7 +236,7 @@ static int check_vma(struct dax_dev *dax_dev, struct vm_area_struct *vma,
 	struct device *dev = &dax_dev->dev;
 	unsigned long mask;
 
-	if (!dax_dev->alive)
+	if (!dax_inode_alive(dax_dev->dax_inode))
 		return -ENXIO;
 
 	/* prevent private mappings from being established */
@@ -542,7 +417,13 @@ static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
 
 	dev_dbg(&dax_dev->dev, "%s\n", __func__);
 
+	/*
+	 * We lock to check dax_inode liveness and will re-check at
+	 * fault time.
+	 */
+	rcu_read_lock();
 	rc = check_vma(dax_dev, vma, __func__);
+	rcu_read_unlock();
 	if (rc)
 		return rc;
 
@@ -588,12 +469,13 @@ static unsigned long dax_get_unmapped_area(struct file *filp,
 
 static int dax_open(struct inode *inode, struct file *filp)
 {
-	struct dax_dev *dax_dev;
+	struct dax_inode *dax_inode = inode_to_dax_inode(inode);
+	struct inode *__dax_inode = dax_inode_to_inode(dax_inode);
+	struct dax_dev *dax_dev = dax_inode_get_private(dax_inode);
 
-	dax_dev = container_of(inode->i_cdev, struct dax_dev, cdev);
 	dev_dbg(&dax_dev->dev, "%s\n", __func__);
-	inode->i_mapping = dax_dev->inode->i_mapping;
-	inode->i_mapping->host = dax_dev->inode;
+	inode->i_mapping = __dax_inode->i_mapping;
+	inode->i_mapping->host = __dax_inode;
 	filp->f_mapping = inode->i_mapping;
 	filp->private_data = dax_dev;
 	inode->i_flags = S_DAX;
@@ -622,32 +504,25 @@ static void dax_dev_release(struct device *dev)
 {
 	struct dax_dev *dax_dev = to_dax_dev(dev);
 	struct dax_region *dax_region = dax_dev->region;
+	struct dax_inode *dax_inode = dax_dev->dax_inode;
 
 	ida_simple_remove(&dax_region->ida, dax_dev->id);
-	ida_simple_remove(&dax_minor_ida, MINOR(dev->devt));
 	dax_region_put(dax_region);
-	iput(dax_dev->inode);
+	put_dax_inode(dax_inode);
 	kfree(dax_dev);
 }
 
 static void unregister_dax_dev(void *dev)
 {
 	struct dax_dev *dax_dev = to_dax_dev(dev);
-	struct cdev *cdev = &dax_dev->cdev;
+	struct dax_inode *dax_inode = dax_dev->dax_inode;
+	struct inode *inode = dax_inode_to_inode(dax_inode);
 
 	dev_dbg(dev, "%s\n", __func__);
 
-	/*
-	 * Note, rcu is not protecting the liveness of dax_dev, rcu is
-	 * ensuring that any fault handlers that might have seen
-	 * dax_dev->alive == true, have completed.  Any fault handlers
-	 * that start after synchronize_rcu() has started will abort
-	 * upon seeing dax_dev->alive == false.
-	 */
-	dax_dev->alive = false;
-	synchronize_rcu();
-	unmap_mapping_range(dax_dev->inode->i_mapping, 0, 0, 1);
-	cdev_del(cdev);
+	kill_dax_inode(dax_inode);
+	unmap_mapping_range(inode->i_mapping, 0, 0, 1);
+	dax_inode_unregister(dax_inode);
 	device_unregister(dev);
 }
 
@@ -655,11 +530,11 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
 		struct resource *res, int count)
 {
 	struct device *parent = dax_region->dev;
+	struct dax_inode *dax_inode;
 	struct dax_dev *dax_dev;
-	int rc = 0, minor, i;
+	struct inode *inode;
 	struct device *dev;
-	struct cdev *cdev;
-	dev_t dev_t;
+	int rc = 0, i;
 
 	dax_dev = kzalloc(sizeof(*dax_dev) + sizeof(*res) * count, GFP_KERNEL);
 	if (!dax_dev)
@@ -685,38 +560,27 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
 		goto err_id;
 	}
 
-	minor = ida_simple_get(&dax_minor_ida, 0, 0, GFP_KERNEL);
-	if (minor < 0) {
-		rc = minor;
-		goto err_minor;
-	}
-
-	dev_t = MKDEV(MAJOR(dax_devt), minor);
-	dev = &dax_dev->dev;
-	dax_dev->inode = dax_inode_get(&dax_dev->cdev, dev_t);
-	if (!dax_dev->inode) {
-		rc = -ENOMEM;
+	dax_inode = alloc_dax_inode(dax_dev);
+	if (!dax_inode)
 		goto err_inode;
-	}
 
-	/* device_initialize() so cdev can reference kobj parent */
+	/* initialize now so dax_inode_register() can reference dev->kobj */
+	dax_dev->dax_inode = dax_inode;
+	dev = &dax_dev->dev;
 	device_initialize(dev);
 
-	cdev = &dax_dev->cdev;
-	cdev_init(cdev, &dax_fops);
-	cdev->owner = parent->driver->owner;
-	cdev->kobj.parent = &dev->kobj;
-	rc = cdev_add(&dax_dev->cdev, dev_t, 1);
+	rc = dax_inode_register(dax_inode, &dax_fops,
+			parent->driver->owner, &dev->kobj);
 	if (rc)
-		goto err_cdev;
+		goto err_register;
 
 	/* from here on we're committed to teardown via dax_dev_release() */
 	dax_dev->num_resources = count;
-	dax_dev->alive = true;
 	dax_dev->region = dax_region;
 	kref_get(&dax_region->kref);
 
-	dev->devt = dev_t;
+	inode = dax_inode_to_inode(dax_inode);
+	dev->devt = inode->i_rdev;
 	dev->class = dax_class;
 	dev->parent = parent;
 	dev->groups = dax_attribute_groups;
@@ -734,11 +598,9 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
 
 	return dax_dev;
 
- err_cdev:
-	iput(dax_dev->inode);
+ err_register:
+	put_dax_inode(dax_inode);
  err_inode:
-	ida_simple_remove(&dax_minor_ida, minor);
- err_minor:
 	ida_simple_remove(&dax_region->ida, dax_dev->id);
  err_id:
 	kfree(dax_dev);
@@ -749,38 +611,13 @@ EXPORT_SYMBOL_GPL(devm_create_dax_dev);
 
 static int __init dax_init(void)
 {
-	int rc;
-
-	rc = dax_inode_init();
-	if (rc)
-		return rc;
-
-	nr_dax = max(nr_dax, 256);
-	rc = alloc_chrdev_region(&dax_devt, 0, nr_dax, "dax");
-	if (rc)
-		goto err_chrdev;
-
 	dax_class = class_create(THIS_MODULE, "dax");
-	if (IS_ERR(dax_class)) {
-		rc = PTR_ERR(dax_class);
-		goto err_class;
-	}
-
-	return 0;
-
- err_class:
-	unregister_chrdev_region(dax_devt, nr_dax);
- err_chrdev:
-	dax_inode_exit();
-	return rc;
+	return PTR_ERR_OR_ZERO(dax_class);
 }
 
 static void __exit dax_exit(void)
 {
 	class_destroy(dax_class);
-	unregister_chrdev_region(dax_devt, nr_dax);
-	ida_destroy(&dax_minor_ida);
-	dax_inode_exit();
 }
 
 MODULE_AUTHOR("Intel Corporation");
diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index 033f49b31fdc..9c98b1dd24c1 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -16,7 +16,7 @@
 #include <linux/pfn_t.h>
 #include "../nvdimm/pfn.h"
 #include "../nvdimm/nd.h"
-#include "dax.h"
+#include "device-dax.h"
 
 struct dax_pmem {
 	struct device *dev;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
new file mode 100644
index 000000000000..e6369b851619
--- /dev/null
+++ b/drivers/dax/super.c
@@ -0,0 +1,310 @@
+/*
+ * Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#include <linux/pagemap.h>
+#include <linux/module.h>
+#include <linux/mount.h>
+#include <linux/magic.h>
+#include <linux/cdev.h>
+#include <linux/hash.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+
+static int nr_dax = CONFIG_NR_DEV_DAX;
+module_param(nr_dax, int, S_IRUGO);
+MODULE_PARM_DESC(nr_dax, "max number of dax device instances");
+
+static dev_t dax_devt;
+static struct vfsmount *dax_mnt;
+static DEFINE_IDA(dax_minor_ida);
+static struct kmem_cache *dax_cache __read_mostly;
+static struct super_block *dax_superblock __read_mostly;
+
+/**
+ * struct dax_inode - anchor object for dax services
+ * @inode: core vfs
+ * @cdev: optional character interface for "device dax"
+ * @private: dax driver private data
+ * @alive: !alive + rcu grace period == no new operations / mappings
+ */
+struct dax_inode {
+	struct inode inode;
+	struct cdev cdev;
+	void *private;
+	bool alive;
+};
+
+bool dax_inode_alive(struct dax_inode *dax_inode)
+{
+	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
+			"dax operations require rcu_read_lock()\n");
+	return dax_inode->alive;
+}
+EXPORT_SYMBOL_GPL(dax_inode_alive);
+
+/*
+ * Note, rcu is not protecting the liveness of dax_inode, rcu is
+ * ensuring that any fault handlers or operations that might have seen
+ * dax_inode_alive(), have completed.  Any operations that start after
+ * synchronize_rcu() has run will abort upon seeing !dax_inode_alive().
+ */
+void kill_dax_inode(struct dax_inode *dax_inode)
+{
+	if (!dax_inode)
+		return;
+
+	dax_inode->alive = false;
+	synchronize_rcu();
+	dax_inode->private = NULL;
+}
+EXPORT_SYMBOL_GPL(kill_dax_inode);
+
+static struct inode *dax_alloc_inode(struct super_block *sb)
+{
+	struct dax_inode *dax_inode;
+
+	dax_inode = kmem_cache_alloc(dax_cache, GFP_KERNEL);
+	return &dax_inode->inode;
+}
+
+static struct dax_inode *to_dax_inode(struct inode *inode)
+{
+	return container_of(inode, struct dax_inode, inode);
+}
+
+static void dax_i_callback(struct rcu_head *head)
+{
+	struct inode *inode = container_of(head, struct inode, i_rcu);
+	struct dax_inode *dax_inode = to_dax_inode(inode);
+
+	ida_simple_remove(&dax_minor_ida, MINOR(inode->i_rdev));
+	kmem_cache_free(dax_cache, dax_inode);
+}
+
+static void dax_destroy_inode(struct inode *inode)
+{
+	struct dax_inode *dax_inode = to_dax_inode(inode);
+
+	WARN_ONCE(dax_inode->alive,
+			"kill_dax_inode() must be called before final iput()\n");
+	call_rcu(&inode->i_rcu, dax_i_callback);
+}
+
+static const struct super_operations dax_sops = {
+	.statfs = simple_statfs,
+	.alloc_inode = dax_alloc_inode,
+	.destroy_inode = dax_destroy_inode,
+	.drop_inode = generic_delete_inode,
+};
+
+static struct dentry *dax_mount(struct file_system_type *fs_type,
+		int flags, const char *dev_name, void *data)
+{
+	return mount_pseudo(fs_type, "dax:", &dax_sops, NULL, DAXFS_MAGIC);
+}
+
+static struct file_system_type dax_type = {
+	.name = "dax",
+	.mount = dax_mount,
+	.kill_sb = kill_anon_super,
+};
+
+static int dax_test(struct inode *inode, void *data)
+{
+	dev_t devt = *(dev_t *) data;
+
+	return inode->i_rdev == devt;
+}
+
+static int dax_set(struct inode *inode, void *data)
+{
+	dev_t devt = *(dev_t *) data;
+
+	inode->i_rdev = devt;
+	return 0;
+}
+
+static struct dax_inode *dax_inode_get(dev_t devt)
+{
+	struct dax_inode *dax_inode;
+	struct inode *inode;
+
+	inode = iget5_locked(dax_superblock, hash_32(devt + DAXFS_MAGIC, 31),
+			dax_test, dax_set, &devt);
+
+	if (!inode)
+		return NULL;
+
+	dax_inode = to_dax_inode(inode);
+	if (inode->i_state & I_NEW) {
+		dax_inode->alive = true;
+		inode->i_cdev = &dax_inode->cdev;
+		inode->i_mode = S_IFCHR;
+		inode->i_flags = S_DAX;
+		mapping_set_gfp_mask(&inode->i_data, GFP_USER);
+		unlock_new_inode(inode);
+	}
+
+	return dax_inode;
+}
+
+struct dax_inode *alloc_dax_inode(void *private)
+{
+	struct dax_inode *dax_inode;
+	dev_t devt;
+	int minor;
+
+	minor = ida_simple_get(&dax_minor_ida, 0, nr_dax, GFP_KERNEL);
+	if (minor < 0)
+		return NULL;
+
+	devt = MKDEV(MAJOR(dax_devt), minor);
+	dax_inode = dax_inode_get(devt);
+	if (!dax_inode)
+		goto err_inode;
+
+	dax_inode->private = private;
+	return dax_inode;
+
+ err_inode:
+	ida_simple_remove(&dax_minor_ida, minor);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(alloc_dax_inode);
+
+void put_dax_inode(struct dax_inode *dax_inode)
+{
+	if (!dax_inode)
+		return;
+	iput(&dax_inode->inode);
+}
+EXPORT_SYMBOL_GPL(put_dax_inode);
+
+/**
+ * inode_to_dax_inode: convert a public inode into its dax_inode
+ * @inode: An inode with i_cdev pointing to a dax_inode
+ */
+struct dax_inode *inode_to_dax_inode(struct inode *inode)
+{
+	struct cdev *cdev = inode->i_cdev;
+
+	return container_of(cdev, struct dax_inode, cdev);
+}
+EXPORT_SYMBOL_GPL(inode_to_dax_inode);
+
+struct inode *dax_inode_to_inode(struct dax_inode *dax_inode)
+{
+	return &dax_inode->inode;
+}
+EXPORT_SYMBOL_GPL(dax_inode_to_inode);
+
+void *dax_inode_get_private(struct dax_inode *dax_inode)
+{
+	return dax_inode->private;
+}
+EXPORT_SYMBOL_GPL(dax_inode_get_private);
+
+int dax_inode_register(struct dax_inode *dax_inode,
+		const struct file_operations *fops, struct module *owner,
+		struct kobject *parent)
+{
+	struct cdev *cdev = &dax_inode->cdev;
+	struct inode *inode = &dax_inode->inode;
+
+	cdev_init(cdev, fops);
+	cdev->owner = owner;
+	cdev->kobj.parent = parent;
+	return cdev_add(cdev, inode->i_rdev, 1);
+}
+EXPORT_SYMBOL_GPL(dax_inode_register);
+
+void dax_inode_unregister(struct dax_inode *dax_inode)
+{
+	struct cdev *cdev = &dax_inode->cdev;
+
+	cdev_del(cdev);
+}
+EXPORT_SYMBOL_GPL(dax_inode_unregister);
+
+static void init_once(void *_dax_inode)
+{
+	struct dax_inode *dax_inode = _dax_inode;
+	struct inode *inode = &dax_inode->inode;
+
+	inode_init_once(inode);
+}
+
+static int dax_inode_init(void)
+{
+	int rc;
+
+	dax_cache = kmem_cache_create("dax_cache", sizeof(struct dax_inode), 0,
+			(SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
+			 SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+			init_once);
+	if (!dax_cache)
+		return -ENOMEM;
+
+	rc = register_filesystem(&dax_type);
+	if (rc)
+		goto err_register_fs;
+
+	dax_mnt = kern_mount(&dax_type);
+	if (IS_ERR(dax_mnt)) {
+		rc = PTR_ERR(dax_mnt);
+		goto err_mount;
+	}
+	dax_superblock = dax_mnt->mnt_sb;
+
+	return 0;
+
+ err_mount:
+	unregister_filesystem(&dax_type);
+ err_register_fs:
+	kmem_cache_destroy(dax_cache);
+
+	return rc;
+}
+
+static void dax_inode_exit(void)
+{
+	kern_unmount(dax_mnt);
+	unregister_filesystem(&dax_type);
+	kmem_cache_destroy(dax_cache);
+}
+
+static int __init dax_fs_init(void)
+{
+	int rc;
+
+	rc = dax_inode_init();
+	if (rc)
+		return rc;
+
+	nr_dax = max(nr_dax, 256);
+	rc = alloc_chrdev_region(&dax_devt, 0, nr_dax, "dax");
+	if (rc)
+		dax_inode_exit();
+	return rc;
+}
+
+static void __exit dax_fs_exit(void)
+{
+	unregister_chrdev_region(dax_devt, nr_dax);
+	ida_destroy(&dax_minor_ida);
+	dax_inode_exit();
+}
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_LICENSE("GPL v2");
+subsys_initcall(dax_fs_init);
+module_exit(dax_fs_exit);
diff --git a/tools/testing/nvdimm/Kbuild b/tools/testing/nvdimm/Kbuild
index 405212be044a..a1ed891d239a 100644
--- a/tools/testing/nvdimm/Kbuild
+++ b/tools/testing/nvdimm/Kbuild
@@ -28,7 +28,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
 obj-$(CONFIG_ND_BLK) += nd_blk.o
 obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
 obj-$(CONFIG_ACPI_NFIT) += nfit.o
-obj-$(CONFIG_DEV_DAX) += dax.o
+obj-$(CONFIG_DEV_DAX) += device_dax.o
 obj-$(CONFIG_DEV_DAX_PMEM) += dax_pmem.o
 
 nfit-y := $(ACPI_SRC)/core.o
@@ -48,8 +48,8 @@ nd_blk-y += config_check.o
 nd_e820-y := $(NVDIMM_SRC)/e820.o
 nd_e820-y += config_check.o
 
-dax-y := $(DAX_SRC)/dax.o
-dax-y += config_check.o
+device_dax-y := $(DAX_SRC)/device.o
+device_dax-y += config_check.o
 
 dax_pmem-y := $(DAX_SRC)/pmem.o
 dax_pmem-y += config_check.o


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 02/17] dax: convert dax_inode locking to srcu
  2017-01-28  8:36 ` Dan Williams
@ 2017-01-28  8:36   ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

In preparation for adding dax_operations that perform ->direct_access()
and user copy operations relative to a dax_inode, convert the existing
dax_inode locking to srcu. Some dax drivers need to sleep in their
->direct_access() methods and user copying may fault / sleep.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/Kconfig  |    1 +
 drivers/dax/device.c |   18 +++++++++---------
 drivers/dax/super.c  |   20 ++++++++++++++++----
 include/linux/dax.h  |    3 +++
 4 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index 39bcbf4c5e40..b7053eafd88e 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -1,5 +1,6 @@
 menuconfig DAX
 	tristate "DAX: direct access to differentiated memory"
+	select SRCU
 	default m if NVDIMM_DAX
 
 if DAX
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 5b5572314929..af06d0bfd6ea 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -333,16 +333,16 @@ static int __dax_dev_fault(struct dax_dev *dax_dev, struct vm_area_struct *vma,
 
 static int dax_dev_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
-	int rc;
+	int rc, id;
 	struct file *filp = vma->vm_file;
 	struct dax_dev *dax_dev = filp->private_data;
 
 	dev_dbg(&dax_dev->dev, "%s: %s: %s (%#lx - %#lx)\n", __func__,
 			current->comm, (vmf->flags & FAULT_FLAG_WRITE)
 			? "write" : "read", vma->vm_start, vma->vm_end);
-	rcu_read_lock();
+	id = dax_read_lock();
 	rc = __dax_dev_fault(dax_dev, vma, vmf);
-	rcu_read_unlock();
+	dax_read_unlock(id);
 
 	return rc;
 }
@@ -390,7 +390,7 @@ static int __dax_dev_pmd_fault(struct dax_dev *dax_dev,
 static int dax_dev_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
 		pmd_t *pmd, unsigned int flags)
 {
-	int rc;
+	int rc, id;
 	struct file *filp = vma->vm_file;
 	struct dax_dev *dax_dev = filp->private_data;
 
@@ -398,9 +398,9 @@ static int dax_dev_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
 			current->comm, (flags & FAULT_FLAG_WRITE)
 			? "write" : "read", vma->vm_start, vma->vm_end);
 
-	rcu_read_lock();
+	id = dax_read_lock();
 	rc = __dax_dev_pmd_fault(dax_dev, vma, addr, pmd, flags);
-	rcu_read_unlock();
+	dax_read_unlock(id);
 
 	return rc;
 }
@@ -412,8 +412,8 @@ static const struct vm_operations_struct dax_dev_vm_ops = {
 
 static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
 {
+	int rc, id;
 	struct dax_dev *dax_dev = filp->private_data;
-	int rc;
 
 	dev_dbg(&dax_dev->dev, "%s\n", __func__);
 
@@ -421,9 +421,9 @@ static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
 	 * We lock to check dax_inode liveness and will re-check at
 	 * fault time.
 	 */
-	rcu_read_lock();
+	id = dax_read_lock();
 	rc = check_vma(dax_dev, vma, __func__);
-	rcu_read_unlock();
+	dax_read_unlock(id);
 	if (rc)
 		return rc;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e6369b851619..7c4dc97d53a8 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -24,11 +24,24 @@ module_param(nr_dax, int, S_IRUGO);
 MODULE_PARM_DESC(nr_dax, "max number of dax device instances");
 
 static dev_t dax_devt;
+DEFINE_STATIC_SRCU(dax_srcu);
 static struct vfsmount *dax_mnt;
 static DEFINE_IDA(dax_minor_ida);
 static struct kmem_cache *dax_cache __read_mostly;
 static struct super_block *dax_superblock __read_mostly;
 
+int dax_read_lock(void)
+{
+	return srcu_read_lock(&dax_srcu);
+}
+EXPORT_SYMBOL_GPL(dax_read_lock);
+
+void dax_read_unlock(int id)
+{
+	srcu_read_unlock(&dax_srcu, id);
+}
+EXPORT_SYMBOL_GPL(dax_read_unlock);
+
 /**
  * struct dax_inode - anchor object for dax services
  * @inode: core vfs
@@ -45,8 +58,7 @@ struct dax_inode {
 
 bool dax_inode_alive(struct dax_inode *dax_inode)
 {
-	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
-			"dax operations require rcu_read_lock()\n");
+	lockdep_assert_held(&dax_srcu);
 	return dax_inode->alive;
 }
 EXPORT_SYMBOL_GPL(dax_inode_alive);
@@ -55,7 +67,7 @@ EXPORT_SYMBOL_GPL(dax_inode_alive);
  * Note, rcu is not protecting the liveness of dax_inode, rcu is
  * ensuring that any fault handlers or operations that might have seen
  * dax_inode_alive(), have completed.  Any operations that start after
- * synchronize_rcu() has run will abort upon seeing !dax_inode_alive().
+ * synchronize_srcu() has run will abort upon seeing !dax_inode_alive().
  */
 void kill_dax_inode(struct dax_inode *dax_inode)
 {
@@ -63,7 +75,7 @@ void kill_dax_inode(struct dax_inode *dax_inode)
 		return;
 
 	dax_inode->alive = false;
-	synchronize_rcu();
+	synchronize_srcu(&dax_srcu);
 	dax_inode->private = NULL;
 }
 EXPORT_SYMBOL_GPL(kill_dax_inode);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 24ad71173995..67002898d130 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -8,6 +8,9 @@
 
 struct iomap_ops;
 
+int dax_read_lock(void);
+void dax_read_unlock(int id);
+
 /*
  * We use lowest available bit in exceptional entry for locking, one bit for
  * the entry size (PMD) and two more to tell us if the entry is a huge zero

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 02/17] dax: convert dax_inode locking to srcu
@ 2017-01-28  8:36   ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

In preparation for adding dax_operations that perform ->direct_access()
and user copy operations relative to a dax_inode, convert the existing
dax_inode locking to srcu. Some dax drivers need to sleep in their
->direct_access() methods and user copying may fault / sleep.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/Kconfig  |    1 +
 drivers/dax/device.c |   18 +++++++++---------
 drivers/dax/super.c  |   20 ++++++++++++++++----
 include/linux/dax.h  |    3 +++
 4 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index 39bcbf4c5e40..b7053eafd88e 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -1,5 +1,6 @@
 menuconfig DAX
 	tristate "DAX: direct access to differentiated memory"
+	select SRCU
 	default m if NVDIMM_DAX
 
 if DAX
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 5b5572314929..af06d0bfd6ea 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -333,16 +333,16 @@ static int __dax_dev_fault(struct dax_dev *dax_dev, struct vm_area_struct *vma,
 
 static int dax_dev_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
-	int rc;
+	int rc, id;
 	struct file *filp = vma->vm_file;
 	struct dax_dev *dax_dev = filp->private_data;
 
 	dev_dbg(&dax_dev->dev, "%s: %s: %s (%#lx - %#lx)\n", __func__,
 			current->comm, (vmf->flags & FAULT_FLAG_WRITE)
 			? "write" : "read", vma->vm_start, vma->vm_end);
-	rcu_read_lock();
+	id = dax_read_lock();
 	rc = __dax_dev_fault(dax_dev, vma, vmf);
-	rcu_read_unlock();
+	dax_read_unlock(id);
 
 	return rc;
 }
@@ -390,7 +390,7 @@ static int __dax_dev_pmd_fault(struct dax_dev *dax_dev,
 static int dax_dev_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
 		pmd_t *pmd, unsigned int flags)
 {
-	int rc;
+	int rc, id;
 	struct file *filp = vma->vm_file;
 	struct dax_dev *dax_dev = filp->private_data;
 
@@ -398,9 +398,9 @@ static int dax_dev_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
 			current->comm, (flags & FAULT_FLAG_WRITE)
 			? "write" : "read", vma->vm_start, vma->vm_end);
 
-	rcu_read_lock();
+	id = dax_read_lock();
 	rc = __dax_dev_pmd_fault(dax_dev, vma, addr, pmd, flags);
-	rcu_read_unlock();
+	dax_read_unlock(id);
 
 	return rc;
 }
@@ -412,8 +412,8 @@ static const struct vm_operations_struct dax_dev_vm_ops = {
 
 static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
 {
+	int rc, id;
 	struct dax_dev *dax_dev = filp->private_data;
-	int rc;
 
 	dev_dbg(&dax_dev->dev, "%s\n", __func__);
 
@@ -421,9 +421,9 @@ static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
 	 * We lock to check dax_inode liveness and will re-check at
 	 * fault time.
 	 */
-	rcu_read_lock();
+	id = dax_read_lock();
 	rc = check_vma(dax_dev, vma, __func__);
-	rcu_read_unlock();
+	dax_read_unlock(id);
 	if (rc)
 		return rc;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e6369b851619..7c4dc97d53a8 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -24,11 +24,24 @@ module_param(nr_dax, int, S_IRUGO);
 MODULE_PARM_DESC(nr_dax, "max number of dax device instances");
 
 static dev_t dax_devt;
+DEFINE_STATIC_SRCU(dax_srcu);
 static struct vfsmount *dax_mnt;
 static DEFINE_IDA(dax_minor_ida);
 static struct kmem_cache *dax_cache __read_mostly;
 static struct super_block *dax_superblock __read_mostly;
 
+int dax_read_lock(void)
+{
+	return srcu_read_lock(&dax_srcu);
+}
+EXPORT_SYMBOL_GPL(dax_read_lock);
+
+void dax_read_unlock(int id)
+{
+	srcu_read_unlock(&dax_srcu, id);
+}
+EXPORT_SYMBOL_GPL(dax_read_unlock);
+
 /**
  * struct dax_inode - anchor object for dax services
  * @inode: core vfs
@@ -45,8 +58,7 @@ struct dax_inode {
 
 bool dax_inode_alive(struct dax_inode *dax_inode)
 {
-	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
-			"dax operations require rcu_read_lock()\n");
+	lockdep_assert_held(&dax_srcu);
 	return dax_inode->alive;
 }
 EXPORT_SYMBOL_GPL(dax_inode_alive);
@@ -55,7 +67,7 @@ EXPORT_SYMBOL_GPL(dax_inode_alive);
  * Note, rcu is not protecting the liveness of dax_inode, rcu is
  * ensuring that any fault handlers or operations that might have seen
  * dax_inode_alive(), have completed.  Any operations that start after
- * synchronize_rcu() has run will abort upon seeing !dax_inode_alive().
+ * synchronize_srcu() has run will abort upon seeing !dax_inode_alive().
  */
 void kill_dax_inode(struct dax_inode *dax_inode)
 {
@@ -63,7 +75,7 @@ void kill_dax_inode(struct dax_inode *dax_inode)
 		return;
 
 	dax_inode->alive = false;
-	synchronize_rcu();
+	synchronize_srcu(&dax_srcu);
 	dax_inode->private = NULL;
 }
 EXPORT_SYMBOL_GPL(kill_dax_inode);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 24ad71173995..67002898d130 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -8,6 +8,9 @@
 
 struct iomap_ops;
 
+int dax_read_lock(void);
+void dax_read_unlock(int id);
+
 /*
  * We use lowest available bit in exceptional entry for locking, one bit for
  * the entry size (PMD) and two more to tell us if the entry is a huge zero


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 03/17] dax: add a facility to lookup a dax inode by 'host' device name
  2017-01-28  8:36 ` Dan Williams
@ 2017-01-28  8:36   ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

For the current block_device based filesystem-dax path, we need a way
for it to lookup the dax_inode associated with a block_device. Add a
'host' property of a dax_inode that can be used for this purpose. It is
a free form string, but for a dax_inode associated with a block device
it is the bdev name.

This is a band-aid until filesystems are able to mount on a dax-inode
directly.

We use a hash list since blkdev_writepages() will need to use this
interface to issue dax_writeback_mapping_range().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/dax.h    |    2 +
 drivers/dax/device.c |    2 +
 drivers/dax/super.c  |   79 +++++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/dax.h  |    1 +
 4 files changed, 80 insertions(+), 4 deletions(-)

diff --git a/drivers/dax/dax.h b/drivers/dax/dax.h
index def061aa75f4..f33c16ed2ec6 100644
--- a/drivers/dax/dax.h
+++ b/drivers/dax/dax.h
@@ -13,7 +13,7 @@
 #ifndef __DAX_H__
 #define __DAX_H__
 struct dax_inode;
-struct dax_inode *alloc_dax_inode(void *private);
+struct dax_inode *alloc_dax_inode(void *private, const char *host);
 void put_dax_inode(struct dax_inode *dax_inode);
 bool dax_inode_alive(struct dax_inode *dax_inode);
 void kill_dax_inode(struct dax_inode *dax_inode);
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index af06d0bfd6ea..6d0a3241a608 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -560,7 +560,7 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
 		goto err_id;
 	}
 
-	dax_inode = alloc_dax_inode(dax_dev);
+	dax_inode = alloc_dax_inode(dax_dev, NULL);
 	if (!dax_inode)
 		goto err_inode;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 7c4dc97d53a8..7ac048f94b2b 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -30,6 +30,10 @@ static DEFINE_IDA(dax_minor_ida);
 static struct kmem_cache *dax_cache __read_mostly;
 static struct super_block *dax_superblock __read_mostly;
 
+#define DAX_HASH_SIZE (PAGE_SIZE / sizeof(struct hlist_head))
+static struct hlist_head dax_host_list[DAX_HASH_SIZE];
+static DEFINE_SPINLOCK(dax_host_lock);
+
 int dax_read_lock(void)
 {
 	return srcu_read_lock(&dax_srcu);
@@ -46,12 +50,15 @@ EXPORT_SYMBOL_GPL(dax_read_unlock);
  * struct dax_inode - anchor object for dax services
  * @inode: core vfs
  * @cdev: optional character interface for "device dax"
+ * @host: optional name for lookups where the device path is not available
  * @private: dax driver private data
  * @alive: !alive + rcu grace period == no new operations / mappings
  */
 struct dax_inode {
+	struct hlist_node list;
 	struct inode inode;
 	struct cdev cdev;
+	const char *host;
 	void *private;
 	bool alive;
 };
@@ -63,6 +70,11 @@ bool dax_inode_alive(struct dax_inode *dax_inode)
 }
 EXPORT_SYMBOL_GPL(dax_inode_alive);
 
+static int dax_host_hash(const char *host)
+{
+	return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
+}
+
 /*
  * Note, rcu is not protecting the liveness of dax_inode, rcu is
  * ensuring that any fault handlers or operations that might have seen
@@ -75,6 +87,12 @@ void kill_dax_inode(struct dax_inode *dax_inode)
 		return;
 
 	dax_inode->alive = false;
+
+	spin_lock(&dax_host_lock);
+	if (!hlist_unhashed(&dax_inode->list))
+		hlist_del_init(&dax_inode->list);
+	spin_unlock(&dax_host_lock);
+
 	synchronize_srcu(&dax_srcu);
 	dax_inode->private = NULL;
 }
@@ -98,6 +116,8 @@ static void dax_i_callback(struct rcu_head *head)
 	struct inode *inode = container_of(head, struct inode, i_rcu);
 	struct dax_inode *dax_inode = to_dax_inode(inode);
 
+	kfree(dax_inode->host);
+	dax_inode->host = NULL;
 	ida_simple_remove(&dax_minor_ida, MINOR(inode->i_rdev));
 	kmem_cache_free(dax_cache, dax_inode);
 }
@@ -169,26 +189,49 @@ static struct dax_inode *dax_inode_get(dev_t devt)
 	return dax_inode;
 }
 
-struct dax_inode *alloc_dax_inode(void *private)
+static void dax_add_host(struct dax_inode *dax_inode, const char *host)
+{
+	int hash;
+
+	INIT_HLIST_NODE(&dax_inode->list);
+	if (!host)
+		return;
+
+	dax_inode->host = host;
+	hash = dax_host_hash(host);
+	spin_lock(&dax_host_lock);
+	hlist_add_head(&dax_inode->list, &dax_host_list[hash]);
+	spin_unlock(&dax_host_lock);
+}
+
+struct dax_inode *alloc_dax_inode(void *private, const char *__host)
 {
 	struct dax_inode *dax_inode;
+	const char *host;
 	dev_t devt;
 	int minor;
 
+	host = kstrdup(__host, GFP_KERNEL);
+	if (__host && !host)
+		return NULL;
+
 	minor = ida_simple_get(&dax_minor_ida, 0, nr_dax, GFP_KERNEL);
 	if (minor < 0)
-		return NULL;
+		goto err_minor;
 
 	devt = MKDEV(MAJOR(dax_devt), minor);
 	dax_inode = dax_inode_get(devt);
 	if (!dax_inode)
 		goto err_inode;
 
+	dax_add_host(dax_inode, host);
 	dax_inode->private = private;
 	return dax_inode;
 
  err_inode:
 	ida_simple_remove(&dax_minor_ida, minor);
+ err_minor:
+	kfree(host);
 	return NULL;
 }
 EXPORT_SYMBOL_GPL(alloc_dax_inode);
@@ -202,6 +245,38 @@ void put_dax_inode(struct dax_inode *dax_inode)
 EXPORT_SYMBOL_GPL(put_dax_inode);
 
 /**
+ * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
+ * @host: alternate name for the inode registered by a dax driver
+ */
+struct dax_inode *dax_get_by_host(const char *host)
+{
+	struct dax_inode *dax_inode, *found = NULL;
+	int hash, id;
+
+	if (!host)
+		return NULL;
+
+	hash = dax_host_hash(host);
+
+	id = dax_read_lock();
+	spin_lock(&dax_host_lock);
+	hlist_for_each_entry(dax_inode, &dax_host_list[hash], list) {
+		if (!dax_inode_alive(dax_inode)
+				|| strcmp(host, dax_inode->host) != 0)
+			continue;
+
+		if (igrab(&dax_inode->inode))
+			found = dax_inode;
+		break;
+	}
+	spin_unlock(&dax_host_lock);
+	dax_read_unlock(id);
+
+	return found;
+}
+EXPORT_SYMBOL_GPL(dax_get_by_host);
+
+/**
  * inode_to_dax_inode: convert a public inode into its dax_inode
  * @inode: An inode with i_cdev pointing to a dax_inode
  */
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 67002898d130..8fe19230e118 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -10,6 +10,7 @@ struct iomap_ops;
 
 int dax_read_lock(void);
 void dax_read_unlock(int id);
+struct dax_inode *dax_get_by_host(const char *host);
 
 /*
  * We use lowest available bit in exceptional entry for locking, one bit for

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 03/17] dax: add a facility to lookup a dax inode by 'host' device name
@ 2017-01-28  8:36   ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

For the current block_device based filesystem-dax path, we need a way
for it to lookup the dax_inode associated with a block_device. Add a
'host' property of a dax_inode that can be used for this purpose. It is
a free form string, but for a dax_inode associated with a block device
it is the bdev name.

This is a band-aid until filesystems are able to mount on a dax-inode
directly.

We use a hash list since blkdev_writepages() will need to use this
interface to issue dax_writeback_mapping_range().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/dax.h    |    2 +
 drivers/dax/device.c |    2 +
 drivers/dax/super.c  |   79 +++++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/dax.h  |    1 +
 4 files changed, 80 insertions(+), 4 deletions(-)

diff --git a/drivers/dax/dax.h b/drivers/dax/dax.h
index def061aa75f4..f33c16ed2ec6 100644
--- a/drivers/dax/dax.h
+++ b/drivers/dax/dax.h
@@ -13,7 +13,7 @@
 #ifndef __DAX_H__
 #define __DAX_H__
 struct dax_inode;
-struct dax_inode *alloc_dax_inode(void *private);
+struct dax_inode *alloc_dax_inode(void *private, const char *host);
 void put_dax_inode(struct dax_inode *dax_inode);
 bool dax_inode_alive(struct dax_inode *dax_inode);
 void kill_dax_inode(struct dax_inode *dax_inode);
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index af06d0bfd6ea..6d0a3241a608 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -560,7 +560,7 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
 		goto err_id;
 	}
 
-	dax_inode = alloc_dax_inode(dax_dev);
+	dax_inode = alloc_dax_inode(dax_dev, NULL);
 	if (!dax_inode)
 		goto err_inode;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 7c4dc97d53a8..7ac048f94b2b 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -30,6 +30,10 @@ static DEFINE_IDA(dax_minor_ida);
 static struct kmem_cache *dax_cache __read_mostly;
 static struct super_block *dax_superblock __read_mostly;
 
+#define DAX_HASH_SIZE (PAGE_SIZE / sizeof(struct hlist_head))
+static struct hlist_head dax_host_list[DAX_HASH_SIZE];
+static DEFINE_SPINLOCK(dax_host_lock);
+
 int dax_read_lock(void)
 {
 	return srcu_read_lock(&dax_srcu);
@@ -46,12 +50,15 @@ EXPORT_SYMBOL_GPL(dax_read_unlock);
  * struct dax_inode - anchor object for dax services
  * @inode: core vfs
  * @cdev: optional character interface for "device dax"
+ * @host: optional name for lookups where the device path is not available
  * @private: dax driver private data
  * @alive: !alive + rcu grace period == no new operations / mappings
  */
 struct dax_inode {
+	struct hlist_node list;
 	struct inode inode;
 	struct cdev cdev;
+	const char *host;
 	void *private;
 	bool alive;
 };
@@ -63,6 +70,11 @@ bool dax_inode_alive(struct dax_inode *dax_inode)
 }
 EXPORT_SYMBOL_GPL(dax_inode_alive);
 
+static int dax_host_hash(const char *host)
+{
+	return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
+}
+
 /*
  * Note, rcu is not protecting the liveness of dax_inode, rcu is
  * ensuring that any fault handlers or operations that might have seen
@@ -75,6 +87,12 @@ void kill_dax_inode(struct dax_inode *dax_inode)
 		return;
 
 	dax_inode->alive = false;
+
+	spin_lock(&dax_host_lock);
+	if (!hlist_unhashed(&dax_inode->list))
+		hlist_del_init(&dax_inode->list);
+	spin_unlock(&dax_host_lock);
+
 	synchronize_srcu(&dax_srcu);
 	dax_inode->private = NULL;
 }
@@ -98,6 +116,8 @@ static void dax_i_callback(struct rcu_head *head)
 	struct inode *inode = container_of(head, struct inode, i_rcu);
 	struct dax_inode *dax_inode = to_dax_inode(inode);
 
+	kfree(dax_inode->host);
+	dax_inode->host = NULL;
 	ida_simple_remove(&dax_minor_ida, MINOR(inode->i_rdev));
 	kmem_cache_free(dax_cache, dax_inode);
 }
@@ -169,26 +189,49 @@ static struct dax_inode *dax_inode_get(dev_t devt)
 	return dax_inode;
 }
 
-struct dax_inode *alloc_dax_inode(void *private)
+static void dax_add_host(struct dax_inode *dax_inode, const char *host)
+{
+	int hash;
+
+	INIT_HLIST_NODE(&dax_inode->list);
+	if (!host)
+		return;
+
+	dax_inode->host = host;
+	hash = dax_host_hash(host);
+	spin_lock(&dax_host_lock);
+	hlist_add_head(&dax_inode->list, &dax_host_list[hash]);
+	spin_unlock(&dax_host_lock);
+}
+
+struct dax_inode *alloc_dax_inode(void *private, const char *__host)
 {
 	struct dax_inode *dax_inode;
+	const char *host;
 	dev_t devt;
 	int minor;
 
+	host = kstrdup(__host, GFP_KERNEL);
+	if (__host && !host)
+		return NULL;
+
 	minor = ida_simple_get(&dax_minor_ida, 0, nr_dax, GFP_KERNEL);
 	if (minor < 0)
-		return NULL;
+		goto err_minor;
 
 	devt = MKDEV(MAJOR(dax_devt), minor);
 	dax_inode = dax_inode_get(devt);
 	if (!dax_inode)
 		goto err_inode;
 
+	dax_add_host(dax_inode, host);
 	dax_inode->private = private;
 	return dax_inode;
 
  err_inode:
 	ida_simple_remove(&dax_minor_ida, minor);
+ err_minor:
+	kfree(host);
 	return NULL;
 }
 EXPORT_SYMBOL_GPL(alloc_dax_inode);
@@ -202,6 +245,38 @@ void put_dax_inode(struct dax_inode *dax_inode)
 EXPORT_SYMBOL_GPL(put_dax_inode);
 
 /**
+ * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
+ * @host: alternate name for the inode registered by a dax driver
+ */
+struct dax_inode *dax_get_by_host(const char *host)
+{
+	struct dax_inode *dax_inode, *found = NULL;
+	int hash, id;
+
+	if (!host)
+		return NULL;
+
+	hash = dax_host_hash(host);
+
+	id = dax_read_lock();
+	spin_lock(&dax_host_lock);
+	hlist_for_each_entry(dax_inode, &dax_host_list[hash], list) {
+		if (!dax_inode_alive(dax_inode)
+				|| strcmp(host, dax_inode->host) != 0)
+			continue;
+
+		if (igrab(&dax_inode->inode))
+			found = dax_inode;
+		break;
+	}
+	spin_unlock(&dax_host_lock);
+	dax_read_unlock(id);
+
+	return found;
+}
+EXPORT_SYMBOL_GPL(dax_get_by_host);
+
+/**
  * inode_to_dax_inode: convert a public inode into its dax_inode
  * @inode: An inode with i_cdev pointing to a dax_inode
  */
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 67002898d130..8fe19230e118 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -10,6 +10,7 @@ struct iomap_ops;
 
 int dax_read_lock(void);
 void dax_read_unlock(int id);
+struct dax_inode *dax_get_by_host(const char *host);
 
 /*
  * We use lowest available bit in exceptional entry for locking, one bit for


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 04/17] dax: introduce dax_operations
  2017-01-28  8:36 ` Dan Williams
@ 2017-01-28  8:36   ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

Track a set of dax_operations per dax_inode that can be set at
alloc_dax_inode() time. These operations will be used to stop the abuse
of block_device_operations for communicating dax capabilities to
filesystems. It will also be used to replace the "pmem api" and move
pmem-specific cache maintenance, and other dax-driver-specific
filesystem-dax operations, to dax inode methods. In particular this
allows us to stop abusing __copy_user_nocache(), via memcpy_to_pmem(),
with a driver specific replacement.

This is a standalone introduction of the operations. Follow on patches
convert each dax-driver and teach fs/dax.c to use ->direct_access() from
dax_operations instead of block_device_operations.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/dax.h    |    4 +++-
 drivers/dax/device.c |    6 +++++-
 drivers/dax/super.c  |    6 +++++-
 include/linux/dax.h  |    5 +++++
 4 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/dax/dax.h b/drivers/dax/dax.h
index f33c16ed2ec6..aeb1d49aafb8 100644
--- a/drivers/dax/dax.h
+++ b/drivers/dax/dax.h
@@ -13,7 +13,9 @@
 #ifndef __DAX_H__
 #define __DAX_H__
 struct dax_inode;
-struct dax_inode *alloc_dax_inode(void *private, const char *host);
+struct dax_operations;
+struct dax_inode *alloc_dax_inode(void *private, const char *host,
+		const struct dax_operations *ops);
 void put_dax_inode(struct dax_inode *dax_inode);
 bool dax_inode_alive(struct dax_inode *dax_inode);
 void kill_dax_inode(struct dax_inode *dax_inode);
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 6d0a3241a608..c3d9405ec285 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -560,7 +560,11 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
 		goto err_id;
 	}
 
-	dax_inode = alloc_dax_inode(dax_dev, NULL);
+	/*
+	 * No 'host' or dax_operations since there is no access to this
+	 * device outside of mmap of the resulting character device.
+	 */
+	dax_inode = alloc_dax_inode(dax_dev, NULL, NULL);
 	if (!dax_inode)
 		goto err_inode;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 7ac048f94b2b..eb844ffea3cf 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -17,6 +17,7 @@
 #include <linux/cdev.h>
 #include <linux/hash.h>
 #include <linux/slab.h>
+#include <linux/dax.h>
 #include <linux/fs.h>
 
 static int nr_dax = CONFIG_NR_DEV_DAX;
@@ -61,6 +62,7 @@ struct dax_inode {
 	const char *host;
 	void *private;
 	bool alive;
+	const struct dax_operations *ops;
 };
 
 bool dax_inode_alive(struct dax_inode *dax_inode)
@@ -204,7 +206,8 @@ static void dax_add_host(struct dax_inode *dax_inode, const char *host)
 	spin_unlock(&dax_host_lock);
 }
 
-struct dax_inode *alloc_dax_inode(void *private, const char *__host)
+struct dax_inode *alloc_dax_inode(void *private, const char *__host,
+		const struct dax_operations *ops)
 {
 	struct dax_inode *dax_inode;
 	const char *host;
@@ -225,6 +228,7 @@ struct dax_inode *alloc_dax_inode(void *private, const char *__host)
 		goto err_inode;
 
 	dax_add_host(dax_inode, host);
+	dax_inode->ops = ops;
 	dax_inode->private = private;
 	return dax_inode;
 
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 8fe19230e118..def9a9d118c9 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,6 +7,11 @@
 #include <asm/pgtable.h>
 
 struct iomap_ops;
+struct dax_inode;
+struct dax_operations {
+	long (*direct_access)(struct dax_inode *, phys_addr_t, void **,
+			pfn_t *, long);
+};
 
 int dax_read_lock(void);
 void dax_read_unlock(int id);

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 04/17] dax: introduce dax_operations
@ 2017-01-28  8:36   ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

Track a set of dax_operations per dax_inode that can be set at
alloc_dax_inode() time. These operations will be used to stop the abuse
of block_device_operations for communicating dax capabilities to
filesystems. It will also be used to replace the "pmem api" and move
pmem-specific cache maintenance, and other dax-driver-specific
filesystem-dax operations, to dax inode methods. In particular this
allows us to stop abusing __copy_user_nocache(), via memcpy_to_pmem(),
with a driver specific replacement.

This is a standalone introduction of the operations. Follow on patches
convert each dax-driver and teach fs/dax.c to use ->direct_access() from
dax_operations instead of block_device_operations.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/dax.h    |    4 +++-
 drivers/dax/device.c |    6 +++++-
 drivers/dax/super.c  |    6 +++++-
 include/linux/dax.h  |    5 +++++
 4 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/dax/dax.h b/drivers/dax/dax.h
index f33c16ed2ec6..aeb1d49aafb8 100644
--- a/drivers/dax/dax.h
+++ b/drivers/dax/dax.h
@@ -13,7 +13,9 @@
 #ifndef __DAX_H__
 #define __DAX_H__
 struct dax_inode;
-struct dax_inode *alloc_dax_inode(void *private, const char *host);
+struct dax_operations;
+struct dax_inode *alloc_dax_inode(void *private, const char *host,
+		const struct dax_operations *ops);
 void put_dax_inode(struct dax_inode *dax_inode);
 bool dax_inode_alive(struct dax_inode *dax_inode);
 void kill_dax_inode(struct dax_inode *dax_inode);
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 6d0a3241a608..c3d9405ec285 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -560,7 +560,11 @@ struct dax_dev *devm_create_dax_dev(struct dax_region *dax_region,
 		goto err_id;
 	}
 
-	dax_inode = alloc_dax_inode(dax_dev, NULL);
+	/*
+	 * No 'host' or dax_operations since there is no access to this
+	 * device outside of mmap of the resulting character device.
+	 */
+	dax_inode = alloc_dax_inode(dax_dev, NULL, NULL);
 	if (!dax_inode)
 		goto err_inode;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 7ac048f94b2b..eb844ffea3cf 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -17,6 +17,7 @@
 #include <linux/cdev.h>
 #include <linux/hash.h>
 #include <linux/slab.h>
+#include <linux/dax.h>
 #include <linux/fs.h>
 
 static int nr_dax = CONFIG_NR_DEV_DAX;
@@ -61,6 +62,7 @@ struct dax_inode {
 	const char *host;
 	void *private;
 	bool alive;
+	const struct dax_operations *ops;
 };
 
 bool dax_inode_alive(struct dax_inode *dax_inode)
@@ -204,7 +206,8 @@ static void dax_add_host(struct dax_inode *dax_inode, const char *host)
 	spin_unlock(&dax_host_lock);
 }
 
-struct dax_inode *alloc_dax_inode(void *private, const char *__host)
+struct dax_inode *alloc_dax_inode(void *private, const char *__host,
+		const struct dax_operations *ops)
 {
 	struct dax_inode *dax_inode;
 	const char *host;
@@ -225,6 +228,7 @@ struct dax_inode *alloc_dax_inode(void *private, const char *__host)
 		goto err_inode;
 
 	dax_add_host(dax_inode, host);
+	dax_inode->ops = ops;
 	dax_inode->private = private;
 	return dax_inode;
 
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 8fe19230e118..def9a9d118c9 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,6 +7,11 @@
 #include <asm/pgtable.h>
 
 struct iomap_ops;
+struct dax_inode;
+struct dax_operations {
+	long (*direct_access)(struct dax_inode *, phys_addr_t, void **,
+			pfn_t *, long);
+};
 
 int dax_read_lock(void);
 void dax_read_unlock(int id);


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 05/17] pmem: add dax_operations support
  2017-01-28  8:36 ` Dan Williams
@ 2017-01-28  8:36   ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

Setup a dax_inode to have the same lifetime as the pmem block device and
add a ->direct_access() method that is equivalent to
pmem_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old pmem_direct_access() will be removed.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/dax.h               |    7 -----
 drivers/nvdimm/Kconfig          |    1 +
 drivers/nvdimm/pmem.c           |   55 +++++++++++++++++++++++++++++++--------
 drivers/nvdimm/pmem.h           |    7 ++++-
 include/linux/dax.h             |    6 ++++
 tools/testing/nvdimm/pmem-dax.c |   12 ++++-----
 6 files changed, 61 insertions(+), 27 deletions(-)

diff --git a/drivers/dax/dax.h b/drivers/dax/dax.h
index aeb1d49aafb8..b4c686d2d446 100644
--- a/drivers/dax/dax.h
+++ b/drivers/dax/dax.h
@@ -13,15 +13,8 @@
 #ifndef __DAX_H__
 #define __DAX_H__
 struct dax_inode;
-struct dax_operations;
-struct dax_inode *alloc_dax_inode(void *private, const char *host,
-		const struct dax_operations *ops);
-void put_dax_inode(struct dax_inode *dax_inode);
-bool dax_inode_alive(struct dax_inode *dax_inode);
-void kill_dax_inode(struct dax_inode *dax_inode);
 struct dax_inode *inode_to_dax_inode(struct inode *inode);
 struct inode *dax_inode_to_inode(struct dax_inode *dax_inode);
-void *dax_inode_get_private(struct dax_inode *dax_inode);
 int dax_inode_register(struct dax_inode *dax_inode,
 		const struct file_operations *fops, struct module *owner,
 		struct kobject *parent);
diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index 59e750183b7f..5bdd499b5f4f 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -20,6 +20,7 @@ if LIBNVDIMM
 config BLK_DEV_PMEM
 	tristate "PMEM: Persistent memory block device support"
 	default LIBNVDIMM
+	select DAX
 	select ND_BTT if BTT
 	select ND_PFN if NVDIMM_PFN
 	help
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 5b536be5a12e..d3d7de645e20 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -28,6 +28,7 @@
 #include <linux/pfn_t.h>
 #include <linux/slab.h>
 #include <linux/pmem.h>
+#include <linux/dax.h>
 #include <linux/nd.h>
 #include "pmem.h"
 #include "pfn.h"
@@ -199,13 +200,12 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
 }
 
 /* see "strong" declaration in tools/testing/nvdimm/pmem-dax.c */
-__weak long pmem_direct_access(struct block_device *bdev, sector_t sector,
-		      void **kaddr, pfn_t *pfn, long size)
+__weak long __pmem_direct_access(struct pmem_device *pmem, phys_addr_t dev_addr,
+		void **kaddr, pfn_t *pfn, long size)
 {
-	struct pmem_device *pmem = bdev->bd_queue->queuedata;
-	resource_size_t offset = sector * 512 + pmem->data_offset;
+	resource_size_t offset = dev_addr + pmem->data_offset;
 
-	if (unlikely(is_bad_pmem(&pmem->bb, sector, size)))
+	if (unlikely(is_bad_pmem(&pmem->bb, dev_addr / 512, size)))
 		return -EIO;
 	*kaddr = pmem->virt_addr + offset;
 	*pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
@@ -219,22 +219,46 @@ __weak long pmem_direct_access(struct block_device *bdev, sector_t sector,
 	return pmem->size - pmem->pfn_pad - offset;
 }
 
+static long pmem_blk_direct_access(struct block_device *bdev, sector_t sector,
+		void **kaddr, pfn_t *pfn, long size)
+{
+	struct pmem_device *pmem = bdev->bd_queue->queuedata;
+
+	return __pmem_direct_access(pmem, sector * 512, kaddr, pfn, size);
+}
+
 static const struct block_device_operations pmem_fops = {
 	.owner =		THIS_MODULE,
 	.rw_page =		pmem_rw_page,
-	.direct_access =	pmem_direct_access,
+	.direct_access =	pmem_blk_direct_access,
 	.revalidate_disk =	nvdimm_revalidate_disk,
 };
 
+static long pmem_dax_direct_access(struct dax_inode *dax_inode,
+		phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size)
+{
+	struct pmem_device *pmem = dax_inode_get_private(dax_inode);
+
+	return __pmem_direct_access(pmem, dev_addr, kaddr, pfn, size);
+}
+
+static const struct dax_operations pmem_dax_ops = {
+	.direct_access = pmem_dax_direct_access,
+};
+
 static void pmem_release_queue(void *q)
 {
 	blk_cleanup_queue(q);
 }
 
-static void pmem_release_disk(void *disk)
+static void pmem_release_disk(void *__pmem)
 {
-	del_gendisk(disk);
-	put_disk(disk);
+	struct pmem_device *pmem = __pmem;
+
+	kill_dax_inode(pmem->dax_inode);
+	put_dax_inode(pmem->dax_inode);
+	del_gendisk(pmem->disk);
+	put_disk(pmem->disk);
 }
 
 static int pmem_attach_disk(struct device *dev,
@@ -245,6 +269,7 @@ static int pmem_attach_disk(struct device *dev,
 	struct vmem_altmap __altmap, *altmap = NULL;
 	struct resource *res = &nsio->res;
 	struct nd_pfn *nd_pfn = NULL;
+	struct dax_inode *dax_inode;
 	int nid = dev_to_node(dev);
 	struct nd_pfn_sb *pfn_sb;
 	struct pmem_device *pmem;
@@ -325,6 +350,7 @@ static int pmem_attach_disk(struct device *dev,
 	disk = alloc_disk_node(0, nid);
 	if (!disk)
 		return -ENOMEM;
+	pmem->disk = disk;
 
 	disk->fops		= &pmem_fops;
 	disk->queue		= q;
@@ -336,9 +362,16 @@ static int pmem_attach_disk(struct device *dev,
 		return -ENOMEM;
 	nvdimm_badblocks_populate(nd_region, &pmem->bb, res);
 	disk->bb = &pmem->bb;
-	device_add_disk(dev, disk);
 
-	if (devm_add_action_or_reset(dev, pmem_release_disk, disk))
+	dax_inode = alloc_dax_inode(pmem, disk->disk_name, &pmem_dax_ops);
+	if (!dax_inode) {
+		put_disk(disk);
+		return -ENOMEM;
+	}
+	pmem->dax_inode = dax_inode;
+
+	device_add_disk(dev, disk);
+	if (devm_add_action_or_reset(dev, pmem_release_disk, pmem))
 		return -ENOMEM;
 
 	revalidate_disk(disk);
diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h
index b4ee4f71b4a1..a26ade213eb5 100644
--- a/drivers/nvdimm/pmem.h
+++ b/drivers/nvdimm/pmem.h
@@ -5,8 +5,6 @@
 #include <linux/pfn_t.h>
 #include <linux/fs.h>
 
-long pmem_direct_access(struct block_device *bdev, sector_t sector,
-		      void **kaddr, pfn_t *pfn, long size);
 /* this definition is in it's own header for tools/testing/nvdimm to consume */
 struct pmem_device {
 	/* One contiguous memory region per device */
@@ -20,5 +18,10 @@ struct pmem_device {
 	/* trim size when namespace capacity has been section aligned */
 	u32			pfn_pad;
 	struct badblocks	bb;
+	struct dax_inode	*dax_inode;
+	struct gendisk		*disk;
 };
+
+long __pmem_direct_access(struct pmem_device *pmem, phys_addr_t dev_addr,
+		void **kaddr, pfn_t *pfn, long size);
 #endif /* __NVDIMM_PMEM_H__ */
diff --git a/include/linux/dax.h b/include/linux/dax.h
index def9a9d118c9..5aa620e8e5a2 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -16,6 +16,12 @@ struct dax_operations {
 int dax_read_lock(void);
 void dax_read_unlock(int id);
 struct dax_inode *dax_get_by_host(const char *host);
+struct dax_inode *alloc_dax_inode(void *private, const char *host,
+		const struct dax_operations *ops);
+void *dax_inode_get_private(struct dax_inode *dax_inode);
+void put_dax_inode(struct dax_inode *dax_inode);
+bool dax_inode_alive(struct dax_inode *dax_inode);
+void kill_dax_inode(struct dax_inode *dax_inode);
 
 /*
  * We use lowest available bit in exceptional entry for locking, one bit for
diff --git a/tools/testing/nvdimm/pmem-dax.c b/tools/testing/nvdimm/pmem-dax.c
index c9b8c48f85fc..2c93836c169e 100644
--- a/tools/testing/nvdimm/pmem-dax.c
+++ b/tools/testing/nvdimm/pmem-dax.c
@@ -15,13 +15,12 @@
 #include <pmem.h>
 #include <nd.h>
 
-long pmem_direct_access(struct block_device *bdev, sector_t sector,
+long __pmem_direct_access(struct pmem_device *pmem, phys_addr_t dev_addr,
 		void **kaddr, pfn_t *pfn, long size)
 {
-	struct pmem_device *pmem = bdev->bd_queue->queuedata;
-	resource_size_t offset = sector * 512 + pmem->data_offset;
+	resource_size_t offset = dev_addr + pmem->data_offset;
 
-	if (unlikely(is_bad_pmem(&pmem->bb, sector, size)))
+	if (unlikely(is_bad_pmem(&pmem->bb, dev_addr / 512, size)))
 		return -EIO;
 
 	/*
@@ -34,9 +33,8 @@ long pmem_direct_access(struct block_device *bdev, sector_t sector,
 		*kaddr = pmem->virt_addr + offset;
 		page = vmalloc_to_page(pmem->virt_addr + offset);
 		*pfn = page_to_pfn_t(page);
-		dev_dbg_ratelimited(disk_to_dev(bdev->bd_disk)->parent,
-				"%s: sector: %#llx pfn: %#lx\n", __func__,
-				(unsigned long long) sector, page_to_pfn(page));
+		pr_debug_ratelimited("%s: pmem: %p dev_addr: %pa pfn: %#lx\n",
+				__func__, pmem, &dev_addr, page_to_pfn(page));
 
 		return PAGE_SIZE;
 	}

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 05/17] pmem: add dax_operations support
@ 2017-01-28  8:36   ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

Setup a dax_inode to have the same lifetime as the pmem block device and
add a ->direct_access() method that is equivalent to
pmem_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old pmem_direct_access() will be removed.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/dax.h               |    7 -----
 drivers/nvdimm/Kconfig          |    1 +
 drivers/nvdimm/pmem.c           |   55 +++++++++++++++++++++++++++++++--------
 drivers/nvdimm/pmem.h           |    7 ++++-
 include/linux/dax.h             |    6 ++++
 tools/testing/nvdimm/pmem-dax.c |   12 ++++-----
 6 files changed, 61 insertions(+), 27 deletions(-)

diff --git a/drivers/dax/dax.h b/drivers/dax/dax.h
index aeb1d49aafb8..b4c686d2d446 100644
--- a/drivers/dax/dax.h
+++ b/drivers/dax/dax.h
@@ -13,15 +13,8 @@
 #ifndef __DAX_H__
 #define __DAX_H__
 struct dax_inode;
-struct dax_operations;
-struct dax_inode *alloc_dax_inode(void *private, const char *host,
-		const struct dax_operations *ops);
-void put_dax_inode(struct dax_inode *dax_inode);
-bool dax_inode_alive(struct dax_inode *dax_inode);
-void kill_dax_inode(struct dax_inode *dax_inode);
 struct dax_inode *inode_to_dax_inode(struct inode *inode);
 struct inode *dax_inode_to_inode(struct dax_inode *dax_inode);
-void *dax_inode_get_private(struct dax_inode *dax_inode);
 int dax_inode_register(struct dax_inode *dax_inode,
 		const struct file_operations *fops, struct module *owner,
 		struct kobject *parent);
diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index 59e750183b7f..5bdd499b5f4f 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -20,6 +20,7 @@ if LIBNVDIMM
 config BLK_DEV_PMEM
 	tristate "PMEM: Persistent memory block device support"
 	default LIBNVDIMM
+	select DAX
 	select ND_BTT if BTT
 	select ND_PFN if NVDIMM_PFN
 	help
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 5b536be5a12e..d3d7de645e20 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -28,6 +28,7 @@
 #include <linux/pfn_t.h>
 #include <linux/slab.h>
 #include <linux/pmem.h>
+#include <linux/dax.h>
 #include <linux/nd.h>
 #include "pmem.h"
 #include "pfn.h"
@@ -199,13 +200,12 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
 }
 
 /* see "strong" declaration in tools/testing/nvdimm/pmem-dax.c */
-__weak long pmem_direct_access(struct block_device *bdev, sector_t sector,
-		      void **kaddr, pfn_t *pfn, long size)
+__weak long __pmem_direct_access(struct pmem_device *pmem, phys_addr_t dev_addr,
+		void **kaddr, pfn_t *pfn, long size)
 {
-	struct pmem_device *pmem = bdev->bd_queue->queuedata;
-	resource_size_t offset = sector * 512 + pmem->data_offset;
+	resource_size_t offset = dev_addr + pmem->data_offset;
 
-	if (unlikely(is_bad_pmem(&pmem->bb, sector, size)))
+	if (unlikely(is_bad_pmem(&pmem->bb, dev_addr / 512, size)))
 		return -EIO;
 	*kaddr = pmem->virt_addr + offset;
 	*pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
@@ -219,22 +219,46 @@ __weak long pmem_direct_access(struct block_device *bdev, sector_t sector,
 	return pmem->size - pmem->pfn_pad - offset;
 }
 
+static long pmem_blk_direct_access(struct block_device *bdev, sector_t sector,
+		void **kaddr, pfn_t *pfn, long size)
+{
+	struct pmem_device *pmem = bdev->bd_queue->queuedata;
+
+	return __pmem_direct_access(pmem, sector * 512, kaddr, pfn, size);
+}
+
 static const struct block_device_operations pmem_fops = {
 	.owner =		THIS_MODULE,
 	.rw_page =		pmem_rw_page,
-	.direct_access =	pmem_direct_access,
+	.direct_access =	pmem_blk_direct_access,
 	.revalidate_disk =	nvdimm_revalidate_disk,
 };
 
+static long pmem_dax_direct_access(struct dax_inode *dax_inode,
+		phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size)
+{
+	struct pmem_device *pmem = dax_inode_get_private(dax_inode);
+
+	return __pmem_direct_access(pmem, dev_addr, kaddr, pfn, size);
+}
+
+static const struct dax_operations pmem_dax_ops = {
+	.direct_access = pmem_dax_direct_access,
+};
+
 static void pmem_release_queue(void *q)
 {
 	blk_cleanup_queue(q);
 }
 
-static void pmem_release_disk(void *disk)
+static void pmem_release_disk(void *__pmem)
 {
-	del_gendisk(disk);
-	put_disk(disk);
+	struct pmem_device *pmem = __pmem;
+
+	kill_dax_inode(pmem->dax_inode);
+	put_dax_inode(pmem->dax_inode);
+	del_gendisk(pmem->disk);
+	put_disk(pmem->disk);
 }
 
 static int pmem_attach_disk(struct device *dev,
@@ -245,6 +269,7 @@ static int pmem_attach_disk(struct device *dev,
 	struct vmem_altmap __altmap, *altmap = NULL;
 	struct resource *res = &nsio->res;
 	struct nd_pfn *nd_pfn = NULL;
+	struct dax_inode *dax_inode;
 	int nid = dev_to_node(dev);
 	struct nd_pfn_sb *pfn_sb;
 	struct pmem_device *pmem;
@@ -325,6 +350,7 @@ static int pmem_attach_disk(struct device *dev,
 	disk = alloc_disk_node(0, nid);
 	if (!disk)
 		return -ENOMEM;
+	pmem->disk = disk;
 
 	disk->fops		= &pmem_fops;
 	disk->queue		= q;
@@ -336,9 +362,16 @@ static int pmem_attach_disk(struct device *dev,
 		return -ENOMEM;
 	nvdimm_badblocks_populate(nd_region, &pmem->bb, res);
 	disk->bb = &pmem->bb;
-	device_add_disk(dev, disk);
 
-	if (devm_add_action_or_reset(dev, pmem_release_disk, disk))
+	dax_inode = alloc_dax_inode(pmem, disk->disk_name, &pmem_dax_ops);
+	if (!dax_inode) {
+		put_disk(disk);
+		return -ENOMEM;
+	}
+	pmem->dax_inode = dax_inode;
+
+	device_add_disk(dev, disk);
+	if (devm_add_action_or_reset(dev, pmem_release_disk, pmem))
 		return -ENOMEM;
 
 	revalidate_disk(disk);
diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h
index b4ee4f71b4a1..a26ade213eb5 100644
--- a/drivers/nvdimm/pmem.h
+++ b/drivers/nvdimm/pmem.h
@@ -5,8 +5,6 @@
 #include <linux/pfn_t.h>
 #include <linux/fs.h>
 
-long pmem_direct_access(struct block_device *bdev, sector_t sector,
-		      void **kaddr, pfn_t *pfn, long size);
 /* this definition is in it's own header for tools/testing/nvdimm to consume */
 struct pmem_device {
 	/* One contiguous memory region per device */
@@ -20,5 +18,10 @@ struct pmem_device {
 	/* trim size when namespace capacity has been section aligned */
 	u32			pfn_pad;
 	struct badblocks	bb;
+	struct dax_inode	*dax_inode;
+	struct gendisk		*disk;
 };
+
+long __pmem_direct_access(struct pmem_device *pmem, phys_addr_t dev_addr,
+		void **kaddr, pfn_t *pfn, long size);
 #endif /* __NVDIMM_PMEM_H__ */
diff --git a/include/linux/dax.h b/include/linux/dax.h
index def9a9d118c9..5aa620e8e5a2 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -16,6 +16,12 @@ struct dax_operations {
 int dax_read_lock(void);
 void dax_read_unlock(int id);
 struct dax_inode *dax_get_by_host(const char *host);
+struct dax_inode *alloc_dax_inode(void *private, const char *host,
+		const struct dax_operations *ops);
+void *dax_inode_get_private(struct dax_inode *dax_inode);
+void put_dax_inode(struct dax_inode *dax_inode);
+bool dax_inode_alive(struct dax_inode *dax_inode);
+void kill_dax_inode(struct dax_inode *dax_inode);
 
 /*
  * We use lowest available bit in exceptional entry for locking, one bit for
diff --git a/tools/testing/nvdimm/pmem-dax.c b/tools/testing/nvdimm/pmem-dax.c
index c9b8c48f85fc..2c93836c169e 100644
--- a/tools/testing/nvdimm/pmem-dax.c
+++ b/tools/testing/nvdimm/pmem-dax.c
@@ -15,13 +15,12 @@
 #include <pmem.h>
 #include <nd.h>
 
-long pmem_direct_access(struct block_device *bdev, sector_t sector,
+long __pmem_direct_access(struct pmem_device *pmem, phys_addr_t dev_addr,
 		void **kaddr, pfn_t *pfn, long size)
 {
-	struct pmem_device *pmem = bdev->bd_queue->queuedata;
-	resource_size_t offset = sector * 512 + pmem->data_offset;
+	resource_size_t offset = dev_addr + pmem->data_offset;
 
-	if (unlikely(is_bad_pmem(&pmem->bb, sector, size)))
+	if (unlikely(is_bad_pmem(&pmem->bb, dev_addr / 512, size)))
 		return -EIO;
 
 	/*
@@ -34,9 +33,8 @@ long pmem_direct_access(struct block_device *bdev, sector_t sector,
 		*kaddr = pmem->virt_addr + offset;
 		page = vmalloc_to_page(pmem->virt_addr + offset);
 		*pfn = page_to_pfn_t(page);
-		dev_dbg_ratelimited(disk_to_dev(bdev->bd_disk)->parent,
-				"%s: sector: %#llx pfn: %#lx\n", __func__,
-				(unsigned long long) sector, page_to_pfn(page));
+		pr_debug_ratelimited("%s: pmem: %p dev_addr: %pa pfn: %#lx\n",
+				__func__, pmem, &dev_addr, page_to_pfn(page));
 
 		return PAGE_SIZE;
 	}


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 06/17] axon_ram: add dax_operations support
  2017-01-28  8:36 ` Dan Williams
@ 2017-01-28  8:36   ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

Setup a dax_inode to have the same lifetime as the axon_ram block device
and add a ->direct_access() method that is equivalent to
axon_ram_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old axon_ram_direct_access() will be removed.
---
 arch/powerpc/platforms/Kconfig |    1 +
 arch/powerpc/sysdev/axonram.c  |   46 +++++++++++++++++++++++++++++++++++-----
 2 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index 7e3a2ebba29b..33244e3d9375 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -284,6 +284,7 @@ config CPM2
 config AXON_RAM
 	tristate "Axon DDR2 memory device driver"
 	depends on PPC_IBM_CELL_BLADE && BLOCK
+	select DAX
 	default m
 	help
 	  It registers one block device per Axon's DDR2 memory bank found
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index ada29eaed6e2..4e1f58187726 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -25,6 +25,7 @@
 
 #include <linux/bio.h>
 #include <linux/blkdev.h>
+#include <linux/dax.h>
 #include <linux/device.h>
 #include <linux/errno.h>
 #include <linux/fs.h>
@@ -62,6 +63,7 @@ static int azfs_major, azfs_minor;
 struct axon_ram_bank {
 	struct platform_device	*device;
 	struct gendisk		*disk;
+	struct dax_inode	*dax_inode;
 	unsigned int		irq_id;
 	unsigned long		ph_addr;
 	unsigned long		io_addr;
@@ -137,25 +139,45 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
 	return BLK_QC_T_NONE;
 }
 
+static long
+__axon_ram_direct_access(struct axon_ram_bank *bank, phys_addr_t offset,
+		       void **kaddr, pfn_t *pfn, long size)
+{
+	*kaddr = (void *) bank->io_addr + offset;
+	*pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
+	return bank->size - offset;
+}
+
 /**
  * axon_ram_direct_access - direct_access() method for block device
  * @device, @sector, @data: see block_device_operations method
  */
 static long
-axon_ram_direct_access(struct block_device *device, sector_t sector,
+axon_ram_blk_direct_access(struct block_device *device, sector_t sector,
 		       void **kaddr, pfn_t *pfn, long size)
 {
 	struct axon_ram_bank *bank = device->bd_disk->private_data;
-	loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
 
-	*kaddr = (void *) bank->io_addr + offset;
-	*pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
-	return bank->size - offset;
+	return __axon_ram_direct_access(bank, sector << AXON_RAM_SECTOR_SHIFT,
+			kaddr, pfn, size);
 }
 
 static const struct block_device_operations axon_ram_devops = {
 	.owner		= THIS_MODULE,
-	.direct_access	= axon_ram_direct_access
+	.direct_access	= axon_ram_blk_direct_access
+};
+
+static long
+axon_ram_dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
+		       void **kaddr, pfn_t *pfn, long size)
+{
+	struct axon_ram_bank *bank = dax_inode_get_private(dax_inode);
+
+	return __axon_ram_direct_access(bank, dev_addr, kaddr, pfn, size);
+}
+
+static const struct dax_operations axon_ram_dax_ops = {
+	.direct_access = axon_ram_dax_direct_access,
 };
 
 /**
@@ -219,6 +241,7 @@ static int axon_ram_probe(struct platform_device *device)
 		goto failed;
 	}
 
+
 	bank->disk->major = azfs_major;
 	bank->disk->first_minor = azfs_minor;
 	bank->disk->fops = &axon_ram_devops;
@@ -227,6 +250,11 @@ static int axon_ram_probe(struct platform_device *device)
 	sprintf(bank->disk->disk_name, "%s%d",
 			AXON_RAM_DEVICE_NAME, axon_ram_bank_id);
 
+	bank->dax_inode = alloc_dax_inode(bank, bank->disk->disk_name,
+			&axon_ram_dax_ops);
+	if (!bank->dax_inode)
+		goto failed;
+
 	bank->disk->queue = blk_alloc_queue(GFP_KERNEL);
 	if (bank->disk->queue == NULL) {
 		dev_err(&device->dev, "Cannot register disk queue\n");
@@ -276,6 +304,10 @@ static int axon_ram_probe(struct platform_device *device)
 						bank->disk->disk_name);
 			del_gendisk(bank->disk);
 		}
+		if (bank->dax_inode) {
+			kill_dax_inode(bank->dax_inode);
+			put_dax_inode(bank->dax_inode);
+		}
 		device->dev.platform_data = NULL;
 		if (bank->io_addr != 0)
 			iounmap((void __iomem *) bank->io_addr);
@@ -298,6 +330,8 @@ axon_ram_remove(struct platform_device *device)
 
 	device_remove_file(&device->dev, &dev_attr_ecc);
 	free_irq(bank->irq_id, device);
+	kill_dax_inode(bank->dax_inode);
+	put_dax_inode(bank->dax_inode);
 	del_gendisk(bank->disk);
 	iounmap((void __iomem *) bank->io_addr);
 	kfree(bank);

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 06/17] axon_ram: add dax_operations support
@ 2017-01-28  8:36   ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

Setup a dax_inode to have the same lifetime as the axon_ram block device
and add a ->direct_access() method that is equivalent to
axon_ram_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old axon_ram_direct_access() will be removed.
---
 arch/powerpc/platforms/Kconfig |    1 +
 arch/powerpc/sysdev/axonram.c  |   46 +++++++++++++++++++++++++++++++++++-----
 2 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index 7e3a2ebba29b..33244e3d9375 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -284,6 +284,7 @@ config CPM2
 config AXON_RAM
 	tristate "Axon DDR2 memory device driver"
 	depends on PPC_IBM_CELL_BLADE && BLOCK
+	select DAX
 	default m
 	help
 	  It registers one block device per Axon's DDR2 memory bank found
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index ada29eaed6e2..4e1f58187726 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -25,6 +25,7 @@
 
 #include <linux/bio.h>
 #include <linux/blkdev.h>
+#include <linux/dax.h>
 #include <linux/device.h>
 #include <linux/errno.h>
 #include <linux/fs.h>
@@ -62,6 +63,7 @@ static int azfs_major, azfs_minor;
 struct axon_ram_bank {
 	struct platform_device	*device;
 	struct gendisk		*disk;
+	struct dax_inode	*dax_inode;
 	unsigned int		irq_id;
 	unsigned long		ph_addr;
 	unsigned long		io_addr;
@@ -137,25 +139,45 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
 	return BLK_QC_T_NONE;
 }
 
+static long
+__axon_ram_direct_access(struct axon_ram_bank *bank, phys_addr_t offset,
+		       void **kaddr, pfn_t *pfn, long size)
+{
+	*kaddr = (void *) bank->io_addr + offset;
+	*pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
+	return bank->size - offset;
+}
+
 /**
  * axon_ram_direct_access - direct_access() method for block device
  * @device, @sector, @data: see block_device_operations method
  */
 static long
-axon_ram_direct_access(struct block_device *device, sector_t sector,
+axon_ram_blk_direct_access(struct block_device *device, sector_t sector,
 		       void **kaddr, pfn_t *pfn, long size)
 {
 	struct axon_ram_bank *bank = device->bd_disk->private_data;
-	loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
 
-	*kaddr = (void *) bank->io_addr + offset;
-	*pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
-	return bank->size - offset;
+	return __axon_ram_direct_access(bank, sector << AXON_RAM_SECTOR_SHIFT,
+			kaddr, pfn, size);
 }
 
 static const struct block_device_operations axon_ram_devops = {
 	.owner		= THIS_MODULE,
-	.direct_access	= axon_ram_direct_access
+	.direct_access	= axon_ram_blk_direct_access
+};
+
+static long
+axon_ram_dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
+		       void **kaddr, pfn_t *pfn, long size)
+{
+	struct axon_ram_bank *bank = dax_inode_get_private(dax_inode);
+
+	return __axon_ram_direct_access(bank, dev_addr, kaddr, pfn, size);
+}
+
+static const struct dax_operations axon_ram_dax_ops = {
+	.direct_access = axon_ram_dax_direct_access,
 };
 
 /**
@@ -219,6 +241,7 @@ static int axon_ram_probe(struct platform_device *device)
 		goto failed;
 	}
 
+
 	bank->disk->major = azfs_major;
 	bank->disk->first_minor = azfs_minor;
 	bank->disk->fops = &axon_ram_devops;
@@ -227,6 +250,11 @@ static int axon_ram_probe(struct platform_device *device)
 	sprintf(bank->disk->disk_name, "%s%d",
 			AXON_RAM_DEVICE_NAME, axon_ram_bank_id);
 
+	bank->dax_inode = alloc_dax_inode(bank, bank->disk->disk_name,
+			&axon_ram_dax_ops);
+	if (!bank->dax_inode)
+		goto failed;
+
 	bank->disk->queue = blk_alloc_queue(GFP_KERNEL);
 	if (bank->disk->queue == NULL) {
 		dev_err(&device->dev, "Cannot register disk queue\n");
@@ -276,6 +304,10 @@ static int axon_ram_probe(struct platform_device *device)
 						bank->disk->disk_name);
 			del_gendisk(bank->disk);
 		}
+		if (bank->dax_inode) {
+			kill_dax_inode(bank->dax_inode);
+			put_dax_inode(bank->dax_inode);
+		}
 		device->dev.platform_data = NULL;
 		if (bank->io_addr != 0)
 			iounmap((void __iomem *) bank->io_addr);
@@ -298,6 +330,8 @@ axon_ram_remove(struct platform_device *device)
 
 	device_remove_file(&device->dev, &dev_attr_ecc);
 	free_irq(bank->irq_id, device);
+	kill_dax_inode(bank->dax_inode);
+	put_dax_inode(bank->dax_inode);
 	del_gendisk(bank->disk);
 	iounmap((void __iomem *) bank->io_addr);
 	kfree(bank);


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 07/17] brd: add dax_operations support
  2017-01-28  8:36 ` Dan Williams
@ 2017-01-28  8:36   ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

Setup a dax_inode to have the same lifetime as the brd block device and
add a ->direct_access() method that is equivalent to
brd_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old brd_direct_access() will be removed.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/block/Kconfig |    1 +
 drivers/block/brd.c   |   57 +++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index 223ff2fcae7e..604b51a884b6 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -337,6 +337,7 @@ config BLK_DEV_SX8
 
 config BLK_DEV_RAM
 	tristate "RAM block device support"
+	select DAX if BLK_DEV_RAM_DAX
 	---help---
 	  Saying Y here will allow you to use a portion of your RAM memory as
 	  a block device, so that you can make file systems on it, read and
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 3adc32a3153b..1279df4dc07c 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -21,6 +21,7 @@
 #include <linux/slab.h>
 #ifdef CONFIG_BLK_DEV_RAM_DAX
 #include <linux/pfn_t.h>
+#include <linux/dax.h>
 #endif
 
 #include <linux/uaccess.h>
@@ -41,6 +42,9 @@ struct brd_device {
 
 	struct request_queue	*brd_queue;
 	struct gendisk		*brd_disk;
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+	struct dax_inode	*dax_inode;
+#endif
 	struct list_head	brd_list;
 
 	/*
@@ -375,15 +379,14 @@ static int brd_rw_page(struct block_device *bdev, sector_t sector,
 }
 
 #ifdef CONFIG_BLK_DEV_RAM_DAX
-static long brd_direct_access(struct block_device *bdev, sector_t sector,
+static long __brd_direct_access(struct brd_device *brd, phys_addr_t dev_addr,
 			void **kaddr, pfn_t *pfn, long size)
 {
-	struct brd_device *brd = bdev->bd_disk->private_data;
 	struct page *page;
 
 	if (!brd)
 		return -ENODEV;
-	page = brd_insert_page(brd, sector);
+	page = brd_insert_page(brd, dev_addr / 512);
 	if (!page)
 		return -ENOSPC;
 	*kaddr = page_address(page);
@@ -391,14 +394,34 @@ static long brd_direct_access(struct block_device *bdev, sector_t sector,
 
 	return PAGE_SIZE;
 }
+
+static long brd_blk_direct_access(struct block_device *bdev, sector_t sector,
+			void **kaddr, pfn_t *pfn, long size)
+{
+	struct brd_device *brd = bdev->bd_disk->private_data;
+
+	return __brd_direct_access(brd, sector * 512, kaddr, pfn, size);
+}
+
+static long brd_dax_direct_access(struct dax_inode *dax_inode,
+		phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size)
+{
+	struct brd_device *brd = dax_inode_get_private(dax_inode);
+
+	return __brd_direct_access(brd, dev_addr, kaddr, pfn, size);
+}
+
+static const struct dax_operations brd_dax_ops = {
+	.direct_access = brd_dax_direct_access,
+};
 #else
-#define brd_direct_access NULL
+#define brd_blk_direct_access NULL
 #endif
 
 static const struct block_device_operations brd_fops = {
 	.owner =		THIS_MODULE,
 	.rw_page =		brd_rw_page,
-	.direct_access =	brd_direct_access,
+	.direct_access =	brd_blk_direct_access,
 };
 
 /*
@@ -441,7 +464,9 @@ static struct brd_device *brd_alloc(int i)
 {
 	struct brd_device *brd;
 	struct gendisk *disk;
-
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+	struct dax_inode *dax_inode;
+#endif
 	brd = kzalloc(sizeof(*brd), GFP_KERNEL);
 	if (!brd)
 		goto out;
@@ -469,9 +494,6 @@ static struct brd_device *brd_alloc(int i)
 	blk_queue_max_discard_sectors(brd->brd_queue, UINT_MAX);
 	brd->brd_queue->limits.discard_zeroes_data = 1;
 	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, brd->brd_queue);
-#ifdef CONFIG_BLK_DEV_RAM_DAX
-	queue_flag_set_unlocked(QUEUE_FLAG_DAX, brd->brd_queue);
-#endif
 	disk = brd->brd_disk = alloc_disk(max_part);
 	if (!disk)
 		goto out_free_queue;
@@ -484,8 +506,21 @@ static struct brd_device *brd_alloc(int i)
 	sprintf(disk->disk_name, "ram%d", i);
 	set_capacity(disk, rd_size * 2);
 
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+	queue_flag_set_unlocked(QUEUE_FLAG_DAX, brd->brd_queue);
+	dax_inode = alloc_dax_inode(brd, disk->disk_name, &brd_dax_ops);
+	if (!dax_inode)
+		goto out_free_inode;
+#endif
+
+
 	return brd;
 
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+out_free_inode:
+	kill_dax_inode(dax_inode);
+	put_dax_inode(dax_inode);
+#endif
 out_free_queue:
 	blk_cleanup_queue(brd->brd_queue);
 out_free_dev:
@@ -525,6 +560,10 @@ static struct brd_device *brd_init_one(int i, bool *new)
 static void brd_del_one(struct brd_device *brd)
 {
 	list_del(&brd->brd_list);
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+	kill_dax_inode(brd->dax_inode);
+	put_dax_inode(brd->dax_inode);
+#endif
 	del_gendisk(brd->brd_disk);
 	brd_free(brd);
 }

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 07/17] brd: add dax_operations support
@ 2017-01-28  8:36   ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

Setup a dax_inode to have the same lifetime as the brd block device and
add a ->direct_access() method that is equivalent to
brd_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old brd_direct_access() will be removed.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/block/Kconfig |    1 +
 drivers/block/brd.c   |   57 +++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index 223ff2fcae7e..604b51a884b6 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -337,6 +337,7 @@ config BLK_DEV_SX8
 
 config BLK_DEV_RAM
 	tristate "RAM block device support"
+	select DAX if BLK_DEV_RAM_DAX
 	---help---
 	  Saying Y here will allow you to use a portion of your RAM memory as
 	  a block device, so that you can make file systems on it, read and
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 3adc32a3153b..1279df4dc07c 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -21,6 +21,7 @@
 #include <linux/slab.h>
 #ifdef CONFIG_BLK_DEV_RAM_DAX
 #include <linux/pfn_t.h>
+#include <linux/dax.h>
 #endif
 
 #include <linux/uaccess.h>
@@ -41,6 +42,9 @@ struct brd_device {
 
 	struct request_queue	*brd_queue;
 	struct gendisk		*brd_disk;
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+	struct dax_inode	*dax_inode;
+#endif
 	struct list_head	brd_list;
 
 	/*
@@ -375,15 +379,14 @@ static int brd_rw_page(struct block_device *bdev, sector_t sector,
 }
 
 #ifdef CONFIG_BLK_DEV_RAM_DAX
-static long brd_direct_access(struct block_device *bdev, sector_t sector,
+static long __brd_direct_access(struct brd_device *brd, phys_addr_t dev_addr,
 			void **kaddr, pfn_t *pfn, long size)
 {
-	struct brd_device *brd = bdev->bd_disk->private_data;
 	struct page *page;
 
 	if (!brd)
 		return -ENODEV;
-	page = brd_insert_page(brd, sector);
+	page = brd_insert_page(brd, dev_addr / 512);
 	if (!page)
 		return -ENOSPC;
 	*kaddr = page_address(page);
@@ -391,14 +394,34 @@ static long brd_direct_access(struct block_device *bdev, sector_t sector,
 
 	return PAGE_SIZE;
 }
+
+static long brd_blk_direct_access(struct block_device *bdev, sector_t sector,
+			void **kaddr, pfn_t *pfn, long size)
+{
+	struct brd_device *brd = bdev->bd_disk->private_data;
+
+	return __brd_direct_access(brd, sector * 512, kaddr, pfn, size);
+}
+
+static long brd_dax_direct_access(struct dax_inode *dax_inode,
+		phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size)
+{
+	struct brd_device *brd = dax_inode_get_private(dax_inode);
+
+	return __brd_direct_access(brd, dev_addr, kaddr, pfn, size);
+}
+
+static const struct dax_operations brd_dax_ops = {
+	.direct_access = brd_dax_direct_access,
+};
 #else
-#define brd_direct_access NULL
+#define brd_blk_direct_access NULL
 #endif
 
 static const struct block_device_operations brd_fops = {
 	.owner =		THIS_MODULE,
 	.rw_page =		brd_rw_page,
-	.direct_access =	brd_direct_access,
+	.direct_access =	brd_blk_direct_access,
 };
 
 /*
@@ -441,7 +464,9 @@ static struct brd_device *brd_alloc(int i)
 {
 	struct brd_device *brd;
 	struct gendisk *disk;
-
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+	struct dax_inode *dax_inode;
+#endif
 	brd = kzalloc(sizeof(*brd), GFP_KERNEL);
 	if (!brd)
 		goto out;
@@ -469,9 +494,6 @@ static struct brd_device *brd_alloc(int i)
 	blk_queue_max_discard_sectors(brd->brd_queue, UINT_MAX);
 	brd->brd_queue->limits.discard_zeroes_data = 1;
 	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, brd->brd_queue);
-#ifdef CONFIG_BLK_DEV_RAM_DAX
-	queue_flag_set_unlocked(QUEUE_FLAG_DAX, brd->brd_queue);
-#endif
 	disk = brd->brd_disk = alloc_disk(max_part);
 	if (!disk)
 		goto out_free_queue;
@@ -484,8 +506,21 @@ static struct brd_device *brd_alloc(int i)
 	sprintf(disk->disk_name, "ram%d", i);
 	set_capacity(disk, rd_size * 2);
 
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+	queue_flag_set_unlocked(QUEUE_FLAG_DAX, brd->brd_queue);
+	dax_inode = alloc_dax_inode(brd, disk->disk_name, &brd_dax_ops);
+	if (!dax_inode)
+		goto out_free_inode;
+#endif
+
+
 	return brd;
 
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+out_free_inode:
+	kill_dax_inode(dax_inode);
+	put_dax_inode(dax_inode);
+#endif
 out_free_queue:
 	blk_cleanup_queue(brd->brd_queue);
 out_free_dev:
@@ -525,6 +560,10 @@ static struct brd_device *brd_init_one(int i, bool *new)
 static void brd_del_one(struct brd_device *brd)
 {
 	list_del(&brd->brd_list);
+#ifdef CONFIG_BLK_DEV_RAM_DAX
+	kill_dax_inode(brd->dax_inode);
+	put_dax_inode(brd->dax_inode);
+#endif
 	del_gendisk(brd->brd_disk);
 	brd_free(brd);
 }


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 08/17] dcssblk: add dax_operations support
  2017-01-28  8:36 ` Dan Williams
@ 2017-01-28  8:36   ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

Setup a dax_inode to have the same lifetime as the dcssblk block device
and add a ->direct_access() method that is equivalent to
dcssblk_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old dcssblk_direct_access() will be removed.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/s390/block/Kconfig   |    1 +
 drivers/s390/block/dcssblk.c |   53 +++++++++++++++++++++++++++++++++++-------
 2 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/drivers/s390/block/Kconfig b/drivers/s390/block/Kconfig
index 4a3b62326183..0acb8c2f9475 100644
--- a/drivers/s390/block/Kconfig
+++ b/drivers/s390/block/Kconfig
@@ -14,6 +14,7 @@ config BLK_DEV_XPRAM
 
 config DCSSBLK
 	def_tristate m
+	select DAX
 	prompt "DCSSBLK support"
 	depends on S390 && BLOCK
 	help
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 9d66b4fb174b..67b0885b4d12 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -18,6 +18,7 @@
 #include <linux/interrupt.h>
 #include <linux/platform_device.h>
 #include <linux/pfn_t.h>
+#include <linux/dax.h>
 #include <asm/extmem.h>
 #include <asm/io.h>
 
@@ -30,8 +31,10 @@ static int dcssblk_open(struct block_device *bdev, fmode_t mode);
 static void dcssblk_release(struct gendisk *disk, fmode_t mode);
 static blk_qc_t dcssblk_make_request(struct request_queue *q,
 						struct bio *bio);
-static long dcssblk_direct_access(struct block_device *bdev, sector_t secnum,
+static long dcssblk_blk_direct_access(struct block_device *bdev, sector_t secnum,
 			 void **kaddr, pfn_t *pfn, long size);
+static long dcssblk_dax_direct_access(struct dax_inode *dax_inode,
+		phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size);
 
 static char dcssblk_segments[DCSSBLK_PARM_LEN] = "\0";
 
@@ -40,7 +43,11 @@ static const struct block_device_operations dcssblk_devops = {
 	.owner   	= THIS_MODULE,
 	.open    	= dcssblk_open,
 	.release 	= dcssblk_release,
-	.direct_access 	= dcssblk_direct_access,
+	.direct_access 	= dcssblk_blk_direct_access,
+};
+
+static const struct dax_operations dcssblk_dax_ops = {
+	.direct_access = dcssblk_dax_direct_access,
 };
 
 struct dcssblk_dev_info {
@@ -57,6 +64,7 @@ struct dcssblk_dev_info {
 	struct request_queue *dcssblk_queue;
 	int num_of_segments;
 	struct list_head seg_list;
+	struct dax_inode *dax_inode;
 };
 
 struct segment_info {
@@ -389,6 +397,8 @@ dcssblk_shared_store(struct device *dev, struct device_attribute *attr, const ch
 	}
 	list_del(&dev_info->lh);
 
+	kill_dax_inode(dev_info->dax_inode);
+	put_dax_inode(dev_info->dax_inode);
 	del_gendisk(dev_info->gd);
 	blk_cleanup_queue(dev_info->dcssblk_queue);
 	dev_info->gd->queue = NULL;
@@ -525,6 +535,7 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
 	int rc, i, j, num_of_segments;
 	struct dcssblk_dev_info *dev_info;
 	struct segment_info *seg_info, *temp;
+	struct dax_inode *dax_inode;
 	char *local_buf;
 	unsigned long seg_byte_size;
 
@@ -654,6 +665,11 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
 	if (rc)
 		goto put_dev;
 
+	dax_inode = alloc_dax_inode(dev_info, dev_info->gd->disk_name,
+			&dcssblk_dax_ops);
+	if (!dax_inode)
+		goto put_dev;
+
 	get_device(&dev_info->dev);
 	device_add_disk(&dev_info->dev, dev_info->gd);
 
@@ -752,6 +768,8 @@ dcssblk_remove_store(struct device *dev, struct device_attribute *attr, const ch
 	}
 
 	list_del(&dev_info->lh);
+	kill_dax_inode(dev_info->dax_inode);
+	put_dax_inode(dev_info->dax_inode);
 	del_gendisk(dev_info->gd);
 	blk_cleanup_queue(dev_info->dcssblk_queue);
 	dev_info->gd->queue = NULL;
@@ -883,21 +901,38 @@ dcssblk_make_request(struct request_queue *q, struct bio *bio)
 }
 
 static long
-dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
+__dcssblk_direct_access(struct dcssblk_dev_info *dev_info, phys_addr_t offset,
+		void **kaddr, pfn_t *pfn, long size)
+{
+	unsigned long dev_sz;
+
+	dev_sz = dev_info->end - dev_info->start;
+	*kaddr = (void *) dev_info->start + offset;
+	*pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), PFN_DEV);
+
+	return dev_sz - offset;
+}
+
+static long
+dcssblk_blk_direct_access(struct block_device *bdev, sector_t secnum,
 			void **kaddr, pfn_t *pfn, long size)
 {
 	struct dcssblk_dev_info *dev_info;
-	unsigned long offset, dev_sz;
 
 	dev_info = bdev->bd_disk->private_data;
 	if (!dev_info)
 		return -ENODEV;
-	dev_sz = dev_info->end - dev_info->start;
-	offset = secnum * 512;
-	*kaddr = (void *) dev_info->start + offset;
-	*pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), PFN_DEV);
+	return __dcssblk_direct_access(dev_info, secnum * 512, kaddr, pfn,
+			size);
+}
 
-	return dev_sz - offset;
+static long
+dcssblk_dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
+			void **kaddr, pfn_t *pfn, long size)
+{
+	struct dcssblk_dev_info *dev_info = dax_inode_get_private(dax_inode);
+
+	return __dcssblk_direct_access(dev_info, dev_addr, kaddr, pfn, size);
 }
 
 static void

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 08/17] dcssblk: add dax_operations support
@ 2017-01-28  8:36   ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

Setup a dax_inode to have the same lifetime as the dcssblk block device
and add a ->direct_access() method that is equivalent to
dcssblk_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old dcssblk_direct_access() will be removed.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/s390/block/Kconfig   |    1 +
 drivers/s390/block/dcssblk.c |   53 +++++++++++++++++++++++++++++++++++-------
 2 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/drivers/s390/block/Kconfig b/drivers/s390/block/Kconfig
index 4a3b62326183..0acb8c2f9475 100644
--- a/drivers/s390/block/Kconfig
+++ b/drivers/s390/block/Kconfig
@@ -14,6 +14,7 @@ config BLK_DEV_XPRAM
 
 config DCSSBLK
 	def_tristate m
+	select DAX
 	prompt "DCSSBLK support"
 	depends on S390 && BLOCK
 	help
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 9d66b4fb174b..67b0885b4d12 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -18,6 +18,7 @@
 #include <linux/interrupt.h>
 #include <linux/platform_device.h>
 #include <linux/pfn_t.h>
+#include <linux/dax.h>
 #include <asm/extmem.h>
 #include <asm/io.h>
 
@@ -30,8 +31,10 @@ static int dcssblk_open(struct block_device *bdev, fmode_t mode);
 static void dcssblk_release(struct gendisk *disk, fmode_t mode);
 static blk_qc_t dcssblk_make_request(struct request_queue *q,
 						struct bio *bio);
-static long dcssblk_direct_access(struct block_device *bdev, sector_t secnum,
+static long dcssblk_blk_direct_access(struct block_device *bdev, sector_t secnum,
 			 void **kaddr, pfn_t *pfn, long size);
+static long dcssblk_dax_direct_access(struct dax_inode *dax_inode,
+		phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size);
 
 static char dcssblk_segments[DCSSBLK_PARM_LEN] = "\0";
 
@@ -40,7 +43,11 @@ static const struct block_device_operations dcssblk_devops = {
 	.owner   	= THIS_MODULE,
 	.open    	= dcssblk_open,
 	.release 	= dcssblk_release,
-	.direct_access 	= dcssblk_direct_access,
+	.direct_access 	= dcssblk_blk_direct_access,
+};
+
+static const struct dax_operations dcssblk_dax_ops = {
+	.direct_access = dcssblk_dax_direct_access,
 };
 
 struct dcssblk_dev_info {
@@ -57,6 +64,7 @@ struct dcssblk_dev_info {
 	struct request_queue *dcssblk_queue;
 	int num_of_segments;
 	struct list_head seg_list;
+	struct dax_inode *dax_inode;
 };
 
 struct segment_info {
@@ -389,6 +397,8 @@ dcssblk_shared_store(struct device *dev, struct device_attribute *attr, const ch
 	}
 	list_del(&dev_info->lh);
 
+	kill_dax_inode(dev_info->dax_inode);
+	put_dax_inode(dev_info->dax_inode);
 	del_gendisk(dev_info->gd);
 	blk_cleanup_queue(dev_info->dcssblk_queue);
 	dev_info->gd->queue = NULL;
@@ -525,6 +535,7 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
 	int rc, i, j, num_of_segments;
 	struct dcssblk_dev_info *dev_info;
 	struct segment_info *seg_info, *temp;
+	struct dax_inode *dax_inode;
 	char *local_buf;
 	unsigned long seg_byte_size;
 
@@ -654,6 +665,11 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
 	if (rc)
 		goto put_dev;
 
+	dax_inode = alloc_dax_inode(dev_info, dev_info->gd->disk_name,
+			&dcssblk_dax_ops);
+	if (!dax_inode)
+		goto put_dev;
+
 	get_device(&dev_info->dev);
 	device_add_disk(&dev_info->dev, dev_info->gd);
 
@@ -752,6 +768,8 @@ dcssblk_remove_store(struct device *dev, struct device_attribute *attr, const ch
 	}
 
 	list_del(&dev_info->lh);
+	kill_dax_inode(dev_info->dax_inode);
+	put_dax_inode(dev_info->dax_inode);
 	del_gendisk(dev_info->gd);
 	blk_cleanup_queue(dev_info->dcssblk_queue);
 	dev_info->gd->queue = NULL;
@@ -883,21 +901,38 @@ dcssblk_make_request(struct request_queue *q, struct bio *bio)
 }
 
 static long
-dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
+__dcssblk_direct_access(struct dcssblk_dev_info *dev_info, phys_addr_t offset,
+		void **kaddr, pfn_t *pfn, long size)
+{
+	unsigned long dev_sz;
+
+	dev_sz = dev_info->end - dev_info->start;
+	*kaddr = (void *) dev_info->start + offset;
+	*pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), PFN_DEV);
+
+	return dev_sz - offset;
+}
+
+static long
+dcssblk_blk_direct_access(struct block_device *bdev, sector_t secnum,
 			void **kaddr, pfn_t *pfn, long size)
 {
 	struct dcssblk_dev_info *dev_info;
-	unsigned long offset, dev_sz;
 
 	dev_info = bdev->bd_disk->private_data;
 	if (!dev_info)
 		return -ENODEV;
-	dev_sz = dev_info->end - dev_info->start;
-	offset = secnum * 512;
-	*kaddr = (void *) dev_info->start + offset;
-	*pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), PFN_DEV);
+	return __dcssblk_direct_access(dev_info, secnum * 512, kaddr, pfn,
+			size);
+}
 
-	return dev_sz - offset;
+static long
+dcssblk_dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
+			void **kaddr, pfn_t *pfn, long size)
+{
+	struct dcssblk_dev_info *dev_info = dax_inode_get_private(dax_inode);
+
+	return __dcssblk_direct_access(dev_info, dev_addr, kaddr, pfn, size);
 }
 
 static void


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 09/17] block: kill bdev_dax_capable()
  2017-01-28  8:36 ` Dan Williams
@ 2017-01-28  8:36   ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

This is leftover dead code that has since been replaced by
bdev_dax_supported().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/block_dev.c         |   24 ------------------------
 include/linux/blkdev.h |    1 -
 2 files changed, 25 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 601b71b76d7f..edb1d2b16b8f 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -807,30 +807,6 @@ int bdev_dax_supported(struct super_block *sb, int blocksize)
 }
 EXPORT_SYMBOL_GPL(bdev_dax_supported);
 
-/**
- * bdev_dax_capable() - Return if the raw device is capable for dax
- * @bdev: The device for raw block device access
- */
-bool bdev_dax_capable(struct block_device *bdev)
-{
-	struct blk_dax_ctl dax = {
-		.size = PAGE_SIZE,
-	};
-
-	if (!IS_ENABLED(CONFIG_FS_DAX))
-		return false;
-
-	dax.sector = 0;
-	if (bdev_direct_access(bdev, &dax) < 0)
-		return false;
-
-	dax.sector = bdev->bd_part->nr_sects - (PAGE_SIZE / 512);
-	if (bdev_direct_access(bdev, &dax) < 0)
-		return false;
-
-	return true;
-}
-
 /*
  * pseudo-fs
  */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3c0ff78b1219..5e7706f7d533 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1904,7 +1904,6 @@ extern int bdev_write_page(struct block_device *, sector_t, struct page *,
 						struct writeback_control *);
 extern long bdev_direct_access(struct block_device *, struct blk_dax_ctl *);
 extern int bdev_dax_supported(struct super_block *, int);
-extern bool bdev_dax_capable(struct block_device *);
 #else /* CONFIG_BLOCK */
 
 struct block_device;

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 09/17] block: kill bdev_dax_capable()
@ 2017-01-28  8:36   ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

This is leftover dead code that has since been replaced by
bdev_dax_supported().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/block_dev.c         |   24 ------------------------
 include/linux/blkdev.h |    1 -
 2 files changed, 25 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 601b71b76d7f..edb1d2b16b8f 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -807,30 +807,6 @@ int bdev_dax_supported(struct super_block *sb, int blocksize)
 }
 EXPORT_SYMBOL_GPL(bdev_dax_supported);
 
-/**
- * bdev_dax_capable() - Return if the raw device is capable for dax
- * @bdev: The device for raw block device access
- */
-bool bdev_dax_capable(struct block_device *bdev)
-{
-	struct blk_dax_ctl dax = {
-		.size = PAGE_SIZE,
-	};
-
-	if (!IS_ENABLED(CONFIG_FS_DAX))
-		return false;
-
-	dax.sector = 0;
-	if (bdev_direct_access(bdev, &dax) < 0)
-		return false;
-
-	dax.sector = bdev->bd_part->nr_sects - (PAGE_SIZE / 512);
-	if (bdev_direct_access(bdev, &dax) < 0)
-		return false;
-
-	return true;
-}
-
 /*
  * pseudo-fs
  */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3c0ff78b1219..5e7706f7d533 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1904,7 +1904,6 @@ extern int bdev_write_page(struct block_device *, sector_t, struct page *,
 						struct writeback_control *);
 extern long bdev_direct_access(struct block_device *, struct blk_dax_ctl *);
 extern int bdev_dax_supported(struct super_block *, int);
-extern bool bdev_dax_capable(struct block_device *);
 #else /* CONFIG_BLOCK */
 
 struct block_device;


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
  2017-01-28  8:36 ` Dan Williams
@ 2017-01-28  8:36   ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

Provide a replacement for bdev_direct_access() that uses
dax_operations.direct_access() instead of
block_device_operations.direct_access(). Once all consumers of the old
api have been converted bdev_direct_access() will be deleted.

Given that block device partitioning decisions can cause dax page
alignment constraints to be violated we still need to validate the
block_device before calling the dax ->direct_access method.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/Kconfig          |    1 +
 drivers/dax/super.c    |   33 +++++++++++++++++++++++++++++++++
 fs/block_dev.c         |   28 ++++++++++++++++++++++++++++
 include/linux/blkdev.h |    3 +++
 include/linux/dax.h    |    2 ++
 5 files changed, 67 insertions(+)

diff --git a/block/Kconfig b/block/Kconfig
index 8bf114a3858a..9be785173280 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -6,6 +6,7 @@ menuconfig BLOCK
        default y
        select SBITMAP
        select SRCU
+       select DAX
        help
 	 Provide block layer support for the kernel.
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index eb844ffea3cf..ab5b082df5dd 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -65,6 +65,39 @@ struct dax_inode {
 	const struct dax_operations *ops;
 };
 
+long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
+		void **kaddr, pfn_t *pfn, long size)
+{
+	long avail;
+
+	/*
+	 * The device driver is allowed to sleep, in order to make the
+	 * memory directly accessible.
+	 */
+	might_sleep();
+
+	if (!dax_inode)
+		return -EOPNOTSUPP;
+
+	if (!dax_inode_alive(dax_inode))
+		return -ENXIO;
+
+	if (size < 0)
+		return size;
+
+	if (dev_addr % PAGE_SIZE)
+		return -EINVAL;
+
+	avail = dax_inode->ops->direct_access(dax_inode, dev_addr, kaddr, pfn,
+			size);
+	if (!avail)
+		return -ERANGE;
+	if (avail > 0 && avail & ~PAGE_MASK)
+		return -ENXIO;
+	return min(avail, size);
+}
+EXPORT_SYMBOL_GPL(dax_direct_access);
+
 bool dax_inode_alive(struct dax_inode *dax_inode)
 {
 	lockdep_assert_held(&dax_srcu);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index edb1d2b16b8f..bf4b51a3a412 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -18,6 +18,7 @@
 #include <linux/module.h>
 #include <linux/blkpg.h>
 #include <linux/magic.h>
+#include <linux/dax.h>
 #include <linux/buffer_head.h>
 #include <linux/swap.h>
 #include <linux/pagevec.h>
@@ -763,6 +764,33 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
 EXPORT_SYMBOL_GPL(bdev_direct_access);
 
 /**
+ * bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address
+ * @bdev: host block device for @dax_inode
+ * @dax_inode: interface data and operations for a memory device
+ * @dax: control and output parameters for ->direct_access
+ *
+ * Return: negative errno if an error occurs, otherwise the number of bytes
+ * accessible at this address.
+ *
+ * Locking: must be called with dax_read_lock() held
+ */
+long bdev_dax_direct_access(struct block_device *bdev,
+		struct dax_inode *dax_inode, struct blk_dax_ctl *dax)
+{
+	sector_t sector = dax->sector;
+
+	if (!blk_queue_dax(bdev->bd_queue))
+		return -EOPNOTSUPP;
+	if ((sector + DIV_ROUND_UP(dax->size, 512))
+			> part_nr_sects_read(bdev->bd_part))
+		return -ERANGE;
+	sector += get_start_sect(bdev);
+	return dax_direct_access(dax_inode, sector * 512, &dax->addr,
+			&dax->pfn, dax->size);
+}
+EXPORT_SYMBOL_GPL(bdev_dax_direct_access);
+
+/**
  * bdev_dax_supported() - Check if the device supports dax for filesystem
  * @sb: The superblock of the device
  * @blocksize: The block size of the device
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5e7706f7d533..3b3c5ce376fd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1903,6 +1903,9 @@ extern int bdev_read_page(struct block_device *, sector_t, struct page *);
 extern int bdev_write_page(struct block_device *, sector_t, struct page *,
 						struct writeback_control *);
 extern long bdev_direct_access(struct block_device *, struct blk_dax_ctl *);
+struct dax_inode;
+extern long bdev_dax_direct_access(struct block_device *bdev,
+		struct dax_inode *dax_inode, struct blk_dax_ctl *dax);
 extern int bdev_dax_supported(struct super_block *, int);
 #else /* CONFIG_BLOCK */
 
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 5aa620e8e5a2..2ef8e18e2587 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -22,6 +22,8 @@ void *dax_inode_get_private(struct dax_inode *dax_inode);
 void put_dax_inode(struct dax_inode *dax_inode);
 bool dax_inode_alive(struct dax_inode *dax_inode);
 void kill_dax_inode(struct dax_inode *dax_inode);
+long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
+		void **kaddr, pfn_t *pfn, long size);
 
 /*
  * We use lowest available bit in exceptional entry for locking, one bit for

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
@ 2017-01-28  8:36   ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:36 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

Provide a replacement for bdev_direct_access() that uses
dax_operations.direct_access() instead of
block_device_operations.direct_access(). Once all consumers of the old
api have been converted bdev_direct_access() will be deleted.

Given that block device partitioning decisions can cause dax page
alignment constraints to be violated we still need to validate the
block_device before calling the dax ->direct_access method.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/Kconfig          |    1 +
 drivers/dax/super.c    |   33 +++++++++++++++++++++++++++++++++
 fs/block_dev.c         |   28 ++++++++++++++++++++++++++++
 include/linux/blkdev.h |    3 +++
 include/linux/dax.h    |    2 ++
 5 files changed, 67 insertions(+)

diff --git a/block/Kconfig b/block/Kconfig
index 8bf114a3858a..9be785173280 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -6,6 +6,7 @@ menuconfig BLOCK
        default y
        select SBITMAP
        select SRCU
+       select DAX
        help
 	 Provide block layer support for the kernel.
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index eb844ffea3cf..ab5b082df5dd 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -65,6 +65,39 @@ struct dax_inode {
 	const struct dax_operations *ops;
 };
 
+long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
+		void **kaddr, pfn_t *pfn, long size)
+{
+	long avail;
+
+	/*
+	 * The device driver is allowed to sleep, in order to make the
+	 * memory directly accessible.
+	 */
+	might_sleep();
+
+	if (!dax_inode)
+		return -EOPNOTSUPP;
+
+	if (!dax_inode_alive(dax_inode))
+		return -ENXIO;
+
+	if (size < 0)
+		return size;
+
+	if (dev_addr % PAGE_SIZE)
+		return -EINVAL;
+
+	avail = dax_inode->ops->direct_access(dax_inode, dev_addr, kaddr, pfn,
+			size);
+	if (!avail)
+		return -ERANGE;
+	if (avail > 0 && avail & ~PAGE_MASK)
+		return -ENXIO;
+	return min(avail, size);
+}
+EXPORT_SYMBOL_GPL(dax_direct_access);
+
 bool dax_inode_alive(struct dax_inode *dax_inode)
 {
 	lockdep_assert_held(&dax_srcu);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index edb1d2b16b8f..bf4b51a3a412 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -18,6 +18,7 @@
 #include <linux/module.h>
 #include <linux/blkpg.h>
 #include <linux/magic.h>
+#include <linux/dax.h>
 #include <linux/buffer_head.h>
 #include <linux/swap.h>
 #include <linux/pagevec.h>
@@ -763,6 +764,33 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
 EXPORT_SYMBOL_GPL(bdev_direct_access);
 
 /**
+ * bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address
+ * @bdev: host block device for @dax_inode
+ * @dax_inode: interface data and operations for a memory device
+ * @dax: control and output parameters for ->direct_access
+ *
+ * Return: negative errno if an error occurs, otherwise the number of bytes
+ * accessible at this address.
+ *
+ * Locking: must be called with dax_read_lock() held
+ */
+long bdev_dax_direct_access(struct block_device *bdev,
+		struct dax_inode *dax_inode, struct blk_dax_ctl *dax)
+{
+	sector_t sector = dax->sector;
+
+	if (!blk_queue_dax(bdev->bd_queue))
+		return -EOPNOTSUPP;
+	if ((sector + DIV_ROUND_UP(dax->size, 512))
+			> part_nr_sects_read(bdev->bd_part))
+		return -ERANGE;
+	sector += get_start_sect(bdev);
+	return dax_direct_access(dax_inode, sector * 512, &dax->addr,
+			&dax->pfn, dax->size);
+}
+EXPORT_SYMBOL_GPL(bdev_dax_direct_access);
+
+/**
  * bdev_dax_supported() - Check if the device supports dax for filesystem
  * @sb: The superblock of the device
  * @blocksize: The block size of the device
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5e7706f7d533..3b3c5ce376fd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1903,6 +1903,9 @@ extern int bdev_read_page(struct block_device *, sector_t, struct page *);
 extern int bdev_write_page(struct block_device *, sector_t, struct page *,
 						struct writeback_control *);
 extern long bdev_direct_access(struct block_device *, struct blk_dax_ctl *);
+struct dax_inode;
+extern long bdev_dax_direct_access(struct block_device *bdev,
+		struct dax_inode *dax_inode, struct blk_dax_ctl *dax);
 extern int bdev_dax_supported(struct super_block *, int);
 #else /* CONFIG_BLOCK */
 
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 5aa620e8e5a2..2ef8e18e2587 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -22,6 +22,8 @@ void *dax_inode_get_private(struct dax_inode *dax_inode);
 void put_dax_inode(struct dax_inode *dax_inode);
 bool dax_inode_alive(struct dax_inode *dax_inode);
 void kill_dax_inode(struct dax_inode *dax_inode);
+long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
+		void **kaddr, pfn_t *pfn, long size);
 
 /*
  * We use lowest available bit in exceptional entry for locking, one bit for


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 11/17] dm: add dax_operations support (producer)
  2017-01-28  8:36 ` Dan Williams
@ 2017-01-28  8:37   ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:37 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

Setup a dax_inode to have the same lifetime as the dm block device and
add a ->direct_access() method that is equivalent to
dm_blk_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old dm_blk_direct_access() will be removed.

This enabling is only for the top-level dm representation to upper
layers. Sub-sequent patches are needed to convert the bottom layer
interface to backing devices.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/md/Kconfig   |    1 +
 drivers/md/dm-core.h |    3 +++
 drivers/md/dm.c      |   42 +++++++++++++++++++++++++++++++++++++++---
 3 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index b7767da50c26..1de8372d9459 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -200,6 +200,7 @@ config BLK_DEV_DM_BUILTIN
 config BLK_DEV_DM
 	tristate "Device mapper support"
 	select BLK_DEV_DM_BUILTIN
+	select DAX
 	---help---
 	  Device-mapper is a low level volume manager.  It works by allowing
 	  people to specify mappings for ranges of logical sectors.  Various
diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 40ceba1fe8be..f6eb8d8db646 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -24,6 +24,8 @@ struct dm_kobject_holder {
 	struct completion completion;
 };
 
+struct dax_inode;
+
 /*
  * DM core internal structure that used directly by dm.c and dm-rq.c
  * DM targets must _not_ deference a mapped_device to directly access its members!
@@ -58,6 +60,7 @@ struct mapped_device {
 	struct target_type *immutable_target_type;
 
 	struct gendisk *disk;
+	struct dax_inode *dax_inode;
 	char name[16];
 
 	void *interface_ptr;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index db934b1dba9d..1b3d9253e92c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -15,6 +15,7 @@
 #include <linux/blkpg.h>
 #include <linux/bio.h>
 #include <linux/mempool.h>
+#include <linux/dax.h>
 #include <linux/slab.h>
 #include <linux/idr.h>
 #include <linux/hdreg.h>
@@ -905,10 +906,10 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
 }
 EXPORT_SYMBOL_GPL(dm_set_target_max_io_len);
 
-static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
-				 void **kaddr, pfn_t *pfn, long size)
+static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
+			       void **kaddr, pfn_t *pfn, long size)
 {
-	struct mapped_device *md = bdev->bd_disk->private_data;
+	sector_t sector = dev_addr >> SECTOR_SHIFT;
 	struct dm_table *map;
 	struct dm_target *ti;
 	int srcu_idx;
@@ -932,6 +933,23 @@ static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
 	return min(ret, size);
 }
 
+static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
+				 void **kaddr, pfn_t *pfn, long size)
+{
+	struct mapped_device *md = bdev->bd_disk->private_data;
+
+	return __dm_direct_access(md, sector << SECTOR_SHIFT, kaddr, pfn, size);
+}
+
+static long dm_dax_direct_access(struct dax_inode *dax_inode,
+				 phys_addr_t dev_addr, void **kaddr, pfn_t *pfn,
+				 long size)
+{
+	struct mapped_device *md = dax_inode_get_private(dax_inode);
+
+	return __dm_direct_access(md, dev_addr, kaddr, pfn, size);
+}
+
 /*
  * A target may call dm_accept_partial_bio only from the map routine.  It is
  * allowed for all bio types except REQ_PREFLUSH.
@@ -1376,6 +1394,7 @@ static int next_free_minor(int *minor)
 }
 
 static const struct block_device_operations dm_blk_dops;
+static const struct dax_operations dm_dax_ops;
 
 static void dm_wq_work(struct work_struct *work);
 
@@ -1423,6 +1442,12 @@ static void cleanup_mapped_device(struct mapped_device *md)
 	if (md->bs)
 		bioset_free(md->bs);
 
+	if (md->dax_inode) {
+		kill_dax_inode(md->dax_inode);
+		put_dax_inode(md->dax_inode);
+		md->dax_inode = NULL;
+	}
+
 	if (md->disk) {
 		spin_lock(&_minor_lock);
 		md->disk->private_data = NULL;
@@ -1450,6 +1475,7 @@ static void cleanup_mapped_device(struct mapped_device *md)
 static struct mapped_device *alloc_dev(int minor)
 {
 	int r, numa_node_id = dm_get_numa_node();
+	struct dax_inode *dax_inode;
 	struct mapped_device *md;
 	void *old_md;
 
@@ -1514,6 +1540,12 @@ static struct mapped_device *alloc_dev(int minor)
 	md->disk->queue = md->queue;
 	md->disk->private_data = md;
 	sprintf(md->disk->disk_name, "dm-%d", minor);
+
+	dax_inode = alloc_dax_inode(md, md->disk->disk_name, &dm_dax_ops);
+	if (!dax_inode)
+		goto bad;
+	md->dax_inode = dax_inode;
+
 	add_disk(md->disk);
 	format_dev_t(md->name, MKDEV(_major, minor));
 
@@ -2735,6 +2767,10 @@ static const struct block_device_operations dm_blk_dops = {
 	.owner = THIS_MODULE
 };
 
+static const struct dax_operations dm_dax_ops = {
+	.direct_access = dm_dax_direct_access,
+};
+
 /*
  * module hooks
  */

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 11/17] dm: add dax_operations support (producer)
@ 2017-01-28  8:37   ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:37 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

Setup a dax_inode to have the same lifetime as the dm block device and
add a ->direct_access() method that is equivalent to
dm_blk_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old dm_blk_direct_access() will be removed.

This enabling is only for the top-level dm representation to upper
layers. Sub-sequent patches are needed to convert the bottom layer
interface to backing devices.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/md/Kconfig   |    1 +
 drivers/md/dm-core.h |    3 +++
 drivers/md/dm.c      |   42 +++++++++++++++++++++++++++++++++++++++---
 3 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index b7767da50c26..1de8372d9459 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -200,6 +200,7 @@ config BLK_DEV_DM_BUILTIN
 config BLK_DEV_DM
 	tristate "Device mapper support"
 	select BLK_DEV_DM_BUILTIN
+	select DAX
 	---help---
 	  Device-mapper is a low level volume manager.  It works by allowing
 	  people to specify mappings for ranges of logical sectors.  Various
diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 40ceba1fe8be..f6eb8d8db646 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -24,6 +24,8 @@ struct dm_kobject_holder {
 	struct completion completion;
 };
 
+struct dax_inode;
+
 /*
  * DM core internal structure that used directly by dm.c and dm-rq.c
  * DM targets must _not_ deference a mapped_device to directly access its members!
@@ -58,6 +60,7 @@ struct mapped_device {
 	struct target_type *immutable_target_type;
 
 	struct gendisk *disk;
+	struct dax_inode *dax_inode;
 	char name[16];
 
 	void *interface_ptr;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index db934b1dba9d..1b3d9253e92c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -15,6 +15,7 @@
 #include <linux/blkpg.h>
 #include <linux/bio.h>
 #include <linux/mempool.h>
+#include <linux/dax.h>
 #include <linux/slab.h>
 #include <linux/idr.h>
 #include <linux/hdreg.h>
@@ -905,10 +906,10 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
 }
 EXPORT_SYMBOL_GPL(dm_set_target_max_io_len);
 
-static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
-				 void **kaddr, pfn_t *pfn, long size)
+static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
+			       void **kaddr, pfn_t *pfn, long size)
 {
-	struct mapped_device *md = bdev->bd_disk->private_data;
+	sector_t sector = dev_addr >> SECTOR_SHIFT;
 	struct dm_table *map;
 	struct dm_target *ti;
 	int srcu_idx;
@@ -932,6 +933,23 @@ static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
 	return min(ret, size);
 }
 
+static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
+				 void **kaddr, pfn_t *pfn, long size)
+{
+	struct mapped_device *md = bdev->bd_disk->private_data;
+
+	return __dm_direct_access(md, sector << SECTOR_SHIFT, kaddr, pfn, size);
+}
+
+static long dm_dax_direct_access(struct dax_inode *dax_inode,
+				 phys_addr_t dev_addr, void **kaddr, pfn_t *pfn,
+				 long size)
+{
+	struct mapped_device *md = dax_inode_get_private(dax_inode);
+
+	return __dm_direct_access(md, dev_addr, kaddr, pfn, size);
+}
+
 /*
  * A target may call dm_accept_partial_bio only from the map routine.  It is
  * allowed for all bio types except REQ_PREFLUSH.
@@ -1376,6 +1394,7 @@ static int next_free_minor(int *minor)
 }
 
 static const struct block_device_operations dm_blk_dops;
+static const struct dax_operations dm_dax_ops;
 
 static void dm_wq_work(struct work_struct *work);
 
@@ -1423,6 +1442,12 @@ static void cleanup_mapped_device(struct mapped_device *md)
 	if (md->bs)
 		bioset_free(md->bs);
 
+	if (md->dax_inode) {
+		kill_dax_inode(md->dax_inode);
+		put_dax_inode(md->dax_inode);
+		md->dax_inode = NULL;
+	}
+
 	if (md->disk) {
 		spin_lock(&_minor_lock);
 		md->disk->private_data = NULL;
@@ -1450,6 +1475,7 @@ static void cleanup_mapped_device(struct mapped_device *md)
 static struct mapped_device *alloc_dev(int minor)
 {
 	int r, numa_node_id = dm_get_numa_node();
+	struct dax_inode *dax_inode;
 	struct mapped_device *md;
 	void *old_md;
 
@@ -1514,6 +1540,12 @@ static struct mapped_device *alloc_dev(int minor)
 	md->disk->queue = md->queue;
 	md->disk->private_data = md;
 	sprintf(md->disk->disk_name, "dm-%d", minor);
+
+	dax_inode = alloc_dax_inode(md, md->disk->disk_name, &dm_dax_ops);
+	if (!dax_inode)
+		goto bad;
+	md->dax_inode = dax_inode;
+
 	add_disk(md->disk);
 	format_dev_t(md->name, MKDEV(_major, minor));
 
@@ -2735,6 +2767,10 @@ static const struct block_device_operations dm_blk_dops = {
 	.owner = THIS_MODULE
 };
 
+static const struct dax_operations dm_dax_ops = {
+	.direct_access = dm_dax_direct_access,
+};
+
 /*
  * module hooks
  */


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 12/17] dm: add dax_operations support (consumer)
  2017-01-28  8:36 ` Dan Williams
@ 2017-01-28  8:37   ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:37 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

Arrange for dm to lookup the dax services available from member
devices. Update the dax-capable targets, linear and stripe, to route dax
operations to the underlying device.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/md/dm-linear.c        |   24 ++++++++++++++++++++++++
 drivers/md/dm-snap.c          |   12 ++++++++++++
 drivers/md/dm-stripe.c        |   30 ++++++++++++++++++++++++++++++
 drivers/md/dm-target.c        |   11 +++++++++++
 drivers/md/dm.c               |   16 ++++++++++++----
 include/linux/device-mapper.h |    7 +++++++
 6 files changed, 96 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 4788b0b989a9..e91ca8089333 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -159,6 +159,29 @@ static long linear_direct_access(struct dm_target *ti, sector_t sector,
 	return ret;
 }
 
+static long linear_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
+				     void **kaddr, pfn_t *pfn, long size)
+{
+	struct linear_c *lc = ti->private;
+	struct block_device *bdev = lc->dev->bdev;
+	struct dax_inode *dax_inode = lc->dev->dax_inode;
+	struct blk_dax_ctl dax = {
+		.sector = linear_map_sector(ti, dev_addr >> SECTOR_SHIFT),
+		.size = size,
+	};
+	long ret;
+
+	ret = bdev_dax_direct_access(bdev, dax_inode, &dax);
+	*kaddr = dax.addr;
+	*pfn = dax.pfn;
+
+	return ret;
+}
+
+static const struct dm_dax_operations linear_dax_ops = {
+	.dm_direct_access = linear_dax_direct_access,
+};
+
 static struct target_type linear_target = {
 	.name   = "linear",
 	.version = {1, 3, 0},
@@ -170,6 +193,7 @@ static struct target_type linear_target = {
 	.prepare_ioctl = linear_prepare_ioctl,
 	.iterate_devices = linear_iterate_devices,
 	.direct_access = linear_direct_access,
+	.dax_ops = &linear_dax_ops,
 };
 
 int __init dm_linear_init(void)
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index c65feeada864..1990e3bd6958 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -2309,6 +2309,13 @@ static long origin_direct_access(struct dm_target *ti, sector_t sector,
 	return -EIO;
 }
 
+static long origin_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
+		void **kaddr, pfn_t *pfn, long size)
+{
+	DMWARN("device does not support dax.");
+	return -EIO;
+}
+
 /*
  * Set the target "max_io_len" field to the minimum of all the snapshots'
  * chunk sizes.
@@ -2357,6 +2364,10 @@ static int origin_iterate_devices(struct dm_target *ti,
 	return fn(ti, o->dev, 0, ti->len, data);
 }
 
+static const struct dm_dax_operations origin_dax_ops = {
+	.dm_direct_access = origin_dax_direct_access,
+};
+
 static struct target_type origin_target = {
 	.name    = "snapshot-origin",
 	.version = {1, 9, 0},
@@ -2369,6 +2380,7 @@ static struct target_type origin_target = {
 	.status  = origin_status,
 	.iterate_devices = origin_iterate_devices,
 	.direct_access = origin_direct_access,
+	.dax_ops = &origin_dax_ops,
 };
 
 static struct target_type snapshot_target = {
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 28193a57bf47..47fb56a6184a 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -331,6 +331,31 @@ static long stripe_direct_access(struct dm_target *ti, sector_t sector,
 	return ret;
 }
 
+static long stripe_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
+		void **kaddr, pfn_t *pfn, long size)
+{
+	struct stripe_c *sc = ti->private;
+	uint32_t stripe;
+	struct block_device *bdev;
+	struct dax_inode *dax_inode;
+	struct blk_dax_ctl dax = {
+		.size = size,
+	};
+	long ret;
+
+	stripe_map_sector(sc, dev_addr >> SECTOR_SHIFT, &stripe, &dax.sector);
+
+	dax.sector += sc->stripe[stripe].physical_start;
+	bdev = sc->stripe[stripe].dev->bdev;
+	dax_inode = sc->stripe[stripe].dev->dax_inode;
+
+	ret = bdev_dax_direct_access(bdev, dax_inode, &dax);
+	*kaddr = dax.addr;
+	*pfn = dax.pfn;
+
+	return ret;
+}
+
 /*
  * Stripe status:
  *
@@ -437,6 +462,10 @@ static void stripe_io_hints(struct dm_target *ti,
 	blk_limits_io_opt(limits, chunk_size * sc->stripes);
 }
 
+static const struct dm_dax_operations stripe_dax_ops = {
+	.dm_direct_access = stripe_dax_direct_access,
+};
+
 static struct target_type stripe_target = {
 	.name   = "striped",
 	.version = {1, 6, 0},
@@ -449,6 +478,7 @@ static struct target_type stripe_target = {
 	.iterate_devices = stripe_iterate_devices,
 	.io_hints = stripe_io_hints,
 	.direct_access = stripe_direct_access,
+	.dax_ops = &stripe_dax_ops,
 };
 
 int __init dm_stripe_init(void)
diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c
index 710ae28fd618..ab072f53cf24 100644
--- a/drivers/md/dm-target.c
+++ b/drivers/md/dm-target.c
@@ -154,6 +154,16 @@ static long io_err_direct_access(struct dm_target *ti, sector_t sector,
 	return -EIO;
 }
 
+static long io_err_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
+				     void **kaddr, pfn_t *pfn, long size)
+{
+	return -EIO;
+}
+
+static const struct dm_dax_operations err_dax_ops = {
+	.dm_direct_access = io_err_dax_direct_access,
+};
+
 static struct target_type error_target = {
 	.name = "error",
 	.version = {1, 5, 0},
@@ -165,6 +175,7 @@ static struct target_type error_target = {
 	.clone_and_map_rq = io_err_clone_and_map_rq,
 	.release_clone_rq = io_err_release_clone_rq,
 	.direct_access = io_err_direct_access,
+	.dax_ops = &err_dax_ops,
 };
 
 int __init dm_target_init(void)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 1b3d9253e92c..5c5eeda0eb0a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -627,6 +627,7 @@ static int open_table_device(struct table_device *td, dev_t dev,
 	}
 
 	td->dm_dev.bdev = bdev;
+	td->dm_dev.dax_inode = dax_get_by_host(bdev->bd_disk->disk_name);
 	return 0;
 }
 
@@ -640,7 +641,9 @@ static void close_table_device(struct table_device *td, struct mapped_device *md
 
 	bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
 	blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
+	put_dax_inode(td->dm_dev.dax_inode);
 	td->dm_dev.bdev = NULL;
+	td->dm_dev.dax_inode = NULL;
 }
 
 static struct table_device *find_table_device(struct list_head *l, dev_t dev,
@@ -907,7 +910,7 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
 EXPORT_SYMBOL_GPL(dm_set_target_max_io_len);
 
 static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
-			       void **kaddr, pfn_t *pfn, long size)
+			       void **kaddr, pfn_t *pfn, long size, bool blk)
 {
 	sector_t sector = dev_addr >> SECTOR_SHIFT;
 	struct dm_table *map;
@@ -926,8 +929,11 @@ static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
 	len = max_io_len(sector, ti) << SECTOR_SHIFT;
 	size = min(len, size);
 
-	if (ti->type->direct_access)
+	if (blk && ti->type->direct_access)
 		ret = ti->type->direct_access(ti, sector, kaddr, pfn, size);
+	else if (ti->type->dax_ops)
+		ret = ti->type->dax_ops->dm_direct_access(ti, dev_addr, kaddr,
+				pfn, size);
 out:
 	dm_put_live_table(md, srcu_idx);
 	return min(ret, size);
@@ -938,7 +944,8 @@ static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
 {
 	struct mapped_device *md = bdev->bd_disk->private_data;
 
-	return __dm_direct_access(md, sector << SECTOR_SHIFT, kaddr, pfn, size);
+	return __dm_direct_access(md, sector << SECTOR_SHIFT, kaddr, pfn, size,
+			true);
 }
 
 static long dm_dax_direct_access(struct dax_inode *dax_inode,
@@ -947,7 +954,8 @@ static long dm_dax_direct_access(struct dax_inode *dax_inode,
 {
 	struct mapped_device *md = dax_inode_get_private(dax_inode);
 
-	return __dm_direct_access(md, dev_addr, kaddr, pfn, size);
+	return __dm_direct_access(md, dev_addr, kaddr, pfn, size,
+			false);
 }
 
 /*
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index ef7962e84444..1b64f412bb45 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -137,12 +137,18 @@ void dm_error(const char *message);
 
 struct dm_dev {
 	struct block_device *bdev;
+	struct dax_inode *dax_inode;
 	fmode_t mode;
 	char name[16];
 };
 
 dev_t dm_get_dev_t(const char *path);
 
+struct dm_dax_operations {
+	long (*dm_direct_access)(struct dm_target *ti, phys_addr_t dev_addr,
+			void **kaddr, pfn_t *pfn, long size);
+};
+
 /*
  * Constructors should call these functions to ensure destination devices
  * are opened/closed correctly.
@@ -180,6 +186,7 @@ struct target_type {
 	dm_iterate_devices_fn iterate_devices;
 	dm_io_hints_fn io_hints;
 	dm_direct_access_fn direct_access;
+	const struct dm_dax_operations *dax_ops;
 
 	/* For internal device-mapper use. */
 	struct list_head list;

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 12/17] dm: add dax_operations support (consumer)
@ 2017-01-28  8:37   ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:37 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

Arrange for dm to lookup the dax services available from member
devices. Update the dax-capable targets, linear and stripe, to route dax
operations to the underlying device.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/md/dm-linear.c        |   24 ++++++++++++++++++++++++
 drivers/md/dm-snap.c          |   12 ++++++++++++
 drivers/md/dm-stripe.c        |   30 ++++++++++++++++++++++++++++++
 drivers/md/dm-target.c        |   11 +++++++++++
 drivers/md/dm.c               |   16 ++++++++++++----
 include/linux/device-mapper.h |    7 +++++++
 6 files changed, 96 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 4788b0b989a9..e91ca8089333 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -159,6 +159,29 @@ static long linear_direct_access(struct dm_target *ti, sector_t sector,
 	return ret;
 }
 
+static long linear_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
+				     void **kaddr, pfn_t *pfn, long size)
+{
+	struct linear_c *lc = ti->private;
+	struct block_device *bdev = lc->dev->bdev;
+	struct dax_inode *dax_inode = lc->dev->dax_inode;
+	struct blk_dax_ctl dax = {
+		.sector = linear_map_sector(ti, dev_addr >> SECTOR_SHIFT),
+		.size = size,
+	};
+	long ret;
+
+	ret = bdev_dax_direct_access(bdev, dax_inode, &dax);
+	*kaddr = dax.addr;
+	*pfn = dax.pfn;
+
+	return ret;
+}
+
+static const struct dm_dax_operations linear_dax_ops = {
+	.dm_direct_access = linear_dax_direct_access,
+};
+
 static struct target_type linear_target = {
 	.name   = "linear",
 	.version = {1, 3, 0},
@@ -170,6 +193,7 @@ static struct target_type linear_target = {
 	.prepare_ioctl = linear_prepare_ioctl,
 	.iterate_devices = linear_iterate_devices,
 	.direct_access = linear_direct_access,
+	.dax_ops = &linear_dax_ops,
 };
 
 int __init dm_linear_init(void)
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index c65feeada864..1990e3bd6958 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -2309,6 +2309,13 @@ static long origin_direct_access(struct dm_target *ti, sector_t sector,
 	return -EIO;
 }
 
+static long origin_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
+		void **kaddr, pfn_t *pfn, long size)
+{
+	DMWARN("device does not support dax.");
+	return -EIO;
+}
+
 /*
  * Set the target "max_io_len" field to the minimum of all the snapshots'
  * chunk sizes.
@@ -2357,6 +2364,10 @@ static int origin_iterate_devices(struct dm_target *ti,
 	return fn(ti, o->dev, 0, ti->len, data);
 }
 
+static const struct dm_dax_operations origin_dax_ops = {
+	.dm_direct_access = origin_dax_direct_access,
+};
+
 static struct target_type origin_target = {
 	.name    = "snapshot-origin",
 	.version = {1, 9, 0},
@@ -2369,6 +2380,7 @@ static struct target_type origin_target = {
 	.status  = origin_status,
 	.iterate_devices = origin_iterate_devices,
 	.direct_access = origin_direct_access,
+	.dax_ops = &origin_dax_ops,
 };
 
 static struct target_type snapshot_target = {
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 28193a57bf47..47fb56a6184a 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -331,6 +331,31 @@ static long stripe_direct_access(struct dm_target *ti, sector_t sector,
 	return ret;
 }
 
+static long stripe_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
+		void **kaddr, pfn_t *pfn, long size)
+{
+	struct stripe_c *sc = ti->private;
+	uint32_t stripe;
+	struct block_device *bdev;
+	struct dax_inode *dax_inode;
+	struct blk_dax_ctl dax = {
+		.size = size,
+	};
+	long ret;
+
+	stripe_map_sector(sc, dev_addr >> SECTOR_SHIFT, &stripe, &dax.sector);
+
+	dax.sector += sc->stripe[stripe].physical_start;
+	bdev = sc->stripe[stripe].dev->bdev;
+	dax_inode = sc->stripe[stripe].dev->dax_inode;
+
+	ret = bdev_dax_direct_access(bdev, dax_inode, &dax);
+	*kaddr = dax.addr;
+	*pfn = dax.pfn;
+
+	return ret;
+}
+
 /*
  * Stripe status:
  *
@@ -437,6 +462,10 @@ static void stripe_io_hints(struct dm_target *ti,
 	blk_limits_io_opt(limits, chunk_size * sc->stripes);
 }
 
+static const struct dm_dax_operations stripe_dax_ops = {
+	.dm_direct_access = stripe_dax_direct_access,
+};
+
 static struct target_type stripe_target = {
 	.name   = "striped",
 	.version = {1, 6, 0},
@@ -449,6 +478,7 @@ static struct target_type stripe_target = {
 	.iterate_devices = stripe_iterate_devices,
 	.io_hints = stripe_io_hints,
 	.direct_access = stripe_direct_access,
+	.dax_ops = &stripe_dax_ops,
 };
 
 int __init dm_stripe_init(void)
diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c
index 710ae28fd618..ab072f53cf24 100644
--- a/drivers/md/dm-target.c
+++ b/drivers/md/dm-target.c
@@ -154,6 +154,16 @@ static long io_err_direct_access(struct dm_target *ti, sector_t sector,
 	return -EIO;
 }
 
+static long io_err_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
+				     void **kaddr, pfn_t *pfn, long size)
+{
+	return -EIO;
+}
+
+static const struct dm_dax_operations err_dax_ops = {
+	.dm_direct_access = io_err_dax_direct_access,
+};
+
 static struct target_type error_target = {
 	.name = "error",
 	.version = {1, 5, 0},
@@ -165,6 +175,7 @@ static struct target_type error_target = {
 	.clone_and_map_rq = io_err_clone_and_map_rq,
 	.release_clone_rq = io_err_release_clone_rq,
 	.direct_access = io_err_direct_access,
+	.dax_ops = &err_dax_ops,
 };
 
 int __init dm_target_init(void)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 1b3d9253e92c..5c5eeda0eb0a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -627,6 +627,7 @@ static int open_table_device(struct table_device *td, dev_t dev,
 	}
 
 	td->dm_dev.bdev = bdev;
+	td->dm_dev.dax_inode = dax_get_by_host(bdev->bd_disk->disk_name);
 	return 0;
 }
 
@@ -640,7 +641,9 @@ static void close_table_device(struct table_device *td, struct mapped_device *md
 
 	bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
 	blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
+	put_dax_inode(td->dm_dev.dax_inode);
 	td->dm_dev.bdev = NULL;
+	td->dm_dev.dax_inode = NULL;
 }
 
 static struct table_device *find_table_device(struct list_head *l, dev_t dev,
@@ -907,7 +910,7 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
 EXPORT_SYMBOL_GPL(dm_set_target_max_io_len);
 
 static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
-			       void **kaddr, pfn_t *pfn, long size)
+			       void **kaddr, pfn_t *pfn, long size, bool blk)
 {
 	sector_t sector = dev_addr >> SECTOR_SHIFT;
 	struct dm_table *map;
@@ -926,8 +929,11 @@ static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
 	len = max_io_len(sector, ti) << SECTOR_SHIFT;
 	size = min(len, size);
 
-	if (ti->type->direct_access)
+	if (blk && ti->type->direct_access)
 		ret = ti->type->direct_access(ti, sector, kaddr, pfn, size);
+	else if (ti->type->dax_ops)
+		ret = ti->type->dax_ops->dm_direct_access(ti, dev_addr, kaddr,
+				pfn, size);
 out:
 	dm_put_live_table(md, srcu_idx);
 	return min(ret, size);
@@ -938,7 +944,8 @@ static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
 {
 	struct mapped_device *md = bdev->bd_disk->private_data;
 
-	return __dm_direct_access(md, sector << SECTOR_SHIFT, kaddr, pfn, size);
+	return __dm_direct_access(md, sector << SECTOR_SHIFT, kaddr, pfn, size,
+			true);
 }
 
 static long dm_dax_direct_access(struct dax_inode *dax_inode,
@@ -947,7 +954,8 @@ static long dm_dax_direct_access(struct dax_inode *dax_inode,
 {
 	struct mapped_device *md = dax_inode_get_private(dax_inode);
 
-	return __dm_direct_access(md, dev_addr, kaddr, pfn, size);
+	return __dm_direct_access(md, dev_addr, kaddr, pfn, size,
+			false);
 }
 
 /*
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index ef7962e84444..1b64f412bb45 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -137,12 +137,18 @@ void dm_error(const char *message);
 
 struct dm_dev {
 	struct block_device *bdev;
+	struct dax_inode *dax_inode;
 	fmode_t mode;
 	char name[16];
 };
 
 dev_t dm_get_dev_t(const char *path);
 
+struct dm_dax_operations {
+	long (*dm_direct_access)(struct dm_target *ti, phys_addr_t dev_addr,
+			void **kaddr, pfn_t *pfn, long size);
+};
+
 /*
  * Constructors should call these functions to ensure destination devices
  * are opened/closed correctly.
@@ -180,6 +186,7 @@ struct target_type {
 	dm_iterate_devices_fn iterate_devices;
 	dm_io_hints_fn io_hints;
 	dm_direct_access_fn direct_access;
+	const struct dm_dax_operations *dax_ops;
 
 	/* For internal device-mapper use. */
 	struct list_head list;


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
  2017-01-28  8:36 ` Dan Williams
@ 2017-01-28  8:37   ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:37 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

This is in preparation for removing the ->direct_access() method from
block_device_operations.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/block_dev.c     |    6 ++++--
 fs/super.c         |   32 +++++++++++++++++++++++++++++---
 include/linux/fs.h |    1 +
 3 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index bf4b51a3a412..a73f2388c515 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -806,14 +806,16 @@ int bdev_dax_supported(struct super_block *sb, int blocksize)
 		.sector = 0,
 		.size = PAGE_SIZE,
 	};
-	int err;
+	int err, id;
 
 	if (blocksize != PAGE_SIZE) {
 		vfs_msg(sb, KERN_ERR, "error: unsupported blocksize for dax");
 		return -EINVAL;
 	}
 
-	err = bdev_direct_access(sb->s_bdev, &dax);
+	id = dax_read_lock();
+	err = bdev_dax_direct_access(sb->s_bdev, sb->s_dax, &dax);
+	dax_read_unlock(id);
 	if (err < 0) {
 		switch (err) {
 		case -EOPNOTSUPP:
diff --git a/fs/super.c b/fs/super.c
index ea662b0e5e78..5e64d11c46c1 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -26,6 +26,7 @@
 #include <linux/mount.h>
 #include <linux/security.h>
 #include <linux/writeback.h>		/* for the emergency remount stuff */
+#include <linux/dax.h>
 #include <linux/idr.h>
 #include <linux/mutex.h>
 #include <linux/backing-dev.h>
@@ -1038,9 +1039,17 @@ struct dentry *mount_ns(struct file_system_type *fs_type,
 EXPORT_SYMBOL(mount_ns);
 
 #ifdef CONFIG_BLOCK
+struct mount_bdev_data {
+	struct block_device *bdev;
+	struct dax_inode *dax_inode;
+};
+
 static int set_bdev_super(struct super_block *s, void *data)
 {
-	s->s_bdev = data;
+	struct mount_bdev_data *mb_data = data;
+
+	s->s_bdev = mb_data->bdev;
+	s->s_dax = mb_data->dax_inode;
 	s->s_dev = s->s_bdev->bd_dev;
 
 	/*
@@ -1053,14 +1062,18 @@ static int set_bdev_super(struct super_block *s, void *data)
 
 static int test_bdev_super(struct super_block *s, void *data)
 {
-	return (void *)s->s_bdev == data;
+	struct mount_bdev_data *mb_data = data;
+
+	return s->s_bdev == mb_data->bdev;
 }
 
 struct dentry *mount_bdev(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data,
 	int (*fill_super)(struct super_block *, void *, int))
 {
+	struct mount_bdev_data mb_data;
 	struct block_device *bdev;
+	struct dax_inode *dax_inode;
 	struct super_block *s;
 	fmode_t mode = FMODE_READ | FMODE_EXCL;
 	int error = 0;
@@ -1072,6 +1085,11 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 	if (IS_ERR(bdev))
 		return ERR_CAST(bdev);
 
+	if (IS_ENABLED(CONFIG_FS_DAX))
+		dax_inode = dax_get_by_host(bdev->bd_disk->disk_name);
+	else
+		dax_inode = NULL;
+
 	/*
 	 * once the super is inserted into the list by sget, s_umount
 	 * will protect the lockfs code from trying to start a snapshot
@@ -1083,8 +1101,13 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 		error = -EBUSY;
 		goto error_bdev;
 	}
+
+	mb_data = (struct mount_bdev_data) {
+		.bdev = bdev,
+		.dax_inode = dax_inode,
+	};
 	s = sget(fs_type, test_bdev_super, set_bdev_super, flags | MS_NOSEC,
-		 bdev);
+		 &mb_data);
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
 	if (IS_ERR(s))
 		goto error_s;
@@ -1126,6 +1149,7 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 	error = PTR_ERR(s);
 error_bdev:
 	blkdev_put(bdev, mode);
+	put_dax_inode(dax_inode);
 error:
 	return ERR_PTR(error);
 }
@@ -1133,6 +1157,7 @@ EXPORT_SYMBOL(mount_bdev);
 
 void kill_block_super(struct super_block *sb)
 {
+	struct dax_inode *dax_inode = sb->s_dax;
 	struct block_device *bdev = sb->s_bdev;
 	fmode_t mode = sb->s_mode;
 
@@ -1141,6 +1166,7 @@ void kill_block_super(struct super_block *sb)
 	sync_blockdev(bdev);
 	WARN_ON_ONCE(!(mode & FMODE_EXCL));
 	blkdev_put(bdev, mode | FMODE_EXCL);
+	put_dax_inode(dax_inode);
 }
 
 EXPORT_SYMBOL(kill_block_super);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c930cbc19342..fdad43169146 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1313,6 +1313,7 @@ struct super_block {
 	struct hlist_bl_head	s_anon;		/* anonymous dentries for (nfs) exporting */
 	struct list_head	s_mounts;	/* list of mounts; _not_ for fs use */
 	struct block_device	*s_bdev;
+	struct dax_inode	*s_dax;
 	struct backing_dev_info *s_bdi;
 	struct mtd_info		*s_mtd;
 	struct hlist_node	s_instances;

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
@ 2017-01-28  8:37   ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:37 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

This is in preparation for removing the ->direct_access() method from
block_device_operations.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/block_dev.c     |    6 ++++--
 fs/super.c         |   32 +++++++++++++++++++++++++++++---
 include/linux/fs.h |    1 +
 3 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index bf4b51a3a412..a73f2388c515 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -806,14 +806,16 @@ int bdev_dax_supported(struct super_block *sb, int blocksize)
 		.sector = 0,
 		.size = PAGE_SIZE,
 	};
-	int err;
+	int err, id;
 
 	if (blocksize != PAGE_SIZE) {
 		vfs_msg(sb, KERN_ERR, "error: unsupported blocksize for dax");
 		return -EINVAL;
 	}
 
-	err = bdev_direct_access(sb->s_bdev, &dax);
+	id = dax_read_lock();
+	err = bdev_dax_direct_access(sb->s_bdev, sb->s_dax, &dax);
+	dax_read_unlock(id);
 	if (err < 0) {
 		switch (err) {
 		case -EOPNOTSUPP:
diff --git a/fs/super.c b/fs/super.c
index ea662b0e5e78..5e64d11c46c1 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -26,6 +26,7 @@
 #include <linux/mount.h>
 #include <linux/security.h>
 #include <linux/writeback.h>		/* for the emergency remount stuff */
+#include <linux/dax.h>
 #include <linux/idr.h>
 #include <linux/mutex.h>
 #include <linux/backing-dev.h>
@@ -1038,9 +1039,17 @@ struct dentry *mount_ns(struct file_system_type *fs_type,
 EXPORT_SYMBOL(mount_ns);
 
 #ifdef CONFIG_BLOCK
+struct mount_bdev_data {
+	struct block_device *bdev;
+	struct dax_inode *dax_inode;
+};
+
 static int set_bdev_super(struct super_block *s, void *data)
 {
-	s->s_bdev = data;
+	struct mount_bdev_data *mb_data = data;
+
+	s->s_bdev = mb_data->bdev;
+	s->s_dax = mb_data->dax_inode;
 	s->s_dev = s->s_bdev->bd_dev;
 
 	/*
@@ -1053,14 +1062,18 @@ static int set_bdev_super(struct super_block *s, void *data)
 
 static int test_bdev_super(struct super_block *s, void *data)
 {
-	return (void *)s->s_bdev == data;
+	struct mount_bdev_data *mb_data = data;
+
+	return s->s_bdev == mb_data->bdev;
 }
 
 struct dentry *mount_bdev(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data,
 	int (*fill_super)(struct super_block *, void *, int))
 {
+	struct mount_bdev_data mb_data;
 	struct block_device *bdev;
+	struct dax_inode *dax_inode;
 	struct super_block *s;
 	fmode_t mode = FMODE_READ | FMODE_EXCL;
 	int error = 0;
@@ -1072,6 +1085,11 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 	if (IS_ERR(bdev))
 		return ERR_CAST(bdev);
 
+	if (IS_ENABLED(CONFIG_FS_DAX))
+		dax_inode = dax_get_by_host(bdev->bd_disk->disk_name);
+	else
+		dax_inode = NULL;
+
 	/*
 	 * once the super is inserted into the list by sget, s_umount
 	 * will protect the lockfs code from trying to start a snapshot
@@ -1083,8 +1101,13 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 		error = -EBUSY;
 		goto error_bdev;
 	}
+
+	mb_data = (struct mount_bdev_data) {
+		.bdev = bdev,
+		.dax_inode = dax_inode,
+	};
 	s = sget(fs_type, test_bdev_super, set_bdev_super, flags | MS_NOSEC,
-		 bdev);
+		 &mb_data);
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
 	if (IS_ERR(s))
 		goto error_s;
@@ -1126,6 +1149,7 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 	error = PTR_ERR(s);
 error_bdev:
 	blkdev_put(bdev, mode);
+	put_dax_inode(dax_inode);
 error:
 	return ERR_PTR(error);
 }
@@ -1133,6 +1157,7 @@ EXPORT_SYMBOL(mount_bdev);
 
 void kill_block_super(struct super_block *sb)
 {
+	struct dax_inode *dax_inode = sb->s_dax;
 	struct block_device *bdev = sb->s_bdev;
 	fmode_t mode = sb->s_mode;
 
@@ -1141,6 +1166,7 @@ void kill_block_super(struct super_block *sb)
 	sync_blockdev(bdev);
 	WARN_ON_ONCE(!(mode & FMODE_EXCL));
 	blkdev_put(bdev, mode | FMODE_EXCL);
+	put_dax_inode(dax_inode);
 }
 
 EXPORT_SYMBOL(kill_block_super);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c930cbc19342..fdad43169146 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1313,6 +1313,7 @@ struct super_block {
 	struct hlist_bl_head	s_anon;		/* anonymous dentries for (nfs) exporting */
 	struct list_head	s_mounts;	/* list of mounts; _not_ for fs use */
 	struct block_device	*s_bdev;
+	struct dax_inode	*s_dax;
 	struct backing_dev_info *s_bdi;
 	struct mtd_info		*s_mtd;
 	struct hlist_node	s_instances;


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 14/17] ext2, ext4, xfs: retrieve dax_inode through iomap operations
  2017-01-28  8:36 ` Dan Williams
@ 2017-01-28  8:37   ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:37 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

In preparation for converting fs/dax.c to use bdev_dax_direct_access()
instead of bdev_direct_access(), add the plumbing to retrieve the
dax_inode determined at mount through ->iomap_begin.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/ext2/inode.c       |    1 +
 fs/ext4/inode.c       |    1 +
 fs/xfs/xfs_aops.c     |   13 +++++++++++++
 fs/xfs/xfs_aops.h     |    1 +
 fs/xfs/xfs_buf.h      |    1 +
 fs/xfs/xfs_iomap.c    |    1 +
 fs/xfs/xfs_super.c    |    3 +++
 include/linux/iomap.h |    1 +
 8 files changed, 22 insertions(+)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index f073bfca694b..c83f84748ec9 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -813,6 +813,7 @@ static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 
 	iomap->flags = 0;
 	iomap->bdev = inode->i_sb->s_bdev;
+	iomap->dax_inode = inode->i_sb->s_dax;
 	iomap->offset = (u64)first_block << blkbits;
 
 	if (ret == 0) {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 88d57af1b516..ae6fa6a78d0d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3344,6 +3344,7 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 
 	iomap->flags = 0;
 	iomap->bdev = inode->i_sb->s_bdev;
+	iomap->dax_inode = inode->i_sb->s_dax;
 	iomap->offset = first_block << blkbits;
 
 	if (ret == 0) {
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 631e7c0e0a29..7d22938a4d8b 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -80,6 +80,19 @@ xfs_find_bdev_for_inode(
 		return mp->m_ddev_targp->bt_bdev;
 }
 
+struct dax_inode *
+xfs_find_dax_for_inode(
+	struct inode		*inode)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+
+	if (XFS_IS_REALTIME_INODE(ip))
+		return NULL;
+	else
+		return mp->m_ddev_targp->bt_dax;
+}
+
 /*
  * We're now finished for good with this page.  Update the page state via the
  * associated buffer_heads, paying attention to the start and end offsets that
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index cc174ec6c2fd..e5b65f436acf 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -59,5 +59,6 @@ int	xfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, size_t size);
 
 extern void xfs_count_page_state(struct page *, int *, int *);
 extern struct block_device *xfs_find_bdev_for_inode(struct inode *);
+extern struct dax_inode *xfs_find_dax_for_inode(struct inode *);
 
 #endif /* __XFS_AOPS_H__ */
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 8a9d3a9599f0..1ff83f398649 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -109,6 +109,7 @@ typedef unsigned int xfs_buf_flags_t;
 typedef struct xfs_buftarg {
 	dev_t			bt_dev;
 	struct block_device	*bt_bdev;
+	struct dax_inode	*bt_dax;
 	struct backing_dev_info	*bt_bdi;
 	struct xfs_mount	*bt_mount;
 	unsigned int		bt_meta_sectorsize;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 0d147428971e..1d08bd2433d5 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -69,6 +69,7 @@ xfs_bmbt_to_iomap(
 	iomap->offset = XFS_FSB_TO_B(mp, imap->br_startoff);
 	iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount);
 	iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip));
+	iomap->dax_inode = xfs_find_dax_for_inode(VFS_I(ip));
 }
 
 xfs_extlen_t
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index eecbaac08eba..1a99013a0701 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -774,6 +774,9 @@ xfs_open_devices(
 	if (!mp->m_ddev_targp)
 		goto out_close_rtdev;
 
+	/* associate dax inode for filesystem-dax */
+	mp->m_ddev_targp->bt_dax = mp->m_super->s_dax;
+
 	if (rtdev) {
 		mp->m_rtdev_targp = xfs_alloc_buftarg(mp, rtdev);
 		if (!mp->m_rtdev_targp)
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index a4c94b86401e..01e265e7cf55 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -41,6 +41,7 @@ struct iomap {
 	u16			type;	/* type of mapping */
 	u16			flags;	/* flags for mapping */
 	struct block_device	*bdev;	/* block device for I/O */
+	struct dax_inode	*dax_inode; /* dax_inode for dax operations */
 };
 
 /*

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 14/17] ext2, ext4, xfs: retrieve dax_inode through iomap operations
@ 2017-01-28  8:37   ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:37 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

In preparation for converting fs/dax.c to use bdev_dax_direct_access()
instead of bdev_direct_access(), add the plumbing to retrieve the
dax_inode determined at mount through ->iomap_begin.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/ext2/inode.c       |    1 +
 fs/ext4/inode.c       |    1 +
 fs/xfs/xfs_aops.c     |   13 +++++++++++++
 fs/xfs/xfs_aops.h     |    1 +
 fs/xfs/xfs_buf.h      |    1 +
 fs/xfs/xfs_iomap.c    |    1 +
 fs/xfs/xfs_super.c    |    3 +++
 include/linux/iomap.h |    1 +
 8 files changed, 22 insertions(+)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index f073bfca694b..c83f84748ec9 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -813,6 +813,7 @@ static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 
 	iomap->flags = 0;
 	iomap->bdev = inode->i_sb->s_bdev;
+	iomap->dax_inode = inode->i_sb->s_dax;
 	iomap->offset = (u64)first_block << blkbits;
 
 	if (ret == 0) {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 88d57af1b516..ae6fa6a78d0d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3344,6 +3344,7 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 
 	iomap->flags = 0;
 	iomap->bdev = inode->i_sb->s_bdev;
+	iomap->dax_inode = inode->i_sb->s_dax;
 	iomap->offset = first_block << blkbits;
 
 	if (ret == 0) {
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 631e7c0e0a29..7d22938a4d8b 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -80,6 +80,19 @@ xfs_find_bdev_for_inode(
 		return mp->m_ddev_targp->bt_bdev;
 }
 
+struct dax_inode *
+xfs_find_dax_for_inode(
+	struct inode		*inode)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+
+	if (XFS_IS_REALTIME_INODE(ip))
+		return NULL;
+	else
+		return mp->m_ddev_targp->bt_dax;
+}
+
 /*
  * We're now finished for good with this page.  Update the page state via the
  * associated buffer_heads, paying attention to the start and end offsets that
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index cc174ec6c2fd..e5b65f436acf 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -59,5 +59,6 @@ int	xfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, size_t size);
 
 extern void xfs_count_page_state(struct page *, int *, int *);
 extern struct block_device *xfs_find_bdev_for_inode(struct inode *);
+extern struct dax_inode *xfs_find_dax_for_inode(struct inode *);
 
 #endif /* __XFS_AOPS_H__ */
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 8a9d3a9599f0..1ff83f398649 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -109,6 +109,7 @@ typedef unsigned int xfs_buf_flags_t;
 typedef struct xfs_buftarg {
 	dev_t			bt_dev;
 	struct block_device	*bt_bdev;
+	struct dax_inode	*bt_dax;
 	struct backing_dev_info	*bt_bdi;
 	struct xfs_mount	*bt_mount;
 	unsigned int		bt_meta_sectorsize;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 0d147428971e..1d08bd2433d5 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -69,6 +69,7 @@ xfs_bmbt_to_iomap(
 	iomap->offset = XFS_FSB_TO_B(mp, imap->br_startoff);
 	iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount);
 	iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip));
+	iomap->dax_inode = xfs_find_dax_for_inode(VFS_I(ip));
 }
 
 xfs_extlen_t
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index eecbaac08eba..1a99013a0701 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -774,6 +774,9 @@ xfs_open_devices(
 	if (!mp->m_ddev_targp)
 		goto out_close_rtdev;
 
+	/* associate dax inode for filesystem-dax */
+	mp->m_ddev_targp->bt_dax = mp->m_super->s_dax;
+
 	if (rtdev) {
 		mp->m_rtdev_targp = xfs_alloc_buftarg(mp, rtdev);
 		if (!mp->m_rtdev_targp)
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index a4c94b86401e..01e265e7cf55 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -41,6 +41,7 @@ struct iomap {
 	u16			type;	/* type of mapping */
 	u16			flags;	/* flags for mapping */
 	struct block_device	*bdev;	/* block device for I/O */
+	struct dax_inode	*dax_inode; /* dax_inode for dax operations */
 };
 
 /*


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 15/17] Revert "block: use DAX for partition table reads"
  2017-01-28  8:36 ` Dan Williams
@ 2017-01-28  8:37   ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:37 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

commit d1a5f2b4d8a1 ("block: use DAX for partition table reads") was
part of a stalled effort to allow dax mappings of block devices. Since
then the device-dax mechanism has filled the role of dax-mapping static
device ranges.

Now that we are moving ->direct_access() from a block_device operation
to a dax_inode operation we would need block devices to map and carry
their own dax_inode reference.

Unless / until we decide to revive dax mapping of raw block devices
through the dax_inode scheme, there is no need to carry
read_dax_sector(). Its removal in turn allows for the removal of
bdev_direct_access() and should have been included in commit
223757016837 ("block_dev: remove DAX leftovers").

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/partition-generic.c |   17 ++---------------
 fs/dax.c                  |   20 --------------------
 include/linux/dax.h       |    6 ------
 3 files changed, 2 insertions(+), 41 deletions(-)

diff --git a/block/partition-generic.c b/block/partition-generic.c
index 7afb9907821f..5dfac337b0f2 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -16,7 +16,6 @@
 #include <linux/kmod.h>
 #include <linux/ctype.h>
 #include <linux/genhd.h>
-#include <linux/dax.h>
 #include <linux/blktrace_api.h>
 
 #include "partitions/check.h"
@@ -631,24 +630,12 @@ int invalidate_partitions(struct gendisk *disk, struct block_device *bdev)
 	return 0;
 }
 
-static struct page *read_pagecache_sector(struct block_device *bdev, sector_t n)
-{
-	struct address_space *mapping = bdev->bd_inode->i_mapping;
-
-	return read_mapping_page(mapping, (pgoff_t)(n >> (PAGE_SHIFT-9)),
-				 NULL);
-}
-
 unsigned char *read_dev_sector(struct block_device *bdev, sector_t n, Sector *p)
 {
+	struct address_space *mapping = bdev->bd_inode->i_mapping;
 	struct page *page;
 
-	/* don't populate page cache for dax capable devices */
-	if (IS_DAX(bdev->bd_inode))
-		page = read_dax_sector(bdev, n);
-	else
-		page = read_pagecache_sector(bdev, n);
-
+	page = read_mapping_page(mapping, (pgoff_t)(n >> (PAGE_SHIFT-9)), NULL);
 	if (!IS_ERR(page)) {
 		if (PageError(page))
 			goto fail;
diff --git a/fs/dax.c b/fs/dax.c
index ddcddfeaa03b..a990211c8a3d 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -97,26 +97,6 @@ static int dax_is_empty_entry(void *entry)
 	return (unsigned long)entry & RADIX_DAX_EMPTY;
 }
 
-struct page *read_dax_sector(struct block_device *bdev, sector_t n)
-{
-	struct page *page = alloc_pages(GFP_KERNEL, 0);
-	struct blk_dax_ctl dax = {
-		.size = PAGE_SIZE,
-		.sector = n & ~((((int) PAGE_SIZE) / 512) - 1),
-	};
-	long rc;
-
-	if (!page)
-		return ERR_PTR(-ENOMEM);
-
-	rc = dax_map_atomic(bdev, &dax);
-	if (rc < 0)
-		return ERR_PTR(rc);
-	memcpy_from_pmem(page_address(page), dax.addr, PAGE_SIZE);
-	dax_unmap_atomic(bdev, &dax);
-	return page;
-}
-
 /*
  * DAX radix tree locking
  */
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 2ef8e18e2587..10b742af3d56 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -65,15 +65,9 @@ void dax_wake_mapping_entry_waiter(struct address_space *mapping,
 		pgoff_t index, void *entry, bool wake_all);
 
 #ifdef CONFIG_FS_DAX
-struct page *read_dax_sector(struct block_device *bdev, sector_t n);
 int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
 		unsigned int offset, unsigned int length);
 #else
-static inline struct page *read_dax_sector(struct block_device *bdev,
-		sector_t n)
-{
-	return ERR_PTR(-ENXIO);
-}
 static inline int __dax_zero_page_range(struct block_device *bdev,
 		sector_t sector, unsigned int offset, unsigned int length)
 {

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 15/17] Revert "block: use DAX for partition table reads"
@ 2017-01-28  8:37   ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:37 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

commit d1a5f2b4d8a1 ("block: use DAX for partition table reads") was
part of a stalled effort to allow dax mappings of block devices. Since
then the device-dax mechanism has filled the role of dax-mapping static
device ranges.

Now that we are moving ->direct_access() from a block_device operation
to a dax_inode operation we would need block devices to map and carry
their own dax_inode reference.

Unless / until we decide to revive dax mapping of raw block devices
through the dax_inode scheme, there is no need to carry
read_dax_sector(). Its removal in turn allows for the removal of
bdev_direct_access() and should have been included in commit
223757016837 ("block_dev: remove DAX leftovers").

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/partition-generic.c |   17 ++---------------
 fs/dax.c                  |   20 --------------------
 include/linux/dax.h       |    6 ------
 3 files changed, 2 insertions(+), 41 deletions(-)

diff --git a/block/partition-generic.c b/block/partition-generic.c
index 7afb9907821f..5dfac337b0f2 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -16,7 +16,6 @@
 #include <linux/kmod.h>
 #include <linux/ctype.h>
 #include <linux/genhd.h>
-#include <linux/dax.h>
 #include <linux/blktrace_api.h>
 
 #include "partitions/check.h"
@@ -631,24 +630,12 @@ int invalidate_partitions(struct gendisk *disk, struct block_device *bdev)
 	return 0;
 }
 
-static struct page *read_pagecache_sector(struct block_device *bdev, sector_t n)
-{
-	struct address_space *mapping = bdev->bd_inode->i_mapping;
-
-	return read_mapping_page(mapping, (pgoff_t)(n >> (PAGE_SHIFT-9)),
-				 NULL);
-}
-
 unsigned char *read_dev_sector(struct block_device *bdev, sector_t n, Sector *p)
 {
+	struct address_space *mapping = bdev->bd_inode->i_mapping;
 	struct page *page;
 
-	/* don't populate page cache for dax capable devices */
-	if (IS_DAX(bdev->bd_inode))
-		page = read_dax_sector(bdev, n);
-	else
-		page = read_pagecache_sector(bdev, n);
-
+	page = read_mapping_page(mapping, (pgoff_t)(n >> (PAGE_SHIFT-9)), NULL);
 	if (!IS_ERR(page)) {
 		if (PageError(page))
 			goto fail;
diff --git a/fs/dax.c b/fs/dax.c
index ddcddfeaa03b..a990211c8a3d 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -97,26 +97,6 @@ static int dax_is_empty_entry(void *entry)
 	return (unsigned long)entry & RADIX_DAX_EMPTY;
 }
 
-struct page *read_dax_sector(struct block_device *bdev, sector_t n)
-{
-	struct page *page = alloc_pages(GFP_KERNEL, 0);
-	struct blk_dax_ctl dax = {
-		.size = PAGE_SIZE,
-		.sector = n & ~((((int) PAGE_SIZE) / 512) - 1),
-	};
-	long rc;
-
-	if (!page)
-		return ERR_PTR(-ENOMEM);
-
-	rc = dax_map_atomic(bdev, &dax);
-	if (rc < 0)
-		return ERR_PTR(rc);
-	memcpy_from_pmem(page_address(page), dax.addr, PAGE_SIZE);
-	dax_unmap_atomic(bdev, &dax);
-	return page;
-}
-
 /*
  * DAX radix tree locking
  */
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 2ef8e18e2587..10b742af3d56 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -65,15 +65,9 @@ void dax_wake_mapping_entry_waiter(struct address_space *mapping,
 		pgoff_t index, void *entry, bool wake_all);
 
 #ifdef CONFIG_FS_DAX
-struct page *read_dax_sector(struct block_device *bdev, sector_t n);
 int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
 		unsigned int offset, unsigned int length);
 #else
-static inline struct page *read_dax_sector(struct block_device *bdev,
-		sector_t n)
-{
-	return ERR_PTR(-ENXIO);
-}
 static inline int __dax_zero_page_range(struct block_device *bdev,
 		sector_t sector, unsigned int offset, unsigned int length)
 {


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 16/17] fs, dax: convert filesystem-dax to bdev_dax_direct_access
  2017-01-28  8:36 ` Dan Williams
@ 2017-01-28  8:37   ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:37 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

Now that a dax_inode is plumbed through all dax-capable drivers we can
switch from block_device_operations to dax_operations for invoking
->direct_access.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/dax.c            |  143 +++++++++++++++++++++++++++------------------------
 fs/iomap.c          |    3 +
 include/linux/dax.h |    6 +-
 3 files changed, 82 insertions(+), 70 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index a990211c8a3d..07b36a26db06 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -51,32 +51,6 @@ static int __init init_dax_wait_table(void)
 }
 fs_initcall(init_dax_wait_table);
 
-static long dax_map_atomic(struct block_device *bdev, struct blk_dax_ctl *dax)
-{
-	struct request_queue *q = bdev->bd_queue;
-	long rc = -EIO;
-
-	dax->addr = ERR_PTR(-EIO);
-	if (blk_queue_enter(q, true) != 0)
-		return rc;
-
-	rc = bdev_direct_access(bdev, dax);
-	if (rc < 0) {
-		dax->addr = ERR_PTR(rc);
-		blk_queue_exit(q);
-		return rc;
-	}
-	return rc;
-}
-
-static void dax_unmap_atomic(struct block_device *bdev,
-		const struct blk_dax_ctl *dax)
-{
-	if (IS_ERR(dax->addr))
-		return;
-	blk_queue_exit(bdev->bd_queue);
-}
-
 static int dax_is_pmd_entry(void *entry)
 {
 	return (unsigned long)entry & RADIX_DAX_PMD;
@@ -549,21 +523,28 @@ static int dax_load_hole(struct address_space *mapping, void **entry,
 	return ret;
 }
 
-static int copy_user_dax(struct block_device *bdev, sector_t sector, size_t size,
-		struct page *to, unsigned long vaddr)
+static int copy_user_dax(struct block_device *bdev, struct dax_inode *dax_inode,
+		sector_t sector, size_t size, struct page *to,
+		unsigned long vaddr)
 {
 	struct blk_dax_ctl dax = {
 		.sector = sector,
 		.size = size,
 	};
 	void *vto;
+	long rc;
+	int id;
 
-	if (dax_map_atomic(bdev, &dax) < 0)
-		return PTR_ERR(dax.addr);
+	id = dax_read_lock();
+	rc = bdev_dax_direct_access(bdev, dax_inode, &dax);
+	if (rc < 0) {
+		dax_read_unlock(id);
+		return rc;
+	}
 	vto = kmap_atomic(to);
 	copy_user_page(vto, (void __force *)dax.addr, vaddr, to);
 	kunmap_atomic(vto);
-	dax_unmap_atomic(bdev, &dax);
+	dax_read_unlock(id);
 	return 0;
 }
 
@@ -731,12 +712,13 @@ static void dax_mapping_entry_mkclean(struct address_space *mapping,
 }
 
 static int dax_writeback_one(struct block_device *bdev,
-		struct address_space *mapping, pgoff_t index, void *entry)
+		struct dax_inode *dax_inode, struct address_space *mapping,
+		pgoff_t index, void *entry)
 {
 	struct radix_tree_root *page_tree = &mapping->page_tree;
 	struct blk_dax_ctl dax;
 	void *entry2, **slot;
-	int ret = 0;
+	int ret = 0, id;
 
 	/*
 	 * A page got tagged dirty in DAX mapping? Something is seriously
@@ -789,18 +771,20 @@ static int dax_writeback_one(struct block_device *bdev,
 	dax.size = PAGE_SIZE << dax_radix_order(entry);
 
 	/*
-	 * We cannot hold tree_lock while calling dax_map_atomic() because it
-	 * eventually calls cond_resched().
+	 * bdev_dax_direct_access() may sleep, so cannot hold tree_lock
+	 * over its invocation.
 	 */
-	ret = dax_map_atomic(bdev, &dax);
+	id = dax_read_lock();
+	ret = bdev_dax_direct_access(bdev, dax_inode, &dax);
 	if (ret < 0) {
+		dax_read_unlock(id);
 		put_locked_mapping_entry(mapping, index, entry);
 		return ret;
 	}
 
 	if (WARN_ON_ONCE(ret < dax.size)) {
 		ret = -EIO;
-		goto unmap;
+		goto dax_unlock;
 	}
 
 	dax_mapping_entry_mkclean(mapping, index, pfn_t_to_pfn(dax.pfn));
@@ -814,8 +798,8 @@ static int dax_writeback_one(struct block_device *bdev,
 	spin_lock_irq(&mapping->tree_lock);
 	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_DIRTY);
 	spin_unlock_irq(&mapping->tree_lock);
- unmap:
-	dax_unmap_atomic(bdev, &dax);
+ dax_unlock:
+	dax_read_unlock(id);
 	put_locked_mapping_entry(mapping, index, entry);
 	return ret;
 
@@ -836,6 +820,7 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 	struct inode *inode = mapping->host;
 	pgoff_t start_index, end_index;
 	pgoff_t indices[PAGEVEC_SIZE];
+	struct dax_inode *dax_inode;
 	struct pagevec pvec;
 	bool done = false;
 	int i, ret = 0;
@@ -846,6 +831,10 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 	if (!mapping->nrexceptional || wbc->sync_mode != WB_SYNC_ALL)
 		return 0;
 
+	dax_inode = dax_get_by_host(bdev->bd_disk->disk_name);
+	if (!dax_inode)
+		return -EIO;
+
 	start_index = wbc->range_start >> PAGE_SHIFT;
 	end_index = wbc->range_end >> PAGE_SHIFT;
 
@@ -866,19 +855,23 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 				break;
 			}
 
-			ret = dax_writeback_one(bdev, mapping, indices[i],
-					pvec.pages[i]);
-			if (ret < 0)
+			ret = dax_writeback_one(bdev, dax_inode, mapping,
+					indices[i], pvec.pages[i]);
+			if (ret < 0) {
+				put_dax_inode(dax_inode);
 				return ret;
+			}
 		}
 	}
+	put_dax_inode(dax_inode);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
 
 static int dax_insert_mapping(struct address_space *mapping,
-		struct block_device *bdev, sector_t sector, size_t size,
-		void **entryp, struct vm_area_struct *vma, struct vm_fault *vmf)
+		struct block_device *bdev, struct dax_inode *dax_inode,
+		sector_t sector, size_t size, void **entryp,
+		struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	unsigned long vaddr = vmf->address;
 	struct blk_dax_ctl dax = {
@@ -887,10 +880,15 @@ static int dax_insert_mapping(struct address_space *mapping,
 	};
 	void *ret;
 	void *entry = *entryp;
+	int id, rc;
 
-	if (dax_map_atomic(bdev, &dax) < 0)
-		return PTR_ERR(dax.addr);
-	dax_unmap_atomic(bdev, &dax);
+	id = dax_read_lock();
+	rc = bdev_dax_direct_access(bdev, dax_inode, &dax);
+	if (rc < 0) {
+		dax_read_unlock(id);
+		return rc;
+	}
+	dax_read_unlock(id);
 
 	ret = dax_insert_mapping_entry(mapping, vmf, entry, dax.sector, 0);
 	if (IS_ERR(ret))
@@ -947,7 +945,8 @@ static bool dax_range_is_aligned(struct block_device *bdev,
 	return true;
 }
 
-int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
+int __dax_zero_page_range(struct block_device *bdev,
+		struct dax_inode *dax_inode, sector_t sector,
 		unsigned int offset, unsigned int length)
 {
 	struct blk_dax_ctl dax = {
@@ -961,10 +960,16 @@ int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
 		return blkdev_issue_zeroout(bdev, start_sector,
 				length >> 9, GFP_NOFS, true);
 	} else {
-		if (dax_map_atomic(bdev, &dax) < 0)
-			return PTR_ERR(dax.addr);
+		int rc, id;
+
+		id = dax_read_lock();
+		rc = bdev_dax_direct_access(bdev, dax_inode, &dax);
+		if (rc < 0) {
+			dax_read_unlock(id);
+			return rc;
+		}
 		clear_pmem(dax.addr + offset, length);
-		dax_unmap_atomic(bdev, &dax);
+		dax_read_unlock(id);
 	}
 	return 0;
 }
@@ -983,6 +988,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	struct iov_iter *iter = data;
 	loff_t end = pos + length, done = 0;
 	ssize_t ret = 0;
+	int id;
 
 	if (iov_iter_rw(iter) == READ) {
 		end = min(end, i_size_read(inode));
@@ -1007,6 +1013,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 					      (end - 1) >> PAGE_SHIFT);
 	}
 
+	id = dax_read_lock();
 	while (pos < end) {
 		unsigned offset = pos & (PAGE_SIZE - 1);
 		struct blk_dax_ctl dax = { 0 };
@@ -1014,7 +1021,8 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		dax.sector = dax_iomap_sector(iomap, pos);
 		dax.size = (length + offset + PAGE_SIZE - 1) & PAGE_MASK;
-		map_len = dax_map_atomic(iomap->bdev, &dax);
+		map_len = bdev_dax_direct_access(iomap->bdev, iomap->dax_inode,
+				&dax);
 		if (map_len < 0) {
 			ret = map_len;
 			break;
@@ -1029,7 +1037,6 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 			map_len = copy_from_iter_pmem(dax.addr, map_len, iter);
 		else
 			map_len = copy_to_iter(dax.addr, map_len, iter);
-		dax_unmap_atomic(iomap->bdev, &dax);
 		if (map_len <= 0) {
 			ret = map_len ? map_len : -EFAULT;
 			break;
@@ -1039,6 +1046,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		length -= map_len;
 		done += map_len;
 	}
+	dax_read_unlock(id);
 
 	return done ? done : ret;
 }
@@ -1151,8 +1159,8 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 			clear_user_highpage(vmf->cow_page, vaddr);
 			break;
 		case IOMAP_MAPPED:
-			error = copy_user_dax(iomap.bdev, sector, PAGE_SIZE,
-					vmf->cow_page, vaddr);
+			error = copy_user_dax(iomap.bdev, iomap.dax_inode,
+					sector, PAGE_SIZE, vmf->cow_page, vaddr);
 			break;
 		default:
 			WARN_ON_ONCE(1);
@@ -1177,8 +1185,8 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 			mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
 		}
-		error = dax_insert_mapping(mapping, iomap.bdev, sector,
-				PAGE_SIZE, &entry, vma, vmf);
+		error = dax_insert_mapping(mapping, iomap.bdev, iomap.dax_inode,
+				sector, PAGE_SIZE, &entry, vma, vmf);
 		/* -EBUSY is fine, somebody else faulted on the same PTE */
 		if (error == -EBUSY)
 			error = 0;
@@ -1231,23 +1239,24 @@ static int dax_pmd_insert_mapping(struct vm_area_struct *vma, pmd_t *pmd,
 {
 	struct address_space *mapping = vma->vm_file->f_mapping;
 	struct block_device *bdev = iomap->bdev;
+	struct dax_inode *dax_inode = iomap->dax_inode;
 	struct blk_dax_ctl dax = {
 		.sector = dax_iomap_sector(iomap, pos),
 		.size = PMD_SIZE,
 	};
-	long length = dax_map_atomic(bdev, &dax);
+	long length;
 	void *ret;
+	int id;
 
-	if (length < 0) /* dax_map_atomic() failed */
-		return VM_FAULT_FALLBACK;
+	id = dax_read_lock();
+	length = bdev_dax_direct_access(bdev, dax_inode, &dax);
 	if (length < PMD_SIZE)
-		goto unmap_fallback;
+		goto unlock_fallback;
 	if (pfn_t_to_pfn(dax.pfn) & PG_PMD_COLOUR)
-		goto unmap_fallback;
+		goto unlock_fallback;
 	if (!pfn_t_devmap(dax.pfn))
-		goto unmap_fallback;
-
-	dax_unmap_atomic(bdev, &dax);
+		goto unlock_fallback;
+	dax_read_unlock(id);
 
 	ret = dax_insert_mapping_entry(mapping, vmf, *entryp, dax.sector,
 			RADIX_DAX_PMD);
@@ -1257,8 +1266,8 @@ static int dax_pmd_insert_mapping(struct vm_area_struct *vma, pmd_t *pmd,
 
 	return vmf_insert_pfn_pmd(vma, address, pmd, dax.pfn, write);
 
- unmap_fallback:
-	dax_unmap_atomic(bdev, &dax);
+ unlock_fallback:
+	dax_read_unlock(id);
 	return VM_FAULT_FALLBACK;
 }
 
diff --git a/fs/iomap.c b/fs/iomap.c
index 354a123f170e..279d18cc1cb6 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -355,7 +355,8 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
 	sector_t sector = iomap->blkno +
 		(((pos & ~(PAGE_SIZE - 1)) - iomap->offset) >> 9);
 
-	return __dax_zero_page_range(iomap->bdev, sector, offset, bytes);
+	return __dax_zero_page_range(iomap->bdev, iomap->dax_inode, sector,
+			offset, bytes);
 }
 
 static loff_t
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 10b742af3d56..b8e8e7896452 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -65,11 +65,13 @@ void dax_wake_mapping_entry_waiter(struct address_space *mapping,
 		pgoff_t index, void *entry, bool wake_all);
 
 #ifdef CONFIG_FS_DAX
-int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
+int __dax_zero_page_range(struct block_device *bdev,
+		struct dax_inode *dax_inode, sector_t sector,
 		unsigned int offset, unsigned int length);
 #else
 static inline int __dax_zero_page_range(struct block_device *bdev,
-		sector_t sector, unsigned int offset, unsigned int length)
+		struct dax_inode *dax_inode, sector_t sector,
+		unsigned int offset, unsigned int length)
 {
 	return -ENXIO;
 }

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 16/17] fs, dax: convert filesystem-dax to bdev_dax_direct_access
@ 2017-01-28  8:37   ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:37 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

Now that a dax_inode is plumbed through all dax-capable drivers we can
switch from block_device_operations to dax_operations for invoking
->direct_access.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/dax.c            |  143 +++++++++++++++++++++++++++------------------------
 fs/iomap.c          |    3 +
 include/linux/dax.h |    6 +-
 3 files changed, 82 insertions(+), 70 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index a990211c8a3d..07b36a26db06 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -51,32 +51,6 @@ static int __init init_dax_wait_table(void)
 }
 fs_initcall(init_dax_wait_table);
 
-static long dax_map_atomic(struct block_device *bdev, struct blk_dax_ctl *dax)
-{
-	struct request_queue *q = bdev->bd_queue;
-	long rc = -EIO;
-
-	dax->addr = ERR_PTR(-EIO);
-	if (blk_queue_enter(q, true) != 0)
-		return rc;
-
-	rc = bdev_direct_access(bdev, dax);
-	if (rc < 0) {
-		dax->addr = ERR_PTR(rc);
-		blk_queue_exit(q);
-		return rc;
-	}
-	return rc;
-}
-
-static void dax_unmap_atomic(struct block_device *bdev,
-		const struct blk_dax_ctl *dax)
-{
-	if (IS_ERR(dax->addr))
-		return;
-	blk_queue_exit(bdev->bd_queue);
-}
-
 static int dax_is_pmd_entry(void *entry)
 {
 	return (unsigned long)entry & RADIX_DAX_PMD;
@@ -549,21 +523,28 @@ static int dax_load_hole(struct address_space *mapping, void **entry,
 	return ret;
 }
 
-static int copy_user_dax(struct block_device *bdev, sector_t sector, size_t size,
-		struct page *to, unsigned long vaddr)
+static int copy_user_dax(struct block_device *bdev, struct dax_inode *dax_inode,
+		sector_t sector, size_t size, struct page *to,
+		unsigned long vaddr)
 {
 	struct blk_dax_ctl dax = {
 		.sector = sector,
 		.size = size,
 	};
 	void *vto;
+	long rc;
+	int id;
 
-	if (dax_map_atomic(bdev, &dax) < 0)
-		return PTR_ERR(dax.addr);
+	id = dax_read_lock();
+	rc = bdev_dax_direct_access(bdev, dax_inode, &dax);
+	if (rc < 0) {
+		dax_read_unlock(id);
+		return rc;
+	}
 	vto = kmap_atomic(to);
 	copy_user_page(vto, (void __force *)dax.addr, vaddr, to);
 	kunmap_atomic(vto);
-	dax_unmap_atomic(bdev, &dax);
+	dax_read_unlock(id);
 	return 0;
 }
 
@@ -731,12 +712,13 @@ static void dax_mapping_entry_mkclean(struct address_space *mapping,
 }
 
 static int dax_writeback_one(struct block_device *bdev,
-		struct address_space *mapping, pgoff_t index, void *entry)
+		struct dax_inode *dax_inode, struct address_space *mapping,
+		pgoff_t index, void *entry)
 {
 	struct radix_tree_root *page_tree = &mapping->page_tree;
 	struct blk_dax_ctl dax;
 	void *entry2, **slot;
-	int ret = 0;
+	int ret = 0, id;
 
 	/*
 	 * A page got tagged dirty in DAX mapping? Something is seriously
@@ -789,18 +771,20 @@ static int dax_writeback_one(struct block_device *bdev,
 	dax.size = PAGE_SIZE << dax_radix_order(entry);
 
 	/*
-	 * We cannot hold tree_lock while calling dax_map_atomic() because it
-	 * eventually calls cond_resched().
+	 * bdev_dax_direct_access() may sleep, so cannot hold tree_lock
+	 * over its invocation.
 	 */
-	ret = dax_map_atomic(bdev, &dax);
+	id = dax_read_lock();
+	ret = bdev_dax_direct_access(bdev, dax_inode, &dax);
 	if (ret < 0) {
+		dax_read_unlock(id);
 		put_locked_mapping_entry(mapping, index, entry);
 		return ret;
 	}
 
 	if (WARN_ON_ONCE(ret < dax.size)) {
 		ret = -EIO;
-		goto unmap;
+		goto dax_unlock;
 	}
 
 	dax_mapping_entry_mkclean(mapping, index, pfn_t_to_pfn(dax.pfn));
@@ -814,8 +798,8 @@ static int dax_writeback_one(struct block_device *bdev,
 	spin_lock_irq(&mapping->tree_lock);
 	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_DIRTY);
 	spin_unlock_irq(&mapping->tree_lock);
- unmap:
-	dax_unmap_atomic(bdev, &dax);
+ dax_unlock:
+	dax_read_unlock(id);
 	put_locked_mapping_entry(mapping, index, entry);
 	return ret;
 
@@ -836,6 +820,7 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 	struct inode *inode = mapping->host;
 	pgoff_t start_index, end_index;
 	pgoff_t indices[PAGEVEC_SIZE];
+	struct dax_inode *dax_inode;
 	struct pagevec pvec;
 	bool done = false;
 	int i, ret = 0;
@@ -846,6 +831,10 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 	if (!mapping->nrexceptional || wbc->sync_mode != WB_SYNC_ALL)
 		return 0;
 
+	dax_inode = dax_get_by_host(bdev->bd_disk->disk_name);
+	if (!dax_inode)
+		return -EIO;
+
 	start_index = wbc->range_start >> PAGE_SHIFT;
 	end_index = wbc->range_end >> PAGE_SHIFT;
 
@@ -866,19 +855,23 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 				break;
 			}
 
-			ret = dax_writeback_one(bdev, mapping, indices[i],
-					pvec.pages[i]);
-			if (ret < 0)
+			ret = dax_writeback_one(bdev, dax_inode, mapping,
+					indices[i], pvec.pages[i]);
+			if (ret < 0) {
+				put_dax_inode(dax_inode);
 				return ret;
+			}
 		}
 	}
+	put_dax_inode(dax_inode);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
 
 static int dax_insert_mapping(struct address_space *mapping,
-		struct block_device *bdev, sector_t sector, size_t size,
-		void **entryp, struct vm_area_struct *vma, struct vm_fault *vmf)
+		struct block_device *bdev, struct dax_inode *dax_inode,
+		sector_t sector, size_t size, void **entryp,
+		struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	unsigned long vaddr = vmf->address;
 	struct blk_dax_ctl dax = {
@@ -887,10 +880,15 @@ static int dax_insert_mapping(struct address_space *mapping,
 	};
 	void *ret;
 	void *entry = *entryp;
+	int id, rc;
 
-	if (dax_map_atomic(bdev, &dax) < 0)
-		return PTR_ERR(dax.addr);
-	dax_unmap_atomic(bdev, &dax);
+	id = dax_read_lock();
+	rc = bdev_dax_direct_access(bdev, dax_inode, &dax);
+	if (rc < 0) {
+		dax_read_unlock(id);
+		return rc;
+	}
+	dax_read_unlock(id);
 
 	ret = dax_insert_mapping_entry(mapping, vmf, entry, dax.sector, 0);
 	if (IS_ERR(ret))
@@ -947,7 +945,8 @@ static bool dax_range_is_aligned(struct block_device *bdev,
 	return true;
 }
 
-int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
+int __dax_zero_page_range(struct block_device *bdev,
+		struct dax_inode *dax_inode, sector_t sector,
 		unsigned int offset, unsigned int length)
 {
 	struct blk_dax_ctl dax = {
@@ -961,10 +960,16 @@ int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
 		return blkdev_issue_zeroout(bdev, start_sector,
 				length >> 9, GFP_NOFS, true);
 	} else {
-		if (dax_map_atomic(bdev, &dax) < 0)
-			return PTR_ERR(dax.addr);
+		int rc, id;
+
+		id = dax_read_lock();
+		rc = bdev_dax_direct_access(bdev, dax_inode, &dax);
+		if (rc < 0) {
+			dax_read_unlock(id);
+			return rc;
+		}
 		clear_pmem(dax.addr + offset, length);
-		dax_unmap_atomic(bdev, &dax);
+		dax_read_unlock(id);
 	}
 	return 0;
 }
@@ -983,6 +988,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	struct iov_iter *iter = data;
 	loff_t end = pos + length, done = 0;
 	ssize_t ret = 0;
+	int id;
 
 	if (iov_iter_rw(iter) == READ) {
 		end = min(end, i_size_read(inode));
@@ -1007,6 +1013,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 					      (end - 1) >> PAGE_SHIFT);
 	}
 
+	id = dax_read_lock();
 	while (pos < end) {
 		unsigned offset = pos & (PAGE_SIZE - 1);
 		struct blk_dax_ctl dax = { 0 };
@@ -1014,7 +1021,8 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		dax.sector = dax_iomap_sector(iomap, pos);
 		dax.size = (length + offset + PAGE_SIZE - 1) & PAGE_MASK;
-		map_len = dax_map_atomic(iomap->bdev, &dax);
+		map_len = bdev_dax_direct_access(iomap->bdev, iomap->dax_inode,
+				&dax);
 		if (map_len < 0) {
 			ret = map_len;
 			break;
@@ -1029,7 +1037,6 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 			map_len = copy_from_iter_pmem(dax.addr, map_len, iter);
 		else
 			map_len = copy_to_iter(dax.addr, map_len, iter);
-		dax_unmap_atomic(iomap->bdev, &dax);
 		if (map_len <= 0) {
 			ret = map_len ? map_len : -EFAULT;
 			break;
@@ -1039,6 +1046,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		length -= map_len;
 		done += map_len;
 	}
+	dax_read_unlock(id);
 
 	return done ? done : ret;
 }
@@ -1151,8 +1159,8 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 			clear_user_highpage(vmf->cow_page, vaddr);
 			break;
 		case IOMAP_MAPPED:
-			error = copy_user_dax(iomap.bdev, sector, PAGE_SIZE,
-					vmf->cow_page, vaddr);
+			error = copy_user_dax(iomap.bdev, iomap.dax_inode,
+					sector, PAGE_SIZE, vmf->cow_page, vaddr);
 			break;
 		default:
 			WARN_ON_ONCE(1);
@@ -1177,8 +1185,8 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 			mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
 		}
-		error = dax_insert_mapping(mapping, iomap.bdev, sector,
-				PAGE_SIZE, &entry, vma, vmf);
+		error = dax_insert_mapping(mapping, iomap.bdev, iomap.dax_inode,
+				sector, PAGE_SIZE, &entry, vma, vmf);
 		/* -EBUSY is fine, somebody else faulted on the same PTE */
 		if (error == -EBUSY)
 			error = 0;
@@ -1231,23 +1239,24 @@ static int dax_pmd_insert_mapping(struct vm_area_struct *vma, pmd_t *pmd,
 {
 	struct address_space *mapping = vma->vm_file->f_mapping;
 	struct block_device *bdev = iomap->bdev;
+	struct dax_inode *dax_inode = iomap->dax_inode;
 	struct blk_dax_ctl dax = {
 		.sector = dax_iomap_sector(iomap, pos),
 		.size = PMD_SIZE,
 	};
-	long length = dax_map_atomic(bdev, &dax);
+	long length;
 	void *ret;
+	int id;
 
-	if (length < 0) /* dax_map_atomic() failed */
-		return VM_FAULT_FALLBACK;
+	id = dax_read_lock();
+	length = bdev_dax_direct_access(bdev, dax_inode, &dax);
 	if (length < PMD_SIZE)
-		goto unmap_fallback;
+		goto unlock_fallback;
 	if (pfn_t_to_pfn(dax.pfn) & PG_PMD_COLOUR)
-		goto unmap_fallback;
+		goto unlock_fallback;
 	if (!pfn_t_devmap(dax.pfn))
-		goto unmap_fallback;
-
-	dax_unmap_atomic(bdev, &dax);
+		goto unlock_fallback;
+	dax_read_unlock(id);
 
 	ret = dax_insert_mapping_entry(mapping, vmf, *entryp, dax.sector,
 			RADIX_DAX_PMD);
@@ -1257,8 +1266,8 @@ static int dax_pmd_insert_mapping(struct vm_area_struct *vma, pmd_t *pmd,
 
 	return vmf_insert_pfn_pmd(vma, address, pmd, dax.pfn, write);
 
- unmap_fallback:
-	dax_unmap_atomic(bdev, &dax);
+ unlock_fallback:
+	dax_read_unlock(id);
 	return VM_FAULT_FALLBACK;
 }
 
diff --git a/fs/iomap.c b/fs/iomap.c
index 354a123f170e..279d18cc1cb6 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -355,7 +355,8 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
 	sector_t sector = iomap->blkno +
 		(((pos & ~(PAGE_SIZE - 1)) - iomap->offset) >> 9);
 
-	return __dax_zero_page_range(iomap->bdev, sector, offset, bytes);
+	return __dax_zero_page_range(iomap->bdev, iomap->dax_inode, sector,
+			offset, bytes);
 }
 
 static loff_t
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 10b742af3d56..b8e8e7896452 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -65,11 +65,13 @@ void dax_wake_mapping_entry_waiter(struct address_space *mapping,
 		pgoff_t index, void *entry, bool wake_all);
 
 #ifdef CONFIG_FS_DAX
-int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
+int __dax_zero_page_range(struct block_device *bdev,
+		struct dax_inode *dax_inode, sector_t sector,
 		unsigned int offset, unsigned int length);
 #else
 static inline int __dax_zero_page_range(struct block_device *bdev,
-		sector_t sector, unsigned int offset, unsigned int length)
+		struct dax_inode *dax_inode, sector_t sector,
+		unsigned int offset, unsigned int length)
 {
 	return -ENXIO;
 }


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 17/17] block: remove block_device_operations.direct_access and related infrastructure
  2017-01-28  8:36 ` Dan Williams
@ 2017-01-28  8:37   ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:37 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: snitzer, mawilcox, linux-block, linux-fsdevel, hch

Now that all the producers and consumers of dax interfaces have been
converted to using dax_operations on a dax_inode, remove the block
device direct_access enabling.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/powerpc/sysdev/axonram.c |   15 --------------
 drivers/block/brd.c           |   11 ----------
 drivers/md/dm-linear.c        |   19 -----------------
 drivers/md/dm-snap.c          |    8 -------
 drivers/md/dm-stripe.c        |   24 ----------------------
 drivers/md/dm-table.c         |    2 +-
 drivers/md/dm-target.c        |    7 ------
 drivers/md/dm.c               |   19 +++--------------
 drivers/nvdimm/pmem.c         |    9 --------
 drivers/s390/block/dcssblk.c  |   16 ---------------
 fs/block_dev.c                |   45 -----------------------------------------
 include/linux/blkdev.h        |    3 ---
 include/linux/device-mapper.h |    9 --------
 13 files changed, 4 insertions(+), 183 deletions(-)

diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 4e1f58187726..1337b5829980 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -148,23 +148,8 @@ __axon_ram_direct_access(struct axon_ram_bank *bank, phys_addr_t offset,
 	return bank->size - offset;
 }
 
-/**
- * axon_ram_direct_access - direct_access() method for block device
- * @device, @sector, @data: see block_device_operations method
- */
-static long
-axon_ram_blk_direct_access(struct block_device *device, sector_t sector,
-		       void **kaddr, pfn_t *pfn, long size)
-{
-	struct axon_ram_bank *bank = device->bd_disk->private_data;
-
-	return __axon_ram_direct_access(bank, sector << AXON_RAM_SECTOR_SHIFT,
-			kaddr, pfn, size);
-}
-
 static const struct block_device_operations axon_ram_devops = {
 	.owner		= THIS_MODULE,
-	.direct_access	= axon_ram_blk_direct_access
 };
 
 static long
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 1279df4dc07c..52a1259f8ded 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -395,14 +395,6 @@ static long __brd_direct_access(struct brd_device *brd, phys_addr_t dev_addr,
 	return PAGE_SIZE;
 }
 
-static long brd_blk_direct_access(struct block_device *bdev, sector_t sector,
-			void **kaddr, pfn_t *pfn, long size)
-{
-	struct brd_device *brd = bdev->bd_disk->private_data;
-
-	return __brd_direct_access(brd, sector * 512, kaddr, pfn, size);
-}
-
 static long brd_dax_direct_access(struct dax_inode *dax_inode,
 		phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size)
 {
@@ -414,14 +406,11 @@ static long brd_dax_direct_access(struct dax_inode *dax_inode,
 static const struct dax_operations brd_dax_ops = {
 	.direct_access = brd_dax_direct_access,
 };
-#else
-#define brd_blk_direct_access NULL
 #endif
 
 static const struct block_device_operations brd_fops = {
 	.owner =		THIS_MODULE,
 	.rw_page =		brd_rw_page,
-	.direct_access =	brd_blk_direct_access,
 };
 
 /*
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index e91ca8089333..7ec2a8eb8a14 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -141,24 +141,6 @@ static int linear_iterate_devices(struct dm_target *ti,
 	return fn(ti, lc->dev, lc->start, ti->len, data);
 }
 
-static long linear_direct_access(struct dm_target *ti, sector_t sector,
-				 void **kaddr, pfn_t *pfn, long size)
-{
-	struct linear_c *lc = ti->private;
-	struct block_device *bdev = lc->dev->bdev;
-	struct blk_dax_ctl dax = {
-		.sector = linear_map_sector(ti, sector),
-		.size = size,
-	};
-	long ret;
-
-	ret = bdev_direct_access(bdev, &dax);
-	*kaddr = dax.addr;
-	*pfn = dax.pfn;
-
-	return ret;
-}
-
 static long linear_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
 				     void **kaddr, pfn_t *pfn, long size)
 {
@@ -192,7 +174,6 @@ static struct target_type linear_target = {
 	.status = linear_status,
 	.prepare_ioctl = linear_prepare_ioctl,
 	.iterate_devices = linear_iterate_devices,
-	.direct_access = linear_direct_access,
 	.dax_ops = &linear_dax_ops,
 };
 
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 1990e3bd6958..1d9407633bb5 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -2302,13 +2302,6 @@ static int origin_map(struct dm_target *ti, struct bio *bio)
 	return do_origin(o->dev, bio);
 }
 
-static long origin_direct_access(struct dm_target *ti, sector_t sector,
-		void **kaddr, pfn_t *pfn, long size)
-{
-	DMWARN("device does not support dax.");
-	return -EIO;
-}
-
 static long origin_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
 		void **kaddr, pfn_t *pfn, long size)
 {
@@ -2379,7 +2372,6 @@ static struct target_type origin_target = {
 	.postsuspend = origin_postsuspend,
 	.status  = origin_status,
 	.iterate_devices = origin_iterate_devices,
-	.direct_access = origin_direct_access,
 	.dax_ops = &origin_dax_ops,
 };
 
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 47fb56a6184a..229b2c543902 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -308,29 +308,6 @@ static int stripe_map(struct dm_target *ti, struct bio *bio)
 	return DM_MAPIO_REMAPPED;
 }
 
-static long stripe_direct_access(struct dm_target *ti, sector_t sector,
-				 void **kaddr, pfn_t *pfn, long size)
-{
-	struct stripe_c *sc = ti->private;
-	uint32_t stripe;
-	struct block_device *bdev;
-	struct blk_dax_ctl dax = {
-		.size = size,
-	};
-	long ret;
-
-	stripe_map_sector(sc, sector, &stripe, &dax.sector);
-
-	dax.sector += sc->stripe[stripe].physical_start;
-	bdev = sc->stripe[stripe].dev->bdev;
-
-	ret = bdev_direct_access(bdev, &dax);
-	*kaddr = dax.addr;
-	*pfn = dax.pfn;
-
-	return ret;
-}
-
 static long stripe_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
 		void **kaddr, pfn_t *pfn, long size)
 {
@@ -477,7 +454,6 @@ static struct target_type stripe_target = {
 	.status = stripe_status,
 	.iterate_devices = stripe_iterate_devices,
 	.io_hints = stripe_io_hints,
-	.direct_access = stripe_direct_access,
 	.dax_ops = &stripe_dax_ops,
 };
 
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 3ad16d9c9d5a..cd23be26384c 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -856,7 +856,7 @@ static bool dm_table_supports_dax(struct dm_table *t)
 	while (i < dm_table_get_num_targets(t)) {
 		ti = dm_table_get_target(t, i++);
 
-		if (!ti->type->direct_access)
+		if (!ti->type->dax_ops)
 			return false;
 
 		if (!ti->type->iterate_devices ||
diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c
index ab072f53cf24..c3f55df90157 100644
--- a/drivers/md/dm-target.c
+++ b/drivers/md/dm-target.c
@@ -148,12 +148,6 @@ static void io_err_release_clone_rq(struct request *clone)
 {
 }
 
-static long io_err_direct_access(struct dm_target *ti, sector_t sector,
-				 void **kaddr, pfn_t *pfn, long size)
-{
-	return -EIO;
-}
-
 static long io_err_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
 				     void **kaddr, pfn_t *pfn, long size)
 {
@@ -174,7 +168,6 @@ static struct target_type error_target = {
 	.map_rq = io_err_map_rq,
 	.clone_and_map_rq = io_err_clone_and_map_rq,
 	.release_clone_rq = io_err_release_clone_rq,
-	.direct_access = io_err_direct_access,
 	.dax_ops = &err_dax_ops,
 };
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 5c5eeda0eb0a..497fb8adc660 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -910,7 +910,7 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
 EXPORT_SYMBOL_GPL(dm_set_target_max_io_len);
 
 static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
-			       void **kaddr, pfn_t *pfn, long size, bool blk)
+			       void **kaddr, pfn_t *pfn, long size)
 {
 	sector_t sector = dev_addr >> SECTOR_SHIFT;
 	struct dm_table *map;
@@ -929,9 +929,7 @@ static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
 	len = max_io_len(sector, ti) << SECTOR_SHIFT;
 	size = min(len, size);
 
-	if (blk && ti->type->direct_access)
-		ret = ti->type->direct_access(ti, sector, kaddr, pfn, size);
-	else if (ti->type->dax_ops)
+	if (ti->type->dax_ops)
 		ret = ti->type->dax_ops->dm_direct_access(ti, dev_addr, kaddr,
 				pfn, size);
 out:
@@ -939,23 +937,13 @@ static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
 	return min(ret, size);
 }
 
-static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
-				 void **kaddr, pfn_t *pfn, long size)
-{
-	struct mapped_device *md = bdev->bd_disk->private_data;
-
-	return __dm_direct_access(md, sector << SECTOR_SHIFT, kaddr, pfn, size,
-			true);
-}
-
 static long dm_dax_direct_access(struct dax_inode *dax_inode,
 				 phys_addr_t dev_addr, void **kaddr, pfn_t *pfn,
 				 long size)
 {
 	struct mapped_device *md = dax_inode_get_private(dax_inode);
 
-	return __dm_direct_access(md, dev_addr, kaddr, pfn, size,
-			false);
+	return __dm_direct_access(md, dev_addr, kaddr, pfn, size);
 }
 
 /*
@@ -2769,7 +2757,6 @@ static const struct block_device_operations dm_blk_dops = {
 	.open = dm_blk_open,
 	.release = dm_blk_close,
 	.ioctl = dm_blk_ioctl,
-	.direct_access = dm_blk_direct_access,
 	.getgeo = dm_blk_getgeo,
 	.pr_ops = &dm_pr_ops,
 	.owner = THIS_MODULE
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index d3d7de645e20..41781f853396 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -219,18 +219,9 @@ __weak long __pmem_direct_access(struct pmem_device *pmem, phys_addr_t dev_addr,
 	return pmem->size - pmem->pfn_pad - offset;
 }
 
-static long pmem_blk_direct_access(struct block_device *bdev, sector_t sector,
-		void **kaddr, pfn_t *pfn, long size)
-{
-	struct pmem_device *pmem = bdev->bd_queue->queuedata;
-
-	return __pmem_direct_access(pmem, sector * 512, kaddr, pfn, size);
-}
-
 static const struct block_device_operations pmem_fops = {
 	.owner =		THIS_MODULE,
 	.rw_page =		pmem_rw_page,
-	.direct_access =	pmem_blk_direct_access,
 	.revalidate_disk =	nvdimm_revalidate_disk,
 };
 
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 67b0885b4d12..03140c93dbd1 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -31,8 +31,6 @@ static int dcssblk_open(struct block_device *bdev, fmode_t mode);
 static void dcssblk_release(struct gendisk *disk, fmode_t mode);
 static blk_qc_t dcssblk_make_request(struct request_queue *q,
 						struct bio *bio);
-static long dcssblk_blk_direct_access(struct block_device *bdev, sector_t secnum,
-			 void **kaddr, pfn_t *pfn, long size);
 static long dcssblk_dax_direct_access(struct dax_inode *dax_inode,
 		phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size);
 
@@ -43,7 +41,6 @@ static const struct block_device_operations dcssblk_devops = {
 	.owner   	= THIS_MODULE,
 	.open    	= dcssblk_open,
 	.release 	= dcssblk_release,
-	.direct_access 	= dcssblk_blk_direct_access,
 };
 
 static const struct dax_operations dcssblk_dax_ops = {
@@ -914,19 +911,6 @@ __dcssblk_direct_access(struct dcssblk_dev_info *dev_info, phys_addr_t offset,
 }
 
 static long
-dcssblk_blk_direct_access(struct block_device *bdev, sector_t secnum,
-			void **kaddr, pfn_t *pfn, long size)
-{
-	struct dcssblk_dev_info *dev_info;
-
-	dev_info = bdev->bd_disk->private_data;
-	if (!dev_info)
-		return -ENODEV;
-	return __dcssblk_direct_access(dev_info, secnum * 512, kaddr, pfn,
-			size);
-}
-
-static long
 dcssblk_dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
 			void **kaddr, pfn_t *pfn, long size)
 {
diff --git a/fs/block_dev.c b/fs/block_dev.c
index a73f2388c515..ba0252736950 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -719,51 +719,6 @@ int bdev_write_page(struct block_device *bdev, sector_t sector,
 EXPORT_SYMBOL_GPL(bdev_write_page);
 
 /**
- * bdev_direct_access() - Get the address for directly-accessibly memory
- * @bdev: The device containing the memory
- * @dax: control and output parameters for ->direct_access
- *
- * If a block device is made up of directly addressable memory, this function
- * will tell the caller the PFN and the address of the memory.  The address
- * may be directly dereferenced within the kernel without the need to call
- * ioremap(), kmap() or similar.  The PFN is suitable for inserting into
- * page tables.
- *
- * Return: negative errno if an error occurs, otherwise the number of bytes
- * accessible at this address.
- */
-long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
-{
-	sector_t sector = dax->sector;
-	long avail, size = dax->size;
-	const struct block_device_operations *ops = bdev->bd_disk->fops;
-
-	/*
-	 * The device driver is allowed to sleep, in order to make the
-	 * memory directly accessible.
-	 */
-	might_sleep();
-
-	if (size < 0)
-		return size;
-	if (!blk_queue_dax(bdev_get_queue(bdev)) || !ops->direct_access)
-		return -EOPNOTSUPP;
-	if ((sector + DIV_ROUND_UP(size, 512)) >
-					part_nr_sects_read(bdev->bd_part))
-		return -ERANGE;
-	sector += get_start_sect(bdev);
-	if (sector % (PAGE_SIZE / 512))
-		return -EINVAL;
-	avail = ops->direct_access(bdev, sector, &dax->addr, &dax->pfn, size);
-	if (!avail)
-		return -ERANGE;
-	if (avail > 0 && avail & ~PAGE_MASK)
-		return -ENXIO;
-	return min(avail, size);
-}
-EXPORT_SYMBOL_GPL(bdev_direct_access);
-
-/**
  * bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address
  * @bdev: host block device for @dax_inode
  * @dax_inode: interface data and operations for a memory device
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3b3c5ce376fd..bb87390a29b1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1882,8 +1882,6 @@ struct block_device_operations {
 	int (*rw_page)(struct block_device *, sector_t, struct page *, bool);
 	int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
 	int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
-	long (*direct_access)(struct block_device *, sector_t, void **, pfn_t *,
-			long);
 	unsigned int (*check_events) (struct gendisk *disk,
 				      unsigned int clearing);
 	/* ->media_changed() is DEPRECATED, use ->check_events() instead */
@@ -1902,7 +1900,6 @@ extern int __blkdev_driver_ioctl(struct block_device *, fmode_t, unsigned int,
 extern int bdev_read_page(struct block_device *, sector_t, struct page *);
 extern int bdev_write_page(struct block_device *, sector_t, struct page *,
 						struct writeback_control *);
-extern long bdev_direct_access(struct block_device *, struct blk_dax_ctl *);
 struct dax_inode;
 extern long bdev_dax_direct_access(struct block_device *bdev,
 		struct dax_inode *dax_inode, struct blk_dax_ctl *dax);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 1b64f412bb45..6e8762f093d3 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -125,14 +125,6 @@ typedef void (*dm_io_hints_fn) (struct dm_target *ti,
  */
 typedef int (*dm_busy_fn) (struct dm_target *ti);
 
-/*
- * Returns:
- *  < 0 : error
- * >= 0 : the number of bytes accessible at the address
- */
-typedef long (*dm_direct_access_fn) (struct dm_target *ti, sector_t sector,
-				     void **kaddr, pfn_t *pfn, long size);
-
 void dm_error(const char *message);
 
 struct dm_dev {
@@ -185,7 +177,6 @@ struct target_type {
 	dm_busy_fn busy;
 	dm_iterate_devices_fn iterate_devices;
 	dm_io_hints_fn io_hints;
-	dm_direct_access_fn direct_access;
 	const struct dm_dax_operations *dax_ops;
 
 	/* For internal device-mapper use. */

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH 17/17] block: remove block_device_operations.direct_access and related infrastructure
@ 2017-01-28  8:37   ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-28  8:37 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

Now that all the producers and consumers of dax interfaces have been
converted to using dax_operations on a dax_inode, remove the block
device direct_access enabling.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/powerpc/sysdev/axonram.c |   15 --------------
 drivers/block/brd.c           |   11 ----------
 drivers/md/dm-linear.c        |   19 -----------------
 drivers/md/dm-snap.c          |    8 -------
 drivers/md/dm-stripe.c        |   24 ----------------------
 drivers/md/dm-table.c         |    2 +-
 drivers/md/dm-target.c        |    7 ------
 drivers/md/dm.c               |   19 +++--------------
 drivers/nvdimm/pmem.c         |    9 --------
 drivers/s390/block/dcssblk.c  |   16 ---------------
 fs/block_dev.c                |   45 -----------------------------------------
 include/linux/blkdev.h        |    3 ---
 include/linux/device-mapper.h |    9 --------
 13 files changed, 4 insertions(+), 183 deletions(-)

diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 4e1f58187726..1337b5829980 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -148,23 +148,8 @@ __axon_ram_direct_access(struct axon_ram_bank *bank, phys_addr_t offset,
 	return bank->size - offset;
 }
 
-/**
- * axon_ram_direct_access - direct_access() method for block device
- * @device, @sector, @data: see block_device_operations method
- */
-static long
-axon_ram_blk_direct_access(struct block_device *device, sector_t sector,
-		       void **kaddr, pfn_t *pfn, long size)
-{
-	struct axon_ram_bank *bank = device->bd_disk->private_data;
-
-	return __axon_ram_direct_access(bank, sector << AXON_RAM_SECTOR_SHIFT,
-			kaddr, pfn, size);
-}
-
 static const struct block_device_operations axon_ram_devops = {
 	.owner		= THIS_MODULE,
-	.direct_access	= axon_ram_blk_direct_access
 };
 
 static long
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 1279df4dc07c..52a1259f8ded 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -395,14 +395,6 @@ static long __brd_direct_access(struct brd_device *brd, phys_addr_t dev_addr,
 	return PAGE_SIZE;
 }
 
-static long brd_blk_direct_access(struct block_device *bdev, sector_t sector,
-			void **kaddr, pfn_t *pfn, long size)
-{
-	struct brd_device *brd = bdev->bd_disk->private_data;
-
-	return __brd_direct_access(brd, sector * 512, kaddr, pfn, size);
-}
-
 static long brd_dax_direct_access(struct dax_inode *dax_inode,
 		phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size)
 {
@@ -414,14 +406,11 @@ static long brd_dax_direct_access(struct dax_inode *dax_inode,
 static const struct dax_operations brd_dax_ops = {
 	.direct_access = brd_dax_direct_access,
 };
-#else
-#define brd_blk_direct_access NULL
 #endif
 
 static const struct block_device_operations brd_fops = {
 	.owner =		THIS_MODULE,
 	.rw_page =		brd_rw_page,
-	.direct_access =	brd_blk_direct_access,
 };
 
 /*
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index e91ca8089333..7ec2a8eb8a14 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -141,24 +141,6 @@ static int linear_iterate_devices(struct dm_target *ti,
 	return fn(ti, lc->dev, lc->start, ti->len, data);
 }
 
-static long linear_direct_access(struct dm_target *ti, sector_t sector,
-				 void **kaddr, pfn_t *pfn, long size)
-{
-	struct linear_c *lc = ti->private;
-	struct block_device *bdev = lc->dev->bdev;
-	struct blk_dax_ctl dax = {
-		.sector = linear_map_sector(ti, sector),
-		.size = size,
-	};
-	long ret;
-
-	ret = bdev_direct_access(bdev, &dax);
-	*kaddr = dax.addr;
-	*pfn = dax.pfn;
-
-	return ret;
-}
-
 static long linear_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
 				     void **kaddr, pfn_t *pfn, long size)
 {
@@ -192,7 +174,6 @@ static struct target_type linear_target = {
 	.status = linear_status,
 	.prepare_ioctl = linear_prepare_ioctl,
 	.iterate_devices = linear_iterate_devices,
-	.direct_access = linear_direct_access,
 	.dax_ops = &linear_dax_ops,
 };
 
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 1990e3bd6958..1d9407633bb5 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -2302,13 +2302,6 @@ static int origin_map(struct dm_target *ti, struct bio *bio)
 	return do_origin(o->dev, bio);
 }
 
-static long origin_direct_access(struct dm_target *ti, sector_t sector,
-		void **kaddr, pfn_t *pfn, long size)
-{
-	DMWARN("device does not support dax.");
-	return -EIO;
-}
-
 static long origin_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
 		void **kaddr, pfn_t *pfn, long size)
 {
@@ -2379,7 +2372,6 @@ static struct target_type origin_target = {
 	.postsuspend = origin_postsuspend,
 	.status  = origin_status,
 	.iterate_devices = origin_iterate_devices,
-	.direct_access = origin_direct_access,
 	.dax_ops = &origin_dax_ops,
 };
 
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 47fb56a6184a..229b2c543902 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -308,29 +308,6 @@ static int stripe_map(struct dm_target *ti, struct bio *bio)
 	return DM_MAPIO_REMAPPED;
 }
 
-static long stripe_direct_access(struct dm_target *ti, sector_t sector,
-				 void **kaddr, pfn_t *pfn, long size)
-{
-	struct stripe_c *sc = ti->private;
-	uint32_t stripe;
-	struct block_device *bdev;
-	struct blk_dax_ctl dax = {
-		.size = size,
-	};
-	long ret;
-
-	stripe_map_sector(sc, sector, &stripe, &dax.sector);
-
-	dax.sector += sc->stripe[stripe].physical_start;
-	bdev = sc->stripe[stripe].dev->bdev;
-
-	ret = bdev_direct_access(bdev, &dax);
-	*kaddr = dax.addr;
-	*pfn = dax.pfn;
-
-	return ret;
-}
-
 static long stripe_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
 		void **kaddr, pfn_t *pfn, long size)
 {
@@ -477,7 +454,6 @@ static struct target_type stripe_target = {
 	.status = stripe_status,
 	.iterate_devices = stripe_iterate_devices,
 	.io_hints = stripe_io_hints,
-	.direct_access = stripe_direct_access,
 	.dax_ops = &stripe_dax_ops,
 };
 
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 3ad16d9c9d5a..cd23be26384c 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -856,7 +856,7 @@ static bool dm_table_supports_dax(struct dm_table *t)
 	while (i < dm_table_get_num_targets(t)) {
 		ti = dm_table_get_target(t, i++);
 
-		if (!ti->type->direct_access)
+		if (!ti->type->dax_ops)
 			return false;
 
 		if (!ti->type->iterate_devices ||
diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c
index ab072f53cf24..c3f55df90157 100644
--- a/drivers/md/dm-target.c
+++ b/drivers/md/dm-target.c
@@ -148,12 +148,6 @@ static void io_err_release_clone_rq(struct request *clone)
 {
 }
 
-static long io_err_direct_access(struct dm_target *ti, sector_t sector,
-				 void **kaddr, pfn_t *pfn, long size)
-{
-	return -EIO;
-}
-
 static long io_err_dax_direct_access(struct dm_target *ti, phys_addr_t dev_addr,
 				     void **kaddr, pfn_t *pfn, long size)
 {
@@ -174,7 +168,6 @@ static struct target_type error_target = {
 	.map_rq = io_err_map_rq,
 	.clone_and_map_rq = io_err_clone_and_map_rq,
 	.release_clone_rq = io_err_release_clone_rq,
-	.direct_access = io_err_direct_access,
 	.dax_ops = &err_dax_ops,
 };
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 5c5eeda0eb0a..497fb8adc660 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -910,7 +910,7 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
 EXPORT_SYMBOL_GPL(dm_set_target_max_io_len);
 
 static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
-			       void **kaddr, pfn_t *pfn, long size, bool blk)
+			       void **kaddr, pfn_t *pfn, long size)
 {
 	sector_t sector = dev_addr >> SECTOR_SHIFT;
 	struct dm_table *map;
@@ -929,9 +929,7 @@ static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
 	len = max_io_len(sector, ti) << SECTOR_SHIFT;
 	size = min(len, size);
 
-	if (blk && ti->type->direct_access)
-		ret = ti->type->direct_access(ti, sector, kaddr, pfn, size);
-	else if (ti->type->dax_ops)
+	if (ti->type->dax_ops)
 		ret = ti->type->dax_ops->dm_direct_access(ti, dev_addr, kaddr,
 				pfn, size);
 out:
@@ -939,23 +937,13 @@ static long __dm_direct_access(struct mapped_device *md, phys_addr_t dev_addr,
 	return min(ret, size);
 }
 
-static long dm_blk_direct_access(struct block_device *bdev, sector_t sector,
-				 void **kaddr, pfn_t *pfn, long size)
-{
-	struct mapped_device *md = bdev->bd_disk->private_data;
-
-	return __dm_direct_access(md, sector << SECTOR_SHIFT, kaddr, pfn, size,
-			true);
-}
-
 static long dm_dax_direct_access(struct dax_inode *dax_inode,
 				 phys_addr_t dev_addr, void **kaddr, pfn_t *pfn,
 				 long size)
 {
 	struct mapped_device *md = dax_inode_get_private(dax_inode);
 
-	return __dm_direct_access(md, dev_addr, kaddr, pfn, size,
-			false);
+	return __dm_direct_access(md, dev_addr, kaddr, pfn, size);
 }
 
 /*
@@ -2769,7 +2757,6 @@ static const struct block_device_operations dm_blk_dops = {
 	.open = dm_blk_open,
 	.release = dm_blk_close,
 	.ioctl = dm_blk_ioctl,
-	.direct_access = dm_blk_direct_access,
 	.getgeo = dm_blk_getgeo,
 	.pr_ops = &dm_pr_ops,
 	.owner = THIS_MODULE
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index d3d7de645e20..41781f853396 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -219,18 +219,9 @@ __weak long __pmem_direct_access(struct pmem_device *pmem, phys_addr_t dev_addr,
 	return pmem->size - pmem->pfn_pad - offset;
 }
 
-static long pmem_blk_direct_access(struct block_device *bdev, sector_t sector,
-		void **kaddr, pfn_t *pfn, long size)
-{
-	struct pmem_device *pmem = bdev->bd_queue->queuedata;
-
-	return __pmem_direct_access(pmem, sector * 512, kaddr, pfn, size);
-}
-
 static const struct block_device_operations pmem_fops = {
 	.owner =		THIS_MODULE,
 	.rw_page =		pmem_rw_page,
-	.direct_access =	pmem_blk_direct_access,
 	.revalidate_disk =	nvdimm_revalidate_disk,
 };
 
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 67b0885b4d12..03140c93dbd1 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -31,8 +31,6 @@ static int dcssblk_open(struct block_device *bdev, fmode_t mode);
 static void dcssblk_release(struct gendisk *disk, fmode_t mode);
 static blk_qc_t dcssblk_make_request(struct request_queue *q,
 						struct bio *bio);
-static long dcssblk_blk_direct_access(struct block_device *bdev, sector_t secnum,
-			 void **kaddr, pfn_t *pfn, long size);
 static long dcssblk_dax_direct_access(struct dax_inode *dax_inode,
 		phys_addr_t dev_addr, void **kaddr, pfn_t *pfn, long size);
 
@@ -43,7 +41,6 @@ static const struct block_device_operations dcssblk_devops = {
 	.owner   	= THIS_MODULE,
 	.open    	= dcssblk_open,
 	.release 	= dcssblk_release,
-	.direct_access 	= dcssblk_blk_direct_access,
 };
 
 static const struct dax_operations dcssblk_dax_ops = {
@@ -914,19 +911,6 @@ __dcssblk_direct_access(struct dcssblk_dev_info *dev_info, phys_addr_t offset,
 }
 
 static long
-dcssblk_blk_direct_access(struct block_device *bdev, sector_t secnum,
-			void **kaddr, pfn_t *pfn, long size)
-{
-	struct dcssblk_dev_info *dev_info;
-
-	dev_info = bdev->bd_disk->private_data;
-	if (!dev_info)
-		return -ENODEV;
-	return __dcssblk_direct_access(dev_info, secnum * 512, kaddr, pfn,
-			size);
-}
-
-static long
 dcssblk_dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
 			void **kaddr, pfn_t *pfn, long size)
 {
diff --git a/fs/block_dev.c b/fs/block_dev.c
index a73f2388c515..ba0252736950 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -719,51 +719,6 @@ int bdev_write_page(struct block_device *bdev, sector_t sector,
 EXPORT_SYMBOL_GPL(bdev_write_page);
 
 /**
- * bdev_direct_access() - Get the address for directly-accessibly memory
- * @bdev: The device containing the memory
- * @dax: control and output parameters for ->direct_access
- *
- * If a block device is made up of directly addressable memory, this function
- * will tell the caller the PFN and the address of the memory.  The address
- * may be directly dereferenced within the kernel without the need to call
- * ioremap(), kmap() or similar.  The PFN is suitable for inserting into
- * page tables.
- *
- * Return: negative errno if an error occurs, otherwise the number of bytes
- * accessible at this address.
- */
-long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
-{
-	sector_t sector = dax->sector;
-	long avail, size = dax->size;
-	const struct block_device_operations *ops = bdev->bd_disk->fops;
-
-	/*
-	 * The device driver is allowed to sleep, in order to make the
-	 * memory directly accessible.
-	 */
-	might_sleep();
-
-	if (size < 0)
-		return size;
-	if (!blk_queue_dax(bdev_get_queue(bdev)) || !ops->direct_access)
-		return -EOPNOTSUPP;
-	if ((sector + DIV_ROUND_UP(size, 512)) >
-					part_nr_sects_read(bdev->bd_part))
-		return -ERANGE;
-	sector += get_start_sect(bdev);
-	if (sector % (PAGE_SIZE / 512))
-		return -EINVAL;
-	avail = ops->direct_access(bdev, sector, &dax->addr, &dax->pfn, size);
-	if (!avail)
-		return -ERANGE;
-	if (avail > 0 && avail & ~PAGE_MASK)
-		return -ENXIO;
-	return min(avail, size);
-}
-EXPORT_SYMBOL_GPL(bdev_direct_access);
-
-/**
  * bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address
  * @bdev: host block device for @dax_inode
  * @dax_inode: interface data and operations for a memory device
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3b3c5ce376fd..bb87390a29b1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1882,8 +1882,6 @@ struct block_device_operations {
 	int (*rw_page)(struct block_device *, sector_t, struct page *, bool);
 	int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
 	int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
-	long (*direct_access)(struct block_device *, sector_t, void **, pfn_t *,
-			long);
 	unsigned int (*check_events) (struct gendisk *disk,
 				      unsigned int clearing);
 	/* ->media_changed() is DEPRECATED, use ->check_events() instead */
@@ -1902,7 +1900,6 @@ extern int __blkdev_driver_ioctl(struct block_device *, fmode_t, unsigned int,
 extern int bdev_read_page(struct block_device *, sector_t, struct page *);
 extern int bdev_write_page(struct block_device *, sector_t, struct page *,
 						struct writeback_control *);
-extern long bdev_direct_access(struct block_device *, struct blk_dax_ctl *);
 struct dax_inode;
 extern long bdev_dax_direct_access(struct block_device *bdev,
 		struct dax_inode *dax_inode, struct blk_dax_ctl *dax);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 1b64f412bb45..6e8762f093d3 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -125,14 +125,6 @@ typedef void (*dm_io_hints_fn) (struct dm_target *ti,
  */
 typedef int (*dm_busy_fn) (struct dm_target *ti);
 
-/*
- * Returns:
- *  < 0 : error
- * >= 0 : the number of bytes accessible at the address
- */
-typedef long (*dm_direct_access_fn) (struct dm_target *ti, sector_t sector,
-				     void **kaddr, pfn_t *pfn, long size);
-
 void dm_error(const char *message);
 
 struct dm_dev {
@@ -185,7 +177,6 @@ struct target_type {
 	dm_busy_fn busy;
 	dm_iterate_devices_fn iterate_devices;
 	dm_io_hints_fn io_hints;
-	dm_direct_access_fn direct_access;
 	const struct dm_dax_operations *dax_ops;
 
 	/* For internal device-mapper use. */


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
  2017-01-28  8:37   ` Dan Williams
  (?)
@ 2017-01-30 12:26   ` Christoph Hellwig
  2017-01-30 18:29       ` Dan Williams
  -1 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2017-01-30 12:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

On Sat, Jan 28, 2017 at 12:37:14AM -0800, Dan Williams wrote:
> This is in preparation for removing the ->direct_access() method from
> block_device_operations.

I don't think mount_bdev has any business knowing about DAX. 
Just call dax_get_by_host manually from the affected file systems for
now, and in the future we can have a pure-DAX mount_dax helper.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 01/17] dax: refactor dax-fs into a generic provider of dax inodes
  2017-01-28  8:36   ` Dan Williams
  (?)
@ 2017-01-30 12:28   ` Christoph Hellwig
  2017-01-30 17:12       ` Dan Williams
  -1 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2017-01-30 12:28 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

I really don't like the dax_inode name.  Why not something like
dax_device or dax_region?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
  2017-01-28  8:36   ` Dan Williams
  (?)
@ 2017-01-30 12:32   ` Christoph Hellwig
  2017-01-30 18:16       ` Dan Williams
  -1 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2017-01-30 12:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, snitzer, toshi.kani, mawilcox, linux-block, jmoyer,
	linux-fsdevel, ross.zwisler, hch

On Sat, Jan 28, 2017 at 12:36:58AM -0800, Dan Williams wrote:
> Provide a replacement for bdev_direct_access() that uses
> dax_operations.direct_access() instead of
> block_device_operations.direct_access(). Once all consumers of the old
> api have been converted bdev_direct_access() will be deleted.
> 
> Given that block device partitioning decisions can cause dax page
> alignment constraints to be violated we still need to validate the
> block_device before calling the dax ->direct_access method.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  block/Kconfig          |    1 +
>  drivers/dax/super.c    |   33 +++++++++++++++++++++++++++++++++
>  fs/block_dev.c         |   28 ++++++++++++++++++++++++++++
>  include/linux/blkdev.h |    3 +++
>  include/linux/dax.h    |    2 ++
>  5 files changed, 67 insertions(+)
> 
> diff --git a/block/Kconfig b/block/Kconfig
> index 8bf114a3858a..9be785173280 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -6,6 +6,7 @@ menuconfig BLOCK
>         default y
>         select SBITMAP
>         select SRCU
> +       select DAX
>         help
>  	 Provide block layer support for the kernel.
>  
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index eb844ffea3cf..ab5b082df5dd 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -65,6 +65,39 @@ struct dax_inode {
>  	const struct dax_operations *ops;
>  };
>  
> +long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
> +		void **kaddr, pfn_t *pfn, long size)
> +{
> +	long avail;
> +
> +	/*
> +	 * The device driver is allowed to sleep, in order to make the
> +	 * memory directly accessible.
> +	 */
> +	might_sleep();
> +
> +	if (!dax_inode)
> +		return -EOPNOTSUPP;
> +
> +	if (!dax_inode_alive(dax_inode))
> +		return -ENXIO;
> +
> +	if (size < 0)
> +		return size;
> +
> +	if (dev_addr % PAGE_SIZE)
> +		return -EINVAL;
> +
> +	avail = dax_inode->ops->direct_access(dax_inode, dev_addr, kaddr, pfn,
> +			size);
> +	if (!avail)
> +		return -ERANGE;
> +	if (avail > 0 && avail & ~PAGE_MASK)
> +		return -ENXIO;
> +	return min(avail, size);
> +}
> +EXPORT_SYMBOL_GPL(dax_direct_access);
> +
>  bool dax_inode_alive(struct dax_inode *dax_inode)
>  {
>  	lockdep_assert_held(&dax_srcu);
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index edb1d2b16b8f..bf4b51a3a412 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -18,6 +18,7 @@
>  #include <linux/module.h>
>  #include <linux/blkpg.h>
>  #include <linux/magic.h>
> +#include <linux/dax.h>
>  #include <linux/buffer_head.h>
>  #include <linux/swap.h>
>  #include <linux/pagevec.h>
> @@ -763,6 +764,33 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
>  EXPORT_SYMBOL_GPL(bdev_direct_access);
>  
>  /**
> + * bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address
> + * @bdev: host block device for @dax_inode
> + * @dax_inode: interface data and operations for a memory device
> + * @dax: control and output parameters for ->direct_access
> + *
> + * Return: negative errno if an error occurs, otherwise the number of bytes
> + * accessible at this address.
> + *
> + * Locking: must be called with dax_read_lock() held
> + */
> +long bdev_dax_direct_access(struct block_device *bdev,
> +		struct dax_inode *dax_inode, struct blk_dax_ctl *dax)
> +{
> +	sector_t sector = dax->sector;
> +
> +	if (!blk_queue_dax(bdev->bd_queue))
> +		return -EOPNOTSUPP;

I don't think this should take a bdev - the caller should know if
it has a dax_inode.  Also if you touch this anyway can we kill
the annoying struct blk_dax_ctl calling convention?  Passing the
four arguments explicitly is just a lot more readable and understandable.

> +	if ((sector + DIV_ROUND_UP(dax->size, 512))
> +			> part_nr_sects_read(bdev->bd_part))
> +		return -ERANGE;
> +	sector += get_start_sect(bdev);
> +	return dax_direct_access(dax_inode, sector * 512, &dax->addr,
> +			&dax->pfn, dax->size);

And please switch to using bytes as the granularity given that we're
deadling with byte addressable memory.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 01/17] dax: refactor dax-fs into a generic provider of dax inodes
  2017-01-30 12:28   ` Christoph Hellwig
@ 2017-01-30 17:12       ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-30 17:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mike Snitzer, Matthew Wilcox, linux-nvdimm, linux-block, linux-fsdevel

On Mon, Jan 30, 2017 at 4:28 AM, Christoph Hellwig <hch@lst.de> wrote:
> I really don't like the dax_inode name.  Why not something like
> dax_device or dax_region?

Fair enough, I'll switch struct dax_inode to dax_device and switch the
existing struct dax_dev to dax_info.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 01/17] dax: refactor dax-fs into a generic provider of dax inodes
@ 2017-01-30 17:12       ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-30 17:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvdimm, Mike Snitzer, Toshi Kani, Matthew Wilcox,
	linux-block, jmoyer, linux-fsdevel, Ross Zwisler

On Mon, Jan 30, 2017 at 4:28 AM, Christoph Hellwig <hch@lst.de> wrote:
> I really don't like the dax_inode name.  Why not something like
> dax_device or dax_region?

Fair enough, I'll switch struct dax_inode to dax_device and switch the
existing struct dax_dev to dax_info.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
  2017-01-30 12:32   ` Christoph Hellwig
@ 2017-01-30 18:16       ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-30 18:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mike Snitzer, Matthew Wilcox, linux-nvdimm, linux-block, linux-fsdevel

On Mon, Jan 30, 2017 at 4:32 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Sat, Jan 28, 2017 at 12:36:58AM -0800, Dan Williams wrote:
>> Provide a replacement for bdev_direct_access() that uses
>> dax_operations.direct_access() instead of
>> block_device_operations.direct_access(). Once all consumers of the old
>> api have been converted bdev_direct_access() will be deleted.
>>
>> Given that block device partitioning decisions can cause dax page
>> alignment constraints to be violated we still need to validate the
>> block_device before calling the dax ->direct_access method.
>>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  block/Kconfig          |    1 +
>>  drivers/dax/super.c    |   33 +++++++++++++++++++++++++++++++++
>>  fs/block_dev.c         |   28 ++++++++++++++++++++++++++++
>>  include/linux/blkdev.h |    3 +++
>>  include/linux/dax.h    |    2 ++
>>  5 files changed, 67 insertions(+)
>>
>> diff --git a/block/Kconfig b/block/Kconfig
>> index 8bf114a3858a..9be785173280 100644
>> --- a/block/Kconfig
>> +++ b/block/Kconfig
>> @@ -6,6 +6,7 @@ menuconfig BLOCK
>>         default y
>>         select SBITMAP
>>         select SRCU
>> +       select DAX
>>         help
>>        Provide block layer support for the kernel.
>>
>> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
>> index eb844ffea3cf..ab5b082df5dd 100644
>> --- a/drivers/dax/super.c
>> +++ b/drivers/dax/super.c
>> @@ -65,6 +65,39 @@ struct dax_inode {
>>       const struct dax_operations *ops;
>>  };
>>
>> +long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
>> +             void **kaddr, pfn_t *pfn, long size)
>> +{
>> +     long avail;
>> +
>> +     /*
>> +      * The device driver is allowed to sleep, in order to make the
>> +      * memory directly accessible.
>> +      */
>> +     might_sleep();
>> +
>> +     if (!dax_inode)
>> +             return -EOPNOTSUPP;
>> +
>> +     if (!dax_inode_alive(dax_inode))
>> +             return -ENXIO;
>> +
>> +     if (size < 0)
>> +             return size;
>> +
>> +     if (dev_addr % PAGE_SIZE)
>> +             return -EINVAL;
>> +
>> +     avail = dax_inode->ops->direct_access(dax_inode, dev_addr, kaddr, pfn,
>> +                     size);
>> +     if (!avail)
>> +             return -ERANGE;
>> +     if (avail > 0 && avail & ~PAGE_MASK)
>> +             return -ENXIO;
>> +     return min(avail, size);
>> +}
>> +EXPORT_SYMBOL_GPL(dax_direct_access);
>> +
>>  bool dax_inode_alive(struct dax_inode *dax_inode)
>>  {
>>       lockdep_assert_held(&dax_srcu);
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index edb1d2b16b8f..bf4b51a3a412 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -18,6 +18,7 @@
>>  #include <linux/module.h>
>>  #include <linux/blkpg.h>
>>  #include <linux/magic.h>
>> +#include <linux/dax.h>
>>  #include <linux/buffer_head.h>
>>  #include <linux/swap.h>
>>  #include <linux/pagevec.h>
>> @@ -763,6 +764,33 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
>>  EXPORT_SYMBOL_GPL(bdev_direct_access);
>>
>>  /**
>> + * bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address
>> + * @bdev: host block device for @dax_inode
>> + * @dax_inode: interface data and operations for a memory device
>> + * @dax: control and output parameters for ->direct_access
>> + *
>> + * Return: negative errno if an error occurs, otherwise the number of bytes
>> + * accessible at this address.
>> + *
>> + * Locking: must be called with dax_read_lock() held
>> + */
>> +long bdev_dax_direct_access(struct block_device *bdev,
>> +             struct dax_inode *dax_inode, struct blk_dax_ctl *dax)
>> +{
>> +     sector_t sector = dax->sector;
>> +
>> +     if (!blk_queue_dax(bdev->bd_queue))
>> +             return -EOPNOTSUPP;
>
> I don't think this should take a bdev - the caller should know if
> it has a dax_inode.  Also if you touch this anyway can we kill
> the annoying struct blk_dax_ctl calling convention?  Passing the
> four arguments explicitly is just a lot more readable and understandable.

Ok, now that dax_map_atomic() is gone, it's much easier to remove
struct blk_dax_ctl.

We can also move the partition alignment checks to be a one-time check
at bdev_dax_capable() time and kill bdev_dax_direct_access() in favor
of calling dax_direct_access() directly.

>> +     if ((sector + DIV_ROUND_UP(dax->size, 512))
>> +                     > part_nr_sects_read(bdev->bd_part))
>> +             return -ERANGE;
>> +     sector += get_start_sect(bdev);
>> +     return dax_direct_access(dax_inode, sector * 512, &dax->addr,
>> +                     &dax->pfn, dax->size);
>
> And please switch to using bytes as the granularity given that we're
> deadling with byte addressable memory.

dax_direct_access() does take a byte aligned physical address, but it
needs to be at least page aligned since we are returning a pfn_t...

Hmm, perhaps the input should be raw page frame number. We could
reduce one of the arguments by making the current 'pfn_t *' parameter
an in/out-parameter.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
@ 2017-01-30 18:16       ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-30 18:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvdimm, Mike Snitzer, Toshi Kani, Matthew Wilcox,
	linux-block, jmoyer, linux-fsdevel, Ross Zwisler

On Mon, Jan 30, 2017 at 4:32 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Sat, Jan 28, 2017 at 12:36:58AM -0800, Dan Williams wrote:
>> Provide a replacement for bdev_direct_access() that uses
>> dax_operations.direct_access() instead of
>> block_device_operations.direct_access(). Once all consumers of the old
>> api have been converted bdev_direct_access() will be deleted.
>>
>> Given that block device partitioning decisions can cause dax page
>> alignment constraints to be violated we still need to validate the
>> block_device before calling the dax ->direct_access method.
>>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  block/Kconfig          |    1 +
>>  drivers/dax/super.c    |   33 +++++++++++++++++++++++++++++++++
>>  fs/block_dev.c         |   28 ++++++++++++++++++++++++++++
>>  include/linux/blkdev.h |    3 +++
>>  include/linux/dax.h    |    2 ++
>>  5 files changed, 67 insertions(+)
>>
>> diff --git a/block/Kconfig b/block/Kconfig
>> index 8bf114a3858a..9be785173280 100644
>> --- a/block/Kconfig
>> +++ b/block/Kconfig
>> @@ -6,6 +6,7 @@ menuconfig BLOCK
>>         default y
>>         select SBITMAP
>>         select SRCU
>> +       select DAX
>>         help
>>        Provide block layer support for the kernel.
>>
>> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
>> index eb844ffea3cf..ab5b082df5dd 100644
>> --- a/drivers/dax/super.c
>> +++ b/drivers/dax/super.c
>> @@ -65,6 +65,39 @@ struct dax_inode {
>>       const struct dax_operations *ops;
>>  };
>>
>> +long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr,
>> +             void **kaddr, pfn_t *pfn, long size)
>> +{
>> +     long avail;
>> +
>> +     /*
>> +      * The device driver is allowed to sleep, in order to make the
>> +      * memory directly accessible.
>> +      */
>> +     might_sleep();
>> +
>> +     if (!dax_inode)
>> +             return -EOPNOTSUPP;
>> +
>> +     if (!dax_inode_alive(dax_inode))
>> +             return -ENXIO;
>> +
>> +     if (size < 0)
>> +             return size;
>> +
>> +     if (dev_addr % PAGE_SIZE)
>> +             return -EINVAL;
>> +
>> +     avail = dax_inode->ops->direct_access(dax_inode, dev_addr, kaddr, pfn,
>> +                     size);
>> +     if (!avail)
>> +             return -ERANGE;
>> +     if (avail > 0 && avail & ~PAGE_MASK)
>> +             return -ENXIO;
>> +     return min(avail, size);
>> +}
>> +EXPORT_SYMBOL_GPL(dax_direct_access);
>> +
>>  bool dax_inode_alive(struct dax_inode *dax_inode)
>>  {
>>       lockdep_assert_held(&dax_srcu);
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index edb1d2b16b8f..bf4b51a3a412 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -18,6 +18,7 @@
>>  #include <linux/module.h>
>>  #include <linux/blkpg.h>
>>  #include <linux/magic.h>
>> +#include <linux/dax.h>
>>  #include <linux/buffer_head.h>
>>  #include <linux/swap.h>
>>  #include <linux/pagevec.h>
>> @@ -763,6 +764,33 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
>>  EXPORT_SYMBOL_GPL(bdev_direct_access);
>>
>>  /**
>> + * bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address
>> + * @bdev: host block device for @dax_inode
>> + * @dax_inode: interface data and operations for a memory device
>> + * @dax: control and output parameters for ->direct_access
>> + *
>> + * Return: negative errno if an error occurs, otherwise the number of bytes
>> + * accessible at this address.
>> + *
>> + * Locking: must be called with dax_read_lock() held
>> + */
>> +long bdev_dax_direct_access(struct block_device *bdev,
>> +             struct dax_inode *dax_inode, struct blk_dax_ctl *dax)
>> +{
>> +     sector_t sector = dax->sector;
>> +
>> +     if (!blk_queue_dax(bdev->bd_queue))
>> +             return -EOPNOTSUPP;
>
> I don't think this should take a bdev - the caller should know if
> it has a dax_inode.  Also if you touch this anyway can we kill
> the annoying struct blk_dax_ctl calling convention?  Passing the
> four arguments explicitly is just a lot more readable and understandable.

Ok, now that dax_map_atomic() is gone, it's much easier to remove
struct blk_dax_ctl.

We can also move the partition alignment checks to be a one-time check
at bdev_dax_capable() time and kill bdev_dax_direct_access() in favor
of calling dax_direct_access() directly.

>> +     if ((sector + DIV_ROUND_UP(dax->size, 512))
>> +                     > part_nr_sects_read(bdev->bd_part))
>> +             return -ERANGE;
>> +     sector += get_start_sect(bdev);
>> +     return dax_direct_access(dax_inode, sector * 512, &dax->addr,
>> +                     &dax->pfn, dax->size);
>
> And please switch to using bytes as the granularity given that we're
> deadling with byte addressable memory.

dax_direct_access() does take a byte aligned physical address, but it
needs to be at least page aligned since we are returning a pfn_t...

Hmm, perhaps the input should be raw page frame number. We could
reduce one of the arguments by making the current 'pfn_t *' parameter
an in/out-parameter.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
  2017-01-30 12:26   ` Christoph Hellwig
@ 2017-01-30 18:29       ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-30 18:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mike Snitzer, Matthew Wilcox, linux-nvdimm, linux-block, linux-fsdevel

On Mon, Jan 30, 2017 at 4:26 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Sat, Jan 28, 2017 at 12:37:14AM -0800, Dan Williams wrote:
>> This is in preparation for removing the ->direct_access() method from
>> block_device_operations.
>
> I don't think mount_bdev has any business knowing about DAX.
> Just call dax_get_by_host manually from the affected file systems for
> now, and in the future we can have a pure-DAX mount_dax helper.

Ok, since we already need dax_get_by_host() in the blkdev_writepages()
path I can sprinkle a few more of those calls and leave mount_bdev
alone.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
@ 2017-01-30 18:29       ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-01-30 18:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvdimm, Mike Snitzer, Toshi Kani, Matthew Wilcox,
	linux-block, jmoyer, linux-fsdevel, Ross Zwisler

On Mon, Jan 30, 2017 at 4:26 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Sat, Jan 28, 2017 at 12:37:14AM -0800, Dan Williams wrote:
>> This is in preparation for removing the ->direct_access() method from
>> block_device_operations.
>
> I don't think mount_bdev has any business knowing about DAX.
> Just call dax_get_by_host manually from the affected file systems for
> now, and in the future we can have a pure-DAX mount_dax helper.

Ok, since we already need dax_get_by_host() in the blkdev_writepages()
path I can sprinkle a few more of those calls and leave mount_bdev
alone.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
  2017-01-30 18:29       ` Dan Williams
@ 2017-02-01  8:08         ` Christoph Hellwig
  -1 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2017-02-01  8:08 UTC (permalink / raw)
  To: Dan Williams
  Cc: Mike Snitzer, Matthew Wilcox, linux-nvdimm, linux-block,
	linux-fsdevel, Christoph Hellwig

On Mon, Jan 30, 2017 at 10:29:12AM -0800, Dan Williams wrote:
> On Mon, Jan 30, 2017 at 4:26 AM, Christoph Hellwig <hch@lst.de> wrote:
> > On Sat, Jan 28, 2017 at 12:37:14AM -0800, Dan Williams wrote:
> >> This is in preparation for removing the ->direct_access() method from
> >> block_device_operations.
> >
> > I don't think mount_bdev has any business knowing about DAX.
> > Just call dax_get_by_host manually from the affected file systems for
> > now, and in the future we can have a pure-DAX mount_dax helper.
> 
> Ok, since we already need dax_get_by_host() in the blkdev_writepages()
> path I can sprinkle a few more of those calls and leave mount_bdev
> alone.

Huh?  I thought we stopped using DAX I/O for the block device nodes
a while ago?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
@ 2017-02-01  8:08         ` Christoph Hellwig
  0 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2017-02-01  8:08 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, linux-nvdimm, Mike Snitzer, Toshi Kani,
	Matthew Wilcox, linux-block, jmoyer, linux-fsdevel, Ross Zwisler

On Mon, Jan 30, 2017 at 10:29:12AM -0800, Dan Williams wrote:
> On Mon, Jan 30, 2017 at 4:26 AM, Christoph Hellwig <hch@lst.de> wrote:
> > On Sat, Jan 28, 2017 at 12:37:14AM -0800, Dan Williams wrote:
> >> This is in preparation for removing the ->direct_access() method from
> >> block_device_operations.
> >
> > I don't think mount_bdev has any business knowing about DAX.
> > Just call dax_get_by_host manually from the affected file systems for
> > now, and in the future we can have a pure-DAX mount_dax helper.
> 
> Ok, since we already need dax_get_by_host() in the blkdev_writepages()
> path I can sprinkle a few more of those calls and leave mount_bdev
> alone.

Huh?  I thought we stopped using DAX I/O for the block device nodes
a while ago?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
  2017-01-30 18:16       ` Dan Williams
@ 2017-02-01  8:10         ` Christoph Hellwig
  -1 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2017-02-01  8:10 UTC (permalink / raw)
  To: Dan Williams
  Cc: Mike Snitzer, Matthew Wilcox, linux-nvdimm, linux-block,
	linux-fsdevel, Christoph Hellwig

On Mon, Jan 30, 2017 at 10:16:29AM -0800, Dan Williams wrote:
> Ok, now that dax_map_atomic() is gone, it's much easier to remove
> struct blk_dax_ctl.
> 
> We can also move the partition alignment checks to be a one-time check
> at bdev_dax_capable() time and kill bdev_dax_direct_access() in favor
> of calling dax_direct_access() directly.

Yes, please.

> >> +     if ((sector + DIV_ROUND_UP(dax->size, 512))
> >> +                     > part_nr_sects_read(bdev->bd_part))
> >> +             return -ERANGE;
> >> +     sector += get_start_sect(bdev);
> >> +     return dax_direct_access(dax_inode, sector * 512, &dax->addr,
> >> +                     &dax->pfn, dax->size);
> >
> > And please switch to using bytes as the granularity given that we're
> > deadling with byte addressable memory.
> 
> dax_direct_access() does take a byte aligned physical address, but it
> needs to be at least page aligned since we are returning a pfn_t...
> 
> Hmm, perhaps the input should be raw page frame number. We could
> reduce one of the arguments by making the current 'pfn_t *' parameter
> an in/out-parameter.

In/Out parameters are always a bit problematic in terms of API clarity.
And updating a device-relative address with an absolute physical one
sounds like an odd API for sure.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
@ 2017-02-01  8:10         ` Christoph Hellwig
  0 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2017-02-01  8:10 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, linux-nvdimm, Mike Snitzer, Toshi Kani,
	Matthew Wilcox, linux-block, jmoyer, linux-fsdevel, Ross Zwisler

On Mon, Jan 30, 2017 at 10:16:29AM -0800, Dan Williams wrote:
> Ok, now that dax_map_atomic() is gone, it's much easier to remove
> struct blk_dax_ctl.
> 
> We can also move the partition alignment checks to be a one-time check
> at bdev_dax_capable() time and kill bdev_dax_direct_access() in favor
> of calling dax_direct_access() directly.

Yes, please.

> >> +     if ((sector + DIV_ROUND_UP(dax->size, 512))
> >> +                     > part_nr_sects_read(bdev->bd_part))
> >> +             return -ERANGE;
> >> +     sector += get_start_sect(bdev);
> >> +     return dax_direct_access(dax_inode, sector * 512, &dax->addr,
> >> +                     &dax->pfn, dax->size);
> >
> > And please switch to using bytes as the granularity given that we're
> > deadling with byte addressable memory.
> 
> dax_direct_access() does take a byte aligned physical address, but it
> needs to be at least page aligned since we are returning a pfn_t...
> 
> Hmm, perhaps the input should be raw page frame number. We could
> reduce one of the arguments by making the current 'pfn_t *' parameter
> an in/out-parameter.

In/Out parameters are always a bit problematic in terms of API clarity.
And updating a device-relative address with an absolute physical one
sounds like an odd API for sure.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
  2017-02-01  8:08         ` Christoph Hellwig
@ 2017-02-01  9:16           ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-02-01  9:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mike Snitzer, Matthew Wilcox, linux-nvdimm, linux-block, linux-fsdevel

On Wed, Feb 1, 2017 at 12:08 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Jan 30, 2017 at 10:29:12AM -0800, Dan Williams wrote:
>> On Mon, Jan 30, 2017 at 4:26 AM, Christoph Hellwig <hch@lst.de> wrote:
>> > On Sat, Jan 28, 2017 at 12:37:14AM -0800, Dan Williams wrote:
>> >> This is in preparation for removing the ->direct_access() method from
>> >> block_device_operations.
>> >
>> > I don't think mount_bdev has any business knowing about DAX.
>> > Just call dax_get_by_host manually from the affected file systems for
>> > now, and in the future we can have a pure-DAX mount_dax helper.
>>
>> Ok, since we already need dax_get_by_host() in the blkdev_writepages()
>> path I can sprinkle a few more of those calls and leave mount_bdev
>> alone.
>
> Huh?  I thought we stopped using DAX I/O for the block device nodes
> a while ago?

Oh, yeah, you're right. The blkdev_writepages() call to
dax_writeback_mapping_range() is likely leftover dead code. I'll clean
it up.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure
@ 2017-02-01  9:16           ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-02-01  9:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvdimm, Mike Snitzer, Toshi Kani, Matthew Wilcox,
	linux-block, jmoyer, linux-fsdevel, Ross Zwisler

On Wed, Feb 1, 2017 at 12:08 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Jan 30, 2017 at 10:29:12AM -0800, Dan Williams wrote:
>> On Mon, Jan 30, 2017 at 4:26 AM, Christoph Hellwig <hch@lst.de> wrote:
>> > On Sat, Jan 28, 2017 at 12:37:14AM -0800, Dan Williams wrote:
>> >> This is in preparation for removing the ->direct_access() method from
>> >> block_device_operations.
>> >
>> > I don't think mount_bdev has any business knowing about DAX.
>> > Just call dax_get_by_host manually from the affected file systems for
>> > now, and in the future we can have a pure-DAX mount_dax helper.
>>
>> Ok, since we already need dax_get_by_host() in the blkdev_writepages()
>> path I can sprinkle a few more of those calls and leave mount_bdev
>> alone.
>
> Huh?  I thought we stopped using DAX I/O for the block device nodes
> a while ago?

Oh, yeah, you're right. The blkdev_writepages() call to
dax_writeback_mapping_range() is likely leftover dead code. I'll clean
it up.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
  2017-02-01  8:10         ` Christoph Hellwig
@ 2017-02-01  9:21           ` Dan Williams
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-02-01  9:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mike Snitzer, Matthew Wilcox, linux-nvdimm, linux-block, linux-fsdevel

On Wed, Feb 1, 2017 at 12:10 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Jan 30, 2017 at 10:16:29AM -0800, Dan Williams wrote:
>> Ok, now that dax_map_atomic() is gone, it's much easier to remove
>> struct blk_dax_ctl.
>>
>> We can also move the partition alignment checks to be a one-time check
>> at bdev_dax_capable() time and kill bdev_dax_direct_access() in favor
>> of calling dax_direct_access() directly.
>
> Yes, please.
>
>> >> +     if ((sector + DIV_ROUND_UP(dax->size, 512))
>> >> +                     > part_nr_sects_read(bdev->bd_part))
>> >> +             return -ERANGE;
>> >> +     sector += get_start_sect(bdev);
>> >> +     return dax_direct_access(dax_inode, sector * 512, &dax->addr,
>> >> +                     &dax->pfn, dax->size);
>> >
>> > And please switch to using bytes as the granularity given that we're
>> > deadling with byte addressable memory.
>>
>> dax_direct_access() does take a byte aligned physical address, but it
>> needs to be at least page aligned since we are returning a pfn_t...
>>
>> Hmm, perhaps the input should be raw page frame number. We could
>> reduce one of the arguments by making the current 'pfn_t *' parameter
>> an in/out-parameter.
>
> In/Out parameters are always a bit problematic in terms of API clarity.
> And updating a device-relative address with an absolute physical one
> sounds like an odd API for sure.

Yes, it does, and I thought better of it shortly after sending that. How about:

long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
unsigned long nr_pages, void **kaddr, pfn_t *pfn)
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
@ 2017-02-01  9:21           ` Dan Williams
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Williams @ 2017-02-01  9:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvdimm, Mike Snitzer, Toshi Kani, Matthew Wilcox,
	linux-block, jmoyer, linux-fsdevel, Ross Zwisler

On Wed, Feb 1, 2017 at 12:10 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Jan 30, 2017 at 10:16:29AM -0800, Dan Williams wrote:
>> Ok, now that dax_map_atomic() is gone, it's much easier to remove
>> struct blk_dax_ctl.
>>
>> We can also move the partition alignment checks to be a one-time check
>> at bdev_dax_capable() time and kill bdev_dax_direct_access() in favor
>> of calling dax_direct_access() directly.
>
> Yes, please.
>
>> >> +     if ((sector + DIV_ROUND_UP(dax->size, 512))
>> >> +                     > part_nr_sects_read(bdev->bd_part))
>> >> +             return -ERANGE;
>> >> +     sector += get_start_sect(bdev);
>> >> +     return dax_direct_access(dax_inode, sector * 512, &dax->addr,
>> >> +                     &dax->pfn, dax->size);
>> >
>> > And please switch to using bytes as the granularity given that we're
>> > deadling with byte addressable memory.
>>
>> dax_direct_access() does take a byte aligned physical address, but it
>> needs to be at least page aligned since we are returning a pfn_t...
>>
>> Hmm, perhaps the input should be raw page frame number. We could
>> reduce one of the arguments by making the current 'pfn_t *' parameter
>> an in/out-parameter.
>
> In/Out parameters are always a bit problematic in terms of API clarity.
> And updating a device-relative address with an absolute physical one
> sounds like an odd API for sure.

Yes, it does, and I thought better of it shortly after sending that. How about:

long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
unsigned long nr_pages, void **kaddr, pfn_t *pfn)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
  2017-02-01  9:21           ` Dan Williams
@ 2017-02-01  9:28             ` Christoph Hellwig
  -1 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2017-02-01  9:28 UTC (permalink / raw)
  To: Dan Williams
  Cc: Mike Snitzer, Matthew Wilcox, linux-nvdimm, linux-block,
	linux-fsdevel, Christoph Hellwig

On Wed, Feb 01, 2017 at 01:21:40AM -0800, Dan Williams wrote:
> > In/Out parameters are always a bit problematic in terms of API clarity.
> > And updating a device-relative address with an absolute physical one
> > sounds like an odd API for sure.
> 
> Yes, it does, and I thought better of it shortly after sending that. How about:
> 
> long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
> unsigned long nr_pages, void **kaddr, pfn_t *pfn)

Yes, that looks good to me.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/17] block: introduce bdev_dax_direct_access()
@ 2017-02-01  9:28             ` Christoph Hellwig
  0 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2017-02-01  9:28 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, linux-nvdimm, Mike Snitzer, Toshi Kani,
	Matthew Wilcox, linux-block, jmoyer, linux-fsdevel, Ross Zwisler

On Wed, Feb 01, 2017 at 01:21:40AM -0800, Dan Williams wrote:
> > In/Out parameters are always a bit problematic in terms of API clarity.
> > And updating a device-relative address with an absolute physical one
> > sounds like an odd API for sure.
> 
> Yes, it does, and I thought better of it shortly after sending that. How about:
> 
> long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
> unsigned long nr_pages, void **kaddr, pfn_t *pfn)

Yes, that looks good to me.

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2017-02-01  9:28 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-28  8:36 [RFC PATCH 00/17] introduce a dax_inode for dax_operations Dan Williams
2017-01-28  8:36 ` Dan Williams
2017-01-28  8:36 ` [RFC PATCH 01/17] dax: refactor dax-fs into a generic provider of dax inodes Dan Williams
2017-01-28  8:36   ` Dan Williams
2017-01-30 12:28   ` Christoph Hellwig
2017-01-30 17:12     ` Dan Williams
2017-01-30 17:12       ` Dan Williams
2017-01-28  8:36 ` [RFC PATCH 02/17] dax: convert dax_inode locking to srcu Dan Williams
2017-01-28  8:36   ` Dan Williams
2017-01-28  8:36 ` [RFC PATCH 03/17] dax: add a facility to lookup a dax inode by 'host' device name Dan Williams
2017-01-28  8:36   ` Dan Williams
2017-01-28  8:36 ` [RFC PATCH 04/17] dax: introduce dax_operations Dan Williams
2017-01-28  8:36   ` Dan Williams
2017-01-28  8:36 ` [RFC PATCH 05/17] pmem: add dax_operations support Dan Williams
2017-01-28  8:36   ` Dan Williams
2017-01-28  8:36 ` [RFC PATCH 06/17] axon_ram: " Dan Williams
2017-01-28  8:36   ` Dan Williams
2017-01-28  8:36 ` [RFC PATCH 07/17] brd: " Dan Williams
2017-01-28  8:36   ` Dan Williams
2017-01-28  8:36 ` [RFC PATCH 08/17] dcssblk: " Dan Williams
2017-01-28  8:36   ` Dan Williams
2017-01-28  8:36 ` [RFC PATCH 09/17] block: kill bdev_dax_capable() Dan Williams
2017-01-28  8:36   ` Dan Williams
2017-01-28  8:36 ` [RFC PATCH 10/17] block: introduce bdev_dax_direct_access() Dan Williams
2017-01-28  8:36   ` Dan Williams
2017-01-30 12:32   ` Christoph Hellwig
2017-01-30 18:16     ` Dan Williams
2017-01-30 18:16       ` Dan Williams
2017-02-01  8:10       ` Christoph Hellwig
2017-02-01  8:10         ` Christoph Hellwig
2017-02-01  9:21         ` Dan Williams
2017-02-01  9:21           ` Dan Williams
2017-02-01  9:28           ` Christoph Hellwig
2017-02-01  9:28             ` Christoph Hellwig
2017-01-28  8:37 ` [RFC PATCH 11/17] dm: add dax_operations support (producer) Dan Williams
2017-01-28  8:37   ` Dan Williams
2017-01-28  8:37 ` [RFC PATCH 12/17] dm: add dax_operations support (consumer) Dan Williams
2017-01-28  8:37   ` Dan Williams
2017-01-28  8:37 ` [RFC PATCH 13/17] fs: update mount_bdev() to lookup dax infrastructure Dan Williams
2017-01-28  8:37   ` Dan Williams
2017-01-30 12:26   ` Christoph Hellwig
2017-01-30 18:29     ` Dan Williams
2017-01-30 18:29       ` Dan Williams
2017-02-01  8:08       ` Christoph Hellwig
2017-02-01  8:08         ` Christoph Hellwig
2017-02-01  9:16         ` Dan Williams
2017-02-01  9:16           ` Dan Williams
2017-01-28  8:37 ` [RFC PATCH 14/17] ext2, ext4, xfs: retrieve dax_inode through iomap operations Dan Williams
2017-01-28  8:37   ` Dan Williams
2017-01-28  8:37 ` [RFC PATCH 15/17] Revert "block: use DAX for partition table reads" Dan Williams
2017-01-28  8:37   ` Dan Williams
2017-01-28  8:37 ` [RFC PATCH 16/17] fs, dax: convert filesystem-dax to bdev_dax_direct_access Dan Williams
2017-01-28  8:37   ` Dan Williams
2017-01-28  8:37 ` [RFC PATCH 17/17] block: remove block_device_operations.direct_access and related infrastructure Dan Williams
2017-01-28  8:37   ` Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.