linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v12 00/12] bcache: support NVDIMM for journaling
@ 2021-08-11 17:02 Coly Li
  2021-08-11 17:02 ` [PATCH v12 01/12] bcache: add initial data structures for nvm pages Coly Li
                   ` (12 more replies)
  0 siblings, 13 replies; 16+ messages in thread
From: Coly Li @ 2021-08-11 17:02 UTC (permalink / raw)
  To: linux-bcache
  Cc: linux-block, linux-nvdimm, axboe, hare, jack, dan.j.williams,
	hch, ying.huang, Coly Li

This is the v12 effort for supporting NVDIMM for bcache journal (some
versions may not posted with version numbers).

The major change of this version is the full pointer of on-media data
structure is replaced by per-namespace offset. Now a pointer address is
calculated by namespace base mapping address + per-namespace offset.
The code logic is same as previous version, all changes are only related
to the base+offset style pointer replacement.

The nvm-pages allocator is a buddy-like allocator, which allocates size
in power-of-2 pages from the NVDIMM namespace. User space tool 'bcache'
has a new added '-M' option to format a NVDIMM namespace and register it
via sysfs interface as a bcache meta device. The nvm-pages kernel code
does a DAX mapping to map the whole namespace into system's memory
address range, and allocating the pages to requestion like typical buddy
allocator does. The major difference is nvm-pages allocator maintains
the pages allocated to each requester by an allocation list which stored
on NVDIMM too. Allocation list of different requester is tracked by a
pre-defined UUID, all the pages tracked in all allocation lists are
treated as allocated busy pages and won't be initialized into buddy
system after the system reboot.

The bcache journal code may request a block of power-of-2 size pages
from the nvm-pages allocator, normally it is a range of 256MB or 512MB
continuous pages range. During meta data journaling, the in-memory jsets
go into the calculated nvdimm pages location by kernel memcpy routine.
So the journaling I/Os won't go into block device (e.g. SSD) anymore, 
the write and read for journal jsets happen on NVDIMM.

Intel developers Jianpeng Ma and Qiaowei Ren compose the initial code of
nvm-pages, the related patches are,
- bcache: initialize the nvm pages allocator
- bcache: initialization of the buddy
- bcache: bch_nvm_alloc_pages() of the buddy
- bcache: bch_nvm_free_pages() of the buddy
- bcache: get recs list head for allocated pages by specific uuid
All the code depends on Linux libnvdimm and dax drivers, the bcache nvm-
pages allocator can be treated as user of these two drivers.

I modify the bcache code to recognize the nvm meta device feature,
initialize journal on NVDIMM, and do journal I/Os on NVDIMM in the
following patches,
- bcache: add initial data structures for nvm pages
- bcache: use bucket index to set GC_MARK_METADATA for journal buckets
  in bch_btree_gc_finish()
- bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
- bcache: initialize bcache journal for NVDIMM meta device
- bcache: support storing bcache journal into NVDIMM meta device
- bcache: read jset from NVDIMM pages for journal replay
- bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
  meta device

In this series, all previously addressed issue via code reviews are all
fixed. And all known issue during testing are fixed. The code survives
from 24+ hours smoking and I/O pressure testing among many reboots, it
works well as expected.

All the code is EXPERIMENTAL, they won't be enabled by default until we
feel the NVDIMM support is completed and stable.

Although there are some experts helped to review the code logic, but we
do appreciate if more people may help to review the code. It is quite
common that bcache patches don't have enough code reviewer, but this
time I do need help for more review or comments on this series.

Thanks in advance.

Coly Li
---

Coly Li (7):
  bcache: add initial data structures for nvm pages
  bcache: use bucket index to set GC_MARK_METADATA for journal buckets
    in bch_btree_gc_finish()
  bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
  bcache: initialize bcache journal for NVDIMM meta device
  bcache: support storing bcache journal into NVDIMM meta device
  bcache: read jset from NVDIMM pages for journal replay
  bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
    meta device

Jianpeng Ma (5):
  bcache: initialize the nvm pages allocator
  bcache: initialization of the buddy
  bcache: bch_nvmpg_alloc_pages() of the buddy
  bcache: bch_nvmpg_free_pages() of the buddy allocator
  bcache: get recs list head for allocated pages by specific uuid

 drivers/md/bcache/Kconfig       |  10 +
 drivers/md/bcache/Makefile      |   1 +
 drivers/md/bcache/btree.c       |   6 +-
 drivers/md/bcache/features.h    |   9 +
 drivers/md/bcache/journal.c     | 325 +++++++++--
 drivers/md/bcache/journal.h     |   2 +-
 drivers/md/bcache/nvm-pages.c   | 931 ++++++++++++++++++++++++++++++++
 drivers/md/bcache/nvm-pages.h   | 127 +++++
 drivers/md/bcache/super.c       |  53 +-
 include/uapi/linux/bcache-nvm.h | 253 +++++++++
 10 files changed, 1649 insertions(+), 68 deletions(-)
 create mode 100644 drivers/md/bcache/nvm-pages.c
 create mode 100644 drivers/md/bcache/nvm-pages.h
 create mode 100644 include/uapi/linux/bcache-nvm.h

-- 
2.26.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v12 01/12] bcache: add initial data structures for nvm pages
  2021-08-11 17:02 [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li
@ 2021-08-11 17:02 ` Coly Li
  2021-08-11 17:02 ` [PATCH v12 02/12] bcache: initialize the nvm pages allocator Coly Li
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Coly Li @ 2021-08-11 17:02 UTC (permalink / raw)
  To: linux-bcache
  Cc: linux-block, linux-nvdimm, axboe, hare, jack, dan.j.williams,
	hch, ying.huang, Coly Li, Hannes Reinecke, Jianpeng Ma,
	Qiaowei Ren

This patch initializes the prototype data structures for nvm pages
allocator,

- struct bch_nvmpg_sb
  This is the super block allocated on each nvdimm namespace for the nvm
pages allocator. A nvdimm pages allocator set may have multiple name-
spaces, bch_nvmpg_sb->set_uuid is used to mark which nvdimm set this
namespace belongs to.

- struct bch_nvmpg_header
  This is a table for all heads of all allocation record lists. An allo-
cation record list traces all page(s) allocated from nvdimm namespace(s)
to a specific requester (identified by uuid). After system reboot, a
requester can retrieve all previously allocated nvdimm pages from its
record list by a pre-defined uuid.

- struct bch_nvmpg_head
  This is a head of an allocation record list. Each nvdimm pages
requester (typically it's a driver) has and only has one allocation
record list, and an allocated nvdimm page only belongs to a specific
allocation record list. Member uuid[] will be set as the requester's
uuid, e.g. for bcache it is the cache set uuid. Member label is not
mandatory, it is a human-readable string for debug purpose. The nvm
offset format pointers recs_offset[] point to the location of actual
allocator record lists on each namespace of the nvdimm pages allocator
set. Each per namespace record list is represented by the following
struct bch_nvmpg_recs.

- struct bch_nvmpg_recs
  This structure represents a requester's allocation record list. Member
uuid is same value as the uuid of its corresponding struct
bch_nvmpg_head. Member recs[] is a table of struct bch_pgalloc_rec
objects to trace all allocated nvmdimm pages. If the table recs[] is
full, the nvmpg format offset is a pointer points to the next struct
bch_nvmpg_recs object, nvm pages allocator will look for available free
allocation record there. All the linked struct bch_nvmpg_recs objects
compose a requester's alloction record list which is headed by the above
struct bch_nvmpg_head.

- struct bch_nvmpg_recs
  This structure records a range of allocated nvdimm pages. Member pgoff
is offset in unit of page size of this allocation range. Member order
indicates size of the allocation range by (1 << order) in unit of page
size. Because the nvdimm pages allocator set may have multiple nvdimm
namespaces, member ns_id is used to identify which namespace the pgoff
belongs to.
  - Bits  0 - 51: pgoff - is pages offset of the allocated pages.
  - Bits 52 - 57: order - allocaed size in page_size * order-of-2
  - Bits 58 - 60: ns_id - identify which namespace the pages stays on
  - Bits 61 - 63: reserved.
Since each of the allocated nvm pages are power of 2, using 6 bits to
represent allocated size can have (1<<(1<<64) - 1) * PAGE_SIZE maximum
value. It can be a 76 bits width range size in byte for 4KB page size,
which is large enough currently.

All the structure members having _offset suffix are in a special fomat.
E.g. bch_nvmpg_sb.{sb_offset, pages_offset, set_header_offset},
bch_nvmpg_head.recs_offset, bch_nvmpg_recs.{head_offset, next_offset},
the offset value is 64bit, the most significant 3 bits are used to
identify which namespace this offset belongs to, and the rested 61 bits
are actual offset inside the namespace. Following patches will have
helper routines to do the conversion between memory pointer and offset.

Signed-off-by: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Ying Huang <ying.huang@intel.com>
---
 include/uapi/linux/bcache-nvm.h | 253 ++++++++++++++++++++++++++++++++
 1 file changed, 253 insertions(+)
 create mode 100644 include/uapi/linux/bcache-nvm.h

diff --git a/include/uapi/linux/bcache-nvm.h b/include/uapi/linux/bcache-nvm.h
new file mode 100644
index 000000000000..0e1082bb88ee
--- /dev/null
+++ b/include/uapi/linux/bcache-nvm.h
@@ -0,0 +1,253 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef _UAPI_BCACHE_NVM_H
+#define _UAPI_BCACHE_NVM_H
+
+/*
+ * Bcache on NVDIMM data structures
+ */
+
+/*
+ * - struct bch_nvmpg_sb
+ *   This is the super block allocated on each nvdimm namespace for the nvm
+ * pages allocator. A nvdimm pages allocator set may have multiple namespaces,
+ * bch_nvmpg_sb->set_uuid is used to mark which nvdimm set this name space
+ * belongs to.
+ *
+ * - struct bch_nvmpg_header
+ *   This is a table for all heads of all allocation record lists. An allo-
+ * cation record list traces all page(s) allocated from nvdimm namespace(s) to
+ * a specific requester (identified by uuid). After system reboot, a requester
+ * can retrieve all previously allocated nvdimm pages from its record list by a
+ * pre-defined uuid.
+ *
+ * - struct bch_nvmpg_head
+ *   This is a head of an allocation record list. Each nvdimm pages requester
+ * (typically it's a driver) has and only has one allocation record list, and
+ * an allocated nvdimm page only bedlones to a specific allocation record list.
+ * Member uuid[] will be set as the requester's uuid, e.g. for bcache it is the
+ * cache set uuid. Member label is not mandatory, it is a human-readable string
+ * for debug purpose. The nvm offset format pointers recs_offset[] point to the
+ * location of actual allocator record lists on each name space of the nvdimm
+ * pages allocator set. Each per name space record list is represented by the
+ * following struct bch_nvmpg_recs.
+ *
+ * - struct bch_nvmpg_recs
+ *   This structure represents a requester's allocation record list. Member uuid
+ * is same value as the uuid of its corresponding struct bch_nvmpg_head. Member
+ * recs[] is a table of struct bch_pgalloc_rec objects to trace all allocated
+ * nvmdimm pages. If the table recs[] is full, the nvmpg format offset is a
+ * pointer points to the next struct bch_nvmpg_recs object, nvm pages allocator
+ * will look for available free allocation record there. All the linked
+ * struct bch_nvmpg_recs objects compose a requester's alloction record list
+ * which is headed by the above struct bch_nvmpg_head.
+ *
+ * - struct bch_nvmpg_rec
+ *   This structure records a range of allocated nvdimm pages. Member pgoff is
+ * offset in unit of page size of this allocation range. Member order indicates
+ * size of the allocation range by (1 << order) in unit of page size. Because
+ * the nvdimm pages allocator set may have multiple nvdimm name spaces, member
+ * ns_id is used to identify which name space the pgoff belongs to.
+ *
+ * All allocation record lists are stored on the first initialized nvdimm name-
+ * space (ns_id 0). The meta data default layout of nvm pages allocator on
+ * namespace 0 is,
+ *
+ *    0 +---------------------------------+
+ *      |                                 |
+ *  4KB +---------------------------------+ <-- BCH_NVMPG_SB_OFFSET
+ *      |          bch_nvmpg_sb           |
+ *  8KB +---------------------------------+ <-- BCH_NVMPG_RECLIST_HEAD_OFFSET
+ *      |        bch_nvmpg_header         |
+ *      |                                 |
+ * 16KB +---------------------------------+ <-- BCH_NVMPG_SYSRECS_OFFSET
+ *      |         bch_nvmpg_recs          |
+ *      |  (nvm pages internal usage)     |
+ * 24KB +---------------------------------+
+ *      |                                 |
+ *      |                                 |
+ * 16MB +---------------------------------+ <-- BCH_NVMPG_START
+ *      |      allocable nvm pages        |
+ *      |      for buddy allocator        |
+ * end  +---------------------------------+
+ *
+ *
+ *
+ * Meta data default layout on rested nvdimm namespaces,
+ *
+ *    0 +---------------------------------+
+ *      |                                 |
+ *  4KB +---------------------------------+ <-- BCH_NVMPG_SB_OFFSET
+ *      |          bch_nvmpg_sb           |
+ *  8KB +---------------------------------+
+ *      |                                 |
+ *      |                                 |
+ *      |                                 |
+ *      |                                 |
+ *      |                                 |
+ *      |                                 |
+ * 16MB +---------------------------------+ <-- BCH_NVMPG_START
+ *      |      allocable nvm pages        |
+ *      |      for buddy allocator        |
+ * end  +---------------------------------+
+ *
+ *
+ * - The nvmpg offset format pointer
+ *   All member names ending with _offset in this header are nvmpg offset
+ * format pointer. The offset format is,
+ *       [highest 3 bits: ns_id]
+ *       [rested 61 bits: offset in No. ns_id namespace]
+ *
+ * The above offset is byte unit, the procedure to reference a nvmpg offset
+ * format pointer is,
+ * 1) Identify the namespace related in-memory structure by ns_id from the
+ *    highest 3 bits of offset value.
+ * 2) Get the DAX mapping base address from the in-memory structure.
+ * 3) Calculate the actual memory address on nvdimm by plusing the DAX base
+ *    address with offset value in rested low 61 bits.
+ * All related in-memory structure and conversion routines don't belong to
+ * user space api, they are defined by nvm-pages allocator code in
+ * drivers/md/bcache/nvm-pages.{c,h}
+ *
+ */
+
+#include <linux/types.h>
+
+/* In sectors */
+#define BCH_NVMPG_SB_OFFSET		4096
+#define BCH_NVMPG_START			(16 << 20)
+
+#define BCH_NVMPG_LBL_SIZE		32
+#define BCH_NVMPG_NS_MAX		8
+
+#define BCH_NVMPG_RECLIST_HEAD_OFFSET	(8<<10)
+#define BCH_NVMPG_SYSRECS_OFFSET	(16<<10)
+
+#define BCH_NVMPG_SB_VERSION		0
+#define BCH_NVMPG_SB_VERSION_MAX	0
+
+static const __u8 bch_nvmpg_magic[] = {
+	0x17, 0xbd, 0x53, 0x7f, 0x1b, 0x23, 0xd6, 0x83,
+	0x46, 0xa4, 0xf8, 0x28, 0x17, 0xda, 0xec, 0xa9 };
+static const __u8 bch_nvmpg_recs_magic[] = {
+	0x39, 0x25, 0x3f, 0xf7, 0x27, 0x17, 0xd0, 0xb9,
+	0x10, 0xe6, 0xd2, 0xda, 0x38, 0x68, 0x26, 0xae };
+
+/* takes 64bit width */
+struct bch_nvmpg_rec {
+	union {
+		struct {
+			__u64	pgoff:52;
+			__u64	order:6;
+			__u64	ns_id:3;
+			__u64	reserved:3;
+		};
+		__u64	_v;
+	};
+};
+
+struct bch_nvmpg_recs {
+	union {
+		struct {
+			/*
+			 * A nvmpg offset format pointer to
+			 * struct bch_nvmpg_head
+			 */
+			__u64			head_offset;
+			/*
+			 * A nvmpg offset format pointer to
+			 * struct bch_nvm_pgalloc_recs which contains
+			 * the next recs[] array.
+			 */
+			__u64			next_offset;
+			__u8			magic[16];
+			__u8			uuid[16];
+			__u32			size;
+			__u32			used;
+			__u64			_pad[4];
+			struct bch_nvmpg_rec	recs[];
+		};
+		__u8				pad[8192];
+	};
+};
+
+#define BCH_NVMPG_MAX_RECS				\
+	((sizeof(struct bch_nvmpg_recs) -		\
+	  offsetof(struct bch_nvmpg_recs, recs)) /	\
+	 sizeof(struct bch_nvmpg_rec))
+
+#define BCH_NVMPG_HD_STAT_FREE		0x0
+#define BCH_NVMPG_HD_STAT_ALLOC		0x1
+struct bch_nvmpg_head {
+	__u8		uuid[16];
+	__u8		label[BCH_NVMPG_LBL_SIZE];
+	__u32		state;
+	__u32		flags;
+	/*
+	 * Array of offset values from the nvmpg offset format
+	 * pointers, each of the pointer points to a per-namespace
+	 * struct bch_nvmpg_recs.
+	 */
+	__u64		recs_offset[BCH_NVMPG_NS_MAX];
+};
+
+/* heads[0] is always for nvm_pages internal usage */
+struct bch_nvmpg_set_header {
+	union {
+		struct {
+			__u32			size;
+			__u32			used;
+			__u64			_pad[4];
+			struct bch_nvmpg_head	heads[];
+		};
+		__u8				pad[8192];
+	};
+};
+
+#define BCH_NVMPG_MAX_HEADS					\
+	((sizeof(struct bch_nvmpg_set_header) -			\
+	  offsetof(struct bch_nvmpg_set_header, heads)) /	\
+	 sizeof(struct bch_nvmpg_head))
+
+/* The on-media bit order is local CPU order */
+struct bch_nvmpg_sb {
+	__u64			csum;
+	__u64			sb_offset;
+	__u64			ns_start;
+	__u64			version;
+	__u8			magic[16];
+	__u8			uuid[16];
+	__u32			page_size;
+	__u32			total_ns;
+	__u32			this_ns;
+	union {
+		__u8		set_uuid[16];
+		__u64		set_magic;
+	};
+
+	__u64			flags;
+	__u64			seq;
+
+	__u64			feature_compat;
+	__u64			feature_incompat;
+	__u64			feature_ro_compat;
+
+	/* For allocable nvm pages from buddy systems */
+	__u64			pages_offset;
+	__u64			pages_total;
+
+	__u64			pad[8];
+
+	/*
+	 * A nvmpg offset format pointer, it points
+	 * to struct bch_nvmpg_set_header which is
+	 * stored only on the first name space.
+	 */
+	__u64			set_header_offset;
+
+	/* Just for csum_set() */
+	__u32			keys;
+	__u64			d[0];
+};
+
+#endif /* _UAPI_BCACHE_NVM_H */
-- 
2.26.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v12 02/12] bcache: initialize the nvm pages allocator
  2021-08-11 17:02 [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li
  2021-08-11 17:02 ` [PATCH v12 01/12] bcache: add initial data structures for nvm pages Coly Li
@ 2021-08-11 17:02 ` Coly Li
  2021-08-12  5:43   ` Dan Williams
  2021-08-11 17:02 ` [PATCH v12 03/12] bcache: initialization of the buddy Coly Li
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 16+ messages in thread
From: Coly Li @ 2021-08-11 17:02 UTC (permalink / raw)
  To: linux-bcache
  Cc: linux-block, linux-nvdimm, axboe, hare, jack, dan.j.williams,
	hch, ying.huang, Jianpeng Ma, Randy Dunlap, Qiaowei Ren,
	Hannes Reinecke

From: Jianpeng Ma <jianpeng.ma@intel.com>

This patch define the prototype data structures in memory and
initializes the nvm pages allocator.

The nvm address space which is managed by this allocator can consist of
many nvm namespaces, and some namespaces can compose into one nvm set,
like cache set. For this initial implementation, only one set can be
supported.

The users of this nvm pages allocator need to call register_namespace()
to register the nvdimm device (like /dev/pmemX) into this allocator as
the instance of struct nvm_namespace.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
---
 drivers/md/bcache/Kconfig     |  10 +
 drivers/md/bcache/Makefile    |   1 +
 drivers/md/bcache/nvm-pages.c | 339 ++++++++++++++++++++++++++++++++++
 drivers/md/bcache/nvm-pages.h |  96 ++++++++++
 drivers/md/bcache/super.c     |   3 +
 5 files changed, 449 insertions(+)
 create mode 100644 drivers/md/bcache/nvm-pages.c
 create mode 100644 drivers/md/bcache/nvm-pages.h

diff --git a/drivers/md/bcache/Kconfig b/drivers/md/bcache/Kconfig
index d1ca4d059c20..a69f6c0e0507 100644
--- a/drivers/md/bcache/Kconfig
+++ b/drivers/md/bcache/Kconfig
@@ -35,3 +35,13 @@ config BCACHE_ASYNC_REGISTRATION
 	device path into this file will returns immediately and the real
 	registration work is handled in kernel work queue in asynchronous
 	way.
+
+config BCACHE_NVM_PAGES
+	bool "NVDIMM support for bcache (EXPERIMENTAL)"
+	depends on BCACHE
+	depends on 64BIT
+	depends on LIBNVDIMM
+	depends on DAX
+	help
+	  Allocate/release NV-memory pages for bcache and provide allocated pages
+	  for each requestor after system reboot.
diff --git a/drivers/md/bcache/Makefile b/drivers/md/bcache/Makefile
index 5b87e59676b8..2397bb7c7ffd 100644
--- a/drivers/md/bcache/Makefile
+++ b/drivers/md/bcache/Makefile
@@ -5,3 +5,4 @@ obj-$(CONFIG_BCACHE)	+= bcache.o
 bcache-y		:= alloc.o bset.o btree.o closure.o debug.o extents.o\
 	io.o journal.o movinggc.o request.o stats.o super.o sysfs.o trace.o\
 	util.o writeback.o features.o
+bcache-$(CONFIG_BCACHE_NVM_PAGES) += nvm-pages.o
diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
new file mode 100644
index 000000000000..6184c628d9cc
--- /dev/null
+++ b/drivers/md/bcache/nvm-pages.c
@@ -0,0 +1,339 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Nvdimm page-buddy allocator
+ *
+ * Copyright (c) 2021, Intel Corporation.
+ * Copyright (c) 2021, Qiaowei Ren <qiaowei.ren@intel.com>.
+ * Copyright (c) 2021, Jianpeng Ma <jianpeng.ma@intel.com>.
+ */
+
+#include "bcache.h"
+#include "nvm-pages.h"
+
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/dax.h>
+#include <linux/pfn_t.h>
+#include <linux/libnvdimm.h>
+#include <linux/mm_types.h>
+#include <linux/err.h>
+#include <linux/pagemap.h>
+#include <linux/bitmap.h>
+#include <linux/blkdev.h>
+
+struct bch_nvmpg_set *global_nvmpg_set;
+
+void *bch_nvmpg_offset_to_ptr(unsigned long offset)
+{
+	int ns_id = BCH_NVMPG_GET_NS_ID(offset);
+	struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[ns_id];
+
+	if (offset == 0)
+		return NULL;
+
+	ns_id = BCH_NVMPG_GET_NS_ID(offset);
+	ns = global_nvmpg_set->ns_tbl[ns_id];
+
+	if (ns)
+		return (void *)(ns->base_addr + BCH_NVMPG_GET_OFFSET(offset));
+
+	pr_err("Invalid ns_id %u\n", ns_id);
+	return NULL;
+}
+
+unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr)
+{
+	int ns_id = ns->ns_id;
+	unsigned long offset = (unsigned long)(ptr - ns->base_addr);
+
+	return BCH_NVMPG_OFFSET(ns_id, offset);
+}
+
+static void release_ns_tbl(struct bch_nvmpg_set *set)
+{
+	int i;
+	struct bch_nvmpg_ns *ns;
+
+	for (i = 0; i < BCH_NVMPG_NS_MAX; i++) {
+		ns = set->ns_tbl[i];
+		if (ns) {
+			blkdev_put(ns->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXEC);
+			set->ns_tbl[i] = NULL;
+			set->attached_ns--;
+			kfree(ns);
+		}
+	}
+
+	if (set->attached_ns)
+		pr_err("unexpected attached_ns: %u\n", set->attached_ns);
+}
+
+static void release_nvmpg_set(struct bch_nvmpg_set *set)
+{
+	release_ns_tbl(set);
+	kfree(set);
+}
+
+/* Namespace 0 contains all meta data of the nvmpg allocation set */
+static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns)
+{
+	struct bch_nvmpg_set_header *set_header;
+
+	if (ns->ns_id != 0) {
+		pr_err("unexpected ns_id %u for first nvmpg namespace.\n",
+		       ns->ns_id);
+		return -EINVAL;
+	}
+
+	set_header = bch_nvmpg_offset_to_ptr(ns->sb->set_header_offset);
+
+	mutex_lock(&global_nvmpg_set->lock);
+	global_nvmpg_set->set_header = set_header;
+	global_nvmpg_set->heads_size = set_header->size;
+	global_nvmpg_set->heads_used = set_header->used;
+	mutex_unlock(&global_nvmpg_set->lock);
+
+	return 0;
+}
+
+static int attach_nvmpg_set(struct bch_nvmpg_ns *ns)
+{
+	struct bch_nvmpg_sb *sb = ns->sb;
+	int rc = 0;
+
+	mutex_lock(&global_nvmpg_set->lock);
+
+	if (global_nvmpg_set->ns_tbl[sb->this_ns]) {
+		pr_err("ns_id %u already attached.\n", ns->ns_id);
+		rc = -EEXIST;
+		goto unlock;
+	}
+
+	if (ns->ns_id != 0) {
+		pr_err("unexpected ns_id %u for first namespace.\n", ns->ns_id);
+		rc = -EINVAL;
+		goto unlock;
+	}
+
+	if (global_nvmpg_set->attached_ns > 0) {
+		pr_err("multiple namespace attaching not supported yet\n");
+		rc = -EOPNOTSUPP;
+		goto unlock;
+	}
+
+	if ((global_nvmpg_set->attached_ns + 1) > sb->total_ns) {
+		pr_err("namespace counters error: attached %u > total %u\n",
+		       global_nvmpg_set->attached_ns,
+		       global_nvmpg_set->total_ns);
+		rc = -EINVAL;
+		goto unlock;
+	}
+
+	memcpy(global_nvmpg_set->set_uuid, sb->set_uuid, 16);
+	global_nvmpg_set->ns_tbl[sb->this_ns] = ns;
+	global_nvmpg_set->attached_ns++;
+	global_nvmpg_set->total_ns = sb->total_ns;
+
+unlock:
+	mutex_unlock(&global_nvmpg_set->lock);
+	return rc;
+}
+
+static int read_nvdimm_meta_super(struct block_device *bdev,
+				  struct bch_nvmpg_ns *ns)
+{
+	struct page *page;
+	struct bch_nvmpg_sb *sb;
+	uint64_t expected_csum = 0;
+	int r;
+
+	page = read_cache_page_gfp(bdev->bd_inode->i_mapping,
+				BCH_NVMPG_SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL);
+
+	if (IS_ERR(page))
+		return -EIO;
+
+	sb = (struct bch_nvmpg_sb *)
+	     (page_address(page) + offset_in_page(BCH_NVMPG_SB_OFFSET));
+
+	r = -EINVAL;
+	expected_csum = csum_set(sb);
+	if (expected_csum != sb->csum) {
+		pr_info("csum is not match with expected one\n");
+		goto put_page;
+	}
+
+	if (memcmp(sb->magic, bch_nvmpg_magic, 16)) {
+		pr_info("invalid bch_nvmpg_magic\n");
+		goto put_page;
+	}
+
+	if (sb->sb_offset !=
+	    BCH_NVMPG_OFFSET(sb->this_ns, BCH_NVMPG_SB_OFFSET)) {
+		pr_info("invalid superblock offset 0x%llx\n", sb->sb_offset);
+		goto put_page;
+	}
+
+	r = -EOPNOTSUPP;
+	if (sb->total_ns != 1) {
+		pr_info("multiple name space not supported yet.\n");
+		goto put_page;
+	}
+
+
+	r = 0;
+	/* Necessary for DAX mapping */
+	ns->page_size = sb->page_size;
+	ns->pages_total = sb->pages_total;
+
+put_page:
+	put_page(page);
+	return r;
+}
+
+struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
+{
+	struct bch_nvmpg_ns *ns = NULL;
+	struct bch_nvmpg_sb *sb = NULL;
+	char buf[BDEVNAME_SIZE];
+	struct block_device *bdev;
+	pgoff_t pgoff;
+	int id, err;
+	char *path;
+	long dax_ret = 0;
+
+	path = kstrndup(dev_path, 512, GFP_KERNEL);
+	if (!path) {
+		pr_err("kstrndup failed\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	bdev = blkdev_get_by_path(strim(path),
+				  FMODE_READ|FMODE_WRITE|FMODE_EXEC,
+				  global_nvmpg_set);
+	if (IS_ERR(bdev)) {
+		pr_err("get %s error: %ld\n", dev_path, PTR_ERR(bdev));
+		kfree(path);
+		return ERR_PTR(PTR_ERR(bdev));
+	}
+
+	err = -ENOMEM;
+	ns = kzalloc(sizeof(struct bch_nvmpg_ns), GFP_KERNEL);
+	if (!ns)
+		goto bdput;
+
+	err = -EIO;
+	if (read_nvdimm_meta_super(bdev, ns)) {
+		pr_err("%s read nvdimm meta super block failed.\n",
+		       bdevname(bdev, buf));
+		goto free_ns;
+	}
+
+	err = -EOPNOTSUPP;
+	if (!bdev_dax_supported(bdev, ns->page_size)) {
+		pr_err("%s don't support DAX\n", bdevname(bdev, buf));
+		goto free_ns;
+	}
+
+	err = -EINVAL;
+	if (bdev_dax_pgoff(bdev, 0, ns->page_size, &pgoff)) {
+		pr_err("invalid offset of %s\n", bdevname(bdev, buf));
+		goto free_ns;
+	}
+
+	err = -ENOMEM;
+	ns->dax_dev = fs_dax_get_by_bdev(bdev);
+	if (!ns->dax_dev) {
+		pr_err("can't by dax device by %s\n", bdevname(bdev, buf));
+		goto free_ns;
+	}
+
+	err = -EINVAL;
+	id = dax_read_lock();
+	dax_ret = dax_direct_access(ns->dax_dev, pgoff, ns->pages_total,
+				    &ns->base_addr, &ns->start_pfn);
+	if (dax_ret <= 0) {
+		pr_err("dax_direct_access error\n");
+		dax_read_unlock(id);
+		goto free_ns;
+	}
+
+	if (dax_ret < ns->pages_total) {
+		pr_warn("mapped range %ld is less than ns->pages_total %lu\n",
+			dax_ret, ns->pages_total);
+	}
+	dax_read_unlock(id);
+
+	sb = (struct bch_nvmpg_sb *)(ns->base_addr + BCH_NVMPG_SB_OFFSET);
+
+	err = -EINVAL;
+	/* Check magic again to make sure DAX mapping is correct */
+	if (memcmp(sb->magic, bch_nvmpg_magic, 16)) {
+		pr_err("invalid bch_nvmpg_magic after DAX mapping\n");
+		goto free_ns;
+	}
+
+	if ((global_nvmpg_set->attached_ns > 0) &&
+	     memcmp(sb->set_uuid, global_nvmpg_set->set_uuid, 16)) {
+		pr_err("set uuid does not match with ns_id %u\n", ns->ns_id);
+		goto free_ns;
+	}
+
+	if (sb->set_header_offset !=
+	    BCH_NVMPG_OFFSET(sb->this_ns, BCH_NVMPG_RECLIST_HEAD_OFFSET)) {
+		pr_err("Invalid header offset: this_ns %u, ns_id %llu, offset 0x%llx\n",
+		       sb->this_ns,
+		       BCH_NVMPG_GET_NS_ID(sb->set_header_offset),
+		       BCH_NVMPG_GET_OFFSET(sb->set_header_offset));
+		goto free_ns;
+	}
+
+	ns->page_size = sb->page_size;
+	ns->pages_offset = sb->pages_offset;
+	ns->pages_total = sb->pages_total;
+	ns->sb = sb;
+	ns->free = 0;
+	ns->bdev = bdev;
+	ns->set = global_nvmpg_set;
+
+	err = attach_nvmpg_set(ns);
+	if (err < 0)
+		goto free_ns;
+
+	mutex_init(&ns->lock);
+
+	err = init_nvmpg_set_header(ns);
+	if (err < 0)
+		goto free_ns;
+
+	kfree(path);
+	return ns;
+
+free_ns:
+	kfree(ns);
+bdput:
+	blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXEC);
+	kfree(path);
+	return ERR_PTR(err);
+}
+EXPORT_SYMBOL_GPL(bch_register_namespace);
+
+int __init bch_nvmpg_init(void)
+{
+	global_nvmpg_set = kzalloc(sizeof(*global_nvmpg_set), GFP_KERNEL);
+	if (!global_nvmpg_set)
+		return -ENOMEM;
+
+	global_nvmpg_set->total_ns = 0;
+	mutex_init(&global_nvmpg_set->lock);
+
+	pr_info("bcache nvm init\n");
+	return 0;
+}
+
+void bch_nvmpg_exit(void)
+{
+	release_nvmpg_set(global_nvmpg_set);
+	pr_info("bcache nvm exit\n");
+}
diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
new file mode 100644
index 000000000000..827cff695608
--- /dev/null
+++ b/drivers/md/bcache/nvm-pages.h
@@ -0,0 +1,96 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _BCACHE_NVM_PAGES_H
+#define _BCACHE_NVM_PAGES_H
+
+#include <linux/bcache-nvm.h>
+#include <linux/libnvdimm.h>
+
+/*
+ * Bcache NVDIMM in memory data structures
+ */
+
+/*
+ * The following three structures in memory records which page(s) allocated
+ * to which owner. After reboot from power failure, they will be initialized
+ * based on nvm pages superblock in NVDIMM device.
+ */
+struct bch_nvmpg_ns {
+	struct bch_nvmpg_sb *sb;
+	void *base_addr;
+
+	unsigned char uuid[16];
+	int ns_id;
+	unsigned int page_size;
+	unsigned long free;
+	unsigned long pages_offset;
+	unsigned long pages_total;
+	pfn_t start_pfn;
+
+	struct dax_device *dax_dev;
+	struct block_device *bdev;
+	struct bch_nvmpg_set *set;
+
+	struct mutex lock;
+};
+
+/*
+ * A set of namespaces. Currently only one set can be supported.
+ */
+struct bch_nvmpg_set {
+	unsigned char set_uuid[16];
+
+	int heads_size;
+	int heads_used;
+	struct bch_nvmpg_set_header *set_header;
+
+	struct bch_nvmpg_ns *ns_tbl[BCH_NVMPG_NS_MAX];
+	int total_ns;
+	int attached_ns;
+
+	struct mutex lock;
+};
+
+#define BCH_NVMPG_NS_ID_BITS	3
+#define BCH_NVMPG_OFFSET_BITS	61
+#define BCH_NVMPG_NS_ID_MASK	((1UL<<BCH_NVMPG_NS_ID_BITS) - 1)
+#define BCH_NVMPG_OFFSET_MASK	((1UL<<BCH_NVMPG_OFFSET_BITS) - 1)
+
+#define BCH_NVMPG_GET_NS_ID(offset)					\
+	(((offset) >> BCH_NVMPG_OFFSET_BITS) & BCH_NVMPG_NS_ID_MASK)
+
+#define BCH_NVMPG_GET_OFFSET(offset)	((offset) & BCH_NVMPG_OFFSET_MASK)
+
+#define BCH_NVMPG_OFFSET(ns_id, offset)					\
+	((((ns_id) & BCH_NVMPG_NS_ID_MASK) << BCH_NVMPG_OFFSET_BITS) |	\
+	 ((offset) & BCH_NVMPG_OFFSET_MASK))
+
+/* Indicate which field in bch_nvmpg_sb to be updated */
+#define BCH_NVMPG_TOTAL_NS	0	/* total_ns */
+
+void *bch_nvmpg_offset_to_ptr(unsigned long offset);
+unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr);
+
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+
+struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path);
+int bch_nvmpg_init(void);
+void bch_nvmpg_exit(void);
+
+#else
+
+static inline struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
+{
+	return NULL;
+}
+
+static inline int bch_nvmpg_init(void)
+{
+	return 0;
+}
+
+static inline void bch_nvmpg_exit(void) { }
+
+#endif /* CONFIG_BCACHE_NVM_PAGES */
+
+#endif /* _BCACHE_NVM_PAGES_H */
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 185246a0d855..4326ffa0d21f 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -14,6 +14,7 @@
 #include "request.h"
 #include "writeback.h"
 #include "features.h"
+#include "nvm-pages.h"
 
 #include <linux/blkdev.h>
 #include <linux/pagemap.h>
@@ -2809,6 +2810,7 @@ static void bcache_exit(void)
 {
 	bch_debug_exit();
 	bch_request_exit();
+	bch_nvmpg_exit();
 	if (bcache_kobj)
 		kobject_put(bcache_kobj);
 	if (bcache_wq)
@@ -2907,6 +2909,7 @@ static int __init bcache_init(void)
 
 	bch_debug_init();
 	closure_debug_init();
+	bch_nvmpg_init();
 
 	bcache_is_reboot = false;
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v12 03/12] bcache: initialization of the buddy
  2021-08-11 17:02 [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li
  2021-08-11 17:02 ` [PATCH v12 01/12] bcache: add initial data structures for nvm pages Coly Li
  2021-08-11 17:02 ` [PATCH v12 02/12] bcache: initialize the nvm pages allocator Coly Li
@ 2021-08-11 17:02 ` Coly Li
  2021-08-11 17:02 ` [PATCH v12 04/12] bcache: bch_nvmpg_alloc_pages() " Coly Li
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Coly Li @ 2021-08-11 17:02 UTC (permalink / raw)
  To: linux-bcache
  Cc: linux-block, linux-nvdimm, axboe, hare, jack, dan.j.williams,
	hch, ying.huang, Jianpeng Ma, kernel test robot, Dan Carpenter,
	Qiaowei Ren, Hannes Reinecke

From: Jianpeng Ma <jianpeng.ma@intel.com>

This nvm pages allocator will implement the simple buddy allocator to
anage the nvm address space. This patch initializes this buddy allocator
for new namespace.

the unit of alloc/free of the buddy allocator is page. DAX device has
their struct page(in dram or PMEM).

        struct {        /* ZONE_DEVICE pages */
                /** @pgmap: Points to the hosting device page map. */
                struct dev_pagemap *pgmap;
                void *zone_device_data;
                /*
                 * ZONE_DEVICE private pages are counted as being
                 * mapped so the next 3 words hold the mapping, index,
                 * and private fields from the source anonymous or
                 * page cache page while the page is migrated to device
                 * private memory.
                 * ZONE_DEVICE MEMORY_DEVICE_FS_DAX pages also
                 * use the mapping, index, and private fields when
                 * pmem backed DAX files are mapped.
                 */
        };

ZONE_DEVICE pages only use pgmap. Other 4 words[16/32 bytes] don't use.
So the second/third word will be used as 'struct list_head ' which list
in buddy. The fourth word(that is normal struct page::index) store pgoff
which the page-offset in the dax device. And the fifth word (that is
normal struct page::private) store order of buddy. page_type will be used
to store buddy flags.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
---
 drivers/md/bcache/nvm-pages.c | 212 +++++++++++++++++++++++++++++++++-
 drivers/md/bcache/nvm-pages.h |  12 ++
 2 files changed, 221 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
index 6184c628d9cc..677fdb62f737 100644
--- a/drivers/md/bcache/nvm-pages.c
+++ b/drivers/md/bcache/nvm-pages.c
@@ -50,6 +50,36 @@ unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr)
 	return BCH_NVMPG_OFFSET(ns_id, offset);
 }
 
+static struct page *bch_nvmpg_va_to_pg(void *addr)
+{
+	return virt_to_page(addr);
+}
+
+static void *bch_nvmpg_pgoff_to_ptr(struct bch_nvmpg_ns *ns, pgoff_t pgoff)
+{
+	return ns->base_addr + (pgoff << PAGE_SHIFT);
+}
+
+static void *bch_nvmpg_rec_to_ptr(struct bch_nvmpg_rec *r)
+{
+	struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[r->ns_id];
+	pgoff_t pgoff = r->pgoff;
+
+	return bch_nvmpg_pgoff_to_ptr(ns, pgoff);
+}
+
+static inline void reserve_nvmpg_pages(struct bch_nvmpg_ns *ns,
+				       pgoff_t pgoff, u64 nr)
+{
+	while (nr > 0) {
+		unsigned int num = nr > UINT_MAX ? UINT_MAX : nr;
+
+		bitmap_set(ns->pages_bitmap, pgoff, num);
+		nr -= num;
+		pgoff += num;
+	}
+}
+
 static void release_ns_tbl(struct bch_nvmpg_set *set)
 {
 	int i;
@@ -58,6 +88,10 @@ static void release_ns_tbl(struct bch_nvmpg_set *set)
 	for (i = 0; i < BCH_NVMPG_NS_MAX; i++) {
 		ns = set->ns_tbl[i];
 		if (ns) {
+			kvfree(ns->pages_bitmap);
+			if (ns->recs_bitmap)
+				bitmap_free(ns->recs_bitmap);
+
 			blkdev_put(ns->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXEC);
 			set->ns_tbl[i] = NULL;
 			set->attached_ns--;
@@ -75,10 +109,73 @@ static void release_nvmpg_set(struct bch_nvmpg_set *set)
 	kfree(set);
 }
 
+static int validate_recs(int ns_id,
+			 struct bch_nvmpg_head *head,
+			 struct bch_nvmpg_recs *recs)
+{
+	if (memcmp(recs->magic, bch_nvmpg_recs_magic, 16)) {
+		pr_err("Invalid bch_nvmpg_recs magic\n");
+		return -EINVAL;
+	}
+
+	if (memcmp(recs->uuid, head->uuid, 16)) {
+		pr_err("Invalid bch_nvmpg_recs uuid\n");
+		return -EINVAL;
+	}
+
+	if (recs->head_offset !=
+	    bch_nvmpg_ptr_to_offset(global_nvmpg_set->ns_tbl[ns_id], head)) {
+		pr_err("Invalid recs head_offset\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int reserve_nvmpg_recs(struct bch_nvmpg_recs *recs)
+{
+	int i, used = 0;
+
+	for (i = 0; i < recs->size; i++) {
+		struct bch_nvmpg_rec *r = &recs->recs[i];
+		struct bch_nvmpg_ns *ns;
+		struct page *page;
+		void *addr;
+
+		if (r->pgoff == 0)
+			continue;
+
+		ns = global_nvmpg_set->ns_tbl[r->ns_id];
+		addr = bch_nvmpg_rec_to_ptr(r);
+		if (addr < ns->base_addr) {
+			pr_err("Invalid recorded address\n");
+			return -EINVAL;
+		}
+
+		/* init struct page: index/private */
+		page = bch_nvmpg_va_to_pg(addr);
+		set_page_private(page, r->order);
+		page->index = r->pgoff;
+
+		reserve_nvmpg_pages(ns, r->pgoff, 1L << r->order);
+		used++;
+	}
+
+	if (used != recs->used) {
+		pr_err("used %d doesn't match recs->used %d\n",
+		       used, recs->used);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /* Namespace 0 contains all meta data of the nvmpg allocation set */
 static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns)
 {
 	struct bch_nvmpg_set_header *set_header;
+	struct bch_nvmpg_recs *sys_recs;
+	int i, j, used = 0, rc = 0;
 
 	if (ns->ns_id != 0) {
 		pr_err("unexpected ns_id %u for first nvmpg namespace.\n",
@@ -92,9 +189,83 @@ static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns)
 	global_nvmpg_set->set_header = set_header;
 	global_nvmpg_set->heads_size = set_header->size;
 	global_nvmpg_set->heads_used = set_header->used;
+
+	/* Reserve the used space from buddy allocator */
+	reserve_nvmpg_pages(ns, 0, div_u64(ns->pages_offset, ns->page_size));
+
+	sys_recs = ns->base_addr + BCH_NVMPG_SYSRECS_OFFSET;
+	for (i = 0; i < set_header->size; i++) {
+		struct bch_nvmpg_head *head;
+
+		head = &set_header->heads[i];
+		if (head->state == BCH_NVMPG_HD_STAT_FREE)
+			continue;
+
+		used++;
+		if (used > global_nvmpg_set->heads_size) {
+			pr_err("used heads %d > heads size %d.\n",
+			       used, global_nvmpg_set->heads_size);
+			goto unlock;
+		}
+
+		for (j = 0; j < BCH_NVMPG_NS_MAX; j++) {
+			struct bch_nvmpg_recs *recs;
+
+			recs = bch_nvmpg_offset_to_ptr(head->recs_offset[j]);
+
+			/* Iterate the recs list */
+			while (recs) {
+				rc = validate_recs(j, head, recs);
+				if (rc < 0)
+					goto unlock;
+
+				rc = reserve_nvmpg_recs(recs);
+				if (rc < 0)
+					goto unlock;
+
+				bitmap_set(ns->recs_bitmap, recs - sys_recs, 1);
+				recs = bch_nvmpg_offset_to_ptr(recs->next_offset);
+			}
+		}
+	}
+unlock:
 	mutex_unlock(&global_nvmpg_set->lock);
+	return rc;
+}
 
-	return 0;
+static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns)
+{
+	unsigned int start, end, pages;
+	int i;
+	struct page *page;
+	pgoff_t pgoff_start;
+
+	bitmap_for_each_clear_region(ns->pages_bitmap,
+				     start, end, 0, ns->pages_total) {
+		pgoff_start = start;
+		pages = end - start;
+
+		while (pages) {
+			void *addr;
+
+			for (i = BCH_MAX_ORDER - 1; i >= 0; i--) {
+				if ((pgoff_start % (1L << i) == 0) &&
+				    (pages >= (1L << i)))
+					break;
+			}
+
+			addr = bch_nvmpg_pgoff_to_ptr(ns, pgoff_start);
+			page = bch_nvmpg_va_to_pg(addr);
+			set_page_private(page, i);
+			page->index = pgoff_start;
+			__SetPageBuddy(page);
+			list_add((struct list_head *)&page->zone_device_data,
+				 &ns->free_area[i]);
+
+			pgoff_start += 1L << i;
+			pages -= 1L << i;
+		}
+	}
 }
 
 static int attach_nvmpg_set(struct bch_nvmpg_ns *ns)
@@ -199,7 +370,7 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
 	char buf[BDEVNAME_SIZE];
 	struct block_device *bdev;
 	pgoff_t pgoff;
-	int id, err;
+	int id, i, err;
 	char *path;
 	long dax_ret = 0;
 
@@ -303,13 +474,48 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
 
 	mutex_init(&ns->lock);
 
+	/*
+	 * parameters of bitmap_set/clear are unsigned int.
+	 * Given currently size of nvm is far from exceeding this limit,
+	 * so only add a WARN_ON message.
+	 */
+	WARN_ON(BITS_TO_LONGS(ns->pages_total) > UINT_MAX);
+	ns->pages_bitmap = kvcalloc(BITS_TO_LONGS(ns->pages_total),
+				    sizeof(unsigned long), GFP_KERNEL);
+	if (!ns->pages_bitmap) {
+		err = -ENOMEM;
+		goto clear_ns_nr;
+	}
+
+	if (ns->sb->this_ns == 0) {
+		ns->recs_bitmap =
+			bitmap_zalloc(BCH_MAX_PGALLOC_RECS, GFP_KERNEL);
+		if (ns->recs_bitmap == NULL) {
+			err = -ENOMEM;
+			goto free_pages_bitmap;
+		}
+	}
+
+	for (i = 0; i < BCH_MAX_ORDER; i++)
+		INIT_LIST_HEAD(&ns->free_area[i]);
+
 	err = init_nvmpg_set_header(ns);
 	if (err < 0)
-		goto free_ns;
+		goto free_recs_bitmap;
+
+	if (ns->sb->this_ns == 0)
+		/* init buddy allocator */
+		bch_nvmpg_init_free_space(ns);
 
 	kfree(path);
 	return ns;
 
+free_recs_bitmap:
+	bitmap_free(ns->recs_bitmap);
+free_pages_bitmap:
+	kvfree(ns->pages_bitmap);
+clear_ns_nr:
+	global_nvmpg_set->ns_tbl[sb->this_ns] = NULL;
 free_ns:
 	kfree(ns);
 bdput:
diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
index 827cff695608..2116086c4d01 100644
--- a/drivers/md/bcache/nvm-pages.h
+++ b/drivers/md/bcache/nvm-pages.h
@@ -10,6 +10,8 @@
  * Bcache NVDIMM in memory data structures
  */
 
+#define BCH_MAX_ORDER 20
+
 /*
  * The following three structures in memory records which page(s) allocated
  * to which owner. After reboot from power failure, they will be initialized
@@ -27,6 +29,11 @@ struct bch_nvmpg_ns {
 	unsigned long pages_total;
 	pfn_t start_pfn;
 
+	unsigned long *pages_bitmap;
+	struct list_head free_area[BCH_MAX_ORDER];
+
+	unsigned long *recs_bitmap;
+
 	struct dax_device *dax_dev;
 	struct block_device *bdev;
 	struct bch_nvmpg_set *set;
@@ -68,6 +75,11 @@ struct bch_nvmpg_set {
 /* Indicate which field in bch_nvmpg_sb to be updated */
 #define BCH_NVMPG_TOTAL_NS	0	/* total_ns */
 
+#define BCH_MAX_PGALLOC_RECS						\
+	(min_t(unsigned int, 64,					\
+	       (BCH_NVMPG_START - BCH_NVMPG_SYSRECS_OFFSET) /		\
+	       sizeof(struct bch_nvmpg_recs)))
+
 void *bch_nvmpg_offset_to_ptr(unsigned long offset);
 unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr);
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v12 04/12] bcache: bch_nvmpg_alloc_pages() of the buddy
  2021-08-11 17:02 [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li
                   ` (2 preceding siblings ...)
  2021-08-11 17:02 ` [PATCH v12 03/12] bcache: initialization of the buddy Coly Li
@ 2021-08-11 17:02 ` Coly Li
  2021-08-11 17:02 ` [PATCH v12 05/12] bcache: bch_nvmpg_free_pages() of the buddy allocator Coly Li
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Coly Li @ 2021-08-11 17:02 UTC (permalink / raw)
  To: linux-bcache
  Cc: linux-block, linux-nvdimm, axboe, hare, jack, dan.j.williams,
	hch, ying.huang, Jianpeng Ma, Qiaowei Ren, Hannes Reinecke

From: Jianpeng Ma <jianpeng.ma@intel.com>

This patch implements the bch_nvmpg_alloc_pages() of the nvm pages buddy
allocator. In terms of function, this func is like current
page-buddy-alloc. But the differences are:
a: it need owner_uuid as parameter which record owner info. And it
make those info persistence.
b: it don't need flags like GFP_*. All allocs are the equal.
c: it don't trigger other ops etc swap/recycle.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
---
 drivers/md/bcache/nvm-pages.c | 210 ++++++++++++++++++++++++++++++++++
 drivers/md/bcache/nvm-pages.h |   9 ++
 2 files changed, 219 insertions(+)

diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
index 677fdb62f737..420b7c479057 100644
--- a/drivers/md/bcache/nvm-pages.c
+++ b/drivers/md/bcache/nvm-pages.c
@@ -50,6 +50,13 @@ unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr)
 	return BCH_NVMPG_OFFSET(ns_id, offset);
 }
 
+static unsigned long bch_nvmpg_ptr_to_pgoff(struct bch_nvmpg_ns *ns, void *ptr)
+{
+	unsigned long offset = (unsigned long)(ptr - ns->base_addr);
+
+	return offset >> PAGE_SHIFT;
+}
+
 static struct page *bch_nvmpg_va_to_pg(void *addr)
 {
 	return virt_to_page(addr);
@@ -268,6 +275,209 @@ static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns)
 	}
 }
 
+
+/* If not found, it will create if create == true */
+static struct bch_nvmpg_head *find_nvmpg_head(const char *uuid, bool create)
+{
+	struct bch_nvmpg_set_header *set_header = global_nvmpg_set->set_header;
+	struct bch_nvmpg_head *head = NULL;
+	int i;
+
+	if (set_header == NULL)
+		goto out;
+
+	for (i = 0; i < set_header->size; i++) {
+		struct bch_nvmpg_head *h = &set_header->heads[i];
+
+		if (h->state != BCH_NVMPG_HD_STAT_ALLOC)
+			continue;
+
+		if (!memcmp(uuid, h->uuid, 16)) {
+			head = h;
+			break;
+		}
+	}
+
+	if (!head && create) {
+		u32 used = set_header->used;
+
+		if (set_header->size > used) {
+			head = &set_header->heads[used];
+			memset(head, 0, sizeof(struct bch_nvmpg_head));
+			head->state = BCH_NVMPG_HD_STAT_ALLOC;
+			memcpy(head->uuid, uuid, 16);
+			global_nvmpg_set->heads_used++;
+			set_header->used++;
+		} else
+			pr_info("No free bch_nvmpg_head\n");
+	}
+
+out:
+	return head;
+}
+
+static struct bch_nvmpg_recs *find_empty_nvmpg_recs(void)
+{
+	unsigned int start;
+	struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[0];
+	struct bch_nvmpg_recs *recs;
+
+	start = bitmap_find_next_zero_area(ns->recs_bitmap,
+					   BCH_MAX_PGALLOC_RECS, 0, 1, 0);
+	if (start > BCH_MAX_PGALLOC_RECS) {
+		pr_info("No free struct bch_nvmpg_recs\n");
+		return NULL;
+	}
+
+	bitmap_set(ns->recs_bitmap, start, 1);
+	recs = (struct bch_nvmpg_recs *)
+		bch_nvmpg_offset_to_ptr(BCH_NVMPG_SYSRECS_OFFSET)
+	       + start;
+
+	memset(recs, 0, sizeof(struct bch_nvmpg_recs));
+	return recs;
+}
+
+
+static struct bch_nvmpg_recs *find_nvmpg_recs(struct bch_nvmpg_ns *ns,
+					      struct bch_nvmpg_head *head,
+					      bool create)
+{
+	int ns_id = ns->sb->this_ns;
+	struct bch_nvmpg_recs *prev_recs = NULL, *recs = NULL;
+
+	recs = bch_nvmpg_offset_to_ptr(head->recs_offset[ns_id]);
+
+	/* If create=false, we return recs[nr] */
+	if (!create)
+		return recs;
+
+	/*
+	 * If create=true, it mean we need a empty struct bch_nvmpg_rec
+	 * So we should find non-empty struct bch_nvmpg_recs or alloc
+	 * new struct bch_nvmpg_recs. And return this bch_nvmpg_recs
+	 */
+	while (recs && (recs->used == recs->size)) {
+		prev_recs = recs;
+		recs = bch_nvmpg_offset_to_ptr(recs->next_offset);
+	}
+
+	/* Found empty struct bch_nvmpg_recs */
+	if (recs)
+		return recs;
+
+	/* Need alloc new struct bch_nvmpg_recs */
+	recs = find_empty_nvmpg_recs();
+	if (recs) {
+		unsigned long offset;
+
+		recs->next_offset = 0;
+		recs->head_offset = bch_nvmpg_ptr_to_offset(ns, head);
+		memcpy(recs->magic, bch_nvmpg_recs_magic, 16);
+		memcpy(recs->uuid, head->uuid, 16);
+		recs->size = BCH_NVMPG_MAX_RECS;
+		recs->used = 0;
+
+		offset = bch_nvmpg_ptr_to_offset(ns, recs);
+		if (prev_recs)
+			prev_recs->next_offset = offset;
+		else
+			head->recs_offset[ns_id] = offset;
+	}
+
+	return recs;
+}
+
+static void add_nvmpg_rec(struct bch_nvmpg_ns *ns,
+			  struct bch_nvmpg_recs *recs,
+			  void *kaddr, int order)
+{
+	int i;
+
+	for (i = 0; i < recs->size; i++) {
+		if (recs->recs[i].pgoff == 0) {
+			recs->recs[i].pgoff = bch_nvmpg_ptr_to_pgoff(ns, kaddr);
+			recs->recs[i].order = order;
+			recs->recs[i].ns_id = ns->sb->this_ns;
+			recs->used++;
+			break;
+		}
+	}
+	BUG_ON(i == recs->size);
+}
+
+
+void *bch_nvmpg_alloc_pages(int order, const char *uuid)
+{
+	void *kaddr = NULL;
+	struct bch_nvmpg_head *head;
+	int n, o;
+
+	mutex_lock(&global_nvmpg_set->lock);
+	head = find_nvmpg_head(uuid, true);
+
+	if (!head) {
+		pr_err("Cannot find bch_nvmpg_recs by uuid.\n");
+		goto unlock;
+	}
+
+	for (n = 0; n < global_nvmpg_set->total_ns; n++) {
+		struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[n];
+
+		if (!ns || (ns->free < (1L << order)))
+			continue;
+
+		for (o = order; o < BCH_MAX_ORDER; o++) {
+			struct list_head *list;
+			struct page *page, *buddy_page;
+
+			if (list_empty(&ns->free_area[o]))
+				continue;
+
+			list = ns->free_area[o].next;
+			page = container_of((void *)list, struct page,
+					    zone_device_data);
+
+			list_del(list);
+
+			while (o != order) {
+				void *addr;
+				pgoff_t pgoff;
+
+				pgoff = page->index + (1L << (o - 1));
+				addr = bch_nvmpg_pgoff_to_ptr(ns, pgoff);
+				buddy_page = bch_nvmpg_va_to_pg(addr);
+				set_page_private(buddy_page, o - 1);
+				buddy_page->index = pgoff;
+				__SetPageBuddy(buddy_page);
+				list_add((struct list_head *)&buddy_page->zone_device_data,
+					 &ns->free_area[o - 1]);
+				o--;
+			}
+
+			set_page_private(page, order);
+			__ClearPageBuddy(page);
+			ns->free -= 1L << order;
+			kaddr = bch_nvmpg_pgoff_to_ptr(ns, page->index);
+			break;
+		}
+
+		if (o < BCH_MAX_ORDER) {
+			struct bch_nvmpg_recs *recs;
+
+			recs = find_nvmpg_recs(ns, head, true);
+			/* ToDo: handle pgalloc_recs==NULL */
+			add_nvmpg_rec(ns, recs, kaddr, order);
+			break;
+		}
+	}
+
+unlock:
+	mutex_unlock(&global_nvmpg_set->lock);
+	return kaddr;
+}
+EXPORT_SYMBOL_GPL(bch_nvmpg_alloc_pages);
+
 static int attach_nvmpg_set(struct bch_nvmpg_ns *ns)
 {
 	struct bch_nvmpg_sb *sb = ns->sb;
diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
index 2116086c4d01..1bcd7a4e1fd1 100644
--- a/drivers/md/bcache/nvm-pages.h
+++ b/drivers/md/bcache/nvm-pages.h
@@ -75,6 +75,9 @@ struct bch_nvmpg_set {
 /* Indicate which field in bch_nvmpg_sb to be updated */
 #define BCH_NVMPG_TOTAL_NS	0	/* total_ns */
 
+#define BCH_PGOFF_TO_KVADDR(pgoff)					\
+	((void *)((unsigned long)(pgoff) << PAGE_SHIFT))
+
 #define BCH_MAX_PGALLOC_RECS						\
 	(min_t(unsigned int, 64,					\
 	       (BCH_NVMPG_START - BCH_NVMPG_SYSRECS_OFFSET) /		\
@@ -88,6 +91,7 @@ unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr);
 struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path);
 int bch_nvmpg_init(void);
 void bch_nvmpg_exit(void);
+void *bch_nvmpg_alloc_pages(int order, const char *uuid);
 
 #else
 
@@ -103,6 +107,11 @@ static inline int bch_nvmpg_init(void)
 
 static inline void bch_nvmpg_exit(void) { }
 
+static inline void *bch_nvmpg_alloc_pages(int order, const char *uuid)
+{
+	return NULL;
+}
+
 #endif /* CONFIG_BCACHE_NVM_PAGES */
 
 #endif /* _BCACHE_NVM_PAGES_H */
-- 
2.26.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v12 05/12] bcache: bch_nvmpg_free_pages() of the buddy allocator
  2021-08-11 17:02 [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li
                   ` (3 preceding siblings ...)
  2021-08-11 17:02 ` [PATCH v12 04/12] bcache: bch_nvmpg_alloc_pages() " Coly Li
@ 2021-08-11 17:02 ` Coly Li
  2021-08-11 17:02 ` [PATCH v12 06/12] bcache: get recs list head for allocated pages by specific uuid Coly Li
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Coly Li @ 2021-08-11 17:02 UTC (permalink / raw)
  To: linux-bcache
  Cc: linux-block, linux-nvdimm, axboe, hare, jack, dan.j.williams,
	hch, ying.huang, Jianpeng Ma, Qiaowei Ren, Hannes Reinecke

From: Jianpeng Ma <jianpeng.ma@intel.com>

This patch implements the bch_nvmpg_free_pages() of the buddy allocator.

The difference between this and page-buddy-free:
it need owner_uuid to free owner allocated pages, and must
persistent after free.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
---
 drivers/md/bcache/nvm-pages.c | 167 +++++++++++++++++++++++++++++++++-
 drivers/md/bcache/nvm-pages.h |   3 +
 2 files changed, 167 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
index 420b7c479057..ef61fdaaac28 100644
--- a/drivers/md/bcache/nvm-pages.c
+++ b/drivers/md/bcache/nvm-pages.c
@@ -240,6 +240,51 @@ static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns)
 	return rc;
 }
 
+static void __free_space(struct bch_nvmpg_ns *ns, void *addr, int order)
+{
+	unsigned long add_pages = (1L << order);
+	pgoff_t pgoff;
+	struct page *page;
+	void *va;
+
+	page = bch_nvmpg_va_to_pg(addr);
+	WARN_ON((!page) || (page->private != order));
+	pgoff = page->index;
+
+	while (order < BCH_MAX_ORDER - 1) {
+		struct page *buddy_page;
+
+		pgoff_t buddy_pgoff = pgoff ^ (1L << order);
+		pgoff_t parent_pgoff = pgoff & ~(1L << order);
+
+		if ((parent_pgoff + (1L << (order + 1)) > ns->pages_total))
+			break;
+
+		va = bch_nvmpg_pgoff_to_ptr(ns, buddy_pgoff);
+		buddy_page = bch_nvmpg_va_to_pg(va);
+		WARN_ON(!buddy_page);
+
+		if (PageBuddy(buddy_page) && (buddy_page->private == order)) {
+			list_del((struct list_head *)&buddy_page->zone_device_data);
+			__ClearPageBuddy(buddy_page);
+			pgoff = parent_pgoff;
+			order++;
+			continue;
+		}
+		break;
+	}
+
+	va = bch_nvmpg_pgoff_to_ptr(ns, pgoff);
+	page = bch_nvmpg_va_to_pg(va);
+	WARN_ON(!page);
+	list_add((struct list_head *)&page->zone_device_data,
+		 &ns->free_area[order]);
+	page->index = pgoff;
+	set_page_private(page, order);
+	__SetPageBuddy(page);
+	ns->free += add_pages;
+}
+
 static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns)
 {
 	unsigned int start, end, pages;
@@ -265,9 +310,9 @@ static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns)
 			page = bch_nvmpg_va_to_pg(addr);
 			set_page_private(page, i);
 			page->index = pgoff_start;
-			__SetPageBuddy(page);
-			list_add((struct list_head *)&page->zone_device_data,
-				 &ns->free_area[i]);
+
+			/* In order to update ns->free */
+			__free_space(ns, addr, i);
 
 			pgoff_start += 1L << i;
 			pages -= 1L << i;
@@ -478,6 +523,121 @@ void *bch_nvmpg_alloc_pages(int order, const char *uuid)
 }
 EXPORT_SYMBOL_GPL(bch_nvmpg_alloc_pages);
 
+static inline void *nvm_end_addr(struct bch_nvmpg_ns *ns)
+{
+	return ns->base_addr + (ns->pages_total << PAGE_SHIFT);
+}
+
+static inline bool in_nvmpg_ns_range(struct bch_nvmpg_ns *ns,
+				     void *start_addr, void *end_addr)
+{
+	return (start_addr >= ns->base_addr) && (end_addr < nvm_end_addr(ns));
+}
+
+static struct bch_nvmpg_ns *find_nvmpg_ns_by_addr(void *addr, int order)
+{
+	int i;
+	struct bch_nvmpg_ns *ns;
+
+	for (i = 0; i < global_nvmpg_set->total_ns; i++) {
+		ns = global_nvmpg_set->ns_tbl[i];
+
+		if (ns && in_nvmpg_ns_range(ns, addr, addr + (1L << order)))
+			return ns;
+	}
+
+	return NULL;
+}
+
+static int remove_nvmpg_rec(struct bch_nvmpg_recs *recs, int ns_id,
+			    void *kaddr, int order)
+{
+	struct bch_nvmpg_head *head;
+	struct bch_nvmpg_recs *prev_recs, *sys_recs;
+	struct bch_nvmpg_ns *ns;
+	unsigned long pgoff;
+	int i;
+
+	ns = global_nvmpg_set->ns_tbl[0];
+	pgoff = bch_nvmpg_ptr_to_pgoff(ns, kaddr);
+
+	head = bch_nvmpg_offset_to_ptr(recs->head_offset);
+	prev_recs = recs;
+	sys_recs = bch_nvmpg_offset_to_ptr(BCH_NVMPG_SYSRECS_OFFSET);
+	while (recs) {
+		for (i = 0; i < recs->size; i++) {
+			struct bch_nvmpg_rec *rec = &(recs->recs[i]);
+
+			if ((rec->pgoff == pgoff) && (rec->ns_id == ns_id)) {
+				WARN_ON(rec->order != order);
+				rec->_v = 0;
+				recs->used--;
+
+				if (recs->used == 0) {
+					int recs_pos = recs - sys_recs;
+
+					if (recs == prev_recs)
+						head->recs_offset[ns_id] =
+							recs->next_offset;
+					else
+						prev_recs->next_offset =
+							recs->next_offset;
+
+					recs->next_offset = 0;
+					recs->head_offset = 0;
+
+					bitmap_clear(ns->recs_bitmap, recs_pos, 1);
+				}
+				goto out;
+			}
+		}
+		prev_recs = recs;
+		recs = bch_nvmpg_offset_to_ptr(recs->next_offset);
+	}
+out:
+	return (recs ? 0 : -ENOENT);
+}
+
+void bch_nvmpg_free_pages(void *addr, int order, const char *uuid)
+{
+	struct bch_nvmpg_ns *ns;
+	struct bch_nvmpg_head *head;
+	struct bch_nvmpg_recs *recs;
+	int r;
+
+	mutex_lock(&global_nvmpg_set->lock);
+
+	ns = find_nvmpg_ns_by_addr(addr, order);
+	if (!ns) {
+		pr_err("can't find namespace by given kaddr from namespace\n");
+		goto unlock;
+	}
+
+	head = find_nvmpg_head(uuid, false);
+	if (!head) {
+		pr_err("can't found bch_nvmpg_head by uuid\n");
+		goto unlock;
+	}
+
+	recs = find_nvmpg_recs(ns, head, false);
+	if (!recs) {
+		pr_err("can't find bch_nvmpg_recs by uuid\n");
+		goto unlock;
+	}
+
+	r = remove_nvmpg_rec(recs, ns->sb->this_ns, addr, order);
+	if (r < 0) {
+		pr_err("can't find bch_nvmpg_rec\n");
+		goto unlock;
+	}
+
+	__free_space(ns, addr, order);
+
+unlock:
+	mutex_unlock(&global_nvmpg_set->lock);
+}
+EXPORT_SYMBOL_GPL(bch_nvmpg_free_pages);
+
 static int attach_nvmpg_set(struct bch_nvmpg_ns *ns)
 {
 	struct bch_nvmpg_sb *sb = ns->sb;
@@ -674,6 +834,7 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
 	ns->pages_offset = sb->pages_offset;
 	ns->pages_total = sb->pages_total;
 	ns->sb = sb;
+	/* increase by __free_space() */
 	ns->free = 0;
 	ns->bdev = bdev;
 	ns->set = global_nvmpg_set;
diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
index 1bcd7a4e1fd1..2529dc8b9d49 100644
--- a/drivers/md/bcache/nvm-pages.h
+++ b/drivers/md/bcache/nvm-pages.h
@@ -92,6 +92,7 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path);
 int bch_nvmpg_init(void);
 void bch_nvmpg_exit(void);
 void *bch_nvmpg_alloc_pages(int order, const char *uuid);
+void bch_nvmpg_free_pages(void *addr, int order, const char *uuid);
 
 #else
 
@@ -112,6 +113,8 @@ static inline void *bch_nvmpg_alloc_pages(int order, const char *uuid)
 	return NULL;
 }
 
+static inline void bch_nvmpg_free_pages(void *addr, int order, const char *uuid) { }
+
 #endif /* CONFIG_BCACHE_NVM_PAGES */
 
 #endif /* _BCACHE_NVM_PAGES_H */
-- 
2.26.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v12 06/12] bcache: get recs list head for allocated pages by specific uuid
  2021-08-11 17:02 [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li
                   ` (4 preceding siblings ...)
  2021-08-11 17:02 ` [PATCH v12 05/12] bcache: bch_nvmpg_free_pages() of the buddy allocator Coly Li
@ 2021-08-11 17:02 ` Coly Li
  2021-08-11 17:02 ` [PATCH v12 07/12] bcache: use bucket index to set GC_MARK_METADATA for journal buckets in bch_btree_gc_finish() Coly Li
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Coly Li @ 2021-08-11 17:02 UTC (permalink / raw)
  To: linux-bcache
  Cc: linux-block, linux-nvdimm, axboe, hare, jack, dan.j.williams,
	hch, ying.huang, Jianpeng Ma, Qiaowei Ren, Hannes Reinecke

From: Jianpeng Ma <jianpeng.ma@intel.com>

This patch implements bch_get_nvmpg_head() of the buddy allocator
to be used to get recs list head for allocated pages by specific
uuid. Then the requester (owner) can find all previous allocated
nvdimm pages by iterating the recs list.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
---
 drivers/md/bcache/nvm-pages.c | 6 ++++++
 drivers/md/bcache/nvm-pages.h | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
index ef61fdaaac28..497360c60f26 100644
--- a/drivers/md/bcache/nvm-pages.c
+++ b/drivers/md/bcache/nvm-pages.c
@@ -523,6 +523,12 @@ void *bch_nvmpg_alloc_pages(int order, const char *uuid)
 }
 EXPORT_SYMBOL_GPL(bch_nvmpg_alloc_pages);
 
+struct bch_nvmpg_head *bch_get_nvmpg_head(const char *uuid)
+{
+	return find_nvmpg_head(uuid, false);
+}
+EXPORT_SYMBOL_GPL(bch_get_nvmpg_head);
+
 static inline void *nvm_end_addr(struct bch_nvmpg_ns *ns)
 {
 	return ns->base_addr + (ns->pages_total << PAGE_SHIFT);
diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
index 2529dc8b9d49..2f6f2ffbfd80 100644
--- a/drivers/md/bcache/nvm-pages.h
+++ b/drivers/md/bcache/nvm-pages.h
@@ -93,6 +93,7 @@ int bch_nvmpg_init(void);
 void bch_nvmpg_exit(void);
 void *bch_nvmpg_alloc_pages(int order, const char *uuid);
 void bch_nvmpg_free_pages(void *addr, int order, const char *uuid);
+struct bch_nvmpg_head *bch_get_nvmpg_head(const char *uuid);
 
 #else
 
@@ -115,6 +116,11 @@ static inline void *bch_nvmpg_alloc_pages(int order, const char *uuid)
 
 static inline void bch_nvmpg_free_pages(void *addr, int order, const char *uuid) { }
 
+static inline struct bch_nvmpg_head *bch_get_nvmpg_head(const char *uuid)
+{
+	return NULL;
+}
+
 #endif /* CONFIG_BCACHE_NVM_PAGES */
 
 #endif /* _BCACHE_NVM_PAGES_H */
-- 
2.26.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v12 07/12] bcache: use bucket index to set GC_MARK_METADATA for journal buckets in bch_btree_gc_finish()
  2021-08-11 17:02 [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li
                   ` (5 preceding siblings ...)
  2021-08-11 17:02 ` [PATCH v12 06/12] bcache: get recs list head for allocated pages by specific uuid Coly Li
@ 2021-08-11 17:02 ` Coly Li
  2021-08-11 17:02 ` [PATCH v12 08/12] bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set Coly Li
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Coly Li @ 2021-08-11 17:02 UTC (permalink / raw)
  To: linux-bcache
  Cc: linux-block, linux-nvdimm, axboe, hare, jack, dan.j.williams,
	hch, ying.huang, Coly Li, Hannes Reinecke, Jianpeng Ma,
	Qiaowei Ren

Currently the meta data bucket locations on cache device are reserved
after the meta data stored on NVDIMM pages, for the meta data layout
consistentcy temporarily. So these buckets are still marked as meta data
by SET_GC_MARK() in bch_btree_gc_finish().

When BCH_FEATURE_INCOMPAT_NVDIMM_META is set, the sb.d[] stores linear
address of NVDIMM pages and not bucket index anymore. Therefore we
should avoid to find bucket index from sb.d[], and directly use bucket
index from ca->sb.first_bucket to (ca->sb.first_bucket +
ca->sb.njournal_bucketsi) for setting the gc mark of journal bucket.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
---
 drivers/md/bcache/btree.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 183a58c89377..e0d7135669ca 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1761,8 +1761,10 @@ static void bch_btree_gc_finish(struct cache_set *c)
 	ca = c->cache;
 	ca->invalidate_needs_gc = 0;
 
-	for (k = ca->sb.d; k < ca->sb.d + ca->sb.keys; k++)
-		SET_GC_MARK(ca->buckets + *k, GC_MARK_METADATA);
+	/* Range [first_bucket, first_bucket + keys) is for journal buckets */
+	for (i = ca->sb.first_bucket;
+	     i < ca->sb.first_bucket + ca->sb.njournal_buckets; i++)
+		SET_GC_MARK(ca->buckets + i, GC_MARK_METADATA);
 
 	for (k = ca->prio_buckets;
 	     k < ca->prio_buckets + prio_buckets(ca) * 2; k++)
-- 
2.26.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v12 08/12] bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
  2021-08-11 17:02 [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li
                   ` (6 preceding siblings ...)
  2021-08-11 17:02 ` [PATCH v12 07/12] bcache: use bucket index to set GC_MARK_METADATA for journal buckets in bch_btree_gc_finish() Coly Li
@ 2021-08-11 17:02 ` Coly Li
  2021-08-11 17:02 ` [PATCH v12 09/12] bcache: initialize bcache journal for NVDIMM meta device Coly Li
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Coly Li @ 2021-08-11 17:02 UTC (permalink / raw)
  To: linux-bcache
  Cc: linux-block, linux-nvdimm, axboe, hare, jack, dan.j.williams,
	hch, ying.huang, Coly Li, Hannes Reinecke, Jianpeng Ma,
	Qiaowei Ren

This patch adds BCH_FEATURE_INCOMPAT_NVDIMM_META (value 0x0004) into the
incompat feature set. When this bit is set by bcache-tools, it indicates
bcache meta data should be stored on specific NVDIMM meta device.

The bcache meta data mainly includes journal and btree nodes, when this
bit is set in incompat feature set, bcache will ask the nvm-pages
allocator for NVDIMM space to store the meta data.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
---
 drivers/md/bcache/features.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/md/bcache/features.h b/drivers/md/bcache/features.h
index d1c8fd3977fc..45d2508d5532 100644
--- a/drivers/md/bcache/features.h
+++ b/drivers/md/bcache/features.h
@@ -17,11 +17,19 @@
 #define BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET		0x0001
 /* real bucket size is (1 << bucket_size) */
 #define BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE	0x0002
+/* store bcache meta data on nvdimm */
+#define BCH_FEATURE_INCOMPAT_NVDIMM_META		0x0004
 
 #define BCH_FEATURE_COMPAT_SUPP		0
 #define BCH_FEATURE_RO_COMPAT_SUPP	0
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+#define BCH_FEATURE_INCOMPAT_SUPP	(BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET| \
+					 BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE| \
+					 BCH_FEATURE_INCOMPAT_NVDIMM_META)
+#else
 #define BCH_FEATURE_INCOMPAT_SUPP	(BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET| \
 					 BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE)
+#endif
 
 #define BCH_HAS_COMPAT_FEATURE(sb, mask) \
 		((sb)->feature_compat & (mask))
@@ -89,6 +97,7 @@ static inline void bch_clear_feature_##name(struct cache_sb *sb) \
 
 BCH_FEATURE_INCOMPAT_FUNCS(obso_large_bucket, OBSO_LARGE_BUCKET);
 BCH_FEATURE_INCOMPAT_FUNCS(large_bucket, LOG_LARGE_BUCKET_SIZE);
+BCH_FEATURE_INCOMPAT_FUNCS(nvdimm_meta, NVDIMM_META);
 
 static inline bool bch_has_unknown_compat_features(struct cache_sb *sb)
 {
-- 
2.26.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v12 09/12] bcache: initialize bcache journal for NVDIMM meta device
  2021-08-11 17:02 [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li
                   ` (7 preceding siblings ...)
  2021-08-11 17:02 ` [PATCH v12 08/12] bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set Coly Li
@ 2021-08-11 17:02 ` Coly Li
  2021-08-11 17:02 ` [PATCH v12 10/12] bcache: support storing bcache journal into " Coly Li
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Coly Li @ 2021-08-11 17:02 UTC (permalink / raw)
  To: linux-bcache
  Cc: linux-block, linux-nvdimm, axboe, hare, jack, dan.j.williams,
	hch, ying.huang, Coly Li, Hannes Reinecke, Jianpeng Ma,
	Qiaowei Ren

The nvm-pages allocator may store and index the NVDIMM pages allocated
for bcache journal. This patch adds the initialization to store bcache
journal space on NVDIMM pages if BCH_FEATURE_INCOMPAT_NVDIMM_META bit is
set by bcache-tools.

If BCH_FEATURE_INCOMPAT_NVDIMM_META is set, get_nvdimm_journal_space()
will return the linear address of NVDIMM pages for bcache journal,
- If there is previously allocated space, find it from nvm-pages owner
  list and return to bch_journal_init().
- If there is no previously allocated space, require a new NVDIMM range
  from the nvm-pages allocator, and return it to bch_journal_init().

And in bch_journal_init(), keys in sb.d[] store the corresponding linear
address from NVDIMM into sb.d[i].ptr[0] where 'i' is the bucket index to
iterate all journal buckets.

Later when bcache journaling code stores the journaling jset, the target
NVDIMM linear address stored (and updated) in sb.d[i].ptr[0] can be used
directly in memory copy from DRAM pages into NVDIMM pages.

Signed-off-by: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
---
 drivers/md/bcache/journal.c   | 117 ++++++++++++++++++++++++++++++++++
 drivers/md/bcache/journal.h   |   2 +-
 drivers/md/bcache/nvm-pages.c |   9 +++
 drivers/md/bcache/nvm-pages.h |   1 +
 drivers/md/bcache/super.c     |  18 +++---
 5 files changed, 136 insertions(+), 11 deletions(-)

diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 61bd79babf7a..9fe6c1abfd84 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -9,6 +9,8 @@
 #include "btree.h"
 #include "debug.h"
 #include "extents.h"
+#include "nvm-pages.h"
+#include "features.h"
 
 #include <trace/events/bcache.h>
 
@@ -982,3 +984,118 @@ int bch_journal_alloc(struct cache_set *c)
 
 	return 0;
 }
+
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+
+static void *find_journal_nvmpg_base(struct bch_nvmpg_head *nvmpg_head,
+				   struct cache *ca)
+{
+	void *addr = NULL;
+	unsigned long jnl_offset, jnl_pgoff, jnl_ns_id;
+	int i;
+
+	jnl_offset = (unsigned long)ca->sb.d[0];
+	jnl_ns_id = BCH_NVMPG_GET_NS_ID(jnl_offset);
+	jnl_pgoff = BCH_NVMPG_GET_OFFSET(jnl_offset) >> PAGE_SHIFT;
+
+	for (i = 0; i < BCH_NVMPG_NS_MAX; i++) {
+		struct bch_nvmpg_recs *recs;
+		struct bch_nvmpg_rec *rec;
+		unsigned long recs_offset = 0;
+		int j;
+
+		recs_offset = nvmpg_head->recs_offset[i];
+		recs = bch_nvmpg_offset_to_ptr(recs_offset);
+		while (recs) {
+			for (j = 0; j < recs->size; j++) {
+				rec = &recs->recs[j];
+				if ((rec->pgoff != jnl_pgoff) ||
+				    (rec->ns_id != jnl_ns_id))
+					continue;
+
+				addr = bch_nvmpg_offset_to_ptr(jnl_offset);
+				goto out;
+			}
+			recs_offset = recs->next_offset;
+			recs = bch_nvmpg_offset_to_ptr(recs_offset);
+		}
+	}
+
+out:
+	return addr;
+}
+
+static void *get_journal_nvmpg_space(struct cache *ca)
+{
+	struct bch_nvmpg_head *head = NULL;
+	void *ret = NULL;
+	int order;
+
+	head = bch_get_nvmpg_head(ca->sb.set_uuid);
+	if (head) {
+		ret = find_journal_nvmpg_base(head, ca);
+		if (ret)
+			goto found;
+	}
+
+	order = ilog2((ca->sb.bucket_size *
+		       ca->sb.njournal_buckets) / PAGE_SECTORS);
+	ret = bch_nvmpg_alloc_pages(order, ca->sb.set_uuid);
+	if (ret)
+		memset(ret, 0, (1 << order) * PAGE_SIZE);
+found:
+	return ret;
+}
+
+#endif /* CONFIG_BCACHE_NVM_PAGES */
+
+static int __bch_journal_nvdimm_init(struct cache *ca)
+{
+	int ret = -1;
+
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+	int i;
+	void *jnl_base = NULL;
+
+	jnl_base = get_journal_nvmpg_space(ca);
+	if (!jnl_base) {
+		pr_err("Failed to get journal space from nvdimm\n");
+		goto out;
+	}
+
+	/* Iniialized and reloaded from on-disk super block already */
+	if (ca->sb.d[0] != 0)
+		goto out;
+
+	for (i = 0; i < ca->sb.keys; i++) {
+		unsigned long jnl_offset;
+
+		jnl_offset = bch_nvmpg_ptr_to_offset(bch_nvmpg_id_to_ns(0),
+					jnl_base + (bucket_bytes(ca) * i));
+		ca->sb.d[i] = jnl_offset;
+	}
+
+	ret = 0;
+out:
+#endif /* CONFIG_BCACHE_NVM_PAGES */
+
+	return ret;
+}
+
+
+int bch_journal_init(struct cache_set *c)
+{
+	int i, ret = 0;
+	struct cache *ca = c->cache;
+
+	ca->sb.keys = clamp_t(int, ca->sb.nbuckets >> 7,
+			      2, SB_JOURNAL_BUCKETS);
+
+	if (!bch_has_feature_nvdimm_meta(&ca->sb)) {
+		for (i = 0; i < ca->sb.keys; i++)
+			ca->sb.d[i] = ca->sb.first_bucket + i;
+	} else
+		ret = __bch_journal_nvdimm_init(ca);
+
+	return ret;
+}
diff --git a/drivers/md/bcache/journal.h b/drivers/md/bcache/journal.h
index f2ea34d5f431..e3a7fa5a8fda 100644
--- a/drivers/md/bcache/journal.h
+++ b/drivers/md/bcache/journal.h
@@ -179,7 +179,7 @@ void bch_journal_mark(struct cache_set *c, struct list_head *list);
 void bch_journal_meta(struct cache_set *c, struct closure *cl);
 int bch_journal_read(struct cache_set *c, struct list_head *list);
 int bch_journal_replay(struct cache_set *c, struct list_head *list);
-
+int bch_journal_init(struct cache_set *c);
 void bch_journal_free(struct cache_set *c);
 int bch_journal_alloc(struct cache_set *c);
 
diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
index 497360c60f26..55f3f9b7fb0c 100644
--- a/drivers/md/bcache/nvm-pages.c
+++ b/drivers/md/bcache/nvm-pages.c
@@ -24,6 +24,15 @@
 
 struct bch_nvmpg_set *global_nvmpg_set;
 
+struct bch_nvmpg_ns *bch_nvmpg_id_to_ns(int ns_id)
+{
+	if ((ns_id >= 0) && (ns_id < BCH_NVMPG_NS_MAX))
+		return global_nvmpg_set->ns_tbl[ns_id];
+
+	pr_emerg("Invalid ns_id: %d\n", ns_id);
+	return NULL;
+}
+
 void *bch_nvmpg_offset_to_ptr(unsigned long offset)
 {
 	int ns_id = BCH_NVMPG_GET_NS_ID(offset);
diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
index 2f6f2ffbfd80..13cc6a532bda 100644
--- a/drivers/md/bcache/nvm-pages.h
+++ b/drivers/md/bcache/nvm-pages.h
@@ -94,6 +94,7 @@ void bch_nvmpg_exit(void);
 void *bch_nvmpg_alloc_pages(int order, const char *uuid);
 void bch_nvmpg_free_pages(void *addr, int order, const char *uuid);
 struct bch_nvmpg_head *bch_get_nvmpg_head(const char *uuid);
+struct bch_nvmpg_ns *bch_nvmpg_id_to_ns(int ns_id);
 
 #else
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 4326ffa0d21f..e66e1d6ef260 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -147,9 +147,11 @@ static const char *read_super_common(struct cache_sb *sb,  struct block_device *
 		goto err;
 
 	err = "Journal buckets not sequential";
-	for (i = 0; i < sb->keys; i++)
-		if (sb->d[i] != sb->first_bucket + i)
-			goto err;
+	if (!bch_has_feature_nvdimm_meta(sb)) {
+		for (i = 0; i < sb->keys; i++)
+			if (sb->d[i] != sb->first_bucket + i)
+				goto err;
+	}
 
 	err = "Too many journal buckets";
 	if (sb->first_bucket + sb->keys > sb->nbuckets)
@@ -2065,14 +2067,10 @@ static int run_cache_set(struct cache_set *c)
 		if (bch_journal_replay(c, &journal))
 			goto err;
 	} else {
-		unsigned int j;
-
 		pr_notice("invalidating existing data\n");
-		ca->sb.keys = clamp_t(int, ca->sb.nbuckets >> 7,
-					2, SB_JOURNAL_BUCKETS);
-
-		for (j = 0; j < ca->sb.keys; j++)
-			ca->sb.d[j] = ca->sb.first_bucket + j;
+		err = "error initializing journal";
+		if (bch_journal_init(c))
+			goto err;
 
 		bch_initial_gc_finish(c);
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v12 10/12] bcache: support storing bcache journal into NVDIMM meta device
  2021-08-11 17:02 [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li
                   ` (8 preceding siblings ...)
  2021-08-11 17:02 ` [PATCH v12 09/12] bcache: initialize bcache journal for NVDIMM meta device Coly Li
@ 2021-08-11 17:02 ` Coly Li
  2021-08-11 17:02 ` [PATCH v12 11/12] bcache: read jset from NVDIMM pages for journal replay Coly Li
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Coly Li @ 2021-08-11 17:02 UTC (permalink / raw)
  To: linux-bcache
  Cc: linux-block, linux-nvdimm, axboe, hare, jack, dan.j.williams,
	hch, ying.huang, Coly Li, Hannes Reinecke, Jianpeng Ma,
	Qiaowei Ren

This patch implements two methods to store bcache journal to,
1) __journal_write_unlocked() for block interface device
   The latency method to compose bio and issue the jset bio to cache
   device (e.g. SSD). c->journal.key.ptr[0] indicates the LBA on cache
   device to store the journal jset.
2) __journal_nvdimm_write_unlocked() for memory interface NVDIMM
   Use memory interface to access NVDIMM pages and store the jset by
   memcpy_flushcache(). c->journal.key.ptr[0] indicates the linear
   address from the NVDIMM pages to store the journal jset.

For legacy configuration without NVDIMM meta device, journal I/O is
handled by __journal_write_unlocked() with existing code logic. If the
NVDIMM meta device is used (by bcache-tools), the journal I/O will
be handled by __journal_nvdimm_write_unlocked() and go into the NVDIMM
pages.

And when NVDIMM meta device is used, sb.d[] stores the linear addresses
from NVDIMM pages (no more bucket index), in journal_reclaim() the
journaling location in c->journal.key.ptr[0] should also be updated by
linear address from NVDIMM pages (no more LBA combined by sectors offset
and bucket index).

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
---
 drivers/md/bcache/journal.c | 120 +++++++++++++++++++++++++-----------
 drivers/md/bcache/super.c   |   3 +-
 2 files changed, 85 insertions(+), 38 deletions(-)

diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 9fe6c1abfd84..8cd0c4dc9137 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -596,6 +596,8 @@ static void do_journal_discard(struct cache *ca)
 		return;
 	}
 
+	BUG_ON(bch_has_feature_nvdimm_meta(&ca->sb));
+
 	switch (atomic_read(&ja->discard_in_flight)) {
 	case DISCARD_IN_FLIGHT:
 		return;
@@ -661,9 +663,16 @@ static void journal_reclaim(struct cache_set *c)
 		goto out;
 
 	ja->cur_idx = next;
-	k->ptr[0] = MAKE_PTR(0,
-			     bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
-			     ca->sb.nr_this_dev);
+	if (!bch_has_feature_nvdimm_meta(&ca->sb))
+		k->ptr[0] = MAKE_PTR(0,
+			bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
+			ca->sb.nr_this_dev);
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+	else
+		k->ptr[0] = (unsigned long)bch_nvmpg_offset_to_ptr(
+						ca->sb.d[ja->cur_idx]);
+#endif
+
 	atomic_long_inc(&c->reclaimed_journal_buckets);
 
 	bkey_init(k);
@@ -729,46 +738,21 @@ static void journal_write_unlock(struct closure *cl)
 	spin_unlock(&c->journal.lock);
 }
 
-static void journal_write_unlocked(struct closure *cl)
+
+static void __journal_write_unlocked(struct cache_set *c)
 	__releases(c->journal.lock)
 {
-	struct cache_set *c = container_of(cl, struct cache_set, journal.io);
-	struct cache *ca = c->cache;
-	struct journal_write *w = c->journal.cur;
 	struct bkey *k = &c->journal.key;
-	unsigned int i, sectors = set_blocks(w->data, block_bytes(ca)) *
-		ca->sb.block_size;
-
+	struct journal_write *w = c->journal.cur;
+	struct closure *cl = &c->journal.io;
+	struct cache *ca = c->cache;
 	struct bio *bio;
 	struct bio_list list;
+	unsigned int i, sectors = set_blocks(w->data, block_bytes(ca)) *
+		ca->sb.block_size;
 
 	bio_list_init(&list);
 
-	if (!w->need_write) {
-		closure_return_with_destructor(cl, journal_write_unlock);
-		return;
-	} else if (journal_full(&c->journal)) {
-		journal_reclaim(c);
-		spin_unlock(&c->journal.lock);
-
-		btree_flush_write(c);
-		continue_at(cl, journal_write, bch_journal_wq);
-		return;
-	}
-
-	c->journal.blocks_free -= set_blocks(w->data, block_bytes(ca));
-
-	w->data->btree_level = c->root->level;
-
-	bkey_copy(&w->data->btree_root, &c->root->key);
-	bkey_copy(&w->data->uuid_bucket, &c->uuid_bucket);
-
-	w->data->prio_bucket[ca->sb.nr_this_dev] = ca->prio_buckets[0];
-	w->data->magic		= jset_magic(&ca->sb);
-	w->data->version	= BCACHE_JSET_VERSION;
-	w->data->last_seq	= last_seq(&c->journal);
-	w->data->csum		= csum_set(w->data);
-
 	for (i = 0; i < KEY_PTRS(k); i++) {
 		ca = c->cache;
 		bio = &ca->journal.bio;
@@ -793,7 +777,6 @@ static void journal_write_unlocked(struct closure *cl)
 
 		ca->journal.seq[ca->journal.cur_idx] = w->data->seq;
 	}
-
 	/* If KEY_PTRS(k) == 0, this jset gets lost in air */
 	BUG_ON(i == 0);
 
@@ -805,6 +788,71 @@ static void journal_write_unlocked(struct closure *cl)
 
 	while ((bio = bio_list_pop(&list)))
 		closure_bio_submit(c, bio, cl);
+}
+
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+
+static void __journal_nvdimm_write_unlocked(struct cache_set *c)
+	__releases(c->journal.lock)
+{
+	struct journal_write *w = c->journal.cur;
+	struct cache *ca = c->cache;
+	unsigned int sectors;
+
+	sectors = set_blocks(w->data, block_bytes(ca)) * ca->sb.block_size;
+	atomic_long_add(sectors, &ca->meta_sectors_written);
+
+	memcpy_flushcache((void *)c->journal.key.ptr[0], w->data, sectors << 9);
+
+	c->journal.key.ptr[0] += sectors << 9;
+	ca->journal.seq[ca->journal.cur_idx] = w->data->seq;
+
+	atomic_dec_bug(&fifo_back(&c->journal.pin));
+	bch_journal_next(&c->journal);
+	journal_reclaim(c);
+
+	spin_unlock(&c->journal.lock);
+}
+
+#endif /* CONFIG_BCACHE_NVM_PAGES */
+
+static void journal_write_unlocked(struct closure *cl)
+{
+	struct cache_set *c = container_of(cl, struct cache_set, journal.io);
+	struct cache *ca = c->cache;
+	struct journal_write *w = c->journal.cur;
+
+	if (!w->need_write) {
+		closure_return_with_destructor(cl, journal_write_unlock);
+		return;
+	} else if (journal_full(&c->journal)) {
+		journal_reclaim(c);
+		spin_unlock(&c->journal.lock);
+
+		btree_flush_write(c);
+		continue_at(cl, journal_write, bch_journal_wq);
+		return;
+	}
+
+	c->journal.blocks_free -= set_blocks(w->data, block_bytes(ca));
+
+	w->data->btree_level = c->root->level;
+
+	bkey_copy(&w->data->btree_root, &c->root->key);
+	bkey_copy(&w->data->uuid_bucket, &c->uuid_bucket);
+
+	w->data->prio_bucket[ca->sb.nr_this_dev] = ca->prio_buckets[0];
+	w->data->magic		= jset_magic(&ca->sb);
+	w->data->version	= BCACHE_JSET_VERSION;
+	w->data->last_seq	= last_seq(&c->journal);
+	w->data->csum		= csum_set(w->data);
+
+	if (!bch_has_feature_nvdimm_meta(&ca->sb))
+		__journal_write_unlocked(c);
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+	else
+		__journal_nvdimm_write_unlocked(c);
+#endif
 
 	continue_at(cl, journal_write_done, NULL);
 }
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index e66e1d6ef260..24734250d005 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1676,7 +1676,7 @@ void bch_cache_set_release(struct kobject *kobj)
 static void cache_set_free(struct closure *cl)
 {
 	struct cache_set *c = container_of(cl, struct cache_set, cl);
-	struct cache *ca;
+	struct cache *ca = c->cache;
 
 	debugfs_remove(c->debug);
 
@@ -1688,7 +1688,6 @@ static void cache_set_free(struct closure *cl)
 	bch_bset_sort_state_free(&c->sort);
 	free_pages((unsigned long) c->uuids, ilog2(meta_bucket_pages(&c->cache->sb)));
 
-	ca = c->cache;
 	if (ca) {
 		ca->set = NULL;
 		c->cache = NULL;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v12 11/12] bcache: read jset from NVDIMM pages for journal replay
  2021-08-11 17:02 [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li
                   ` (9 preceding siblings ...)
  2021-08-11 17:02 ` [PATCH v12 10/12] bcache: support storing bcache journal into " Coly Li
@ 2021-08-11 17:02 ` Coly Li
  2021-08-11 17:02 ` [PATCH v12 12/12] bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device Coly Li
  2021-08-15 16:21 ` [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li
  12 siblings, 0 replies; 16+ messages in thread
From: Coly Li @ 2021-08-11 17:02 UTC (permalink / raw)
  To: linux-bcache
  Cc: linux-block, linux-nvdimm, axboe, hare, jack, dan.j.williams,
	hch, ying.huang, Coly Li, kernel test robot, Dan Carpenter,
	Hannes Reinecke, Jianpeng Ma, Qiaowei Ren

This patch implements two methods to read jset from media for journal
replay,
- __jnl_rd_bkt() for block device
  This is the legacy method to read jset via block device interface.
- __jnl_rd_nvm_bkt() for NVDIMM
  This is the method to read jset from NVDIMM memory interface, a.k.a
  memcopy() from NVDIMM pages to DRAM pages.

If BCH_FEATURE_INCOMPAT_NVDIMM_META is set in incompat feature set,
during running cache set, journal_read_bucket() will read the journal
content from NVDIMM by __jnl_rd_nvm_bkt(). The linear addresses of
NVDIMM pages to read jset are stored in sb.d[SB_JOURNAL_BUCKETS], which
were initialized and maintained in previous runs of the cache set.

A thing should be noticed is, when bch_journal_read() is called, the
linear address of NVDIMM pages is not loaded and initialized yet, it
is necessary to call __bch_journal_nvdimm_init() before reading the jset
from NVDIMM pages.

The code comments added in journal_read_bucket() is noticed by kernel
test robot and Dan Carpenter, it explains why it is safe to only check
!bch_has_feature_nvdimm_meta() condition in the if() statement when
CONFIG_BCACHE_NVM_PAGES is not configured. To avoid confusion from the
bogus warning message from static checking tool.

Signed-off-by: Coly Li <colyli@suse.de>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
---
 drivers/md/bcache/journal.c | 88 ++++++++++++++++++++++++++++++-------
 1 file changed, 71 insertions(+), 17 deletions(-)

diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 8cd0c4dc9137..987306b4db20 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -34,18 +34,60 @@ static void journal_read_endio(struct bio *bio)
 	closure_put(cl);
 }
 
+static struct jset *__jnl_rd_bkt(struct cache *ca, unsigned int bkt_idx,
+				    unsigned int len, unsigned int offset,
+				    struct closure *cl)
+{
+	sector_t bucket = bucket_to_sector(ca->set, ca->sb.d[bkt_idx]);
+	struct bio *bio = &ca->journal.bio;
+	struct jset *data = ca->set->journal.w[0].data;
+
+	bio_reset(bio);
+	bio->bi_iter.bi_sector	= bucket + offset;
+	bio_set_dev(bio, ca->bdev);
+	bio->bi_iter.bi_size	= len << 9;
+
+	bio->bi_end_io	= journal_read_endio;
+	bio->bi_private = cl;
+	bio_set_op_attrs(bio, REQ_OP_READ, 0);
+	bch_bio_map(bio, data);
+
+	closure_bio_submit(ca->set, bio, cl);
+	closure_sync(cl);
+
+	/* Indeed journal.w[0].data */
+	return data;
+}
+
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+
+static struct jset *__jnl_rd_nvm_bkt(struct cache *ca, unsigned int bkt_idx,
+				     unsigned int len, unsigned int offset)
+{
+	void *jset_addr;
+	struct jset *data;
+
+	jset_addr = bch_nvmpg_offset_to_ptr(ca->sb.d[bkt_idx]) + (offset << 9);
+	data = ca->set->journal.w[0].data;
+
+	memcpy(data, jset_addr, len << 9);
+
+	/* Indeed journal.w[0].data */
+	return data;
+}
+
+#endif /* CONFIG_BCACHE_NVM_PAGES */
+
 static int journal_read_bucket(struct cache *ca, struct list_head *list,
 			       unsigned int bucket_index)
 {
 	struct journal_device *ja = &ca->journal;
-	struct bio *bio = &ja->bio;
 
 	struct journal_replay *i;
-	struct jset *j, *data = ca->set->journal.w[0].data;
+	struct jset *j;
 	struct closure cl;
 	unsigned int len, left, offset = 0;
 	int ret = 0;
-	sector_t bucket = bucket_to_sector(ca->set, ca->sb.d[bucket_index]);
 
 	closure_init_stack(&cl);
 
@@ -55,26 +97,27 @@ static int journal_read_bucket(struct cache *ca, struct list_head *list,
 reread:		left = ca->sb.bucket_size - offset;
 		len = min_t(unsigned int, left, PAGE_SECTORS << JSET_BITS);
 
-		bio_reset(bio);
-		bio->bi_iter.bi_sector	= bucket + offset;
-		bio_set_dev(bio, ca->bdev);
-		bio->bi_iter.bi_size	= len << 9;
-
-		bio->bi_end_io	= journal_read_endio;
-		bio->bi_private = &cl;
-		bio_set_op_attrs(bio, REQ_OP_READ, 0);
-		bch_bio_map(bio, data);
-
-		closure_bio_submit(ca->set, bio, &cl);
-		closure_sync(&cl);
+		if (!bch_has_feature_nvdimm_meta(&ca->sb))
+			j = __jnl_rd_bkt(ca, bucket_index, len, offset, &cl);
+		/*
+		 * If CONFIG_BCACHE_NVM_PAGES is not defined, the feature bit
+		 * BCH_FEATURE_INCOMPAT_NVDIMM_META won't in incompatible
+		 * support feature set, a cache device format with feature bit
+		 * BCH_FEATURE_INCOMPAT_NVDIMM_META will fail much earlier in
+		 * read_super() by bch_has_unknown_incompat_features().
+		 * Therefore when CONFIG_BCACHE_NVM_PAGES is not define, it is
+		 * safe to ignore the bch_has_feature_nvdimm_meta() condition.
+		 */
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+		else
+			j = __jnl_rd_nvm_bkt(ca, bucket_index, len, offset);
+#endif
 
 		/* This function could be simpler now since we no longer write
 		 * journal entries that overlap bucket boundaries; this means
 		 * the start of a bucket will always have a valid journal entry
 		 * if it has any journal entries at all.
 		 */
-
-		j = data;
 		while (len) {
 			struct list_head *where;
 			size_t blocks, bytes = set_bytes(j);
@@ -170,6 +213,8 @@ reread:		left = ca->sb.bucket_size - offset;
 	return ret;
 }
 
+static int __bch_journal_nvdimm_init(struct cache *ca);
+
 int bch_journal_read(struct cache_set *c, struct list_head *list)
 {
 #define read_bucket(b)							\
@@ -188,6 +233,15 @@ int bch_journal_read(struct cache_set *c, struct list_head *list)
 	unsigned int i, l, r, m;
 	uint64_t seq;
 
+	/*
+	 * Linear addresses of NVDIMM pages for journaling is not
+	 * initialized yet, do it before read jset from NVDIMM pages.
+	 */
+	if (bch_has_feature_nvdimm_meta(&ca->sb)) {
+		if (__bch_journal_nvdimm_init(ca) < 0)
+			return -ENXIO;
+	}
+
 	bitmap_zero(bitmap, SB_JOURNAL_BUCKETS);
 	pr_debug("%u journal buckets\n", ca->sb.njournal_buckets);
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v12 12/12] bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device
  2021-08-11 17:02 [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li
                   ` (10 preceding siblings ...)
  2021-08-11 17:02 ` [PATCH v12 11/12] bcache: read jset from NVDIMM pages for journal replay Coly Li
@ 2021-08-11 17:02 ` Coly Li
  2021-08-15 16:21 ` [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li
  12 siblings, 0 replies; 16+ messages in thread
From: Coly Li @ 2021-08-11 17:02 UTC (permalink / raw)
  To: linux-bcache
  Cc: linux-block, linux-nvdimm, axboe, hare, jack, dan.j.williams,
	hch, ying.huang, Coly Li, Hannes Reinecke, Jianpeng Ma,
	Qiaowei Ren

This patch adds a sysfs interface register_nvdimm_meta to register
NVDIMM meta device. The sysfs interface file only shows up when
CONFIG_BCACHE_NVM_PAGES=y. Then a NVDIMM name space formatted by
bcache-tools can be registered into bcache by e.g.,
  echo /dev/pmem0 > /sys/fs/bcache/register_nvdimm_meta

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
---
 drivers/md/bcache/super.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 24734250d005..434739594baf 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2403,10 +2403,18 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 static ssize_t bch_pending_bdevs_cleanup(struct kobject *k,
 					 struct kobj_attribute *attr,
 					 const char *buffer, size_t size);
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+static ssize_t register_nvdimm_meta(struct kobject *k,
+				    struct kobj_attribute *attr,
+				    const char *buffer, size_t size);
+#endif
 
 kobj_attribute_write(register,		register_bcache);
 kobj_attribute_write(register_quiet,	register_bcache);
 kobj_attribute_write(pendings_cleanup,	bch_pending_bdevs_cleanup);
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+kobj_attribute_write(register_nvdimm_meta, register_nvdimm_meta);
+#endif
 
 static bool bch_is_open_backing(dev_t dev)
 {
@@ -2520,6 +2528,24 @@ static void register_device_async(struct async_reg_args *args)
 	queue_delayed_work(system_wq, &args->reg_work, 10);
 }
 
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+static ssize_t register_nvdimm_meta(struct kobject *k, struct kobj_attribute *attr,
+				    const char *buffer, size_t size)
+{
+	ssize_t ret = size;
+
+	struct bch_nvmpg_ns *ns = bch_register_namespace(buffer);
+
+	if (IS_ERR(ns)) {
+		pr_err("register nvdimm namespace %s for meta device failed.\n",
+			buffer);
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+#endif
+
 static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 			       const char *buffer, size_t size)
 {
@@ -2855,6 +2881,9 @@ static int __init bcache_init(void)
 	static const struct attribute *files[] = {
 		&ksysfs_register.attr,
 		&ksysfs_register_quiet.attr,
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+		&ksysfs_register_nvdimm_meta.attr,
+#endif
 		&ksysfs_pendings_cleanup.attr,
 		NULL
 	};
-- 
2.26.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v12 02/12] bcache: initialize the nvm pages allocator
  2021-08-11 17:02 ` [PATCH v12 02/12] bcache: initialize the nvm pages allocator Coly Li
@ 2021-08-12  5:43   ` Dan Williams
  2021-08-12  8:26     ` Coly Li
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2021-08-12  5:43 UTC (permalink / raw)
  To: Coly Li
  Cc: linux-bcache, linux-block, linux-nvdimm, Jens Axboe,
	Hannes Reinecke, Jan Kara, Christoph Hellwig, Huang, Ying,
	Jianpeng Ma, Randy Dunlap, Qiaowei Ren, Hannes Reinecke

On Wed, Aug 11, 2021 at 10:04 AM Coly Li <colyli@suse.de> wrote:
>
> From: Jianpeng Ma <jianpeng.ma@intel.com>
>
> This patch define the prototype data structures in memory and
> initializes the nvm pages allocator.
>
> The nvm address space which is managed by this allocator can consist of
> many nvm namespaces, and some namespaces can compose into one nvm set,
> like cache set. For this initial implementation, only one set can be
> supported.
>
> The users of this nvm pages allocator need to call register_namespace()
> to register the nvdimm device (like /dev/pmemX) into this allocator as
> the instance of struct nvm_namespace.
>
> Reported-by: Randy Dunlap <rdunlap@infradead.org>
> Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
> Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
> Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Jens Axboe <axboe@kernel.dk>
> ---
>  drivers/md/bcache/Kconfig     |  10 +
>  drivers/md/bcache/Makefile    |   1 +
>  drivers/md/bcache/nvm-pages.c | 339 ++++++++++++++++++++++++++++++++++
>  drivers/md/bcache/nvm-pages.h |  96 ++++++++++
>  drivers/md/bcache/super.c     |   3 +
>  5 files changed, 449 insertions(+)
>  create mode 100644 drivers/md/bcache/nvm-pages.c
>  create mode 100644 drivers/md/bcache/nvm-pages.h
>
> diff --git a/drivers/md/bcache/Kconfig b/drivers/md/bcache/Kconfig
> index d1ca4d059c20..a69f6c0e0507 100644
> --- a/drivers/md/bcache/Kconfig
> +++ b/drivers/md/bcache/Kconfig
> @@ -35,3 +35,13 @@ config BCACHE_ASYNC_REGISTRATION
>         device path into this file will returns immediately and the real
>         registration work is handled in kernel work queue in asynchronous
>         way.
> +
> +config BCACHE_NVM_PAGES
> +       bool "NVDIMM support for bcache (EXPERIMENTAL)"
> +       depends on BCACHE
> +       depends on 64BIT
> +       depends on LIBNVDIMM
> +       depends on DAX
> +       help
> +         Allocate/release NV-memory pages for bcache and provide allocated pages
> +         for each requestor after system reboot.
> diff --git a/drivers/md/bcache/Makefile b/drivers/md/bcache/Makefile
> index 5b87e59676b8..2397bb7c7ffd 100644
> --- a/drivers/md/bcache/Makefile
> +++ b/drivers/md/bcache/Makefile
> @@ -5,3 +5,4 @@ obj-$(CONFIG_BCACHE)    += bcache.o
>  bcache-y               := alloc.o bset.o btree.o closure.o debug.o extents.o\
>         io.o journal.o movinggc.o request.o stats.o super.o sysfs.o trace.o\
>         util.o writeback.o features.o
> +bcache-$(CONFIG_BCACHE_NVM_PAGES) += nvm-pages.o
> diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
> new file mode 100644
> index 000000000000..6184c628d9cc
> --- /dev/null
> +++ b/drivers/md/bcache/nvm-pages.c
> @@ -0,0 +1,339 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Nvdimm page-buddy allocator
> + *
> + * Copyright (c) 2021, Intel Corporation.
> + * Copyright (c) 2021, Qiaowei Ren <qiaowei.ren@intel.com>.
> + * Copyright (c) 2021, Jianpeng Ma <jianpeng.ma@intel.com>.
> + */
> +
> +#include "bcache.h"
> +#include "nvm-pages.h"
> +
> +#include <linux/slab.h>
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +#include <linux/dax.h>
> +#include <linux/pfn_t.h>
> +#include <linux/libnvdimm.h>
> +#include <linux/mm_types.h>
> +#include <linux/err.h>
> +#include <linux/pagemap.h>
> +#include <linux/bitmap.h>
> +#include <linux/blkdev.h>
> +
> +struct bch_nvmpg_set *global_nvmpg_set;
> +
> +void *bch_nvmpg_offset_to_ptr(unsigned long offset)
> +{
> +       int ns_id = BCH_NVMPG_GET_NS_ID(offset);
> +       struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[ns_id];
> +
> +       if (offset == 0)
> +               return NULL;
> +
> +       ns_id = BCH_NVMPG_GET_NS_ID(offset);
> +       ns = global_nvmpg_set->ns_tbl[ns_id];
> +
> +       if (ns)
> +               return (void *)(ns->base_addr + BCH_NVMPG_GET_OFFSET(offset));
> +
> +       pr_err("Invalid ns_id %u\n", ns_id);
> +       return NULL;
> +}
> +
> +unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr)
> +{
> +       int ns_id = ns->ns_id;
> +       unsigned long offset = (unsigned long)(ptr - ns->base_addr);
> +
> +       return BCH_NVMPG_OFFSET(ns_id, offset);
> +}
> +
> +static void release_ns_tbl(struct bch_nvmpg_set *set)
> +{
> +       int i;
> +       struct bch_nvmpg_ns *ns;
> +
> +       for (i = 0; i < BCH_NVMPG_NS_MAX; i++) {
> +               ns = set->ns_tbl[i];
> +               if (ns) {
> +                       blkdev_put(ns->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXEC);
> +                       set->ns_tbl[i] = NULL;
> +                       set->attached_ns--;
> +                       kfree(ns);
> +               }
> +       }
> +
> +       if (set->attached_ns)
> +               pr_err("unexpected attached_ns: %u\n", set->attached_ns);
> +}
> +
> +static void release_nvmpg_set(struct bch_nvmpg_set *set)
> +{
> +       release_ns_tbl(set);
> +       kfree(set);
> +}
> +
> +/* Namespace 0 contains all meta data of the nvmpg allocation set */
> +static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns)
> +{
> +       struct bch_nvmpg_set_header *set_header;
> +
> +       if (ns->ns_id != 0) {
> +               pr_err("unexpected ns_id %u for first nvmpg namespace.\n",
> +                      ns->ns_id);
> +               return -EINVAL;
> +       }
> +
> +       set_header = bch_nvmpg_offset_to_ptr(ns->sb->set_header_offset);
> +
> +       mutex_lock(&global_nvmpg_set->lock);
> +       global_nvmpg_set->set_header = set_header;
> +       global_nvmpg_set->heads_size = set_header->size;
> +       global_nvmpg_set->heads_used = set_header->used;
> +       mutex_unlock(&global_nvmpg_set->lock);
> +
> +       return 0;
> +}
> +
> +static int attach_nvmpg_set(struct bch_nvmpg_ns *ns)
> +{
> +       struct bch_nvmpg_sb *sb = ns->sb;
> +       int rc = 0;
> +
> +       mutex_lock(&global_nvmpg_set->lock);
> +
> +       if (global_nvmpg_set->ns_tbl[sb->this_ns]) {
> +               pr_err("ns_id %u already attached.\n", ns->ns_id);
> +               rc = -EEXIST;
> +               goto unlock;
> +       }
> +
> +       if (ns->ns_id != 0) {
> +               pr_err("unexpected ns_id %u for first namespace.\n", ns->ns_id);
> +               rc = -EINVAL;
> +               goto unlock;
> +       }
> +
> +       if (global_nvmpg_set->attached_ns > 0) {
> +               pr_err("multiple namespace attaching not supported yet\n");
> +               rc = -EOPNOTSUPP;
> +               goto unlock;
> +       }
> +
> +       if ((global_nvmpg_set->attached_ns + 1) > sb->total_ns) {
> +               pr_err("namespace counters error: attached %u > total %u\n",
> +                      global_nvmpg_set->attached_ns,
> +                      global_nvmpg_set->total_ns);
> +               rc = -EINVAL;
> +               goto unlock;
> +       }
> +
> +       memcpy(global_nvmpg_set->set_uuid, sb->set_uuid, 16);
> +       global_nvmpg_set->ns_tbl[sb->this_ns] = ns;
> +       global_nvmpg_set->attached_ns++;
> +       global_nvmpg_set->total_ns = sb->total_ns;
> +
> +unlock:
> +       mutex_unlock(&global_nvmpg_set->lock);
> +       return rc;
> +}
> +
> +static int read_nvdimm_meta_super(struct block_device *bdev,
> +                                 struct bch_nvmpg_ns *ns)
> +{
> +       struct page *page;
> +       struct bch_nvmpg_sb *sb;
> +       uint64_t expected_csum = 0;
> +       int r;
> +
> +       page = read_cache_page_gfp(bdev->bd_inode->i_mapping,
> +                               BCH_NVMPG_SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL);
> +
> +       if (IS_ERR(page))
> +               return -EIO;
> +
> +       sb = (struct bch_nvmpg_sb *)
> +            (page_address(page) + offset_in_page(BCH_NVMPG_SB_OFFSET));
> +
> +       r = -EINVAL;
> +       expected_csum = csum_set(sb);
> +       if (expected_csum != sb->csum) {
> +               pr_info("csum is not match with expected one\n");
> +               goto put_page;
> +       }
> +
> +       if (memcmp(sb->magic, bch_nvmpg_magic, 16)) {
> +               pr_info("invalid bch_nvmpg_magic\n");
> +               goto put_page;
> +       }
> +
> +       if (sb->sb_offset !=
> +           BCH_NVMPG_OFFSET(sb->this_ns, BCH_NVMPG_SB_OFFSET)) {
> +               pr_info("invalid superblock offset 0x%llx\n", sb->sb_offset);
> +               goto put_page;
> +       }
> +
> +       r = -EOPNOTSUPP;
> +       if (sb->total_ns != 1) {
> +               pr_info("multiple name space not supported yet.\n");
> +               goto put_page;
> +       }
> +
> +
> +       r = 0;
> +       /* Necessary for DAX mapping */
> +       ns->page_size = sb->page_size;
> +       ns->pages_total = sb->pages_total;
> +
> +put_page:
> +       put_page(page);
> +       return r;
> +}
> +
> +struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
> +{
> +       struct bch_nvmpg_ns *ns = NULL;
> +       struct bch_nvmpg_sb *sb = NULL;
> +       char buf[BDEVNAME_SIZE];
> +       struct block_device *bdev;
> +       pgoff_t pgoff;
> +       int id, err;
> +       char *path;
> +       long dax_ret = 0;
> +
> +       path = kstrndup(dev_path, 512, GFP_KERNEL);
> +       if (!path) {
> +               pr_err("kstrndup failed\n");
> +               return ERR_PTR(-ENOMEM);
> +       }
> +
> +       bdev = blkdev_get_by_path(strim(path),
> +                                 FMODE_READ|FMODE_WRITE|FMODE_EXEC,
> +                                 global_nvmpg_set);
> +       if (IS_ERR(bdev)) {
> +               pr_err("get %s error: %ld\n", dev_path, PTR_ERR(bdev));
> +               kfree(path);
> +               return ERR_PTR(PTR_ERR(bdev));
> +       }
> +
> +       err = -ENOMEM;
> +       ns = kzalloc(sizeof(struct bch_nvmpg_ns), GFP_KERNEL);
> +       if (!ns)
> +               goto bdput;
> +
> +       err = -EIO;
> +       if (read_nvdimm_meta_super(bdev, ns)) {
> +               pr_err("%s read nvdimm meta super block failed.\n",
> +                      bdevname(bdev, buf));
> +               goto free_ns;
> +       }
> +
> +       err = -EOPNOTSUPP;
> +       if (!bdev_dax_supported(bdev, ns->page_size)) {
> +               pr_err("%s don't support DAX\n", bdevname(bdev, buf));
> +               goto free_ns;
> +       }
> +
> +       err = -EINVAL;
> +       if (bdev_dax_pgoff(bdev, 0, ns->page_size, &pgoff)) {
> +               pr_err("invalid offset of %s\n", bdevname(bdev, buf));
> +               goto free_ns;
> +       }
> +
> +       err = -ENOMEM;
> +       ns->dax_dev = fs_dax_get_by_bdev(bdev);
> +       if (!ns->dax_dev) {
> +               pr_err("can't by dax device by %s\n", bdevname(bdev, buf));
> +               goto free_ns;
> +       }
> +
> +       err = -EINVAL;
> +       id = dax_read_lock();
> +       dax_ret = dax_direct_access(ns->dax_dev, pgoff, ns->pages_total,
> +                                   &ns->base_addr, &ns->start_pfn);
> +       if (dax_ret <= 0) {
> +               pr_err("dax_direct_access error\n");
> +               dax_read_unlock(id);
> +               goto free_ns;
> +       }
> +
> +       if (dax_ret < ns->pages_total) {
> +               pr_warn("mapped range %ld is less than ns->pages_total %lu\n",
> +                       dax_ret, ns->pages_total);

This failure will become a common occurrence with CXL namespaces that
will have discontiguous range support. It's already the case for
dax-devices for soft-reserved memory [1]. In the CXL case the
discontinuity will be 256MB aligned, for the soft-reserved dax-devices
the discontinuity granularity can be as small as 4K.

[1]: https://elixir.bootlin.com/linux/v5.14-rc5/source/drivers/dax/device.c#L414

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v12 02/12] bcache: initialize the nvm pages allocator
  2021-08-12  5:43   ` Dan Williams
@ 2021-08-12  8:26     ` Coly Li
  0 siblings, 0 replies; 16+ messages in thread
From: Coly Li @ 2021-08-12  8:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-bcache, linux-block, linux-nvdimm, Jens Axboe,
	Hannes Reinecke, Jan Kara, Christoph Hellwig, Huang, Ying,
	Jianpeng Ma, Randy Dunlap, Qiaowei Ren, Hannes Reinecke

On 8/12/21 1:43 PM, Dan Williams wrote:
> On Wed, Aug 11, 2021 at 10:04 AM Coly Li <colyli@suse.de> wrote:
>> From: Jianpeng Ma <jianpeng.ma@intel.com>
>>
>> This patch define the prototype data structures in memory and
>> initializes the nvm pages allocator.
>>
>> The nvm address space which is managed by this allocator can consist of
>> many nvm namespaces, and some namespaces can compose into one nvm set,
>> like cache set. For this initial implementation, only one set can be
>> supported.
>>
>> The users of this nvm pages allocator need to call register_namespace()
>> to register the nvdimm device (like /dev/pmemX) into this allocator as
>> the instance of struct nvm_namespace.
>>
>> Reported-by: Randy Dunlap <rdunlap@infradead.org>
>> Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
>> Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
>> Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
>> Cc: Christoph Hellwig <hch@lst.de>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Hannes Reinecke <hare@suse.de>
>> Cc: Jens Axboe <axboe@kernel.dk>
>> ---
>>  drivers/md/bcache/Kconfig     |  10 +
>>  drivers/md/bcache/Makefile    |   1 +
>>  drivers/md/bcache/nvm-pages.c | 339 ++++++++++++++++++++++++++++++++++
>>  drivers/md/bcache/nvm-pages.h |  96 ++++++++++
>>  drivers/md/bcache/super.c     |   3 +
>>  5 files changed, 449 insertions(+)
>>  create mode 100644 drivers/md/bcache/nvm-pages.c
>>  create mode 100644 drivers/md/bcache/nvm-pages.h
>>
[snipped]
>> +
>> +       err = -EOPNOTSUPP;
>> +       if (!bdev_dax_supported(bdev, ns->page_size)) {
>> +               pr_err("%s don't support DAX\n", bdevname(bdev, buf));
>> +               goto free_ns;
>> +       }
>> +
>> +       err = -EINVAL;
>> +       if (bdev_dax_pgoff(bdev, 0, ns->page_size, &pgoff)) {
>> +               pr_err("invalid offset of %s\n", bdevname(bdev, buf));
>> +               goto free_ns;
>> +       }
>> +
>> +       err = -ENOMEM;
>> +       ns->dax_dev = fs_dax_get_by_bdev(bdev);
>> +       if (!ns->dax_dev) {
>> +               pr_err("can't by dax device by %s\n", bdevname(bdev, buf));
>> +               goto free_ns;
>> +       }
>> +
>> +       err = -EINVAL;
>> +       id = dax_read_lock();
>> +       dax_ret = dax_direct_access(ns->dax_dev, pgoff, ns->pages_total,
>> +                                   &ns->base_addr, &ns->start_pfn);
>> +       if (dax_ret <= 0) {
>> +               pr_err("dax_direct_access error\n");
>> +               dax_read_unlock(id);
>> +               goto free_ns;
>> +       }
>> +
>> +       if (dax_ret < ns->pages_total) {
>> +               pr_warn("mapped range %ld is less than ns->pages_total %lu\n",
>> +                       dax_ret, ns->pages_total);

Hi Dan,

Many thanks for your information.

> This failure will become a common occurrence with CXL namespaces that
> will have discontiguous range support. It's already the case for
> dax-devices for soft-reserved memory [1]. In the CXL case the
> discontinuity will be 256MB aligned, for the soft-reserved dax-devices
> the discontinuity granularity can be as small as 4K.
>
> [1]: https://elixir.bootlin.com/linux/v5.14-rc5/source/drivers/dax/device.c#L414

Fortunately the on-media allocation list format works with multiple
ranges of the namespace. For the in-memory struct bch_nvmpg_ns currently
assumes the namespace is a flat continuous range. Yes, we need to
consider and support multiple ranges in struct bch_nvmpg_ns for buddy
allocation initialization to skip the discontinuous gap.

It will be in the to-do list for next work. Thanks for your comments and
hint.

Coly Li

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v12 00/12] bcache: support NVDIMM for journaling
  2021-08-11 17:02 [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li
                   ` (11 preceding siblings ...)
  2021-08-11 17:02 ` [PATCH v12 12/12] bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device Coly Li
@ 2021-08-15 16:21 ` Coly Li
  12 siblings, 0 replies; 16+ messages in thread
From: Coly Li @ 2021-08-15 16:21 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-nvdimm, hare, jack, dan.j.williams, hch,
	ying.huang, linux-bcache

Hi Jens,

Could you please consider take the v12 series for Linux v5.15 merge window?

In this version the full pointer in the on-media data structures are
modified to per-namespace offset, and all previous review comments are
fixed.
There is no more comments for 4 hours, and this series survives in my
smoking test for 24+ hours, as an EXPERIMENTAL code the current status
is fine IMHO.

Thanks in advance.

Coly Li


On 8/12/21 1:02 AM, Coly Li wrote:
> This is the v12 effort for supporting NVDIMM for bcache journal (some
> versions may not posted with version numbers).
>
> The major change of this version is the full pointer of on-media data
> structure is replaced by per-namespace offset. Now a pointer address is
> calculated by namespace base mapping address + per-namespace offset.
> The code logic is same as previous version, all changes are only related
> to the base+offset style pointer replacement.
>
> The nvm-pages allocator is a buddy-like allocator, which allocates size
> in power-of-2 pages from the NVDIMM namespace. User space tool 'bcache'
> has a new added '-M' option to format a NVDIMM namespace and register it
> via sysfs interface as a bcache meta device. The nvm-pages kernel code
> does a DAX mapping to map the whole namespace into system's memory
> address range, and allocating the pages to requestion like typical buddy
> allocator does. The major difference is nvm-pages allocator maintains
> the pages allocated to each requester by an allocation list which stored
> on NVDIMM too. Allocation list of different requester is tracked by a
> pre-defined UUID, all the pages tracked in all allocation lists are
> treated as allocated busy pages and won't be initialized into buddy
> system after the system reboot.
>
> The bcache journal code may request a block of power-of-2 size pages
> from the nvm-pages allocator, normally it is a range of 256MB or 512MB
> continuous pages range. During meta data journaling, the in-memory jsets
> go into the calculated nvdimm pages location by kernel memcpy routine.
> So the journaling I/Os won't go into block device (e.g. SSD) anymore, 
> the write and read for journal jsets happen on NVDIMM.
>
> Intel developers Jianpeng Ma and Qiaowei Ren compose the initial code of
> nvm-pages, the related patches are,
> - bcache: initialize the nvm pages allocator
> - bcache: initialization of the buddy
> - bcache: bch_nvm_alloc_pages() of the buddy
> - bcache: bch_nvm_free_pages() of the buddy
> - bcache: get recs list head for allocated pages by specific uuid
> All the code depends on Linux libnvdimm and dax drivers, the bcache nvm-
> pages allocator can be treated as user of these two drivers.
>
> I modify the bcache code to recognize the nvm meta device feature,
> initialize journal on NVDIMM, and do journal I/Os on NVDIMM in the
> following patches,
> - bcache: add initial data structures for nvm pages
> - bcache: use bucket index to set GC_MARK_METADATA for journal buckets
>   in bch_btree_gc_finish()
> - bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
> - bcache: initialize bcache journal for NVDIMM meta device
> - bcache: support storing bcache journal into NVDIMM meta device
> - bcache: read jset from NVDIMM pages for journal replay
> - bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
>   meta device
>
> In this series, all previously addressed issue via code reviews are all
> fixed. And all known issue during testing are fixed. The code survives
> from 24+ hours smoking and I/O pressure testing among many reboots, it
> works well as expected.
>
> All the code is EXPERIMENTAL, they won't be enabled by default until we
> feel the NVDIMM support is completed and stable.
>
> Although there are some experts helped to review the code logic, but we
> do appreciate if more people may help to review the code. It is quite
> common that bcache patches don't have enough code reviewer, but this
> time I do need help for more review or comments on this series.
>
> Thanks in advance.
>
> Coly Li
> ---
>
> Coly Li (7):
>   bcache: add initial data structures for nvm pages
>   bcache: use bucket index to set GC_MARK_METADATA for journal buckets
>     in bch_btree_gc_finish()
>   bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
>   bcache: initialize bcache journal for NVDIMM meta device
>   bcache: support storing bcache journal into NVDIMM meta device
>   bcache: read jset from NVDIMM pages for journal replay
>   bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
>     meta device
>
> Jianpeng Ma (5):
>   bcache: initialize the nvm pages allocator
>   bcache: initialization of the buddy
>   bcache: bch_nvmpg_alloc_pages() of the buddy
>   bcache: bch_nvmpg_free_pages() of the buddy allocator
>   bcache: get recs list head for allocated pages by specific uuid
>
>  drivers/md/bcache/Kconfig       |  10 +
>  drivers/md/bcache/Makefile      |   1 +
>  drivers/md/bcache/btree.c       |   6 +-
>  drivers/md/bcache/features.h    |   9 +
>  drivers/md/bcache/journal.c     | 325 +++++++++--
>  drivers/md/bcache/journal.h     |   2 +-
>  drivers/md/bcache/nvm-pages.c   | 931 ++++++++++++++++++++++++++++++++
>  drivers/md/bcache/nvm-pages.h   | 127 +++++
>  drivers/md/bcache/super.c       |  53 +-
>  include/uapi/linux/bcache-nvm.h | 253 +++++++++
>  10 files changed, 1649 insertions(+), 68 deletions(-)
>  create mode 100644 drivers/md/bcache/nvm-pages.c
>  create mode 100644 drivers/md/bcache/nvm-pages.h
>  create mode 100644 include/uapi/linux/bcache-nvm.h
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-08-15 16:22 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-11 17:02 [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li
2021-08-11 17:02 ` [PATCH v12 01/12] bcache: add initial data structures for nvm pages Coly Li
2021-08-11 17:02 ` [PATCH v12 02/12] bcache: initialize the nvm pages allocator Coly Li
2021-08-12  5:43   ` Dan Williams
2021-08-12  8:26     ` Coly Li
2021-08-11 17:02 ` [PATCH v12 03/12] bcache: initialization of the buddy Coly Li
2021-08-11 17:02 ` [PATCH v12 04/12] bcache: bch_nvmpg_alloc_pages() " Coly Li
2021-08-11 17:02 ` [PATCH v12 05/12] bcache: bch_nvmpg_free_pages() of the buddy allocator Coly Li
2021-08-11 17:02 ` [PATCH v12 06/12] bcache: get recs list head for allocated pages by specific uuid Coly Li
2021-08-11 17:02 ` [PATCH v12 07/12] bcache: use bucket index to set GC_MARK_METADATA for journal buckets in bch_btree_gc_finish() Coly Li
2021-08-11 17:02 ` [PATCH v12 08/12] bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set Coly Li
2021-08-11 17:02 ` [PATCH v12 09/12] bcache: initialize bcache journal for NVDIMM meta device Coly Li
2021-08-11 17:02 ` [PATCH v12 10/12] bcache: support storing bcache journal into " Coly Li
2021-08-11 17:02 ` [PATCH v12 11/12] bcache: read jset from NVDIMM pages for journal replay Coly Li
2021-08-11 17:02 ` [PATCH v12 12/12] bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device Coly Li
2021-08-15 16:21 ` [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).