linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal
@ 2021-12-12 17:05 Coly Li
  2021-12-12 17:05 ` [PATCH v13 01/12] bcache: add initial data structures for nvm pages Coly Li
                   ` (12 more replies)
  0 siblings, 13 replies; 28+ messages in thread
From: Coly Li @ 2021-12-12 17:05 UTC (permalink / raw)
  To: axboe
  Cc: linux-bcache, linux-block, Coly Li, Christoph Hellwig,
	Dan Williams, Hannes Reinecke, Jianpeng Ma, Qiaowei Ren,
	Ying Huang

Hi Jens,

This is the v12 effort the enabling NVDIMM for bcache journal, the code
is under testing for months and quite stable now. Please consider to
take them for Linux v5.17 merge window.

All current code logic and on-media format are consistent with previous
v12 series. The major difference from v12 series include,
- more typos in code comments and commit logs are fixed.
- add kernel message to indicate only first range is used currently if
  the NVDIMM namespace has multiple mapping ranges.
- not export nvm-pages allocator APIs, it is unnecessary since currently 
  only bcache uses them.

Now all previous bcache related UAPI headers are all moved into bcache
private code directory, there is no global headers exported to neither
kernel or user source code.

Bcache uses nvm-pages allocator to allocate pages from NVDIMM namespace
for its journaling space. The nvm-pages allocator is a buddy-like
allocator, which allocates size in power-of-2 pages from the NVDIMM
namespace. User space tool 'bcache' has a new added '-M' option to
format a NVDIMM namespace and register it via sysfs interface as a
bcache meta device. The nvm-pages allocator code does a DAX mapping to
map the whole namespace into system's memory address range, and allocate
the pages to requestion like typical buddy allocator does. The major
difference is nvm-pages allocator maintains the pages allocated to each
requester by an allocation list which stored on NVDIMM too. Allocation
list of different requester is tracked by a pre-defined UUID, all the
pages tracked in all allocation lists are treated as allocated busy
pages and won't be initialized into buddy system after the system
reboots.

The bcache journal code may request a block of power-of-2 size pages
from the nvm-pages allocator, normally it is a range of 256MB or 512MB
continuous pages range. During meta data journaling, the in-memory jsets
go into the calculated nvdimm pages location by kernel memcpy routine.
So the journaling I/Os won't go into block device (e.g. SSD) anymore,
the write and read for journal jsets happen on NVDIMM. 

Intel developers Jianpeng Ma and Qiaowei Ren compose the initial code of
nvm-pages allocator, the related patches are,
- bcache: initialize the nvm-pages allocator
- bcache: initialization of the buddy
- bcache: bch_nvm_alloc_pages() of the buddy
- bcache: bch_nvm_free_pages() of the buddy
- bcache: get recs list head for allocated pages by specific uuid
All the code depends on Linux libnvdimm and dax drivers, the bcache nvm-
pages allocator can be treated as user of these two drivers.

I modify the bcache code to recognize the nvm meta device feature,
initialize journal on NVDIMM, and do journal I/Os on NVDIMM in the
following patches,
- bcache: add initial data structures for nvm pages
- bcache: use bucket index to set GC_MARK_METADATA for journal buckets
  in bch_btree_gc_finish()
- bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
- bcache: initialize bcache journal for NVDIMM meta device
- bcache: support storing bcache journal into NVDIMM meta device
- bcache: read jset from NVDIMM pages for journal replay
- bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
  meta device

All the code is EXPERIMENTAL, they won't be enabled by default until we
feel the NVDIMM support is completed and stable. The current code has
been tested internally for monthes, we don't observe any issue during
all tests with or without enabling the configuration.

Please consider to pick this series for Linux v5.17 merge window. If
there is any issue detected, we will response in time and fix them ASAP.

Thank you in advance.

Coly Li

Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Ying Huang <ying.huang@intel.com>
---

Coly Li (7):
  bcache: add initial data structures for nvm pages
  bcache: use bucket index to set GC_MARK_METADATA for journal buckets
    in bch_btree_gc_finish()
  bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
  bcache: initialize bcache journal for NVDIMM meta device
  bcache: support storing bcache journal into NVDIMM meta device
  bcache: read jset from NVDIMM pages for journal replay
  bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
    meta device

Jianpeng Ma (5):
  bcache: initialize the nvm pages allocator
  bcache: initialization of the buddy
  bcache: bch_nvmpg_alloc_pages() of the buddy
  bcache: bch_nvmpg_free_pages() of the buddy allocator
  bcache: get recs list head for allocated pages by specific uuid

 drivers/md/bcache/Kconfig        |  10 +
 drivers/md/bcache/Makefile       |   1 +
 drivers/md/bcache/btree.c        |   6 +-
 drivers/md/bcache/features.h     |   9 +
 drivers/md/bcache/journal.c      | 321 +++++++++--
 drivers/md/bcache/journal.h      |   2 +-
 drivers/md/bcache/nvmpg.c        | 931 +++++++++++++++++++++++++++++++
 drivers/md/bcache/nvmpg.h        | 128 +++++
 drivers/md/bcache/nvmpg_format.h | 253 +++++++++
 drivers/md/bcache/super.c        |  53 +-
 10 files changed, 1646 insertions(+), 68 deletions(-)
 create mode 100644 drivers/md/bcache/nvmpg.c
 create mode 100644 drivers/md/bcache/nvmpg.h
 create mode 100644 drivers/md/bcache/nvmpg_format.h

-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v13 01/12] bcache: add initial data structures for nvm pages
  2021-12-12 17:05 [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Coly Li
@ 2021-12-12 17:05 ` Coly Li
  2021-12-12 17:05 ` [PATCH v13 02/12] bcache: initialize the nvm pages allocator Coly Li
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Coly Li @ 2021-12-12 17:05 UTC (permalink / raw)
  To: axboe
  Cc: linux-bcache, linux-block, Coly Li, Christoph Hellwig,
	Dan Williams, Hannes Reinecke, Jianpeng Ma, Qiaowei Ren,
	Ying Huang

This patch initializes the prototype data structures for nvm pages
allocator,

- struct bch_nvmpg_sb
  This is the super block allocated on each nvdimm namespace for the nvm
pages allocator. A nvdimm pages allocator set may have multiple name-
spaces, bch_nvmpg_sb->set_uuid is used to mark which nvdimm set this
namespace belongs to.

- struct bch_nvmpg_header
  This is a table for all heads of all allocation record lists. An allo-
cation record list traces all page(s) allocated from nvdimm namespace(s)
to a specific requester (identified by uuid). After system reboot, a
requester can retrieve all previously allocated nvdimm pages from its
record list by a pre-defined uuid.

- struct bch_nvmpg_head
  This is a head of an allocation record list. Each nvdimm pages
requester (typically it's a driver) has and only has one allocation
record list, and an allocated nvdimm page only belongs to a specific
allocation record list. Member uuid[] will be set as the requester's
uuid, e.g. for bcache it is the cache set uuid. Member label is not
mandatory, it is a human-readable string for debug purpose. The nvm
offset format pointers recs_offset[] point to the location of actual
allocator record lists on each namespace of the nvdimm pages allocator
set. Each per namespace record list is represented by the following
struct bch_nvmpg_recs.

- struct bch_nvmpg_recs
  This structure represents a requester's allocation record list. Member
uuid is same value as the uuid of its corresponding struct
bch_nvmpg_head. Member recs[] is a table of struct bch_pgalloc_rec
objects to trace all allocated nvmdimm pages. If the table recs[] is
full, the nvmpg format offset is a pointer points to the next struct
bch_nvmpg_recs object, nvm pages allocator will look for available free
allocation record there. All the linked struct bch_nvmpg_recs objects
compose a requester's allocation record list which is headed by the
above struct bch_nvmpg_head.

- struct bch_nvmpg_rec
  This structure records a range of allocated nvdimm pages. Member pgoff
is offset in unit of page size of this allocation range. Member order
indicates size of the allocation range by (1 << order) in unit of page
size. Because the nvdimm pages allocator set may have multiple nvdimm
namespaces, member ns_id is used to identify which namespace the pgoff
belongs to.
  - Bits  0 - 51: pgoff - is pages offset of the allocated pages.
  - Bits 52 - 57: order - allocated size in page_size * order-of-2
  - Bits 58 - 60: ns_id - identify which namespace the pages stays on
  - Bits 61 - 63: reserved.
Since each of the allocated nvm pages are power of 2, using 6 bits to
represent allocated size can have (1<<(1<<64) - 1) * PAGE_SIZE maximum
value. It can be a 76 bits width range size in byte for 4KB page size,
which is large enough currently.

All the structure members having _offset suffix are in a special format.
E.g. bch_nvmpg_sb.{sb_offset, pages_offset, set_header_offset},
bch_nvmpg_head.recs_offset, bch_nvmpg_recs.{head_offset, next_offset},
the offset value is 64bit, the most significant 3 bits are used to
identify which namespace this offset belongs to, and the rested 61 bits
are actual offset inside the namespace. Following patches will have
helper routines to do the conversion between memory pointer and offset.

Signed-off-by: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Ying Huang <ying.huang@intel.com>
---
 drivers/md/bcache/nvmpg_format.h | 253 +++++++++++++++++++++++++++++++
 1 file changed, 253 insertions(+)
 create mode 100644 drivers/md/bcache/nvmpg_format.h

diff --git a/drivers/md/bcache/nvmpg_format.h b/drivers/md/bcache/nvmpg_format.h
new file mode 100644
index 000000000000..414bcafa31ee
--- /dev/null
+++ b/drivers/md/bcache/nvmpg_format.h
@@ -0,0 +1,253 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef _NVMPG_FORMAT_H
+#define _NVMPG_FORMAT_H
+
+/*
+ * Bcache on NVDIMM data structures
+ */
+
+/*
+ * - struct bch_nvmpg_sb
+ *   This is the super block allocated on each nvdimm namespace for the nvm
+ * pages allocator. A nvdimm pages allocator set may have multiple namespaces,
+ * bch_nvmpg_sb->set_uuid is used to mark which nvdimm set this name space
+ * belongs to.
+ *
+ * - struct bch_nvmpg_header
+ *   This is a table for all heads of all allocation record lists. An allo-
+ * cation record list traces all page(s) allocated from nvdimm namespace(s) to
+ * a specific requester (identified by uuid). After system reboot, a requester
+ * can retrieve all previously allocated nvdimm pages from its record list by a
+ * pre-defined uuid.
+ *
+ * - struct bch_nvmpg_head
+ *   This is a head of an allocation record list. Each nvdimm pages requester
+ * (typically it's a driver) has and only has one allocation record list, and
+ * an allocated nvdimm page only bedlones to a specific allocation record list.
+ * Member uuid[] will be set as the requester's uuid, e.g. for bcache it is the
+ * cache set uuid. Member label is not mandatory, it is a human-readable string
+ * for debug purpose. The nvm offset format pointers recs_offset[] point to the
+ * location of actual allocator record lists on each name space of the nvdimm
+ * pages allocator set. Each per name space record list is represented by the
+ * following struct bch_nvmpg_recs.
+ *
+ * - struct bch_nvmpg_recs
+ *   This structure represents a requester's allocation record list. Member uuid
+ * is same value as the uuid of its corresponding struct bch_nvmpg_head. Member
+ * recs[] is a table of struct bch_pgalloc_rec objects to trace all allocated
+ * nvmdimm pages. If the table recs[] is full, the nvmpg format offset is a
+ * pointer points to the next struct bch_nvmpg_recs object, nvm pages allocator
+ * will look for available free allocation record there. All the linked
+ * struct bch_nvmpg_recs objects compose a requester's allocation record list
+ * which is headed by the above struct bch_nvmpg_head.
+ *
+ * - struct bch_nvmpg_rec
+ *   This structure records a range of allocated nvdimm pages. Member pgoff is
+ * offset in unit of page size of this allocation range. Member order indicates
+ * size of the allocation range by (1 << order) in unit of page size. Because
+ * the nvdimm pages allocator set may have multiple nvdimm name spaces, member
+ * ns_id is used to identify which name space the pgoff belongs to.
+ *
+ * All allocation record lists are stored on the first initialized nvdimm name-
+ * space (ns_id 0). The meta data default layout of nvm pages allocator on
+ * namespace 0 is,
+ *
+ *    0 +---------------------------------+
+ *      |                                 |
+ *  4KB +---------------------------------+ <-- BCH_NVMPG_SB_OFFSET
+ *      |          bch_nvmpg_sb           |
+ *  8KB +---------------------------------+ <-- BCH_NVMPG_RECLIST_HEAD_OFFSET
+ *      |        bch_nvmpg_header         |
+ *      |                                 |
+ * 16KB +---------------------------------+ <-- BCH_NVMPG_SYSRECS_OFFSET
+ *      |         bch_nvmpg_recs          |
+ *      |  (nvm pages internal usage)     |
+ * 24KB +---------------------------------+
+ *      |                                 |
+ *      |                                 |
+ * 16MB +---------------------------------+ <-- BCH_NVMPG_START
+ *      |      allocable nvm pages        |
+ *      |      for buddy allocator        |
+ * end  +---------------------------------+
+ *
+ *
+ *
+ * Meta data default layout on rested nvdimm namespaces,
+ *
+ *    0 +---------------------------------+
+ *      |                                 |
+ *  4KB +---------------------------------+ <-- BCH_NVMPG_SB_OFFSET
+ *      |          bch_nvmpg_sb           |
+ *  8KB +---------------------------------+
+ *      |                                 |
+ *      |                                 |
+ *      |                                 |
+ *      |                                 |
+ *      |                                 |
+ *      |                                 |
+ * 16MB +---------------------------------+ <-- BCH_NVMPG_START
+ *      |      allocable nvm pages        |
+ *      |      for buddy allocator        |
+ * end  +---------------------------------+
+ *
+ *
+ * - The nvmpg offset format pointer
+ *   All member names ending with _offset in this header are nvmpg offset
+ * format pointer. The offset format is,
+ *       [highest 3 bits: ns_id]
+ *       [rested 61 bits: offset in No. ns_id namespace]
+ *
+ * The above offset is byte unit, the procedure to reference a nvmpg offset
+ * format pointer is,
+ * 1) Identify the namespace related in-memory structure by ns_id from the
+ *    highest 3 bits of offset value.
+ * 2) Get the DAX mapping base address from the in-memory structure.
+ * 3) Calculate the actual memory address on nvdimm by plusing the DAX base
+ *    address with offset value in rested low 61 bits.
+ * All related in-memory structure and conversion routines don't belong to
+ * user space api, they are defined by nvm-pages allocator code in
+ * drivers/md/bcache/nvm-pages.{c,h}
+ *
+ */
+
+#include <linux/types.h>
+
+/* In sectors */
+#define BCH_NVMPG_SB_OFFSET		4096
+#define BCH_NVMPG_START			(16 << 20)
+
+#define BCH_NVMPG_LBL_SIZE		32
+#define BCH_NVMPG_NS_MAX		8
+
+#define BCH_NVMPG_RECLIST_HEAD_OFFSET	(8<<10)
+#define BCH_NVMPG_SYSRECS_OFFSET	(16<<10)
+
+#define BCH_NVMPG_SB_VERSION		0
+#define BCH_NVMPG_SB_VERSION_MAX	0
+
+static const __u8 bch_nvmpg_magic[] = {
+	0x17, 0xbd, 0x53, 0x7f, 0x1b, 0x23, 0xd6, 0x83,
+	0x46, 0xa4, 0xf8, 0x28, 0x17, 0xda, 0xec, 0xa9 };
+static const __u8 bch_nvmpg_recs_magic[] = {
+	0x39, 0x25, 0x3f, 0xf7, 0x27, 0x17, 0xd0, 0xb9,
+	0x10, 0xe6, 0xd2, 0xda, 0x38, 0x68, 0x26, 0xae };
+
+/* takes 64bit width */
+struct bch_nvmpg_rec {
+	union {
+		struct {
+			__u64	pgoff:52;
+			__u64	order:6;
+			__u64	ns_id:3;
+			__u64	reserved:3;
+		};
+		__u64	_v;
+	};
+};
+
+struct bch_nvmpg_recs {
+	union {
+		struct {
+			/*
+			 * A nvmpg offset format pointer to
+			 * struct bch_nvmpg_head
+			 */
+			__u64			head_offset;
+			/*
+			 * A nvmpg offset format pointer to
+			 * struct bch_nvm_pgalloc_recs which contains
+			 * the next recs[] array.
+			 */
+			__u64			next_offset;
+			__u8			magic[16];
+			__u8			uuid[16];
+			__u32			size;
+			__u32			used;
+			__u64			_pad[4];
+			struct bch_nvmpg_rec	recs[];
+		};
+		__u8				pad[8192];
+	};
+};
+
+#define BCH_NVMPG_MAX_RECS				\
+	((sizeof(struct bch_nvmpg_recs) -		\
+	  offsetof(struct bch_nvmpg_recs, recs)) /	\
+	 sizeof(struct bch_nvmpg_rec))
+
+#define BCH_NVMPG_HD_STAT_FREE		0x0
+#define BCH_NVMPG_HD_STAT_ALLOC		0x1
+struct bch_nvmpg_head {
+	__u8		uuid[16];
+	__u8		label[BCH_NVMPG_LBL_SIZE];
+	__u32		state;
+	__u32		flags;
+	/*
+	 * Array of offset values from the nvmpg offset format
+	 * pointers, each of the pointer points to a per-namespace
+	 * struct bch_nvmpg_recs.
+	 */
+	__u64		recs_offset[BCH_NVMPG_NS_MAX];
+};
+
+/* heads[0] is always for nvm_pages internal usage */
+struct bch_nvmpg_set_header {
+	union {
+		struct {
+			__u32			size;
+			__u32			used;
+			__u64			_pad[4];
+			struct bch_nvmpg_head	heads[];
+		};
+		__u8				pad[8192];
+	};
+};
+
+#define BCH_NVMPG_MAX_HEADS					\
+	((sizeof(struct bch_nvmpg_set_header) -			\
+	  offsetof(struct bch_nvmpg_set_header, heads)) /	\
+	 sizeof(struct bch_nvmpg_head))
+
+/* The on-media bit order is local CPU order */
+struct bch_nvmpg_sb {
+	__u64			csum;
+	__u64			sb_offset;
+	__u64			ns_start;
+	__u64			version;
+	__u8			magic[16];
+	__u8			uuid[16];
+	__u32			page_size;
+	__u32			total_ns;
+	__u32			this_ns;
+	union {
+		__u8		set_uuid[16];
+		__u64		set_magic;
+	};
+
+	__u64			flags;
+	__u64			seq;
+
+	__u64			feature_compat;
+	__u64			feature_incompat;
+	__u64			feature_ro_compat;
+
+	/* For allocable nvm pages from buddy systems */
+	__u64			pages_offset;
+	__u64			pages_total;
+
+	__u64			pad[8];
+
+	/*
+	 * A nvmpg offset format pointer, it points
+	 * to struct bch_nvmpg_set_header which is
+	 * stored only on the first name space.
+	 */
+	__u64			set_header_offset;
+
+	/* Just for csum_set() */
+	__u32			keys;
+	__u64			d[0];
+};
+
+#endif /* _NVMPG_FORMAT_H */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v13 02/12] bcache: initialize the nvm pages allocator
  2021-12-12 17:05 [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Coly Li
  2021-12-12 17:05 ` [PATCH v13 01/12] bcache: add initial data structures for nvm pages Coly Li
@ 2021-12-12 17:05 ` Coly Li
  2021-12-12 19:34   ` Jens Axboe
  2021-12-12 17:05 ` [PATCH v13 03/12] bcache: initialization of the buddy Coly Li
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Coly Li @ 2021-12-12 17:05 UTC (permalink / raw)
  To: axboe
  Cc: linux-bcache, linux-block, Jianpeng Ma, Randy Dunlap,
	Qiaowei Ren, Christoph Hellwig, Dan Williams, Hannes Reinecke

From: Jianpeng Ma <jianpeng.ma@intel.com>

This patch define the prototype data structures in memory and
initializes the nvm pages allocator.

The nvm address space which is managed by this allocator can consist of
many nvm namespaces, and some namespaces can compose into one nvm set,
like cache set. For this initial implementation, only one set can be
supported.

The users of this nvm pages allocator need to call register_namespace()
to register the nvdimm device (like /dev/pmemX) into this allocator as
the instance of struct nvm_namespace.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
---
 drivers/md/bcache/Kconfig  |  10 ++
 drivers/md/bcache/Makefile |   1 +
 drivers/md/bcache/nvmpg.c  | 340 +++++++++++++++++++++++++++++++++++++
 drivers/md/bcache/nvmpg.h  |  97 +++++++++++
 drivers/md/bcache/super.c  |   3 +
 5 files changed, 451 insertions(+)
 create mode 100644 drivers/md/bcache/nvmpg.c
 create mode 100644 drivers/md/bcache/nvmpg.h

diff --git a/drivers/md/bcache/Kconfig b/drivers/md/bcache/Kconfig
index cf3e8096942a..4a7c13e882bb 100644
--- a/drivers/md/bcache/Kconfig
+++ b/drivers/md/bcache/Kconfig
@@ -36,3 +36,13 @@ config BCACHE_ASYNC_REGISTRATION
 	device path into this file will returns immediately and the real
 	registration work is handled in kernel work queue in asynchronous
 	way.
+
+config BCACHE_NVM_PAGES
+	bool "NVDIMM support for bcache (EXPERIMENTAL)"
+	depends on BCACHE
+	depends on 64BIT
+	depends on LIBNVDIMM
+	depends on DAX
+	help
+	  Allocate/release NV-memory pages for bcache and provide allocated pages
+	  for each requestor after system reboot.
diff --git a/drivers/md/bcache/Makefile b/drivers/md/bcache/Makefile
index 5b87e59676b8..276b33be5ad5 100644
--- a/drivers/md/bcache/Makefile
+++ b/drivers/md/bcache/Makefile
@@ -5,3 +5,4 @@ obj-$(CONFIG_BCACHE)	+= bcache.o
 bcache-y		:= alloc.o bset.o btree.o closure.o debug.o extents.o\
 	io.o journal.o movinggc.o request.o stats.o super.o sysfs.o trace.o\
 	util.o writeback.o features.o
+bcache-$(CONFIG_BCACHE_NVM_PAGES) += nvmpg.o
diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c
new file mode 100644
index 000000000000..b654bbbda03e
--- /dev/null
+++ b/drivers/md/bcache/nvmpg.c
@@ -0,0 +1,340 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Nvdimm page-buddy allocator
+ *
+ * Copyright (c) 2021, Intel Corporation.
+ * Copyright (c) 2021, Qiaowei Ren <qiaowei.ren@intel.com>.
+ * Copyright (c) 2021, Jianpeng Ma <jianpeng.ma@intel.com>.
+ */
+
+#include "bcache.h"
+#include "nvmpg.h"
+
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/dax.h>
+#include <linux/pfn_t.h>
+#include <linux/libnvdimm.h>
+#include <linux/mm_types.h>
+#include <linux/err.h>
+#include <linux/pagemap.h>
+#include <linux/bitmap.h>
+#include <linux/blkdev.h>
+
+struct bch_nvmpg_set *global_nvmpg_set;
+
+void *bch_nvmpg_offset_to_ptr(unsigned long offset)
+{
+	int ns_id = BCH_NVMPG_GET_NS_ID(offset);
+	struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[ns_id];
+
+	if (offset == 0)
+		return NULL;
+
+	ns_id = BCH_NVMPG_GET_NS_ID(offset);
+	ns = global_nvmpg_set->ns_tbl[ns_id];
+
+	if (ns)
+		return (void *)(ns->base_addr + BCH_NVMPG_GET_OFFSET(offset));
+
+	pr_err("Invalid ns_id %u\n", ns_id);
+	return NULL;
+}
+
+unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr)
+{
+	int ns_id = ns->ns_id;
+	unsigned long offset = (unsigned long)(ptr - ns->base_addr);
+
+	return BCH_NVMPG_OFFSET(ns_id, offset);
+}
+
+static void release_ns_tbl(struct bch_nvmpg_set *set)
+{
+	int i;
+	struct bch_nvmpg_ns *ns;
+
+	for (i = 0; i < BCH_NVMPG_NS_MAX; i++) {
+		ns = set->ns_tbl[i];
+		if (ns) {
+			fs_put_dax(ns->dax_dev);
+			blkdev_put(ns->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
+			set->ns_tbl[i] = NULL;
+			set->attached_ns--;
+			kfree(ns);
+		}
+	}
+
+	if (set->attached_ns)
+		pr_err("unexpected attached_ns: %u\n", set->attached_ns);
+}
+
+static void release_nvmpg_set(struct bch_nvmpg_set *set)
+{
+	release_ns_tbl(set);
+	kfree(set);
+}
+
+/* Namespace 0 contains all meta data of the nvmpg allocation set */
+static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns)
+{
+	struct bch_nvmpg_set_header *set_header;
+
+	if (ns->ns_id != 0) {
+		pr_err("unexpected ns_id %u for first nvmpg namespace.\n",
+		       ns->ns_id);
+		return -EINVAL;
+	}
+
+	set_header = bch_nvmpg_offset_to_ptr(ns->sb->set_header_offset);
+
+	mutex_lock(&global_nvmpg_set->lock);
+	global_nvmpg_set->set_header = set_header;
+	global_nvmpg_set->heads_size = set_header->size;
+	global_nvmpg_set->heads_used = set_header->used;
+	mutex_unlock(&global_nvmpg_set->lock);
+
+	return 0;
+}
+
+static int attach_nvmpg_set(struct bch_nvmpg_ns *ns)
+{
+	struct bch_nvmpg_sb *sb = ns->sb;
+	int rc = 0;
+
+	mutex_lock(&global_nvmpg_set->lock);
+
+	if (global_nvmpg_set->ns_tbl[sb->this_ns]) {
+		pr_err("ns_id %u already attached.\n", ns->ns_id);
+		rc = -EEXIST;
+		goto unlock;
+	}
+
+	if (ns->ns_id != 0) {
+		pr_err("unexpected ns_id %u for first namespace.\n", ns->ns_id);
+		rc = -EINVAL;
+		goto unlock;
+	}
+
+	if (global_nvmpg_set->attached_ns > 0) {
+		pr_err("multiple namespace attaching not supported yet\n");
+		rc = -EOPNOTSUPP;
+		goto unlock;
+	}
+
+	if ((global_nvmpg_set->attached_ns + 1) > sb->total_ns) {
+		pr_err("namespace counters error: attached %u > total %u\n",
+		       global_nvmpg_set->attached_ns,
+		       global_nvmpg_set->total_ns);
+		rc = -EINVAL;
+		goto unlock;
+	}
+
+	memcpy(global_nvmpg_set->set_uuid, sb->set_uuid, 16);
+	global_nvmpg_set->ns_tbl[sb->this_ns] = ns;
+	global_nvmpg_set->attached_ns++;
+	global_nvmpg_set->total_ns = sb->total_ns;
+
+unlock:
+	mutex_unlock(&global_nvmpg_set->lock);
+	return rc;
+}
+
+static int read_nvdimm_meta_super(struct block_device *bdev,
+				  struct bch_nvmpg_ns *ns)
+{
+	struct page *page;
+	struct bch_nvmpg_sb *sb;
+	uint64_t expected_csum = 0;
+	int r;
+
+	page = read_cache_page_gfp(bdev->bd_inode->i_mapping,
+				BCH_NVMPG_SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL);
+
+	if (IS_ERR(page))
+		return -EIO;
+
+	sb = (struct bch_nvmpg_sb *)
+	     (page_address(page) + offset_in_page(BCH_NVMPG_SB_OFFSET));
+
+	r = -EINVAL;
+	expected_csum = csum_set(sb);
+	if (expected_csum != sb->csum) {
+		pr_info("csum is not match with expected one\n");
+		goto put_page;
+	}
+
+	if (memcmp(sb->magic, bch_nvmpg_magic, 16)) {
+		pr_info("invalid bch_nvmpg_magic\n");
+		goto put_page;
+	}
+
+	if (sb->sb_offset !=
+	    BCH_NVMPG_OFFSET(sb->this_ns, BCH_NVMPG_SB_OFFSET)) {
+		pr_info("invalid superblock offset 0x%llx\n", sb->sb_offset);
+		goto put_page;
+	}
+
+	r = -EOPNOTSUPP;
+	if (sb->total_ns != 1) {
+		pr_info("multiple name space not supported yet.\n");
+		goto put_page;
+	}
+
+
+	r = 0;
+	/* Necessary for DAX mapping */
+	ns->page_size = sb->page_size;
+	ns->pages_total = sb->pages_total;
+
+put_page:
+	put_page(page);
+	return r;
+}
+
+struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
+{
+	struct bch_nvmpg_ns *ns = NULL;
+	struct bch_nvmpg_sb *sb = NULL;
+	char buf[BDEVNAME_SIZE];
+	struct block_device *bdev;
+	pgoff_t pgoff;
+	int id, err;
+	char *path;
+	long dax_ret = 0;
+
+	path = kstrndup(dev_path, 512, GFP_KERNEL);
+	if (!path) {
+		pr_err("kstrndup failed\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	bdev = blkdev_get_by_path(strim(path),
+				  FMODE_READ|FMODE_WRITE|FMODE_EXCL,
+				  global_nvmpg_set);
+	if (IS_ERR(bdev)) {
+		pr_err("get %s error: %ld\n", dev_path, PTR_ERR(bdev));
+		kfree(path);
+		return ERR_PTR(PTR_ERR(bdev));
+	}
+
+	err = -ENOMEM;
+	ns = kzalloc(sizeof(struct bch_nvmpg_ns), GFP_KERNEL);
+	if (!ns)
+		goto bdput;
+
+	err = -EIO;
+	if (read_nvdimm_meta_super(bdev, ns)) {
+		pr_err("%s read nvdimm meta super block failed.\n",
+		       bdevname(bdev, buf));
+		goto free_ns;
+	}
+
+	err = -EOPNOTSUPP;
+	ns->dax_dev = fs_dax_get_by_bdev(bdev);
+	if (!ns->dax_dev) {
+		pr_err("can't get dax device by %s\n", bdevname(bdev, buf));
+		goto free_ns;
+	}
+
+	if (!dax_supported(ns->dax_dev, bdev, ns->page_size, 0,
+			   bdev_nr_sectors(bdev))) {
+		pr_err("%s don't support DAX\n", bdevname(bdev, buf));
+		goto free_ns;
+	}
+
+	err = -EINVAL;
+	if (bdev_dax_pgoff(bdev, 0, ns->page_size, &pgoff)) {
+		pr_err("invalid offset of %s\n", bdevname(bdev, buf));
+		goto free_ns;
+	}
+
+	err = -EINVAL;
+	id = dax_read_lock();
+	dax_ret = dax_direct_access(ns->dax_dev, pgoff, ns->pages_total,
+				    &ns->base_addr, &ns->start_pfn);
+	if (dax_ret <= 0) {
+		pr_err("dax_direct_access error\n");
+		dax_read_unlock(id);
+		goto free_ns;
+	}
+
+	if (dax_ret < ns->pages_total) {
+		pr_warn("currently first %ld pages (from %lu in total) are used\n",
+			dax_ret, ns->pages_total);
+	}
+	dax_read_unlock(id);
+
+	sb = (struct bch_nvmpg_sb *)(ns->base_addr + BCH_NVMPG_SB_OFFSET);
+
+	err = -EINVAL;
+	/* Check magic again to make sure DAX mapping is correct */
+	if (memcmp(sb->magic, bch_nvmpg_magic, 16)) {
+		pr_err("invalid bch_nvmpg_magic after DAX mapping\n");
+		goto free_ns;
+	}
+
+	if ((global_nvmpg_set->attached_ns > 0) &&
+	     memcmp(sb->set_uuid, global_nvmpg_set->set_uuid, 16)) {
+		pr_err("set uuid does not match with ns_id %u\n", ns->ns_id);
+		goto free_ns;
+	}
+
+	if (sb->set_header_offset !=
+	    BCH_NVMPG_OFFSET(sb->this_ns, BCH_NVMPG_RECLIST_HEAD_OFFSET)) {
+		pr_err("Invalid header offset: this_ns %u, ns_id %llu, offset 0x%llx\n",
+		       sb->this_ns,
+		       BCH_NVMPG_GET_NS_ID(sb->set_header_offset),
+		       BCH_NVMPG_GET_OFFSET(sb->set_header_offset));
+		goto free_ns;
+	}
+
+	ns->page_size = sb->page_size;
+	ns->pages_offset = sb->pages_offset;
+	ns->pages_total = sb->pages_total;
+	ns->sb = sb;
+	ns->free = 0;
+	ns->bdev = bdev;
+	ns->set = global_nvmpg_set;
+
+	err = attach_nvmpg_set(ns);
+	if (err < 0)
+		goto free_ns;
+
+	mutex_init(&ns->lock);
+
+	err = init_nvmpg_set_header(ns);
+	if (err < 0)
+		goto free_ns;
+
+	kfree(path);
+	return ns;
+
+free_ns:
+	fs_put_dax(ns->dax_dev);
+	kfree(ns);
+bdput:
+	blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
+	kfree(path);
+	return ERR_PTR(err);
+}
+
+int __init bch_nvmpg_init(void)
+{
+	global_nvmpg_set = kzalloc(sizeof(*global_nvmpg_set), GFP_KERNEL);
+	if (!global_nvmpg_set)
+		return -ENOMEM;
+
+	global_nvmpg_set->total_ns = 0;
+	mutex_init(&global_nvmpg_set->lock);
+
+	pr_info("bcache nvm init\n");
+	return 0;
+}
+
+void bch_nvmpg_exit(void)
+{
+	release_nvmpg_set(global_nvmpg_set);
+	pr_info("bcache nvm exit\n");
+}
diff --git a/drivers/md/bcache/nvmpg.h b/drivers/md/bcache/nvmpg.h
new file mode 100644
index 000000000000..698c890b2d15
--- /dev/null
+++ b/drivers/md/bcache/nvmpg.h
@@ -0,0 +1,97 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _BCACHE_NVM_PAGES_H
+#define _BCACHE_NVM_PAGES_H
+
+#include <linux/libnvdimm.h>
+
+#include "nvmpg_format.h"
+
+/*
+ * Bcache NVDIMM in memory data structures
+ */
+
+/*
+ * The following three structures in memory records which page(s) allocated
+ * to which owner. After reboot from power failure, they will be initialized
+ * based on nvm pages superblock in NVDIMM device.
+ */
+struct bch_nvmpg_ns {
+	struct bch_nvmpg_sb *sb;
+	void *base_addr;
+
+	unsigned char uuid[16];
+	int ns_id;
+	unsigned int page_size;
+	unsigned long free;
+	unsigned long pages_offset;
+	unsigned long pages_total;
+	pfn_t start_pfn;
+
+	struct dax_device *dax_dev;
+	struct block_device *bdev;
+	struct bch_nvmpg_set *set;
+
+	struct mutex lock;
+};
+
+/*
+ * A set of namespaces. Currently only one set can be supported.
+ */
+struct bch_nvmpg_set {
+	unsigned char set_uuid[16];
+
+	int heads_size;
+	int heads_used;
+	struct bch_nvmpg_set_header *set_header;
+
+	struct bch_nvmpg_ns *ns_tbl[BCH_NVMPG_NS_MAX];
+	int total_ns;
+	int attached_ns;
+
+	struct mutex lock;
+};
+
+#define BCH_NVMPG_NS_ID_BITS	3
+#define BCH_NVMPG_OFFSET_BITS	61
+#define BCH_NVMPG_NS_ID_MASK	((1UL<<BCH_NVMPG_NS_ID_BITS) - 1)
+#define BCH_NVMPG_OFFSET_MASK	((1UL<<BCH_NVMPG_OFFSET_BITS) - 1)
+
+#define BCH_NVMPG_GET_NS_ID(offset)					\
+	(((offset) >> BCH_NVMPG_OFFSET_BITS) & BCH_NVMPG_NS_ID_MASK)
+
+#define BCH_NVMPG_GET_OFFSET(offset)	((offset) & BCH_NVMPG_OFFSET_MASK)
+
+#define BCH_NVMPG_OFFSET(ns_id, offset)					\
+	((((ns_id) & BCH_NVMPG_NS_ID_MASK) << BCH_NVMPG_OFFSET_BITS) |	\
+	 ((offset) & BCH_NVMPG_OFFSET_MASK))
+
+/* Indicate which field in bch_nvmpg_sb to be updated */
+#define BCH_NVMPG_TOTAL_NS	0	/* total_ns */
+
+void *bch_nvmpg_offset_to_ptr(unsigned long offset);
+unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr);
+
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+
+struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path);
+int bch_nvmpg_init(void);
+void bch_nvmpg_exit(void);
+
+#else
+
+static inline struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
+{
+	return NULL;
+}
+
+static inline int bch_nvmpg_init(void)
+{
+	return 0;
+}
+
+static inline void bch_nvmpg_exit(void) { }
+
+#endif /* CONFIG_BCACHE_NVM_PAGES */
+
+#endif /* _BCACHE_NVM_PAGES_H */
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 86b9e355c583..74d51a0b806f 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -14,6 +14,7 @@
 #include "request.h"
 #include "writeback.h"
 #include "features.h"
+#include "nvmpg.h"
 
 #include <linux/blkdev.h>
 #include <linux/pagemap.h>
@@ -2818,6 +2819,7 @@ static void bcache_exit(void)
 {
 	bch_debug_exit();
 	bch_request_exit();
+	bch_nvmpg_exit();
 	if (bcache_kobj)
 		kobject_put(bcache_kobj);
 	if (bcache_wq)
@@ -2916,6 +2918,7 @@ static int __init bcache_init(void)
 
 	bch_debug_init();
 	closure_debug_init();
+	bch_nvmpg_init();
 
 	bcache_is_reboot = false;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v13 03/12] bcache: initialization of the buddy
  2021-12-12 17:05 [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Coly Li
  2021-12-12 17:05 ` [PATCH v13 01/12] bcache: add initial data structures for nvm pages Coly Li
  2021-12-12 17:05 ` [PATCH v13 02/12] bcache: initialize the nvm pages allocator Coly Li
@ 2021-12-12 17:05 ` Coly Li
  2021-12-12 20:10   ` Jens Axboe
  2021-12-15 16:20   ` Dan Carpenter
  2021-12-12 17:05 ` [PATCH v13 04/12] bcache: bch_nvmpg_alloc_pages() " Coly Li
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 28+ messages in thread
From: Coly Li @ 2021-12-12 17:05 UTC (permalink / raw)
  To: axboe
  Cc: linux-bcache, linux-block, Jianpeng Ma, kernel test robot,
	Dan Carpenter, Qiaowei Ren, Christoph Hellwig, Dan Williams,
	Hannes Reinecke

From: Jianpeng Ma <jianpeng.ma@intel.com>

This nvm pages allocator will implement the simple buddy allocator to
anage the nvm address space. This patch initializes this buddy allocator
for new namespace.

the unit of alloc/free of the buddy allocator is page. DAX device has
their struct page(in dram or PMEM).

	struct {        /* ZONE_DEVICE pages */
		/** @pgmap: Points to the hosting device page map. */
		struct dev_pagemap *pgmap;
		void *zone_device_data;
		/*
		 * ZONE_DEVICE private pages are counted as being
		 * mapped so the next 3 words hold the mapping, index,
		 * and private fields from the source anonymous or
		 * page cache page while the page is migrated to device
		 * private memory.
		 * ZONE_DEVICE MEMORY_DEVICE_FS_DAX pages also
		 * use the mapping, index, and private fields when
		 * pmem backed DAX files are mapped.
		 */
	};

ZONE_DEVICE pages only use pgmap. Other 4 words[16/32 bytes] don't use.
So the second/third word will be used as 'struct list_head ' which list
in buddy. The fourth word(that is normal struct page::index) store pgoff
which the page-offset in the dax device. And the fifth word (that is
normal struct page::private) store order of buddy. page_type will be used
to store buddy flags.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
---
 drivers/md/bcache/nvmpg.c | 212 +++++++++++++++++++++++++++++++++++++-
 drivers/md/bcache/nvmpg.h |  12 +++
 2 files changed, 221 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c
index b654bbbda03e..2b70ee4a6028 100644
--- a/drivers/md/bcache/nvmpg.c
+++ b/drivers/md/bcache/nvmpg.c
@@ -50,6 +50,36 @@ unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr)
 	return BCH_NVMPG_OFFSET(ns_id, offset);
 }
 
+static struct page *bch_nvmpg_va_to_pg(void *addr)
+{
+	return virt_to_page(addr);
+}
+
+static void *bch_nvmpg_pgoff_to_ptr(struct bch_nvmpg_ns *ns, pgoff_t pgoff)
+{
+	return ns->base_addr + (pgoff << PAGE_SHIFT);
+}
+
+static void *bch_nvmpg_rec_to_ptr(struct bch_nvmpg_rec *r)
+{
+	struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[r->ns_id];
+	pgoff_t pgoff = r->pgoff;
+
+	return bch_nvmpg_pgoff_to_ptr(ns, pgoff);
+}
+
+static inline void reserve_nvmpg_pages(struct bch_nvmpg_ns *ns,
+				       pgoff_t pgoff, u64 nr)
+{
+	while (nr > 0) {
+		unsigned int num = nr > UINT_MAX ? UINT_MAX : nr;
+
+		bitmap_set(ns->pages_bitmap, pgoff, num);
+		nr -= num;
+		pgoff += num;
+	}
+}
+
 static void release_ns_tbl(struct bch_nvmpg_set *set)
 {
 	int i;
@@ -58,6 +88,10 @@ static void release_ns_tbl(struct bch_nvmpg_set *set)
 	for (i = 0; i < BCH_NVMPG_NS_MAX; i++) {
 		ns = set->ns_tbl[i];
 		if (ns) {
+			kvfree(ns->pages_bitmap);
+			if (ns->recs_bitmap)
+				bitmap_free(ns->recs_bitmap);
+
 			fs_put_dax(ns->dax_dev);
 			blkdev_put(ns->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
 			set->ns_tbl[i] = NULL;
@@ -76,10 +110,73 @@ static void release_nvmpg_set(struct bch_nvmpg_set *set)
 	kfree(set);
 }
 
+static int validate_recs(int ns_id,
+			 struct bch_nvmpg_head *head,
+			 struct bch_nvmpg_recs *recs)
+{
+	if (memcmp(recs->magic, bch_nvmpg_recs_magic, 16)) {
+		pr_err("Invalid bch_nvmpg_recs magic\n");
+		return -EINVAL;
+	}
+
+	if (memcmp(recs->uuid, head->uuid, 16)) {
+		pr_err("Invalid bch_nvmpg_recs uuid\n");
+		return -EINVAL;
+	}
+
+	if (recs->head_offset !=
+	    bch_nvmpg_ptr_to_offset(global_nvmpg_set->ns_tbl[ns_id], head)) {
+		pr_err("Invalid recs head_offset\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int reserve_nvmpg_recs(struct bch_nvmpg_recs *recs)
+{
+	int i, used = 0;
+
+	for (i = 0; i < recs->size; i++) {
+		struct bch_nvmpg_rec *r = &recs->recs[i];
+		struct bch_nvmpg_ns *ns;
+		struct page *page;
+		void *addr;
+
+		if (r->pgoff == 0)
+			continue;
+
+		ns = global_nvmpg_set->ns_tbl[r->ns_id];
+		addr = bch_nvmpg_rec_to_ptr(r);
+		if (addr < ns->base_addr) {
+			pr_err("Invalid recorded address\n");
+			return -EINVAL;
+		}
+
+		/* init struct page: index/private */
+		page = bch_nvmpg_va_to_pg(addr);
+		set_page_private(page, r->order);
+		page->index = r->pgoff;
+
+		reserve_nvmpg_pages(ns, r->pgoff, 1L << r->order);
+		used++;
+	}
+
+	if (used != recs->used) {
+		pr_err("used %d doesn't match recs->used %d\n",
+		       used, recs->used);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /* Namespace 0 contains all meta data of the nvmpg allocation set */
 static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns)
 {
 	struct bch_nvmpg_set_header *set_header;
+	struct bch_nvmpg_recs *sys_recs;
+	int i, j, used = 0, rc = 0;
 
 	if (ns->ns_id != 0) {
 		pr_err("unexpected ns_id %u for first nvmpg namespace.\n",
@@ -93,9 +190,83 @@ static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns)
 	global_nvmpg_set->set_header = set_header;
 	global_nvmpg_set->heads_size = set_header->size;
 	global_nvmpg_set->heads_used = set_header->used;
+
+	/* Reserve the used space from buddy allocator */
+	reserve_nvmpg_pages(ns, 0, div_u64(ns->pages_offset, ns->page_size));
+
+	sys_recs = ns->base_addr + BCH_NVMPG_SYSRECS_OFFSET;
+	for (i = 0; i < set_header->size; i++) {
+		struct bch_nvmpg_head *head;
+
+		head = &set_header->heads[i];
+		if (head->state == BCH_NVMPG_HD_STAT_FREE)
+			continue;
+
+		used++;
+		if (used > global_nvmpg_set->heads_size) {
+			pr_err("used heads %d > heads size %d.\n",
+			       used, global_nvmpg_set->heads_size);
+			goto unlock;
+		}
+
+		for (j = 0; j < BCH_NVMPG_NS_MAX; j++) {
+			struct bch_nvmpg_recs *recs;
+
+			recs = bch_nvmpg_offset_to_ptr(head->recs_offset[j]);
+
+			/* Iterate the recs list */
+			while (recs) {
+				rc = validate_recs(j, head, recs);
+				if (rc < 0)
+					goto unlock;
+
+				rc = reserve_nvmpg_recs(recs);
+				if (rc < 0)
+					goto unlock;
+
+				bitmap_set(ns->recs_bitmap, recs - sys_recs, 1);
+				recs = bch_nvmpg_offset_to_ptr(recs->next_offset);
+			}
+		}
+	}
+unlock:
 	mutex_unlock(&global_nvmpg_set->lock);
+	return rc;
+}
 
-	return 0;
+static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns)
+{
+	unsigned int start, end, pages;
+	int i;
+	struct page *page;
+	pgoff_t pgoff_start;
+
+	bitmap_for_each_clear_region(ns->pages_bitmap,
+				     start, end, 0, ns->pages_total) {
+		pgoff_start = start;
+		pages = end - start;
+
+		while (pages) {
+			void *addr;
+
+			for (i = BCH_MAX_ORDER - 1; i >= 0; i--) {
+				if ((pgoff_start % (1L << i) == 0) &&
+				    (pages >= (1L << i)))
+					break;
+			}
+
+			addr = bch_nvmpg_pgoff_to_ptr(ns, pgoff_start);
+			page = bch_nvmpg_va_to_pg(addr);
+			set_page_private(page, i);
+			page->index = pgoff_start;
+			__SetPageBuddy(page);
+			list_add((struct list_head *)&page->zone_device_data,
+				 &ns->free_area[i]);
+
+			pgoff_start += 1L << i;
+			pages -= 1L << i;
+		}
+	}
 }
 
 static int attach_nvmpg_set(struct bch_nvmpg_ns *ns)
@@ -200,7 +371,7 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
 	char buf[BDEVNAME_SIZE];
 	struct block_device *bdev;
 	pgoff_t pgoff;
-	int id, err;
+	int id, i, err;
 	char *path;
 	long dax_ret = 0;
 
@@ -304,13 +475,48 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
 
 	mutex_init(&ns->lock);
 
+	/*
+	 * parameters of bitmap_set/clear are unsigned int.
+	 * Given currently size of nvm is far from exceeding this limit,
+	 * so only add a WARN_ON message.
+	 */
+	WARN_ON(BITS_TO_LONGS(ns->pages_total) > UINT_MAX);
+	ns->pages_bitmap = kvcalloc(BITS_TO_LONGS(ns->pages_total),
+				    sizeof(unsigned long), GFP_KERNEL);
+	if (!ns->pages_bitmap) {
+		err = -ENOMEM;
+		goto clear_ns_nr;
+	}
+
+	if (ns->sb->this_ns == 0) {
+		ns->recs_bitmap =
+			bitmap_zalloc(BCH_MAX_PGALLOC_RECS, GFP_KERNEL);
+		if (ns->recs_bitmap == NULL) {
+			err = -ENOMEM;
+			goto free_pages_bitmap;
+		}
+	}
+
+	for (i = 0; i < BCH_MAX_ORDER; i++)
+		INIT_LIST_HEAD(&ns->free_area[i]);
+
 	err = init_nvmpg_set_header(ns);
 	if (err < 0)
-		goto free_ns;
+		goto free_recs_bitmap;
+
+	if (ns->sb->this_ns == 0)
+		/* init buddy allocator */
+		bch_nvmpg_init_free_space(ns);
 
 	kfree(path);
 	return ns;
 
+free_recs_bitmap:
+	bitmap_free(ns->recs_bitmap);
+free_pages_bitmap:
+	kvfree(ns->pages_bitmap);
+clear_ns_nr:
+	global_nvmpg_set->ns_tbl[sb->this_ns] = NULL;
 free_ns:
 	fs_put_dax(ns->dax_dev);
 	kfree(ns);
diff --git a/drivers/md/bcache/nvmpg.h b/drivers/md/bcache/nvmpg.h
index 698c890b2d15..55778d4db7da 100644
--- a/drivers/md/bcache/nvmpg.h
+++ b/drivers/md/bcache/nvmpg.h
@@ -11,6 +11,8 @@
  * Bcache NVDIMM in memory data structures
  */
 
+#define BCH_MAX_ORDER 20
+
 /*
  * The following three structures in memory records which page(s) allocated
  * to which owner. After reboot from power failure, they will be initialized
@@ -28,6 +30,11 @@ struct bch_nvmpg_ns {
 	unsigned long pages_total;
 	pfn_t start_pfn;
 
+	unsigned long *pages_bitmap;
+	struct list_head free_area[BCH_MAX_ORDER];
+
+	unsigned long *recs_bitmap;
+
 	struct dax_device *dax_dev;
 	struct block_device *bdev;
 	struct bch_nvmpg_set *set;
@@ -69,6 +76,11 @@ struct bch_nvmpg_set {
 /* Indicate which field in bch_nvmpg_sb to be updated */
 #define BCH_NVMPG_TOTAL_NS	0	/* total_ns */
 
+#define BCH_MAX_PGALLOC_RECS						\
+	(min_t(unsigned int, 64,					\
+	       (BCH_NVMPG_START - BCH_NVMPG_SYSRECS_OFFSET) /		\
+	       sizeof(struct bch_nvmpg_recs)))
+
 void *bch_nvmpg_offset_to_ptr(unsigned long offset);
 unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v13 04/12] bcache: bch_nvmpg_alloc_pages() of the buddy
  2021-12-12 17:05 [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Coly Li
                   ` (2 preceding siblings ...)
  2021-12-12 17:05 ` [PATCH v13 03/12] bcache: initialization of the buddy Coly Li
@ 2021-12-12 17:05 ` Coly Li
  2021-12-12 20:14   ` Jens Axboe
  2021-12-12 17:05 ` [PATCH v13 05/12] bcache: bch_nvmpg_free_pages() of the buddy allocator Coly Li
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Coly Li @ 2021-12-12 17:05 UTC (permalink / raw)
  To: axboe
  Cc: linux-bcache, linux-block, Jianpeng Ma, Qiaowei Ren,
	Christoph Hellwig, Dan Williams, Hannes Reinecke

From: Jianpeng Ma <jianpeng.ma@intel.com>

This patch implements the bch_nvmpg_alloc_pages() of the nvm pages buddy
allocator. In terms of function, this func is like current
page-buddy-alloc. But the differences are:
a: it need owner_uuid as parameter which record owner info. And it
make those info persistence.
b: it don't need flags like GFP_*. All allocs are the equal.
c: it don't trigger other ops etc swap/recycle.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
---
 drivers/md/bcache/nvmpg.c | 221 ++++++++++++++++++++++++++++++++++++++
 drivers/md/bcache/nvmpg.h |   9 ++
 2 files changed, 230 insertions(+)

diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c
index 2b70ee4a6028..a920779eb548 100644
--- a/drivers/md/bcache/nvmpg.c
+++ b/drivers/md/bcache/nvmpg.c
@@ -42,6 +42,11 @@ void *bch_nvmpg_offset_to_ptr(unsigned long offset)
 	return NULL;
 }
 
+static unsigned long bch_nvmpg_offset_to_pgoff(unsigned long nvmpg_offset)
+{
+	return BCH_NVMPG_GET_OFFSET(nvmpg_offset) >> PAGE_SHIFT;
+}
+
 unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr)
 {
 	int ns_id = ns->ns_id;
@@ -60,6 +65,15 @@ static void *bch_nvmpg_pgoff_to_ptr(struct bch_nvmpg_ns *ns, pgoff_t pgoff)
 	return ns->base_addr + (pgoff << PAGE_SHIFT);
 }
 
+static unsigned long bch_nvmpg_pgoff_to_offset(struct bch_nvmpg_ns *ns,
+					       pgoff_t pgoff)
+{
+	int ns_id = ns->ns_id;
+	unsigned long offset = pgoff << PAGE_SHIFT;
+
+	return BCH_NVMPG_OFFSET(ns_id, offset);
+}
+
 static void *bch_nvmpg_rec_to_ptr(struct bch_nvmpg_rec *r)
 {
 	struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[r->ns_id];
@@ -269,6 +283,213 @@ static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns)
 	}
 }
 
+
+/* If not found, it will create if create == true */
+static struct bch_nvmpg_head *find_nvmpg_head(const char *uuid, bool create)
+{
+	struct bch_nvmpg_set_header *set_header = global_nvmpg_set->set_header;
+	struct bch_nvmpg_head *head = NULL;
+	int i;
+
+	if (set_header == NULL)
+		goto out;
+
+	for (i = 0; i < set_header->size; i++) {
+		struct bch_nvmpg_head *h = &set_header->heads[i];
+
+		if (h->state != BCH_NVMPG_HD_STAT_ALLOC)
+			continue;
+
+		if (!memcmp(uuid, h->uuid, 16)) {
+			head = h;
+			break;
+		}
+	}
+
+	if (!head && create) {
+		u32 used = set_header->used;
+
+		if (set_header->size > used) {
+			head = &set_header->heads[used];
+			memset(head, 0, sizeof(struct bch_nvmpg_head));
+			head->state = BCH_NVMPG_HD_STAT_ALLOC;
+			memcpy(head->uuid, uuid, 16);
+			global_nvmpg_set->heads_used++;
+			set_header->used++;
+		} else
+			pr_info("No free bch_nvmpg_head\n");
+	}
+
+out:
+	return head;
+}
+
+static struct bch_nvmpg_recs *find_empty_nvmpg_recs(void)
+{
+	unsigned int start;
+	struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[0];
+	struct bch_nvmpg_recs *recs;
+
+	start = bitmap_find_next_zero_area(ns->recs_bitmap,
+					   BCH_MAX_PGALLOC_RECS, 0, 1, 0);
+	if (start > BCH_MAX_PGALLOC_RECS) {
+		pr_info("No free struct bch_nvmpg_recs\n");
+		return NULL;
+	}
+
+	bitmap_set(ns->recs_bitmap, start, 1);
+	recs = (struct bch_nvmpg_recs *)
+		bch_nvmpg_offset_to_ptr(BCH_NVMPG_SYSRECS_OFFSET)
+	       + start;
+
+	memset(recs, 0, sizeof(struct bch_nvmpg_recs));
+	return recs;
+}
+
+
+static struct bch_nvmpg_recs *find_nvmpg_recs(struct bch_nvmpg_ns *ns,
+					      struct bch_nvmpg_head *head,
+					      bool create)
+{
+	int ns_id = ns->sb->this_ns;
+	struct bch_nvmpg_recs *prev_recs = NULL, *recs = NULL;
+
+	recs = bch_nvmpg_offset_to_ptr(head->recs_offset[ns_id]);
+
+	/* If create=false, we return recs[nr] */
+	if (!create)
+		return recs;
+
+	/*
+	 * If create=true, it mean we need a empty struct bch_nvmpg_rec
+	 * So we should find non-empty struct bch_nvmpg_recs or alloc
+	 * new struct bch_nvmpg_recs. And return this bch_nvmpg_recs
+	 */
+	while (recs && (recs->used == recs->size)) {
+		prev_recs = recs;
+		recs = bch_nvmpg_offset_to_ptr(recs->next_offset);
+	}
+
+	/* Found empty struct bch_nvmpg_recs */
+	if (recs)
+		return recs;
+
+	/* Need alloc new struct bch_nvmpg_recs */
+	recs = find_empty_nvmpg_recs();
+	if (recs) {
+		unsigned long offset;
+
+		recs->next_offset = 0;
+		recs->head_offset = bch_nvmpg_ptr_to_offset(ns, head);
+		memcpy(recs->magic, bch_nvmpg_recs_magic, 16);
+		memcpy(recs->uuid, head->uuid, 16);
+		recs->size = BCH_NVMPG_MAX_RECS;
+		recs->used = 0;
+
+		offset = bch_nvmpg_ptr_to_offset(ns, recs);
+		if (prev_recs)
+			prev_recs->next_offset = offset;
+		else
+			head->recs_offset[ns_id] = offset;
+	}
+
+	return recs;
+}
+
+static void add_nvmpg_rec(struct bch_nvmpg_ns *ns,
+			  struct bch_nvmpg_recs *recs,
+			  unsigned long nvmpg_offset,
+			  int order)
+{
+	int i, ns_id;
+	unsigned long pgoff;
+
+	pgoff = bch_nvmpg_offset_to_pgoff(nvmpg_offset);
+	ns_id = ns->sb->this_ns;
+
+	for (i = 0; i < recs->size; i++) {
+		if (recs->recs[i].pgoff == 0) {
+			recs->recs[i].pgoff = pgoff;
+			recs->recs[i].order = order;
+			recs->recs[i].ns_id = ns_id;
+			recs->used++;
+			break;
+		}
+	}
+	BUG_ON(i == recs->size);
+}
+
+
+unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid)
+{
+	unsigned long nvmpg_offset = 0;
+	struct bch_nvmpg_head *head;
+	int n, o;
+
+	mutex_lock(&global_nvmpg_set->lock);
+	head = find_nvmpg_head(uuid, true);
+
+	if (!head) {
+		pr_err("Cannot find bch_nvmpg_recs by uuid.\n");
+		goto unlock;
+	}
+
+	for (n = 0; n < global_nvmpg_set->total_ns; n++) {
+		struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[n];
+
+		if (!ns || (ns->free < (1L << order)))
+			continue;
+
+		for (o = order; o < BCH_MAX_ORDER; o++) {
+			struct list_head *list;
+			struct page *page, *buddy_page;
+
+			if (list_empty(&ns->free_area[o]))
+				continue;
+
+			list = ns->free_area[o].next;
+			page = container_of((void *)list, struct page,
+					    zone_device_data);
+
+			list_del(list);
+
+			while (o != order) {
+				void *addr;
+				pgoff_t pgoff;
+
+				pgoff = page->index + (1L << (o - 1));
+				addr = bch_nvmpg_pgoff_to_ptr(ns, pgoff);
+				buddy_page = bch_nvmpg_va_to_pg(addr);
+				set_page_private(buddy_page, o - 1);
+				buddy_page->index = pgoff;
+				__SetPageBuddy(buddy_page);
+				list_add((struct list_head *)&buddy_page->zone_device_data,
+					 &ns->free_area[o - 1]);
+				o--;
+			}
+
+			set_page_private(page, order);
+			__ClearPageBuddy(page);
+			ns->free -= 1L << order;
+			nvmpg_offset = bch_nvmpg_pgoff_to_offset(ns, page->index);
+			break;
+		}
+
+		if (o < BCH_MAX_ORDER) {
+			struct bch_nvmpg_recs *recs;
+
+			recs = find_nvmpg_recs(ns, head, true);
+			/* ToDo: handle pgalloc_recs==NULL */
+			add_nvmpg_rec(ns, recs, nvmpg_offset, order);
+			break;
+		}
+	}
+
+unlock:
+	mutex_unlock(&global_nvmpg_set->lock);
+	return nvmpg_offset;
+}
+
 static int attach_nvmpg_set(struct bch_nvmpg_ns *ns)
 {
 	struct bch_nvmpg_sb *sb = ns->sb;
diff --git a/drivers/md/bcache/nvmpg.h b/drivers/md/bcache/nvmpg.h
index 55778d4db7da..d03f3241b45a 100644
--- a/drivers/md/bcache/nvmpg.h
+++ b/drivers/md/bcache/nvmpg.h
@@ -76,6 +76,9 @@ struct bch_nvmpg_set {
 /* Indicate which field in bch_nvmpg_sb to be updated */
 #define BCH_NVMPG_TOTAL_NS	0	/* total_ns */
 
+#define BCH_PGOFF_TO_KVADDR(pgoff)					\
+	((void *)((unsigned long)(pgoff) << PAGE_SHIFT))
+
 #define BCH_MAX_PGALLOC_RECS						\
 	(min_t(unsigned int, 64,					\
 	       (BCH_NVMPG_START - BCH_NVMPG_SYSRECS_OFFSET) /		\
@@ -89,6 +92,7 @@ unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr);
 struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path);
 int bch_nvmpg_init(void);
 void bch_nvmpg_exit(void);
+unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid);
 
 #else
 
@@ -104,6 +108,11 @@ static inline int bch_nvmpg_init(void)
 
 static inline void bch_nvmpg_exit(void) { }
 
+static inline unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid)
+{
+	return 0;
+}
+
 #endif /* CONFIG_BCACHE_NVM_PAGES */
 
 #endif /* _BCACHE_NVM_PAGES_H */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v13 05/12] bcache: bch_nvmpg_free_pages() of the buddy allocator
  2021-12-12 17:05 [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Coly Li
                   ` (3 preceding siblings ...)
  2021-12-12 17:05 ` [PATCH v13 04/12] bcache: bch_nvmpg_alloc_pages() " Coly Li
@ 2021-12-12 17:05 ` Coly Li
  2021-12-12 20:16   ` Jens Axboe
  2022-02-21 12:36   ` yukuai (C)
  2021-12-12 17:05 ` [PATCH v13 06/12] bcache: get recs list head for allocated pages by specific uuid Coly Li
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 28+ messages in thread
From: Coly Li @ 2021-12-12 17:05 UTC (permalink / raw)
  To: axboe
  Cc: linux-bcache, linux-block, Jianpeng Ma, Qiaowei Ren,
	Christoph Hellwig, Dan Williams, Hannes Reinecke

From: Jianpeng Ma <jianpeng.ma@intel.com>

This patch implements the bch_nvmpg_free_pages() of the buddy allocator.

The difference between this and page-buddy-free:
it need owner_uuid to free owner allocated pages, and must
persistent after free.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
---
 drivers/md/bcache/nvmpg.c | 164 ++++++++++++++++++++++++++++++++++++--
 drivers/md/bcache/nvmpg.h |   3 +
 2 files changed, 160 insertions(+), 7 deletions(-)

diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c
index a920779eb548..8ce0c4389b42 100644
--- a/drivers/md/bcache/nvmpg.c
+++ b/drivers/md/bcache/nvmpg.c
@@ -248,6 +248,57 @@ static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns)
 	return rc;
 }
 
+static void __free_space(struct bch_nvmpg_ns *ns, unsigned long nvmpg_offset,
+			 int order)
+{
+	unsigned long add_pages = (1L << order);
+	pgoff_t pgoff;
+	struct page *page;
+	void *va;
+
+	if (nvmpg_offset == 0) {
+		pr_err("free pages on offset 0\n");
+		return;
+	}
+
+	page = bch_nvmpg_va_to_pg(bch_nvmpg_offset_to_ptr(nvmpg_offset));
+	WARN_ON((!page) || (page->private != order));
+	pgoff = page->index;
+
+	while (order < BCH_MAX_ORDER - 1) {
+		struct page *buddy_page;
+
+		pgoff_t buddy_pgoff = pgoff ^ (1L << order);
+		pgoff_t parent_pgoff = pgoff & ~(1L << order);
+
+		if ((parent_pgoff + (1L << (order + 1)) > ns->pages_total))
+			break;
+
+		va = bch_nvmpg_pgoff_to_ptr(ns, buddy_pgoff);
+		buddy_page = bch_nvmpg_va_to_pg(va);
+		WARN_ON(!buddy_page);
+
+		if (PageBuddy(buddy_page) && (buddy_page->private == order)) {
+			list_del((struct list_head *)&buddy_page->zone_device_data);
+			__ClearPageBuddy(buddy_page);
+			pgoff = parent_pgoff;
+			order++;
+			continue;
+		}
+		break;
+	}
+
+	va = bch_nvmpg_pgoff_to_ptr(ns, pgoff);
+	page = bch_nvmpg_va_to_pg(va);
+	WARN_ON(!page);
+	list_add((struct list_head *)&page->zone_device_data,
+		 &ns->free_area[order]);
+	page->index = pgoff;
+	set_page_private(page, order);
+	__SetPageBuddy(page);
+	ns->free += add_pages;
+}
+
 static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns)
 {
 	unsigned int start, end, pages;
@@ -261,21 +312,19 @@ static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns)
 		pages = end - start;
 
 		while (pages) {
-			void *addr;
-
 			for (i = BCH_MAX_ORDER - 1; i >= 0; i--) {
 				if ((pgoff_start % (1L << i) == 0) &&
 				    (pages >= (1L << i)))
 					break;
 			}
 
-			addr = bch_nvmpg_pgoff_to_ptr(ns, pgoff_start);
-			page = bch_nvmpg_va_to_pg(addr);
+			page = bch_nvmpg_va_to_pg(
+					bch_nvmpg_pgoff_to_ptr(ns, pgoff_start));
 			set_page_private(page, i);
 			page->index = pgoff_start;
-			__SetPageBuddy(page);
-			list_add((struct list_head *)&page->zone_device_data,
-				 &ns->free_area[i]);
+
+			/* In order to update ns->free */
+			__free_space(ns, pgoff_start, i);
 
 			pgoff_start += 1L << i;
 			pages -= 1L << i;
@@ -490,6 +539,106 @@ unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid)
 	return nvmpg_offset;
 }
 
+static inline void *nvm_end_addr(struct bch_nvmpg_ns *ns)
+{
+	return ns->base_addr + (ns->pages_total << PAGE_SHIFT);
+}
+
+static inline bool in_nvmpg_ns_range(struct bch_nvmpg_ns *ns,
+				     void *start_addr, void *end_addr)
+{
+	return (start_addr >= ns->base_addr) && (end_addr < nvm_end_addr(ns));
+}
+
+static int remove_nvmpg_rec(struct bch_nvmpg_recs *recs, int ns_id,
+			    unsigned long nvmpg_offset, int order)
+{
+	struct bch_nvmpg_head *head;
+	struct bch_nvmpg_recs *prev_recs, *sys_recs;
+	struct bch_nvmpg_ns *ns;
+	unsigned long pgoff;
+	int i;
+
+	ns = global_nvmpg_set->ns_tbl[0];
+	pgoff = bch_nvmpg_offset_to_pgoff(nvmpg_offset);
+
+	head = bch_nvmpg_offset_to_ptr(recs->head_offset);
+	prev_recs = recs;
+	sys_recs = bch_nvmpg_offset_to_ptr(BCH_NVMPG_SYSRECS_OFFSET);
+	while (recs) {
+		for (i = 0; i < recs->size; i++) {
+			struct bch_nvmpg_rec *rec = &(recs->recs[i]);
+
+			if ((rec->pgoff == pgoff) && (rec->ns_id == ns_id)) {
+				WARN_ON(rec->order != order);
+				rec->_v = 0;
+				recs->used--;
+
+				if (recs->used == 0) {
+					int recs_pos = recs - sys_recs;
+
+					if (recs == prev_recs)
+						head->recs_offset[ns_id] =
+							recs->next_offset;
+					else
+						prev_recs->next_offset =
+							recs->next_offset;
+
+					recs->next_offset = 0;
+					recs->head_offset = 0;
+
+					bitmap_clear(ns->recs_bitmap, recs_pos, 1);
+				}
+				goto out;
+			}
+		}
+		prev_recs = recs;
+		recs = bch_nvmpg_offset_to_ptr(recs->next_offset);
+	}
+out:
+	return (recs ? 0 : -ENOENT);
+}
+
+void bch_nvmpg_free_pages(unsigned long nvmpg_offset, int order,
+			  const char *uuid)
+{
+	struct bch_nvmpg_ns *ns;
+	struct bch_nvmpg_head *head;
+	struct bch_nvmpg_recs *recs;
+	int r;
+
+	mutex_lock(&global_nvmpg_set->lock);
+
+	ns = global_nvmpg_set->ns_tbl[BCH_NVMPG_GET_NS_ID(nvmpg_offset)];
+	if (!ns) {
+		pr_err("can't find namespace by given kaddr from namespace\n");
+		goto unlock;
+	}
+
+	head = find_nvmpg_head(uuid, false);
+	if (!head) {
+		pr_err("can't found bch_nvmpg_head by uuid\n");
+		goto unlock;
+	}
+
+	recs = find_nvmpg_recs(ns, head, false);
+	if (!recs) {
+		pr_err("can't find bch_nvmpg_recs by uuid\n");
+		goto unlock;
+	}
+
+	r = remove_nvmpg_rec(recs, ns->sb->this_ns, nvmpg_offset, order);
+	if (r < 0) {
+		pr_err("can't find bch_nvmpg_rec\n");
+		goto unlock;
+	}
+
+	__free_space(ns, nvmpg_offset, order);
+
+unlock:
+	mutex_unlock(&global_nvmpg_set->lock);
+}
+
 static int attach_nvmpg_set(struct bch_nvmpg_ns *ns)
 {
 	struct bch_nvmpg_sb *sb = ns->sb;
@@ -686,6 +835,7 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
 	ns->pages_offset = sb->pages_offset;
 	ns->pages_total = sb->pages_total;
 	ns->sb = sb;
+	/* increase by __free_space() */
 	ns->free = 0;
 	ns->bdev = bdev;
 	ns->set = global_nvmpg_set;
diff --git a/drivers/md/bcache/nvmpg.h b/drivers/md/bcache/nvmpg.h
index d03f3241b45a..e089936e7f13 100644
--- a/drivers/md/bcache/nvmpg.h
+++ b/drivers/md/bcache/nvmpg.h
@@ -93,6 +93,7 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path);
 int bch_nvmpg_init(void);
 void bch_nvmpg_exit(void);
 unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid);
+void bch_nvmpg_free_pages(unsigned long nvmpg_offset, int order, const char *uuid);
 
 #else
 
@@ -113,6 +114,8 @@ static inline unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid)
 	return 0;
 }
 
+static inline void bch_nvmpg_free_pages(void *addr, int order, const char *uuid) { }
+
 #endif /* CONFIG_BCACHE_NVM_PAGES */
 
 #endif /* _BCACHE_NVM_PAGES_H */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v13 06/12] bcache: get recs list head for allocated pages by specific uuid
  2021-12-12 17:05 [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Coly Li
                   ` (4 preceding siblings ...)
  2021-12-12 17:05 ` [PATCH v13 05/12] bcache: bch_nvmpg_free_pages() of the buddy allocator Coly Li
@ 2021-12-12 17:05 ` Coly Li
  2021-12-12 20:18   ` Jens Axboe
  2021-12-12 17:05 ` [PATCH v13 07/12] bcache: use bucket index to set GC_MARK_METADATA for journal buckets in bch_btree_gc_finish() Coly Li
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Coly Li @ 2021-12-12 17:05 UTC (permalink / raw)
  To: axboe
  Cc: linux-bcache, linux-block, Jianpeng Ma, Qiaowei Ren,
	Hannes Reinecke, Christoph Hellwig, Dan Williams

From: Jianpeng Ma <jianpeng.ma@intel.com>

This patch implements bch_get_nvmpg_head() of the buddy allocator
to be used to get recs list head for allocated pages by specific
uuid. Then the requester (owner) can find all previous allocated
nvdimm pages by iterating the recs list.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
---
 drivers/md/bcache/nvmpg.c | 5 +++++
 drivers/md/bcache/nvmpg.h | 6 ++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c
index 8ce0c4389b42..e26c7b578a62 100644
--- a/drivers/md/bcache/nvmpg.c
+++ b/drivers/md/bcache/nvmpg.c
@@ -539,6 +539,11 @@ unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid)
 	return nvmpg_offset;
 }
 
+struct bch_nvmpg_head *bch_get_nvmpg_head(const char *uuid)
+{
+	return find_nvmpg_head(uuid, false);
+}
+
 static inline void *nvm_end_addr(struct bch_nvmpg_ns *ns)
 {
 	return ns->base_addr + (ns->pages_total << PAGE_SHIFT);
diff --git a/drivers/md/bcache/nvmpg.h b/drivers/md/bcache/nvmpg.h
index e089936e7f13..2361cabf18be 100644
--- a/drivers/md/bcache/nvmpg.h
+++ b/drivers/md/bcache/nvmpg.h
@@ -94,6 +94,7 @@ int bch_nvmpg_init(void);
 void bch_nvmpg_exit(void);
 unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid);
 void bch_nvmpg_free_pages(unsigned long nvmpg_offset, int order, const char *uuid);
+struct bch_nvmpg_head *bch_get_nvmpg_head(const char *uuid);
 
 #else
 
@@ -116,6 +117,11 @@ static inline unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid)
 
 static inline void bch_nvmpg_free_pages(void *addr, int order, const char *uuid) { }
 
+static inline struct bch_nvmpg_head *bch_get_nvmpg_head(const char *uuid)
+{
+	return NULL;
+}
+
 #endif /* CONFIG_BCACHE_NVM_PAGES */
 
 #endif /* _BCACHE_NVM_PAGES_H */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v13 07/12] bcache: use bucket index to set GC_MARK_METADATA for journal buckets in bch_btree_gc_finish()
  2021-12-12 17:05 [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Coly Li
                   ` (5 preceding siblings ...)
  2021-12-12 17:05 ` [PATCH v13 06/12] bcache: get recs list head for allocated pages by specific uuid Coly Li
@ 2021-12-12 17:05 ` Coly Li
  2021-12-12 17:05 ` [PATCH v13 08/12] bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set Coly Li
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Coly Li @ 2021-12-12 17:05 UTC (permalink / raw)
  To: axboe
  Cc: linux-bcache, linux-block, Coly Li, Hannes Reinecke,
	Christoph Hellwig, Dan Williams, Jianpeng Ma, Qiaowei Ren

Currently the meta data bucket locations on cache device are reserved
after the meta data stored on NVDIMM pages, for the meta data layout
consistentcy temporarily. So these buckets are still marked as meta data
by SET_GC_MARK() in bch_btree_gc_finish().

When BCH_FEATURE_INCOMPAT_NVDIMM_META is set, the sb.d[] stores linear
address of NVDIMM pages and not bucket index anymore. Therefore we
should avoid to find bucket index from sb.d[], and directly use bucket
index from ca->sb.first_bucket to (ca->sb.first_bucket +
ca->sb.njournal_bucketsi) for setting the gc mark of journal bucket.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
---
 drivers/md/bcache/btree.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 88c573eeb598..1a0ff117373f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1761,8 +1761,10 @@ static void bch_btree_gc_finish(struct cache_set *c)
 	ca = c->cache;
 	ca->invalidate_needs_gc = 0;
 
-	for (k = ca->sb.d; k < ca->sb.d + ca->sb.keys; k++)
-		SET_GC_MARK(ca->buckets + *k, GC_MARK_METADATA);
+	/* Range [first_bucket, first_bucket + keys) is for journal buckets */
+	for (i = ca->sb.first_bucket;
+	     i < ca->sb.first_bucket + ca->sb.njournal_buckets; i++)
+		SET_GC_MARK(ca->buckets + i, GC_MARK_METADATA);
 
 	for (k = ca->prio_buckets;
 	     k < ca->prio_buckets + prio_buckets(ca) * 2; k++)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v13 08/12] bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
  2021-12-12 17:05 [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Coly Li
                   ` (6 preceding siblings ...)
  2021-12-12 17:05 ` [PATCH v13 07/12] bcache: use bucket index to set GC_MARK_METADATA for journal buckets in bch_btree_gc_finish() Coly Li
@ 2021-12-12 17:05 ` Coly Li
  2021-12-12 17:05 ` [PATCH v13 09/12] bcache: initialize bcache journal for NVDIMM meta device Coly Li
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Coly Li @ 2021-12-12 17:05 UTC (permalink / raw)
  To: axboe
  Cc: linux-bcache, linux-block, Coly Li, Hannes Reinecke,
	Christoph Hellwig, Dan Williams, Jianpeng Ma, Qiaowei Ren

This patch adds BCH_FEATURE_INCOMPAT_NVDIMM_META (value 0x0004) into the
incompat feature set. When this bit is set by bcache-tools, it indicates
bcache meta data should be stored on specific NVDIMM meta device.

The bcache meta data mainly includes journal and btree nodes, when this
bit is set in incompat feature set, bcache will ask the nvm-pages
allocator for NVDIMM space to store the meta data.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
---
 drivers/md/bcache/features.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/md/bcache/features.h b/drivers/md/bcache/features.h
index 09161b89c63e..fab92678be76 100644
--- a/drivers/md/bcache/features.h
+++ b/drivers/md/bcache/features.h
@@ -18,11 +18,19 @@
 #define BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET		0x0001
 /* real bucket size is (1 << bucket_size) */
 #define BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE	0x0002
+/* store bcache meta data on nvdimm */
+#define BCH_FEATURE_INCOMPAT_NVDIMM_META		0x0004
 
 #define BCH_FEATURE_COMPAT_SUPP		0
 #define BCH_FEATURE_RO_COMPAT_SUPP	0
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+#define BCH_FEATURE_INCOMPAT_SUPP	(BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET| \
+					 BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE| \
+					 BCH_FEATURE_INCOMPAT_NVDIMM_META)
+#else
 #define BCH_FEATURE_INCOMPAT_SUPP	(BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET| \
 					 BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE)
+#endif
 
 #define BCH_HAS_COMPAT_FEATURE(sb, mask) \
 		((sb)->feature_compat & (mask))
@@ -90,6 +98,7 @@ static inline void bch_clear_feature_##name(struct cache_sb *sb) \
 
 BCH_FEATURE_INCOMPAT_FUNCS(obso_large_bucket, OBSO_LARGE_BUCKET);
 BCH_FEATURE_INCOMPAT_FUNCS(large_bucket, LOG_LARGE_BUCKET_SIZE);
+BCH_FEATURE_INCOMPAT_FUNCS(nvdimm_meta, NVDIMM_META);
 
 static inline bool bch_has_unknown_compat_features(struct cache_sb *sb)
 {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v13 09/12] bcache: initialize bcache journal for NVDIMM meta device
  2021-12-12 17:05 [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Coly Li
                   ` (7 preceding siblings ...)
  2021-12-12 17:05 ` [PATCH v13 08/12] bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set Coly Li
@ 2021-12-12 17:05 ` Coly Li
  2021-12-12 17:05 ` [PATCH v13 10/12] bcache: support storing bcache journal into " Coly Li
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Coly Li @ 2021-12-12 17:05 UTC (permalink / raw)
  To: axboe
  Cc: linux-bcache, linux-block, Coly Li, Christoph Hellwig,
	Dan Williams, Hannes Reinecke, Jianpeng Ma, Qiaowei Ren

The nvm-pages allocator may store and index the NVDIMM pages allocated
for bcache journal. This patch adds the initialization to store bcache
journal space on NVDIMM pages if BCH_FEATURE_INCOMPAT_NVDIMM_META bit is
set by bcache-tools.

If BCH_FEATURE_INCOMPAT_NVDIMM_META is set, get_nvdimm_journal_space()
will return the nvmpg_offset of NVDIMM pages for bcache journal,
- If there is previously allocated space, find it from nvm-pages owner
  list and return to bch_journal_init().
- If there is no previously allocated space, require a new NVDIMM range
  from the nvm-pages allocator, and return it to bch_journal_init().

And in bch_journal_init(), keys in sb.d[] store the corresponding nvmpg
offset from NVDIMM into sb.d[i].ptr[0] where 'i' is the bucket index to
iterate all journal buckets.

Later when bcache journaling code stores the journaling jset, the target
NVDIMM nvmpg offset stored (and updated) in sb.d[i].ptr[0] can be used
to calculate the linear address in memory copy from DRAM pages into
NVDIMM pages.

Signed-off-by: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
---
 drivers/md/bcache/journal.c | 113 ++++++++++++++++++++++++++++++++++++
 drivers/md/bcache/journal.h |   2 +-
 drivers/md/bcache/nvmpg.c   |   9 +++
 drivers/md/bcache/nvmpg.h   |   1 +
 drivers/md/bcache/super.c   |  18 +++---
 5 files changed, 132 insertions(+), 11 deletions(-)

diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 61bd79babf7a..d887557c718e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -9,6 +9,8 @@
 #include "btree.h"
 #include "debug.h"
 #include "extents.h"
+#include "nvmpg.h"
+#include "features.h"
 
 #include <trace/events/bcache.h>
 
@@ -982,3 +984,114 @@ int bch_journal_alloc(struct cache_set *c)
 
 	return 0;
 }
+
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+
+static unsigned long find_journal_nvmpg_base(struct bch_nvmpg_head *nvmpg_head,
+					     struct cache *ca)
+{
+	unsigned long jnl_offset, jnl_pgoff, jnl_ns_id;
+	unsigned long ret_offset = 0;
+	int i;
+
+	jnl_offset = (unsigned long)ca->sb.d[0];
+	jnl_ns_id = BCH_NVMPG_GET_NS_ID(jnl_offset);
+	jnl_pgoff = BCH_NVMPG_GET_OFFSET(jnl_offset) >> PAGE_SHIFT;
+
+	for (i = 0; i < BCH_NVMPG_NS_MAX; i++) {
+		struct bch_nvmpg_recs *recs;
+		struct bch_nvmpg_rec *rec;
+		unsigned long recs_offset = 0;
+		int j;
+
+		recs_offset = nvmpg_head->recs_offset[i];
+		recs = bch_nvmpg_offset_to_ptr(recs_offset);
+		while (recs) {
+			for (j = 0; j < recs->size; j++) {
+				rec = &recs->recs[j];
+				if ((rec->pgoff != jnl_pgoff) ||
+				    (rec->ns_id != jnl_ns_id))
+					continue;
+
+				ret_offset = jnl_offset;
+				goto out;
+			}
+			recs_offset = recs->next_offset;
+			recs = bch_nvmpg_offset_to_ptr(recs_offset);
+		}
+	}
+
+out:
+	return ret_offset;
+}
+
+static unsigned long get_journal_nvmpg_space(struct cache *ca)
+{
+	struct bch_nvmpg_head *head = NULL;
+	unsigned long nvmpg_offset;
+	int order;
+
+	head = bch_get_nvmpg_head(ca->sb.set_uuid);
+	if (head) {
+		nvmpg_offset = find_journal_nvmpg_base(head, ca);
+		if (nvmpg_offset)
+			goto found;
+	}
+
+	order = ilog2((ca->sb.bucket_size *
+		       ca->sb.njournal_buckets) / PAGE_SECTORS);
+	nvmpg_offset = bch_nvmpg_alloc_pages(order, ca->sb.set_uuid);
+	if (nvmpg_offset)
+		memset(bch_nvmpg_offset_to_ptr(nvmpg_offset),
+		       0, (1 << order) * PAGE_SIZE);
+found:
+	return nvmpg_offset;
+}
+
+#endif /* CONFIG_BCACHE_NVM_PAGES */
+
+static int __bch_journal_nvdimm_init(struct cache *ca)
+{
+	int ret = -1;
+
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+	int i;
+	unsigned long jnl_base = 0;
+
+	jnl_base = get_journal_nvmpg_space(ca);
+	if (!jnl_base) {
+		pr_err("Failed to get journal space from nvdimm\n");
+		goto out;
+	}
+
+	/* Iniialized and reloaded from on-disk super block already */
+	if (ca->sb.d[0] != 0)
+		goto out;
+
+	for (i = 0; i < ca->sb.keys; i++)
+		ca->sb.d[i] = jnl_base + (bucket_bytes(ca) * i);
+
+	ret = 0;
+out:
+#endif /* CONFIG_BCACHE_NVM_PAGES */
+
+	return ret;
+}
+
+
+int bch_journal_init(struct cache_set *c)
+{
+	int i, ret = 0;
+	struct cache *ca = c->cache;
+
+	ca->sb.keys = clamp_t(int, ca->sb.nbuckets >> 7,
+			      2, SB_JOURNAL_BUCKETS);
+
+	if (!bch_has_feature_nvdimm_meta(&ca->sb)) {
+		for (i = 0; i < ca->sb.keys; i++)
+			ca->sb.d[i] = ca->sb.first_bucket + i;
+	} else
+		ret = __bch_journal_nvdimm_init(ca);
+
+	return ret;
+}
diff --git a/drivers/md/bcache/journal.h b/drivers/md/bcache/journal.h
index f2ea34d5f431..e3a7fa5a8fda 100644
--- a/drivers/md/bcache/journal.h
+++ b/drivers/md/bcache/journal.h
@@ -179,7 +179,7 @@ void bch_journal_mark(struct cache_set *c, struct list_head *list);
 void bch_journal_meta(struct cache_set *c, struct closure *cl);
 int bch_journal_read(struct cache_set *c, struct list_head *list);
 int bch_journal_replay(struct cache_set *c, struct list_head *list);
-
+int bch_journal_init(struct cache_set *c);
 void bch_journal_free(struct cache_set *c);
 int bch_journal_alloc(struct cache_set *c);
 
diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c
index e26c7b578a62..1a3c6327b091 100644
--- a/drivers/md/bcache/nvmpg.c
+++ b/drivers/md/bcache/nvmpg.c
@@ -24,6 +24,15 @@
 
 struct bch_nvmpg_set *global_nvmpg_set;
 
+struct bch_nvmpg_ns *bch_nvmpg_id_to_ns(int ns_id)
+{
+	if ((ns_id >= 0) && (ns_id < BCH_NVMPG_NS_MAX))
+		return global_nvmpg_set->ns_tbl[ns_id];
+
+	pr_emerg("Invalid ns_id: %d\n", ns_id);
+	return NULL;
+}
+
 void *bch_nvmpg_offset_to_ptr(unsigned long offset)
 {
 	int ns_id = BCH_NVMPG_GET_NS_ID(offset);
diff --git a/drivers/md/bcache/nvmpg.h b/drivers/md/bcache/nvmpg.h
index 2361cabf18be..f7b7177cced3 100644
--- a/drivers/md/bcache/nvmpg.h
+++ b/drivers/md/bcache/nvmpg.h
@@ -95,6 +95,7 @@ void bch_nvmpg_exit(void);
 unsigned long bch_nvmpg_alloc_pages(int order, const char *uuid);
 void bch_nvmpg_free_pages(unsigned long nvmpg_offset, int order, const char *uuid);
 struct bch_nvmpg_head *bch_get_nvmpg_head(const char *uuid);
+struct bch_nvmpg_ns *bch_nvmpg_id_to_ns(int ns_id);
 
 #else
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 74d51a0b806f..a27fa65d8832 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -147,9 +147,11 @@ static const char *read_super_common(struct cache_sb *sb,  struct block_device *
 		goto err;
 
 	err = "Journal buckets not sequential";
-	for (i = 0; i < sb->keys; i++)
-		if (sb->d[i] != sb->first_bucket + i)
-			goto err;
+	if (!bch_has_feature_nvdimm_meta(sb)) {
+		for (i = 0; i < sb->keys; i++)
+			if (sb->d[i] != sb->first_bucket + i)
+				goto err;
+	}
 
 	err = "Too many journal buckets";
 	if (sb->first_bucket + sb->keys > sb->nbuckets)
@@ -2068,14 +2070,10 @@ static int run_cache_set(struct cache_set *c)
 		if (bch_journal_replay(c, &journal))
 			goto err;
 	} else {
-		unsigned int j;
-
 		pr_notice("invalidating existing data\n");
-		ca->sb.keys = clamp_t(int, ca->sb.nbuckets >> 7,
-					2, SB_JOURNAL_BUCKETS);
-
-		for (j = 0; j < ca->sb.keys; j++)
-			ca->sb.d[j] = ca->sb.first_bucket + j;
+		err = "error initializing journal";
+		if (bch_journal_init(c))
+			goto err;
 
 		bch_initial_gc_finish(c);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v13 10/12] bcache: support storing bcache journal into NVDIMM meta device
  2021-12-12 17:05 [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Coly Li
                   ` (8 preceding siblings ...)
  2021-12-12 17:05 ` [PATCH v13 09/12] bcache: initialize bcache journal for NVDIMM meta device Coly Li
@ 2021-12-12 17:05 ` Coly Li
  2021-12-12 17:05 ` [PATCH v13 11/12] bcache: read jset from NVDIMM pages for journal replay Coly Li
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Coly Li @ 2021-12-12 17:05 UTC (permalink / raw)
  To: axboe
  Cc: linux-bcache, linux-block, Coly Li, Hannes Reinecke,
	Christoph Hellwig, Dan Williams, Jianpeng Ma, Qiaowei Ren

This patch implements two methods to store bcache journal to,
1) __journal_write_unlocked() for block interface device
   The latency method to compose bio and issue the jset bio to cache
   device (e.g. SSD). c->journal.key.ptr[0] indicates the LBA on cache
   device to store the journal jset.
2) __journal_nvdimm_write_unlocked() for memory interface NVDIMM
   Use memory interface to access NVDIMM pages and store the jset by
   memcpy_flushcache(). c->journal.key.ptr[0] indicates the linear
   address from the NVDIMM pages to store the journal jset.

For legacy configuration without NVDIMM meta device, journal I/O is
handled by __journal_write_unlocked() with existing code logic. If the
NVDIMM meta device is used (by bcache-tools), the journal I/O will
be handled by __journal_nvdimm_write_unlocked() and go into the NVDIMM
pages.

And when NVDIMM meta device is used, sb.d[] stores the linear addresses
from NVDIMM pages (no more bucket index), in journal_reclaim() the
journaling location in c->journal.key.ptr[0] should also be updated by
linear address from NVDIMM pages (no more LBA combined by sectors offset
and bucket index).

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
---
 drivers/md/bcache/journal.c | 120 +++++++++++++++++++++++++-----------
 drivers/md/bcache/super.c   |   3 +-
 2 files changed, 85 insertions(+), 38 deletions(-)

diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index d887557c718e..7d5c5ed18890 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -596,6 +596,8 @@ static void do_journal_discard(struct cache *ca)
 		return;
 	}
 
+	BUG_ON(bch_has_feature_nvdimm_meta(&ca->sb));
+
 	switch (atomic_read(&ja->discard_in_flight)) {
 	case DISCARD_IN_FLIGHT:
 		return;
@@ -661,9 +663,16 @@ static void journal_reclaim(struct cache_set *c)
 		goto out;
 
 	ja->cur_idx = next;
-	k->ptr[0] = MAKE_PTR(0,
-			     bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
-			     ca->sb.nr_this_dev);
+	if (!bch_has_feature_nvdimm_meta(&ca->sb))
+		k->ptr[0] = MAKE_PTR(0,
+			bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
+			ca->sb.nr_this_dev);
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+	else
+		k->ptr[0] = (unsigned long)bch_nvmpg_offset_to_ptr(
+						ca->sb.d[ja->cur_idx]);
+#endif
+
 	atomic_long_inc(&c->reclaimed_journal_buckets);
 
 	bkey_init(k);
@@ -729,46 +738,21 @@ static void journal_write_unlock(struct closure *cl)
 	spin_unlock(&c->journal.lock);
 }
 
-static void journal_write_unlocked(struct closure *cl)
+
+static void __journal_write_unlocked(struct cache_set *c)
 	__releases(c->journal.lock)
 {
-	struct cache_set *c = container_of(cl, struct cache_set, journal.io);
-	struct cache *ca = c->cache;
-	struct journal_write *w = c->journal.cur;
 	struct bkey *k = &c->journal.key;
-	unsigned int i, sectors = set_blocks(w->data, block_bytes(ca)) *
-		ca->sb.block_size;
-
+	struct journal_write *w = c->journal.cur;
+	struct closure *cl = &c->journal.io;
+	struct cache *ca = c->cache;
 	struct bio *bio;
 	struct bio_list list;
+	unsigned int i, sectors = set_blocks(w->data, block_bytes(ca)) *
+		ca->sb.block_size;
 
 	bio_list_init(&list);
 
-	if (!w->need_write) {
-		closure_return_with_destructor(cl, journal_write_unlock);
-		return;
-	} else if (journal_full(&c->journal)) {
-		journal_reclaim(c);
-		spin_unlock(&c->journal.lock);
-
-		btree_flush_write(c);
-		continue_at(cl, journal_write, bch_journal_wq);
-		return;
-	}
-
-	c->journal.blocks_free -= set_blocks(w->data, block_bytes(ca));
-
-	w->data->btree_level = c->root->level;
-
-	bkey_copy(&w->data->btree_root, &c->root->key);
-	bkey_copy(&w->data->uuid_bucket, &c->uuid_bucket);
-
-	w->data->prio_bucket[ca->sb.nr_this_dev] = ca->prio_buckets[0];
-	w->data->magic		= jset_magic(&ca->sb);
-	w->data->version	= BCACHE_JSET_VERSION;
-	w->data->last_seq	= last_seq(&c->journal);
-	w->data->csum		= csum_set(w->data);
-
 	for (i = 0; i < KEY_PTRS(k); i++) {
 		ca = c->cache;
 		bio = &ca->journal.bio;
@@ -793,7 +777,6 @@ static void journal_write_unlocked(struct closure *cl)
 
 		ca->journal.seq[ca->journal.cur_idx] = w->data->seq;
 	}
-
 	/* If KEY_PTRS(k) == 0, this jset gets lost in air */
 	BUG_ON(i == 0);
 
@@ -805,6 +788,71 @@ static void journal_write_unlocked(struct closure *cl)
 
 	while ((bio = bio_list_pop(&list)))
 		closure_bio_submit(c, bio, cl);
+}
+
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+
+static void __journal_nvdimm_write_unlocked(struct cache_set *c)
+	__releases(c->journal.lock)
+{
+	struct journal_write *w = c->journal.cur;
+	struct cache *ca = c->cache;
+	unsigned int sectors;
+
+	sectors = set_blocks(w->data, block_bytes(ca)) * ca->sb.block_size;
+	atomic_long_add(sectors, &ca->meta_sectors_written);
+
+	memcpy_flushcache((void *)c->journal.key.ptr[0], w->data, sectors << 9);
+
+	c->journal.key.ptr[0] += sectors << 9;
+	ca->journal.seq[ca->journal.cur_idx] = w->data->seq;
+
+	atomic_dec_bug(&fifo_back(&c->journal.pin));
+	bch_journal_next(&c->journal);
+	journal_reclaim(c);
+
+	spin_unlock(&c->journal.lock);
+}
+
+#endif /* CONFIG_BCACHE_NVM_PAGES */
+
+static void journal_write_unlocked(struct closure *cl)
+{
+	struct cache_set *c = container_of(cl, struct cache_set, journal.io);
+	struct cache *ca = c->cache;
+	struct journal_write *w = c->journal.cur;
+
+	if (!w->need_write) {
+		closure_return_with_destructor(cl, journal_write_unlock);
+		return;
+	} else if (journal_full(&c->journal)) {
+		journal_reclaim(c);
+		spin_unlock(&c->journal.lock);
+
+		btree_flush_write(c);
+		continue_at(cl, journal_write, bch_journal_wq);
+		return;
+	}
+
+	c->journal.blocks_free -= set_blocks(w->data, block_bytes(ca));
+
+	w->data->btree_level = c->root->level;
+
+	bkey_copy(&w->data->btree_root, &c->root->key);
+	bkey_copy(&w->data->uuid_bucket, &c->uuid_bucket);
+
+	w->data->prio_bucket[ca->sb.nr_this_dev] = ca->prio_buckets[0];
+	w->data->magic		= jset_magic(&ca->sb);
+	w->data->version	= BCACHE_JSET_VERSION;
+	w->data->last_seq	= last_seq(&c->journal);
+	w->data->csum		= csum_set(w->data);
+
+	if (!bch_has_feature_nvdimm_meta(&ca->sb))
+		__journal_write_unlocked(c);
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+	else
+		__journal_nvdimm_write_unlocked(c);
+#endif
 
 	continue_at(cl, journal_write_done, NULL);
 }
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index a27fa65d8832..45b69ddc9cfa 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1679,7 +1679,7 @@ void bch_cache_set_release(struct kobject *kobj)
 static void cache_set_free(struct closure *cl)
 {
 	struct cache_set *c = container_of(cl, struct cache_set, cl);
-	struct cache *ca;
+	struct cache *ca = c->cache;
 
 	debugfs_remove(c->debug);
 
@@ -1691,7 +1691,6 @@ static void cache_set_free(struct closure *cl)
 	bch_bset_sort_state_free(&c->sort);
 	free_pages((unsigned long) c->uuids, ilog2(meta_bucket_pages(&c->cache->sb)));
 
-	ca = c->cache;
 	if (ca) {
 		ca->set = NULL;
 		c->cache = NULL;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v13 11/12] bcache: read jset from NVDIMM pages for journal replay
  2021-12-12 17:05 [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Coly Li
                   ` (9 preceding siblings ...)
  2021-12-12 17:05 ` [PATCH v13 10/12] bcache: support storing bcache journal into " Coly Li
@ 2021-12-12 17:05 ` Coly Li
  2021-12-12 17:05 ` [PATCH v13 12/12] bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device Coly Li
  2021-12-12 20:20 ` [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Jens Axboe
  12 siblings, 0 replies; 28+ messages in thread
From: Coly Li @ 2021-12-12 17:05 UTC (permalink / raw)
  To: axboe
  Cc: linux-bcache, linux-block, Coly Li, kernel test robot,
	Dan Carpenter, Christoph Hellwig, Dan Williams, Hannes Reinecke,
	Jianpeng Ma, Qiaowei Ren

This patch implements two methods to read jset from media for journal
replay,
- __jnl_rd_bkt() for block device
  This is the legacy method to read jset via block device interface.
- __jnl_rd_nvm_bkt() for NVDIMM
  This is the method to read jset from NVDIMM memory interface, a.k.a
  memcopy() from NVDIMM pages to DRAM pages.

If BCH_FEATURE_INCOMPAT_NVDIMM_META is set in incompat feature set,
during running cache set, journal_read_bucket() will read the journal
content from NVDIMM by __jnl_rd_nvm_bkt(). The linear addresses of
NVDIMM pages to read jset are stored in sb.d[SB_JOURNAL_BUCKETS], which
were initialized and maintained in previous runs of the cache set.

A thing should be noticed is, when bch_journal_read() is called, the
linear address of NVDIMM pages is not loaded and initialized yet, it
is necessary to call __bch_journal_nvdimm_init() before reading the jset
from NVDIMM pages.

The code comments added in journal_read_bucket() is noticed by kernel
test robot and Dan Carpenter, it explains why it is safe to only check
!bch_has_feature_nvdimm_meta() condition in the if() statement when
CONFIG_BCACHE_NVM_PAGES is not configured. To avoid confusion from the
bogus warning message from static checking tool.

Signed-off-by: Coly Li <colyli@suse.de>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
---
 drivers/md/bcache/journal.c | 88 ++++++++++++++++++++++++++++++-------
 1 file changed, 71 insertions(+), 17 deletions(-)

diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 7d5c5ed18890..902992be9191 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -34,18 +34,60 @@ static void journal_read_endio(struct bio *bio)
 	closure_put(cl);
 }
 
+static struct jset *__jnl_rd_bkt(struct cache *ca, unsigned int bkt_idx,
+				    unsigned int len, unsigned int offset,
+				    struct closure *cl)
+{
+	sector_t bucket = bucket_to_sector(ca->set, ca->sb.d[bkt_idx]);
+	struct bio *bio = &ca->journal.bio;
+	struct jset *data = ca->set->journal.w[0].data;
+
+	bio_reset(bio);
+	bio->bi_iter.bi_sector	= bucket + offset;
+	bio_set_dev(bio, ca->bdev);
+	bio->bi_iter.bi_size	= len << 9;
+
+	bio->bi_end_io	= journal_read_endio;
+	bio->bi_private = cl;
+	bio_set_op_attrs(bio, REQ_OP_READ, 0);
+	bch_bio_map(bio, data);
+
+	closure_bio_submit(ca->set, bio, cl);
+	closure_sync(cl);
+
+	/* Indeed journal.w[0].data */
+	return data;
+}
+
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+
+static struct jset *__jnl_rd_nvm_bkt(struct cache *ca, unsigned int bkt_idx,
+				     unsigned int len, unsigned int offset)
+{
+	void *jset_addr;
+	struct jset *data;
+
+	jset_addr = bch_nvmpg_offset_to_ptr(ca->sb.d[bkt_idx]) + (offset << 9);
+	data = ca->set->journal.w[0].data;
+
+	memcpy(data, jset_addr, len << 9);
+
+	/* Indeed journal.w[0].data */
+	return data;
+}
+
+#endif /* CONFIG_BCACHE_NVM_PAGES */
+
 static int journal_read_bucket(struct cache *ca, struct list_head *list,
 			       unsigned int bucket_index)
 {
 	struct journal_device *ja = &ca->journal;
-	struct bio *bio = &ja->bio;
 
 	struct journal_replay *i;
-	struct jset *j, *data = ca->set->journal.w[0].data;
+	struct jset *j;
 	struct closure cl;
 	unsigned int len, left, offset = 0;
 	int ret = 0;
-	sector_t bucket = bucket_to_sector(ca->set, ca->sb.d[bucket_index]);
 
 	closure_init_stack(&cl);
 
@@ -55,26 +97,27 @@ static int journal_read_bucket(struct cache *ca, struct list_head *list,
 reread:		left = ca->sb.bucket_size - offset;
 		len = min_t(unsigned int, left, PAGE_SECTORS << JSET_BITS);
 
-		bio_reset(bio);
-		bio->bi_iter.bi_sector	= bucket + offset;
-		bio_set_dev(bio, ca->bdev);
-		bio->bi_iter.bi_size	= len << 9;
-
-		bio->bi_end_io	= journal_read_endio;
-		bio->bi_private = &cl;
-		bio_set_op_attrs(bio, REQ_OP_READ, 0);
-		bch_bio_map(bio, data);
-
-		closure_bio_submit(ca->set, bio, &cl);
-		closure_sync(&cl);
+		if (!bch_has_feature_nvdimm_meta(&ca->sb))
+			j = __jnl_rd_bkt(ca, bucket_index, len, offset, &cl);
+		/*
+		 * If CONFIG_BCACHE_NVM_PAGES is not defined, the feature bit
+		 * BCH_FEATURE_INCOMPAT_NVDIMM_META won't in incompatible
+		 * support feature set, a cache device format with feature bit
+		 * BCH_FEATURE_INCOMPAT_NVDIMM_META will fail much earlier in
+		 * read_super() by bch_has_unknown_incompat_features().
+		 * Therefore when CONFIG_BCACHE_NVM_PAGES is not define, it is
+		 * safe to ignore the bch_has_feature_nvdimm_meta() condition.
+		 */
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+		else
+			j = __jnl_rd_nvm_bkt(ca, bucket_index, len, offset);
+#endif
 
 		/* This function could be simpler now since we no longer write
 		 * journal entries that overlap bucket boundaries; this means
 		 * the start of a bucket will always have a valid journal entry
 		 * if it has any journal entries at all.
 		 */
-
-		j = data;
 		while (len) {
 			struct list_head *where;
 			size_t blocks, bytes = set_bytes(j);
@@ -170,6 +213,8 @@ reread:		left = ca->sb.bucket_size - offset;
 	return ret;
 }
 
+static int __bch_journal_nvdimm_init(struct cache *ca);
+
 int bch_journal_read(struct cache_set *c, struct list_head *list)
 {
 #define read_bucket(b)							\
@@ -188,6 +233,15 @@ int bch_journal_read(struct cache_set *c, struct list_head *list)
 	unsigned int i, l, r, m;
 	uint64_t seq;
 
+	/*
+	 * Linear addresses of NVDIMM pages for journaling is not
+	 * initialized yet, do it before read jset from NVDIMM pages.
+	 */
+	if (bch_has_feature_nvdimm_meta(&ca->sb)) {
+		if (__bch_journal_nvdimm_init(ca) < 0)
+			return -ENXIO;
+	}
+
 	bitmap_zero(bitmap, SB_JOURNAL_BUCKETS);
 	pr_debug("%u journal buckets\n", ca->sb.njournal_buckets);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v13 12/12] bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device
  2021-12-12 17:05 [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Coly Li
                   ` (10 preceding siblings ...)
  2021-12-12 17:05 ` [PATCH v13 11/12] bcache: read jset from NVDIMM pages for journal replay Coly Li
@ 2021-12-12 17:05 ` Coly Li
  2021-12-12 20:20 ` [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Jens Axboe
  12 siblings, 0 replies; 28+ messages in thread
From: Coly Li @ 2021-12-12 17:05 UTC (permalink / raw)
  To: axboe
  Cc: linux-bcache, linux-block, Coly Li, Hannes Reinecke,
	Christoph Hellwig, Dan Williams, Jianpeng Ma, Qiaowei Ren

This patch adds a sysfs interface register_nvdimm_meta to register
NVDIMM meta device. The sysfs interface file only shows up when
CONFIG_BCACHE_NVM_PAGES=y. Then a NVDIMM name space formatted by
bcache-tools can be registered into bcache by e.g.,
  echo /dev/pmem0 > /sys/fs/bcache/register_nvdimm_meta

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
---
 drivers/md/bcache/super.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 45b69ddc9cfa..2b9cde44879b 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2405,10 +2405,18 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 static ssize_t bch_pending_bdevs_cleanup(struct kobject *k,
 					 struct kobj_attribute *attr,
 					 const char *buffer, size_t size);
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+static ssize_t register_nvdimm_meta(struct kobject *k,
+				    struct kobj_attribute *attr,
+				    const char *buffer, size_t size);
+#endif
 
 kobj_attribute_write(register,		register_bcache);
 kobj_attribute_write(register_quiet,	register_bcache);
 kobj_attribute_write(pendings_cleanup,	bch_pending_bdevs_cleanup);
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+kobj_attribute_write(register_nvdimm_meta, register_nvdimm_meta);
+#endif
 
 static bool bch_is_open_backing(dev_t dev)
 {
@@ -2522,6 +2530,24 @@ static void register_device_async(struct async_reg_args *args)
 	queue_delayed_work(system_wq, &args->reg_work, 10);
 }
 
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+static ssize_t register_nvdimm_meta(struct kobject *k, struct kobj_attribute *attr,
+				    const char *buffer, size_t size)
+{
+	ssize_t ret = size;
+
+	struct bch_nvmpg_ns *ns = bch_register_namespace(buffer);
+
+	if (IS_ERR(ns)) {
+		pr_err("register nvdimm namespace %s for meta device failed.\n",
+			buffer);
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+#endif
+
 static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 			       const char *buffer, size_t size)
 {
@@ -2864,6 +2890,9 @@ static int __init bcache_init(void)
 	static const struct attribute *files[] = {
 		&ksysfs_register.attr,
 		&ksysfs_register_quiet.attr,
+#if defined(CONFIG_BCACHE_NVM_PAGES)
+		&ksysfs_register_nvdimm_meta.attr,
+#endif
 		&ksysfs_pendings_cleanup.attr,
 		NULL
 	};
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v13 02/12] bcache: initialize the nvm pages allocator
  2021-12-12 17:05 ` [PATCH v13 02/12] bcache: initialize the nvm pages allocator Coly Li
@ 2021-12-12 19:34   ` Jens Axboe
  2021-12-28  5:29     ` Coly Li
  0 siblings, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2021-12-12 19:34 UTC (permalink / raw)
  To: Coly Li
  Cc: linux-bcache, linux-block, Jianpeng Ma, Randy Dunlap,
	Qiaowei Ren, Christoph Hellwig, Dan Williams, Hannes Reinecke

On 12/12/21 10:05 AM, Coly Li wrote:
> diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c
> new file mode 100644
> index 000000000000..b654bbbda03e
> --- /dev/null
> +++ b/drivers/md/bcache/nvmpg.c
> @@ -0,0 +1,340 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Nvdimm page-buddy allocator
> + *
> + * Copyright (c) 2021, Intel Corporation.
> + * Copyright (c) 2021, Qiaowei Ren <qiaowei.ren@intel.com>.
> + * Copyright (c) 2021, Jianpeng Ma <jianpeng.ma@intel.com>.
> + */
> +
> +#include "bcache.h"
> +#include "nvmpg.h"
> +
> +#include <linux/slab.h>
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +#include <linux/dax.h>
> +#include <linux/pfn_t.h>
> +#include <linux/libnvdimm.h>
> +#include <linux/mm_types.h>
> +#include <linux/err.h>
> +#include <linux/pagemap.h>
> +#include <linux/bitmap.h>
> +#include <linux/blkdev.h>
> +
> +struct bch_nvmpg_set *global_nvmpg_set;
> +
> +void *bch_nvmpg_offset_to_ptr(unsigned long offset)
> +{
> +	int ns_id = BCH_NVMPG_GET_NS_ID(offset);
> +	struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[ns_id];
> +
> +	if (offset == 0)
> +		return NULL;
> +
> +	ns_id = BCH_NVMPG_GET_NS_ID(offset);
> +	ns = global_nvmpg_set->ns_tbl[ns_id];
> +
> +	if (ns)
> +		return (void *)(ns->base_addr + BCH_NVMPG_GET_OFFSET(offset));
> +
> +	pr_err("Invalid ns_id %u\n", ns_id);
> +	return NULL;
> +}
> +
> +unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr)
> +{
> +	int ns_id = ns->ns_id;
> +	unsigned long offset = (unsigned long)(ptr - ns->base_addr);
> +
> +	return BCH_NVMPG_OFFSET(ns_id, offset);
> +}
> +
> +static void release_ns_tbl(struct bch_nvmpg_set *set)
> +{
> +	int i;
> +	struct bch_nvmpg_ns *ns;
> +
> +	for (i = 0; i < BCH_NVMPG_NS_MAX; i++) {
> +		ns = set->ns_tbl[i];
> +		if (ns) {
> +			fs_put_dax(ns->dax_dev);
> +			blkdev_put(ns->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
> +			set->ns_tbl[i] = NULL;
> +			set->attached_ns--;
> +			kfree(ns);
> +		}
> +	}
> +
> +	if (set->attached_ns)
> +		pr_err("unexpected attached_ns: %u\n", set->attached_ns);
> +}
> +
> +static void release_nvmpg_set(struct bch_nvmpg_set *set)
> +{
> +	release_ns_tbl(set);
> +	kfree(set);
> +}
> +
> +/* Namespace 0 contains all meta data of the nvmpg allocation set */
> +static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns)
> +{
> +	struct bch_nvmpg_set_header *set_header;
> +
> +	if (ns->ns_id != 0) {
> +		pr_err("unexpected ns_id %u for first nvmpg namespace.\n",
> +		       ns->ns_id);
> +		return -EINVAL;
> +	}
> +
> +	set_header = bch_nvmpg_offset_to_ptr(ns->sb->set_header_offset);
> +
> +	mutex_lock(&global_nvmpg_set->lock);
> +	global_nvmpg_set->set_header = set_header;
> +	global_nvmpg_set->heads_size = set_header->size;
> +	global_nvmpg_set->heads_used = set_header->used;
> +	mutex_unlock(&global_nvmpg_set->lock);
> +
> +	return 0;
> +}
> +
> +static int attach_nvmpg_set(struct bch_nvmpg_ns *ns)
> +{
> +	struct bch_nvmpg_sb *sb = ns->sb;
> +	int rc = 0;
> +
> +	mutex_lock(&global_nvmpg_set->lock);
> +
> +	if (global_nvmpg_set->ns_tbl[sb->this_ns]) {
> +		pr_err("ns_id %u already attached.\n", ns->ns_id);
> +		rc = -EEXIST;
> +		goto unlock;
> +	}
> +
> +	if (ns->ns_id != 0) {
> +		pr_err("unexpected ns_id %u for first namespace.\n", ns->ns_id);
> +		rc = -EINVAL;
> +		goto unlock;
> +	}
> +
> +	if (global_nvmpg_set->attached_ns > 0) {
> +		pr_err("multiple namespace attaching not supported yet\n");
> +		rc = -EOPNOTSUPP;
> +		goto unlock;
> +	}
> +
> +	if ((global_nvmpg_set->attached_ns + 1) > sb->total_ns) {
> +		pr_err("namespace counters error: attached %u > total %u\n",
> +		       global_nvmpg_set->attached_ns,
> +		       global_nvmpg_set->total_ns);
> +		rc = -EINVAL;
> +		goto unlock;
> +	}
> +
> +	memcpy(global_nvmpg_set->set_uuid, sb->set_uuid, 16);
> +	global_nvmpg_set->ns_tbl[sb->this_ns] = ns;
> +	global_nvmpg_set->attached_ns++;
> +	global_nvmpg_set->total_ns = sb->total_ns;
> +
> +unlock:
> +	mutex_unlock(&global_nvmpg_set->lock);
> +	return rc;
> +}
> +
> +static int read_nvdimm_meta_super(struct block_device *bdev,
> +				  struct bch_nvmpg_ns *ns)
> +{
> +	struct page *page;
> +	struct bch_nvmpg_sb *sb;
> +	uint64_t expected_csum = 0;
> +	int r;
> +
> +	page = read_cache_page_gfp(bdev->bd_inode->i_mapping,
> +				BCH_NVMPG_SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL);
> +
> +	if (IS_ERR(page))
> +		return -EIO;
> +
> +	sb = (struct bch_nvmpg_sb *)
> +	     (page_address(page) + offset_in_page(BCH_NVMPG_SB_OFFSET));
> +
> +	r = -EINVAL;
> +	expected_csum = csum_set(sb);
> +	if (expected_csum != sb->csum) {
> +		pr_info("csum is not match with expected one\n");

"Checksum mismatch"

would be more correct english, should it print the checksums as well?

> +		goto put_page;
> +	}
> +
> +	if (memcmp(sb->magic, bch_nvmpg_magic, 16)) {
> +		pr_info("invalid bch_nvmpg_magic\n");
> +		goto put_page;
> +	}
> +
> +	if (sb->sb_offset !=
> +	    BCH_NVMPG_OFFSET(sb->this_ns, BCH_NVMPG_SB_OFFSET)) {
> +		pr_info("invalid superblock offset 0x%llx\n", sb->sb_offset);
> +		goto put_page;
> +	}
> +
> +	r = -EOPNOTSUPP;
> +	if (sb->total_ns != 1) {
> +		pr_info("multiple name space not supported yet.\n");
> +		goto put_page;
> +	}

Please use namespace consistently.

> +struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
> +{
> +	struct bch_nvmpg_ns *ns = NULL;
> +	struct bch_nvmpg_sb *sb = NULL;
> +	char buf[BDEVNAME_SIZE];
> +	struct block_device *bdev;
> +	pgoff_t pgoff;
> +	int id, err;
> +	char *path;
> +	long dax_ret = 0;
> +
> +	path = kstrndup(dev_path, 512, GFP_KERNEL);
> +	if (!path) {
> +		pr_err("kstrndup failed\n");
> +		return ERR_PTR(-ENOMEM);
> +	}

Really don't think we need that piece of information. Same for a lot of
other places, you have a ton of pr_err() stuff that looks mostly like
debugging.

> +	ns->page_size = sb->page_size;
> +	ns->pages_offset = sb->pages_offset;
> +	ns->pages_total = sb->pages_total;
> +	ns->sb = sb;
> +	ns->free = 0;
> +	ns->bdev = bdev;
> +	ns->set = global_nvmpg_set;
> +
> +	err = attach_nvmpg_set(ns);
> +	if (err < 0)
> +		goto free_ns;
> +
> +	mutex_init(&ns->lock);
> +
> +	err = init_nvmpg_set_header(ns);
> +	if (err < 0)
> +		goto free_ns;

Does this error path need to un-attach?
> +int __init bch_nvmpg_init(void)
> +{
> +	global_nvmpg_set = kzalloc(sizeof(*global_nvmpg_set), GFP_KERNEL);
> +	if (!global_nvmpg_set)
> +		return -ENOMEM;
> +
> +	global_nvmpg_set->total_ns = 0;
> +	mutex_init(&global_nvmpg_set->lock);
> +
> +	pr_info("bcache nvm init\n");

Another useless pr debug print, just get rid of it (and others).

> +void bch_nvmpg_exit(void)
> +{
> +	release_nvmpg_set(global_nvmpg_set);
> +	pr_info("bcache nvm exit\n");
> +}

Ditto


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v13 03/12] bcache: initialization of the buddy
  2021-12-12 17:05 ` [PATCH v13 03/12] bcache: initialization of the buddy Coly Li
@ 2021-12-12 20:10   ` Jens Axboe
  2021-12-28  5:29     ` Coly Li
  2021-12-15 16:20   ` Dan Carpenter
  1 sibling, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2021-12-12 20:10 UTC (permalink / raw)
  To: Coly Li
  Cc: linux-bcache, linux-block, Jianpeng Ma, kernel test robot,
	Dan Carpenter, Qiaowei Ren, Christoph Hellwig, Dan Williams,
	Hannes Reinecke

> diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c
> index b654bbbda03e..2b70ee4a6028 100644
> --- a/drivers/md/bcache/nvmpg.c
> +++ b/drivers/md/bcache/nvmpg.c
> @@ -50,6 +50,36 @@ unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr)
>  	return BCH_NVMPG_OFFSET(ns_id, offset);
>  }
>  
> +static struct page *bch_nvmpg_va_to_pg(void *addr)
> +{
> +	return virt_to_page(addr);
> +}

What's the purpose of this helper?

> +static inline void reserve_nvmpg_pages(struct bch_nvmpg_ns *ns,
> +				       pgoff_t pgoff, u64 nr)
> +{
> +	while (nr > 0) {
> +		unsigned int num = nr > UINT_MAX ? UINT_MAX : nr;

Surely UINT_MAX isn't anywhere near a valid limit?

> @@ -76,10 +110,73 @@ static void release_nvmpg_set(struct bch_nvmpg_set *set)
>  	kfree(set);
>  }
>  
> +static int validate_recs(int ns_id,
> +			 struct bch_nvmpg_head *head,
> +			 struct bch_nvmpg_recs *recs)
> +{
> +	if (memcmp(recs->magic, bch_nvmpg_recs_magic, 16)) {
> +		pr_err("Invalid bch_nvmpg_recs magic\n");
> +		return -EINVAL;
> +	}
> +
> +	if (memcmp(recs->uuid, head->uuid, 16)) {
> +		pr_err("Invalid bch_nvmpg_recs uuid\n");
> +		return -EINVAL;
> +	}
> +
> +	if (recs->head_offset !=
> +	    bch_nvmpg_ptr_to_offset(global_nvmpg_set->ns_tbl[ns_id], head)) {
> +		pr_err("Invalid recs head_offset\n");
> +		return -EINVAL;
> +	}

Same comments here on the frivilous error messaging, other places in
this file too. Check all the other patches as well, please.

>  /* Namespace 0 contains all meta data of the nvmpg allocation set */
>  static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns)
>  {
>  	struct bch_nvmpg_set_header *set_header;
> +	struct bch_nvmpg_recs *sys_recs;
> +	int i, j, used = 0, rc = 0;
>  
>  	if (ns->ns_id != 0) {
>  		pr_err("unexpected ns_id %u for first nvmpg namespace.\n",
> @@ -93,9 +190,83 @@ static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns)
>  	global_nvmpg_set->set_header = set_header;
>  	global_nvmpg_set->heads_size = set_header->size;
>  	global_nvmpg_set->heads_used = set_header->used;
> +
> +	/* Reserve the used space from buddy allocator */
> +	reserve_nvmpg_pages(ns, 0, div_u64(ns->pages_offset, ns->page_size));
> +
> +	sys_recs = ns->base_addr + BCH_NVMPG_SYSRECS_OFFSET;
> +	for (i = 0; i < set_header->size; i++) {
> +		struct bch_nvmpg_head *head;
> +
> +		head = &set_header->heads[i];
> +		if (head->state == BCH_NVMPG_HD_STAT_FREE)
> +			continue;
> +
> +		used++;
> +		if (used > global_nvmpg_set->heads_size) {
> +			pr_err("used heads %d > heads size %d.\n",
> +			       used, global_nvmpg_set->heads_size);
> +			goto unlock;
> +		}
> +
> +		for (j = 0; j < BCH_NVMPG_NS_MAX; j++) {
> +			struct bch_nvmpg_recs *recs;
> +
> +			recs = bch_nvmpg_offset_to_ptr(head->recs_offset[j]);
> +
> +			/* Iterate the recs list */
> +			while (recs) {
> +				rc = validate_recs(j, head, recs);
> +				if (rc < 0)
> +					goto unlock;
> +
> +				rc = reserve_nvmpg_recs(recs);
> +				if (rc < 0)
> +					goto unlock;
> +
> +				bitmap_set(ns->recs_bitmap, recs - sys_recs, 1);
> +				recs = bch_nvmpg_offset_to_ptr(recs->next_offset);
> +			}
> +		}
> +	}
> +unlock:
>  	mutex_unlock(&global_nvmpg_set->lock);
> +	return rc;
> +}
>  
> -	return 0;
> +static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns)
> +{
> +	unsigned int start, end, pages;
> +	int i;
> +	struct page *page;
> +	pgoff_t pgoff_start;
> +
> +	bitmap_for_each_clear_region(ns->pages_bitmap,
> +				     start, end, 0, ns->pages_total) {
> +		pgoff_start = start;
> +		pages = end - start;
> +
> +		while (pages) {
> +			void *addr;
> +
> +			for (i = BCH_MAX_ORDER - 1; i >= 0; i--) {
> +				if ((pgoff_start % (1L << i) == 0) &&
> +				    (pages >= (1L << i)))
> +					break;
> +			}
> +
> +			addr = bch_nvmpg_pgoff_to_ptr(ns, pgoff_start);
> +			page = bch_nvmpg_va_to_pg(addr);
> +			set_page_private(page, i);
> +			page->index = pgoff_start;
> +			__SetPageBuddy(page);
> +			list_add((struct list_head *)&page->zone_device_data,
> +				 &ns->free_area[i]);
> +
> +			pgoff_start += 1L << i;
> +			pages -= 1L << i;
> +		}
> +	}
>  }
>  
>  static int attach_nvmpg_set(struct bch_nvmpg_ns *ns)
> @@ -200,7 +371,7 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
>  	char buf[BDEVNAME_SIZE];
>  	struct block_device *bdev;
>  	pgoff_t pgoff;
> -	int id, err;
> +	int id, i, err;
>  	char *path;
>  	long dax_ret = 0;
>  
> @@ -304,13 +475,48 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
>  
>  	mutex_init(&ns->lock);
>  
> +	/*
> +	 * parameters of bitmap_set/clear are unsigned int.
> +	 * Given currently size of nvm is far from exceeding this limit,
> +	 * so only add a WARN_ON message.
> +	 */
> +	WARN_ON(BITS_TO_LONGS(ns->pages_total) > UINT_MAX);

Does this really need to be a WARN_ON()? Looks more like an -EINVAL
condition.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v13 04/12] bcache: bch_nvmpg_alloc_pages() of the buddy
  2021-12-12 17:05 ` [PATCH v13 04/12] bcache: bch_nvmpg_alloc_pages() " Coly Li
@ 2021-12-12 20:14   ` Jens Axboe
  2021-12-28  5:29     ` Coly Li
  0 siblings, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2021-12-12 20:14 UTC (permalink / raw)
  To: Coly Li
  Cc: linux-bcache, linux-block, Jianpeng Ma, Qiaowei Ren,
	Christoph Hellwig, Dan Williams, Hannes Reinecke

On 12/12/21 10:05 AM, Coly Li wrote:
> +/* If not found, it will create if create == true */
> +static struct bch_nvmpg_head *find_nvmpg_head(const char *uuid, bool create)
> +{
> +	struct bch_nvmpg_set_header *set_header = global_nvmpg_set->set_header;
> +	struct bch_nvmpg_head *head = NULL;
> +	int i;
> +
> +	if (set_header == NULL)
> +		goto out;
> +
> +	for (i = 0; i < set_header->size; i++) {
> +		struct bch_nvmpg_head *h = &set_header->heads[i];
> +
> +		if (h->state != BCH_NVMPG_HD_STAT_ALLOC)
> +			continue;
> +
> +		if (!memcmp(uuid, h->uuid, 16)) {
> +			head = h;
> +			break;
> +		}
> +	}
> +
> +	if (!head && create) {
> +		u32 used = set_header->used;
> +
> +		if (set_header->size > used) {
> +			head = &set_header->heads[used];
> +			memset(head, 0, sizeof(struct bch_nvmpg_head));
> +			head->state = BCH_NVMPG_HD_STAT_ALLOC;
> +			memcpy(head->uuid, uuid, 16);
> +			global_nvmpg_set->heads_used++;
> +			set_header->used++;
> +		} else
> +			pr_info("No free bch_nvmpg_head\n");
> +	}

Use {} consistently. Again probably just some printk that should go
away.

> +static struct bch_nvmpg_recs *find_nvmpg_recs(struct bch_nvmpg_ns *ns,
> +					      struct bch_nvmpg_head *head,
> +					      bool create)
> +{
> +	int ns_id = ns->sb->this_ns;
> +	struct bch_nvmpg_recs *prev_recs = NULL, *recs = NULL;
> +
> +	recs = bch_nvmpg_offset_to_ptr(head->recs_offset[ns_id]);
> +
> +	/* If create=false, we return recs[nr] */
> +	if (!create)
> +		return recs;

Would this be cleaner to handle in the caller?

> +static void add_nvmpg_rec(struct bch_nvmpg_ns *ns,
> +			  struct bch_nvmpg_recs *recs,
> +			  unsigned long nvmpg_offset,
> +			  int order)
> +{
> +	int i, ns_id;
> +	unsigned long pgoff;
> +
> +	pgoff = bch_nvmpg_offset_to_pgoff(nvmpg_offset);
> +	ns_id = ns->sb->this_ns;
> +
> +	for (i = 0; i < recs->size; i++) {
> +		if (recs->recs[i].pgoff == 0) {
> +			recs->recs[i].pgoff = pgoff;
> +			recs->recs[i].order = order;
> +			recs->recs[i].ns_id = ns_id;
> +			recs->used++;
> +			break;
> +		}
> +	}
> +	BUG_ON(i == recs->size);

No BUG_ON's, please. It only truly belongs in core code for cases where
error handling isn't possible, does not apply here.

> diff --git a/drivers/md/bcache/nvmpg.h b/drivers/md/bcache/nvmpg.h
> index 55778d4db7da..d03f3241b45a 100644
> --- a/drivers/md/bcache/nvmpg.h
> +++ b/drivers/md/bcache/nvmpg.h
> @@ -76,6 +76,9 @@ struct bch_nvmpg_set {
>  /* Indicate which field in bch_nvmpg_sb to be updated */
>  #define BCH_NVMPG_TOTAL_NS	0	/* total_ns */
>  
> +#define BCH_PGOFF_TO_KVADDR(pgoff)					\
> +	((void *)((unsigned long)(pgoff) << PAGE_SHIFT))

Pretty sure we have a general kernel helper for this, better to use that
rather than duplicate it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v13 05/12] bcache: bch_nvmpg_free_pages() of the buddy allocator
  2021-12-12 17:05 ` [PATCH v13 05/12] bcache: bch_nvmpg_free_pages() of the buddy allocator Coly Li
@ 2021-12-12 20:16   ` Jens Axboe
  2021-12-28  5:29     ` Coly Li
  2022-02-21 12:36   ` yukuai (C)
  1 sibling, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2021-12-12 20:16 UTC (permalink / raw)
  To: Coly Li
  Cc: linux-bcache, linux-block, Jianpeng Ma, Qiaowei Ren,
	Christoph Hellwig, Dan Williams, Hannes Reinecke

On 12/12/21 10:05 AM, Coly Li wrote:
> diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c
> index a920779eb548..8ce0c4389b42 100644
> --- a/drivers/md/bcache/nvmpg.c
> +++ b/drivers/md/bcache/nvmpg.c
> @@ -248,6 +248,57 @@ static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns)
>  	return rc;
>  }
>  
> +static void __free_space(struct bch_nvmpg_ns *ns, unsigned long nvmpg_offset,
> +			 int order)
> +{
> +	unsigned long add_pages = (1L << order);
> +	pgoff_t pgoff;
> +	struct page *page;
> +	void *va;
> +
> +	if (nvmpg_offset == 0) {
> +		pr_err("free pages on offset 0\n");
> +		return;
> +	}
> +
> +	page = bch_nvmpg_va_to_pg(bch_nvmpg_offset_to_ptr(nvmpg_offset));
> +	WARN_ON((!page) || (page->private != order));
> +	pgoff = page->index;
> +
> +	while (order < BCH_MAX_ORDER - 1) {
> +		struct page *buddy_page;
> +
> +		pgoff_t buddy_pgoff = pgoff ^ (1L << order);
> +		pgoff_t parent_pgoff = pgoff & ~(1L << order);
> +
> +		if ((parent_pgoff + (1L << (order + 1)) > ns->pages_total))
> +			break;
> +
> +		va = bch_nvmpg_pgoff_to_ptr(ns, buddy_pgoff);
> +		buddy_page = bch_nvmpg_va_to_pg(va);
> +		WARN_ON(!buddy_page);
> +
> +		if (PageBuddy(buddy_page) && (buddy_page->private == order)) {
> +			list_del((struct list_head *)&buddy_page->zone_device_data);
> +			__ClearPageBuddy(buddy_page);
> +			pgoff = parent_pgoff;
> +			order++;
> +			continue;
> +		}
> +		break;
> +	}
> +
> +	va = bch_nvmpg_pgoff_to_ptr(ns, pgoff);
> +	page = bch_nvmpg_va_to_pg(va);
> +	WARN_ON(!page);
> +	list_add((struct list_head *)&page->zone_device_data,
> +		 &ns->free_area[order]);
> +	page->index = pgoff;
> +	set_page_private(page, order);
> +	__SetPageBuddy(page);
> +	ns->free += add_pages;
> +}

There are 3 WARN_ON's in here. If you absolutely must use a WARN_ON,
then make them WARN_ON_ONCE(). Ditto in other spots in this patch.

> +void bch_nvmpg_free_pages(unsigned long nvmpg_offset, int order,
> +			  const char *uuid)
> +{
> +	struct bch_nvmpg_ns *ns;
> +	struct bch_nvmpg_head *head;
> +	struct bch_nvmpg_recs *recs;
> +	int r;
> +
> +	mutex_lock(&global_nvmpg_set->lock);
> +
> +	ns = global_nvmpg_set->ns_tbl[BCH_NVMPG_GET_NS_ID(nvmpg_offset)];
> +	if (!ns) {
> +		pr_err("can't find namespace by given kaddr from namespace\n");
> +		goto unlock;
> +	}
> +
> +	head = find_nvmpg_head(uuid, false);
> +	if (!head) {
> +		pr_err("can't found bch_nvmpg_head by uuid\n");
> +		goto unlock;
> +	}
> +
> +	recs = find_nvmpg_recs(ns, head, false);
> +	if (!recs) {
> +		pr_err("can't find bch_nvmpg_recs by uuid\n");
> +		goto unlock;
> +	}
> +
> +	r = remove_nvmpg_rec(recs, ns->sb->this_ns, nvmpg_offset, order);
> +	if (r < 0) {
> +		pr_err("can't find bch_nvmpg_rec\n");
> +		goto unlock;
> +	}
> +
> +	__free_space(ns, nvmpg_offset, order);
> +
> +unlock:
> +	mutex_unlock(&global_nvmpg_set->lock);
> +}

Again tons of useless error prints. Make them return a proper error
instad of just making things void...

> @@ -686,6 +835,7 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
>  	ns->pages_offset = sb->pages_offset;
>  	ns->pages_total = sb->pages_total;
>  	ns->sb = sb;
> +	/* increase by __free_space() */
>  	ns->free = 0;
>  	ns->bdev = bdev;
>  	ns->set = global_nvmpg_set;

Does that hunk belong in here?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v13 06/12] bcache: get recs list head for allocated pages by specific uuid
  2021-12-12 17:05 ` [PATCH v13 06/12] bcache: get recs list head for allocated pages by specific uuid Coly Li
@ 2021-12-12 20:18   ` Jens Axboe
  0 siblings, 0 replies; 28+ messages in thread
From: Jens Axboe @ 2021-12-12 20:18 UTC (permalink / raw)
  To: Coly Li
  Cc: linux-bcache, linux-block, Jianpeng Ma, Qiaowei Ren,
	Hannes Reinecke, Christoph Hellwig, Dan Williams

On 12/12/21 10:05 AM, Coly Li wrote:
> From: Jianpeng Ma <jianpeng.ma@intel.com>
> 
> This patch implements bch_get_nvmpg_head() of the buddy allocator
> to be used to get recs list head for allocated pages by specific
> uuid. Then the requester (owner) can find all previous allocated
> nvdimm pages by iterating the recs list.

This patch looks like it should just get dropped.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal
  2021-12-12 17:05 [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Coly Li
                   ` (11 preceding siblings ...)
  2021-12-12 17:05 ` [PATCH v13 12/12] bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device Coly Li
@ 2021-12-12 20:20 ` Jens Axboe
  2021-12-28  5:29   ` Coly Li
  12 siblings, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2021-12-12 20:20 UTC (permalink / raw)
  To: Coly Li
  Cc: linux-bcache, linux-block, Christoph Hellwig, Dan Williams,
	Hannes Reinecke, Jianpeng Ma, Qiaowei Ren, Ying Huang

On 12/12/21 10:05 AM, Coly Li wrote:
> Hi Jens,
> 
> This is the v12 effort the enabling NVDIMM for bcache journal, the code
> is under testing for months and quite stable now. Please consider to
> take them for Linux v5.17 merge window.

As I've mentioned before, this really needs some thorough review from
people not associated with the project. I spent a bit of time on the
first section of the patches, and it doesn't look ready to me.

Do you have any tests for this new code? Any that exercise the error
paths?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v13 03/12] bcache: initialization of the buddy
  2021-12-12 17:05 ` [PATCH v13 03/12] bcache: initialization of the buddy Coly Li
  2021-12-12 20:10   ` Jens Axboe
@ 2021-12-15 16:20   ` Dan Carpenter
  2021-12-28  5:12     ` Coly Li
  1 sibling, 1 reply; 28+ messages in thread
From: Dan Carpenter @ 2021-12-15 16:20 UTC (permalink / raw)
  To: Coly Li
  Cc: axboe, linux-bcache, linux-block, Jianpeng Ma, kernel test robot,
	Qiaowei Ren, Christoph Hellwig, Dan Williams, Hannes Reinecke

On Mon, Dec 13, 2021 at 01:05:43AM +0800, Coly Li wrote:
> +	/*
> +	 * parameters of bitmap_set/clear are unsigned int.
> +	 * Given currently size of nvm is far from exceeding this limit,
> +	 * so only add a WARN_ON message.
> +	 */
> +	WARN_ON(BITS_TO_LONGS(ns->pages_total) > UINT_MAX);
> +	ns->pages_bitmap = kvcalloc(BITS_TO_LONGS(ns->pages_total),
> +				    sizeof(unsigned long), GFP_KERNEL);

BITS_TO_LONGS() has a potential integer overflow if we're talking about
truly giant numbers.  It will return zero if ns->pages_total is more
than U64_MAX - 64.  In that case kvcalloc() will return ZERO_SIZE_PTR.

Btw, kvcalloc() will never let you allocate more than INT_MAX.  It will
trigger a WARN_ONCE().  If people want to allocate more than 2GB of RAM
then they have to plan ahead of time and use vmalloc().

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v13 03/12] bcache: initialization of the buddy
  2021-12-15 16:20   ` Dan Carpenter
@ 2021-12-28  5:12     ` Coly Li
  0 siblings, 0 replies; 28+ messages in thread
From: Coly Li @ 2021-12-28  5:12 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: axboe, linux-bcache, linux-block, Jianpeng Ma, kernel test robot,
	Qiaowei Ren, Christoph Hellwig, Dan Williams, Hannes Reinecke

On 12/16/21 12:20 AM, Dan Carpenter wrote:
> On Mon, Dec 13, 2021 at 01:05:43AM +0800, Coly Li wrote:
>> +	/*
>> +	 * parameters of bitmap_set/clear are unsigned int.
>> +	 * Given currently size of nvm is far from exceeding this limit,
>> +	 * so only add a WARN_ON message.
>> +	 */
>> +	WARN_ON(BITS_TO_LONGS(ns->pages_total) > UINT_MAX);
>> +	ns->pages_bitmap = kvcalloc(BITS_TO_LONGS(ns->pages_total),
>> +				    sizeof(unsigned long), GFP_KERNEL);
> BITS_TO_LONGS() has a potential integer overflow if we're talking about
> truly giant numbers.  It will return zero if ns->pages_total is more
> than U64_MAX - 64.  In that case kvcalloc() will return ZERO_SIZE_PTR.
>
> Btw, kvcalloc() will never let you allocate more than INT_MAX.  It will
> trigger a WARN_ONCE().  If people want to allocate more than 2GB of RAM
> then they have to plan ahead of time and use vmalloc().
>

Hi Dan,

Thanks for the informative hint. I discussed with Qiaowen and Jianpeng, 
we plan to use an extent tree to replace current bitmap to record the 
free and allocated areas on the NVDIMM namespace. Which may have more 
efficient memory usage and avoid such size limitation.

Sorry for replying late, I was in travel and followed a sick for whole 
week. Again, thank you for taking time to look into this, and please 
continue next time :-)

Coly Li

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal
  2021-12-12 20:20 ` [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Jens Axboe
@ 2021-12-28  5:29   ` Coly Li
  0 siblings, 0 replies; 28+ messages in thread
From: Coly Li @ 2021-12-28  5:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-bcache, linux-block, Christoph Hellwig, Dan Williams,
	Hannes Reinecke, Jianpeng Ma, Qiaowei Ren, Ying Huang

On 12/13/21 4:20 AM, Jens Axboe wrote:
> On 12/12/21 10:05 AM, Coly Li wrote:
>> Hi Jens,
>>
>> This is the v12 effort the enabling NVDIMM for bcache journal, the code
>> is under testing for months and quite stable now. Please consider to
>> take them for Linux v5.17 merge window.
> As I've mentioned before, this really needs some thorough review from
> people not associated with the project. I spent a bit of time on the
> first section of the patches, and it doesn't look ready to me.

Copied. Considering the following part of the whole work is on the way, 
I will not submit the journaling part again.

The following plan is to finish the whole work and ask more people to 
review the whole patch set. And on the same time, we will look for some 
real workload for further testing and result estimate.

It would take several merge windows before next submission to you. Thank 
you for the time to looking into them and providing constructive 
suggestions.

Coly Li



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v13 02/12] bcache: initialize the nvm pages allocator
  2021-12-12 19:34   ` Jens Axboe
@ 2021-12-28  5:29     ` Coly Li
  0 siblings, 0 replies; 28+ messages in thread
From: Coly Li @ 2021-12-28  5:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-bcache, linux-block, Jianpeng Ma, Randy Dunlap,
	Qiaowei Ren, Christoph Hellwig, Dan Williams, Hannes Reinecke

Hi Jens,

Thank you for review the patches. I was in travel for a week when you 
replied the email and after the travel I was sick for one more week. 
Just being able to sit in front of my laptop now.

I reply inline bellow your comments.

On 12/13/21 3:34 AM, Jens Axboe wrote:
> On 12/12/21 10:05 AM, Coly Li wrote:
>> diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c
>> new file mode 100644
>> index 000000000000..b654bbbda03e
>> --- /dev/null
>> +++ b/drivers/md/bcache/nvmpg.c
>> @@ -0,0 +1,340 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Nvdimm page-buddy allocator
>> + *
>> + * Copyright (c) 2021, Intel Corporation.
>> + * Copyright (c) 2021, Qiaowei Ren<qiaowei.ren@intel.com>.
>> + * Copyright (c) 2021, Jianpeng Ma<jianpeng.ma@intel.com>.
>> + */
>> +
>> +#include "bcache.h"
>> +#include "nvmpg.h"
>> +
>> +#include <linux/slab.h>
>> +#include <linux/list.h>
>> +#include <linux/mutex.h>
>> +#include <linux/dax.h>
>> +#include <linux/pfn_t.h>
>> +#include <linux/libnvdimm.h>
>> +#include <linux/mm_types.h>
>> +#include <linux/err.h>
>> +#include <linux/pagemap.h>
>> +#include <linux/bitmap.h>
>> +#include <linux/blkdev.h>
>> +
[snipped]
>> +	expected_csum = csum_set(sb);
>> +	if (expected_csum != sb->csum) {
>> +		pr_info("csum is not match with expected one\n");
> "Checksum mismatch"

Copied, it will be fixed.


> would be more correct english, should it print the checksums as well?

This is the style following bcache code to report bad check sum, like,
super.c:718:                pr_warn("bad csum reading priorities\n");
journal.c:141:                pr_info("%u: bad csum, %zu bytes, offset 
%u\n",

There might be fine to only notice the bad csum status.

>> +		goto put_page;
>> +	}
>> +
>> +	if (memcmp(sb->magic, bch_nvmpg_magic, 16)) {
>> +		pr_info("invalid bch_nvmpg_magic\n");
>> +		goto put_page;
>> +	}
>> +
>> +	if (sb->sb_offset !=
>> +	    BCH_NVMPG_OFFSET(sb->this_ns, BCH_NVMPG_SB_OFFSET)) {
>> +		pr_info("invalid superblock offset 0x%llx\n", sb->sb_offset);
>> +		goto put_page;
>> +	}
>> +
>> +	r = -EOPNOTSUPP;
>> +	if (sb->total_ns != 1) {
>> +		pr_info("multiple name space not supported yet.\n");
>> +		goto put_page;
>> +	}
> Please use namespace consistently.

Copied, it will be fixed.

>> +struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
>> +{
>> +	struct bch_nvmpg_ns *ns = NULL;
>> +	struct bch_nvmpg_sb *sb = NULL;
>> +	char buf[BDEVNAME_SIZE];
>> +	struct block_device *bdev;
>> +	pgoff_t pgoff;
>> +	int id, err;
>> +	char *path;
>> +	long dax_ret = 0;
>> +
>> +	path = kstrndup(dev_path, 512, GFP_KERNEL);
>> +	if (!path) {
>> +		pr_err("kstrndup failed\n");
>> +		return ERR_PTR(-ENOMEM);
>> +	}
> Really don't think we need that piece of information. Same for a lot of
> other places, you have a ton of pr_err() stuff that looks mostly like
> debugging.

Hmm, I think I need your guidance here. When I review the code, IMHO the 
error message is necessary but don't print them in run time code. Such 
place is in NVDIMM namespace registration code path, therefore I 
accepted it. This is what I was trained for long time....

Could you please give me some hint to judge whether a printk message 
should be avoided? Then I will use it in future code review and my own 
coding work.


>> +	ns->page_size = sb->page_size;
>> +	ns->pages_offset = sb->pages_offset;
>> +	ns->pages_total = sb->pages_total;
>> +	ns->sb = sb;
>> +	ns->free = 0;
>> +	ns->bdev = bdev;
>> +	ns->set = global_nvmpg_set;
>> +
>> +	err = attach_nvmpg_set(ns);
>> +	if (err < 0)
>> +		goto free_ns;
>> +
>> +	mutex_init(&ns->lock);
>> +
>> +	err = init_nvmpg_set_header(ns);
>> +	if (err < 0)
>> +		goto free_ns;
> Does this error path need to un-attach?

Detach or un-attach is not implemented in this series, because it is 
related to how to flush the whole NVDIMM namespace during un-attach. It 
is planed to posted with the NVDIMM support for Bcache B+tree node.

For current bcache code only with SSD, if a flush request received from 
upper layer, it is synchronously sent to backing device and 
asynchronously sent to cache device with journal flush. But for NVDIMM 
namespace, such operation is to flush last level cache, we have 3 
candidate methods to handle,
1) flush cache for the whole dax mapped linear address range (like 
nvdimm_flush() in libnvdimm)
2) flush cache for the only allocated NVDIMM ranges in the nvmpg 
allocator for selected requesters (uuids).
3) maintain a list of dirty NVDIMM pages, and only flush cache for these 
linear address ranges.

We are still testing the performance and not make decision yet. So the 
detach code path is not posted in this series and we un-attach the 
NVDIMM namespace by unloading bcache kernel module currently.

>> +int __init bch_nvmpg_init(void)
>> +{
>> +	global_nvmpg_set = kzalloc(sizeof(*global_nvmpg_set), GFP_KERNEL);
>> +	if (!global_nvmpg_set)
>> +		return -ENOMEM;
>> +
>> +	global_nvmpg_set->total_ns = 0;
>> +	mutex_init(&global_nvmpg_set->lock);
>> +
>> +	pr_info("bcache nvm init\n");
> Another useless pr debug print, just get rid of it (and others).

Copied, it will be dropped.

>> +void bch_nvmpg_exit(void)
>> +{
>> +	release_nvmpg_set(global_nvmpg_set);
>> +	pr_info("bcache nvm exit\n");
>> +}
> Ditto

Copied.

Thanks for your comments.

Coly Li

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v13 03/12] bcache: initialization of the buddy
  2021-12-12 20:10   ` Jens Axboe
@ 2021-12-28  5:29     ` Coly Li
  0 siblings, 0 replies; 28+ messages in thread
From: Coly Li @ 2021-12-28  5:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-bcache, linux-block, Jianpeng Ma, kernel test robot,
	Dan Carpenter, Qiaowei Ren, Christoph Hellwig, Dan Williams,
	Hannes Reinecke

On 12/13/21 4:10 AM, Jens Axboe wrote:
>> diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c
>> index b654bbbda03e..2b70ee4a6028 100644
>> --- a/drivers/md/bcache/nvmpg.c
>> +++ b/drivers/md/bcache/nvmpg.c
>> @@ -50,6 +50,36 @@ unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr)
>>   	return BCH_NVMPG_OFFSET(ns_id, offset);
>>   }
>>   
>> +static struct page *bch_nvmpg_va_to_pg(void *addr)
>> +{
>> +	return virt_to_page(addr);
>> +}
> What's the purpose of this helper?

This is used for the simplified buddy-like allocator. When releasing a 
bulk of continuous NVDIMM pages, what we record in nvmpg metadata is the 
offset value (combined with NVDIMM namespace ID and the in-namespace 
offset). The offset value can be converted into a DAX mapped linear 
address of the allocated continuous NVDIMM pages, bch_nvmpg_va_to_pg() 
is to find the header page and release the continuous pages into the 
page list of specific order in the buddy-like pages allocator.

A typical usage for bch_nvmpg_va_pg() is in bch_nvmpg_alloc_pages(), 
after the buddy-like allocator chooses a bulk of continuous pages, what 
we have is only the in-namespace offset of the first page, we need to 
find the corresponding struct page by,
     bch_nvmpg_va_pg(dax-mapped-base-address + header-page-index << 
page-size-bits)
Then we set buddy-like allocator related information into the header page.

>> +static inline void reserve_nvmpg_pages(struct bch_nvmpg_ns *ns,
>> +				       pgoff_t pgoff, u64 nr)
>> +{
>> +	while (nr > 0) {
>> +		unsigned int num = nr > UINT_MAX ? UINT_MAX : nr;
> Surely UINT_MAX isn't anywhere near a valid limit?

Hmm, do you mean whether UINT_MAX is too large, or too small?

The while() loop here I took it as a paranoid oversize handling and no 
real effect indeed, so I was fine with it as the first version. The idea 
method should be an extent tree to record all the reserved area which 
may save a lot of system memory space than bitmap does.

I will suggest Qiaowen and Jianpeng to use extent tree to record the 
free and allocated areas from the NVDIMM namespace and drop the bitmap 
method now.


>> @@ -76,10 +110,73 @@ static void release_nvmpg_set(struct bch_nvmpg_set *set)
>>   	kfree(set);
>>   }
>>   
>> +static int validate_recs(int ns_id,
>> +			 struct bch_nvmpg_head *head,
>> +			 struct bch_nvmpg_recs *recs)
>> +{
>> +	if (memcmp(recs->magic, bch_nvmpg_recs_magic, 16)) {
>> +		pr_err("Invalid bch_nvmpg_recs magic\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (memcmp(recs->uuid, head->uuid, 16)) {
>> +		pr_err("Invalid bch_nvmpg_recs uuid\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (recs->head_offset !=
>> +	    bch_nvmpg_ptr_to_offset(global_nvmpg_set->ns_tbl[ns_id], head)) {
>> +		pr_err("Invalid recs head_offset\n");
>> +		return -EINVAL;
>> +	}
> Same comments here on the frivilous error messaging, other places in
> this file too. Check all the other patches as well, please.

This is the error message style we try to follow from bcache code, and 
IMHO it is necessary. Any of the above error condition means meta data 
might be corrupted, which is critical.

[snipped]
>>   static int attach_nvmpg_set(struct bch_nvmpg_ns *ns)
>> @@ -200,7 +371,7 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
>>   	char buf[BDEVNAME_SIZE];
>>   	struct block_device *bdev;
>>   	pgoff_t pgoff;
>> -	int id, err;
>> +	int id, i, err;
>>   	char *path;
>>   	long dax_ret = 0;
>>   
>> @@ -304,13 +475,48 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
>>   
>>   	mutex_init(&ns->lock);
>>   
>> +	/*
>> +	 * parameters of bitmap_set/clear are unsigned int.
>> +	 * Given currently size of nvm is far from exceeding this limit,
>> +	 * so only add a WARN_ON message.
>> +	 */
>> +	WARN_ON(BITS_TO_LONGS(ns->pages_total) > UINT_MAX);
> Does this really need to be a WARN_ON()? Looks more like an -EINVAL
> condition.

This is because currently the free and allocated areas are recorded by 
bitmap during the buddy system initialization. As I said after the 
bitmap is switched to an extent tree, such limitation check will 
disappear. After Qiaowen and Jianpeng replace the bitmap by extent tree, 
people won't see the limitation.

Thanks for your review.

Coly Li

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v13 04/12] bcache: bch_nvmpg_alloc_pages() of the buddy
  2021-12-12 20:14   ` Jens Axboe
@ 2021-12-28  5:29     ` Coly Li
  0 siblings, 0 replies; 28+ messages in thread
From: Coly Li @ 2021-12-28  5:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-bcache, linux-block, Jianpeng Ma, Qiaowei Ren,
	Christoph Hellwig, Dan Williams, Hannes Reinecke

On 12/13/21 4:14 AM, Jens Axboe wrote:
> On 12/12/21 10:05 AM, Coly Li wrote:
>> +/* If not found, it will create if create == true */
>> +static struct bch_nvmpg_head *find_nvmpg_head(const char *uuid, bool create)
>> +{
>> +	struct bch_nvmpg_set_header *set_header = global_nvmpg_set->set_header;
>> +	struct bch_nvmpg_head *head = NULL;
>> +	int i;
>> +
>> +	if (set_header == NULL)
>> +		goto out;
>> +
>> +	for (i = 0; i < set_header->size; i++) {
>> +		struct bch_nvmpg_head *h = &set_header->heads[i];
>> +
>> +		if (h->state != BCH_NVMPG_HD_STAT_ALLOC)
>> +			continue;
>> +
>> +		if (!memcmp(uuid, h->uuid, 16)) {
>> +			head = h;
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (!head && create) {
>> +		u32 used = set_header->used;
>> +
>> +		if (set_header->size > used) {
>> +			head = &set_header->heads[used];
>> +			memset(head, 0, sizeof(struct bch_nvmpg_head));
>> +			head->state = BCH_NVMPG_HD_STAT_ALLOC;
>> +			memcpy(head->uuid, uuid, 16);
>> +			global_nvmpg_set->heads_used++;
>> +			set_header->used++;
>> +		} else
>> +			pr_info("No free bch_nvmpg_head\n");
>> +	}
> Use {} consistently. Again probably just some printk that should go
> away.

Copied.

>> +static struct bch_nvmpg_recs *find_nvmpg_recs(struct bch_nvmpg_ns *ns,
>> +					      struct bch_nvmpg_head *head,
>> +					      bool create)
>> +{
>> +	int ns_id = ns->sb->this_ns;
>> +	struct bch_nvmpg_recs *prev_recs = NULL, *recs = NULL;
>> +
>> +	recs = bch_nvmpg_offset_to_ptr(head->recs_offset[ns_id]);
>> +
>> +	/* If create=false, we return recs[nr] */
>> +	if (!create)
>> +		return recs;
> Would this be cleaner to handle in the caller?

Cure, I will suggest Jianpeng and Qiaowei to change this.

>> +static void add_nvmpg_rec(struct bch_nvmpg_ns *ns,
>> +			  struct bch_nvmpg_recs *recs,
>> +			  unsigned long nvmpg_offset,
>> +			  int order)
>> +{
>> +	int i, ns_id;
>> +	unsigned long pgoff;
>> +
>> +	pgoff = bch_nvmpg_offset_to_pgoff(nvmpg_offset);
>> +	ns_id = ns->sb->this_ns;
>> +
>> +	for (i = 0; i < recs->size; i++) {
>> +		if (recs->recs[i].pgoff == 0) {
>> +			recs->recs[i].pgoff = pgoff;
>> +			recs->recs[i].order = order;
>> +			recs->recs[i].ns_id = ns_id;
>> +			recs->used++;
>> +			break;
>> +		}
>> +	}
>> +	BUG_ON(i == recs->size);
> No BUG_ON's, please. It only truly belongs in core code for cases where
> error handling isn't possible, does not apply here.

It is because currently only 1 single record allocated for bcache 
journal, and if i == recs->size happens it means the on-NVDIMM struct 
bch_nvmpg_recs is corrupted.

Currently we are working on storing Btree nodes on NVDIMM, such BUG_ON() 
is dropped.


>> diff --git a/drivers/md/bcache/nvmpg.h b/drivers/md/bcache/nvmpg.h
>> index 55778d4db7da..d03f3241b45a 100644
>> --- a/drivers/md/bcache/nvmpg.h
>> +++ b/drivers/md/bcache/nvmpg.h
>> @@ -76,6 +76,9 @@ struct bch_nvmpg_set {
>>   /* Indicate which field in bch_nvmpg_sb to be updated */
>>   #define BCH_NVMPG_TOTAL_NS	0	/* total_ns */
>>   
>> +#define BCH_PGOFF_TO_KVADDR(pgoff)					\
>> +	((void *)((unsigned long)(pgoff) << PAGE_SHIFT))
> Pretty sure we have a general kernel helper for this, better to use that
> rather than duplicate it.
>
>
Copied. Thank for pointing out this.

Coly Li

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v13 05/12] bcache: bch_nvmpg_free_pages() of the buddy allocator
  2021-12-12 20:16   ` Jens Axboe
@ 2021-12-28  5:29     ` Coly Li
  0 siblings, 0 replies; 28+ messages in thread
From: Coly Li @ 2021-12-28  5:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-bcache, linux-block, Jianpeng Ma, Qiaowei Ren,
	Christoph Hellwig, Dan Williams, Hannes Reinecke

On 12/13/21 4:16 AM, Jens Axboe wrote:
> On 12/12/21 10:05 AM, Coly Li wrote:
>> diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c
>> index a920779eb548..8ce0c4389b42 100644
>> --- a/drivers/md/bcache/nvmpg.c
>> +++ b/drivers/md/bcache/nvmpg.c
>> @@ -248,6 +248,57 @@ static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns)
>>   	return rc;
>>   }
>>   
>> +static void __free_space(struct bch_nvmpg_ns *ns, unsigned long nvmpg_offset,
>> +			 int order)
>> +{
>> +	unsigned long add_pages = (1L << order);
>> +	pgoff_t pgoff;
>> +	struct page *page;
>> +	void *va;
>> +
>> +	if (nvmpg_offset == 0) {
>> +		pr_err("free pages on offset 0\n");
>> +		return;
>> +	}
>> +
>> +	page = bch_nvmpg_va_to_pg(bch_nvmpg_offset_to_ptr(nvmpg_offset));
>> +	WARN_ON((!page) || (page->private != order));
>> +	pgoff = page->index;
>> +
>> +	while (order < BCH_MAX_ORDER - 1) {
>> +		struct page *buddy_page;
>> +
>> +		pgoff_t buddy_pgoff = pgoff ^ (1L << order);
>> +		pgoff_t parent_pgoff = pgoff & ~(1L << order);
>> +
>> +		if ((parent_pgoff + (1L << (order + 1)) > ns->pages_total))
>> +			break;
>> +
>> +		va = bch_nvmpg_pgoff_to_ptr(ns, buddy_pgoff);
>> +		buddy_page = bch_nvmpg_va_to_pg(va);
>> +		WARN_ON(!buddy_page);
>> +
>> +		if (PageBuddy(buddy_page) && (buddy_page->private == order)) {
>> +			list_del((struct list_head *)&buddy_page->zone_device_data);
>> +			__ClearPageBuddy(buddy_page);
>> +			pgoff = parent_pgoff;
>> +			order++;
>> +			continue;
>> +		}
>> +		break;
>> +	}
>> +
>> +	va = bch_nvmpg_pgoff_to_ptr(ns, pgoff);
>> +	page = bch_nvmpg_va_to_pg(va);
>> +	WARN_ON(!page);
>> +	list_add((struct list_head *)&page->zone_device_data,
>> +		 &ns->free_area[order]);
>> +	page->index = pgoff;
>> +	set_page_private(page, order);
>> +	__SetPageBuddy(page);
>> +	ns->free += add_pages;
>> +}
> There are 3 WARN_ON's in here. If you absolutely must use a WARN_ON,
> then make them WARN_ON_ONCE(). Ditto in other spots in this patch.
>
>> +void bch_nvmpg_free_pages(unsigned long nvmpg_offset, int order,
>> +			  const char *uuid)
>> +{
>> +	struct bch_nvmpg_ns *ns;
>> +	struct bch_nvmpg_head *head;
>> +	struct bch_nvmpg_recs *recs;
>> +	int r;
>> +
>> +	mutex_lock(&global_nvmpg_set->lock);
>> +
>> +	ns = global_nvmpg_set->ns_tbl[BCH_NVMPG_GET_NS_ID(nvmpg_offset)];
>> +	if (!ns) {
>> +		pr_err("can't find namespace by given kaddr from namespace\n");
>> +		goto unlock;
>> +	}
>> +
>> +	head = find_nvmpg_head(uuid, false);
>> +	if (!head) {
>> +		pr_err("can't found bch_nvmpg_head by uuid\n");
>> +		goto unlock;
>> +	}
>> +
>> +	recs = find_nvmpg_recs(ns, head, false);
>> +	if (!recs) {
>> +		pr_err("can't find bch_nvmpg_recs by uuid\n");
>> +		goto unlock;
>> +	}
>> +
>> +	r = remove_nvmpg_rec(recs, ns->sb->this_ns, nvmpg_offset, order);
>> +	if (r < 0) {
>> +		pr_err("can't find bch_nvmpg_rec\n");
>> +		goto unlock;
>> +	}
>> +
>> +	__free_space(ns, nvmpg_offset, order);
>> +
>> +unlock:
>> +	mutex_unlock(&global_nvmpg_set->lock);
>> +}
> Again tons of useless error prints. Make them return a proper error
> instad of just making things void...

Copied. It will be improved as you suggested.


>> @@ -686,6 +835,7 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path)
>>   	ns->pages_offset = sb->pages_offset;
>>   	ns->pages_total = sb->pages_total;
>>   	ns->sb = sb;
>> +	/* increase by __free_space() */
>>   	ns->free = 0;
>>   	ns->bdev = bdev;
>>   	ns->set = global_nvmpg_set;
> Does that hunk belong in here?
>

Could you please give me more detailed explanation of the above 
question? Do you mention the bch_register_namespace() function, or the 
code block which initializes members of ns?

Thank you.

Coly Li


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v13 05/12] bcache: bch_nvmpg_free_pages() of the buddy allocator
  2021-12-12 17:05 ` [PATCH v13 05/12] bcache: bch_nvmpg_free_pages() of the buddy allocator Coly Li
  2021-12-12 20:16   ` Jens Axboe
@ 2022-02-21 12:36   ` yukuai (C)
  2022-02-22  5:03     ` Ma, Jianpeng
  1 sibling, 1 reply; 28+ messages in thread
From: yukuai (C) @ 2022-02-21 12:36 UTC (permalink / raw)
  To: Coly Li, axboe
  Cc: linux-bcache, linux-block, Jianpeng Ma, Qiaowei Ren,
	Christoph Hellwig, Dan Williams, Hannes Reinecke

在 2021/12/13 1:05, Coly Li 写道:
> From: Jianpeng Ma <jianpeng.ma@intel.com>
> 
> This patch implements the bch_nvmpg_free_pages() of the buddy allocator.
> 
> The difference between this and page-buddy-free:
> it need owner_uuid to free owner allocated pages, and must
> persistent after free.
> 
> Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
> Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
> Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Jens Axboe <axboe@kernel.dk>
> ---
>   drivers/md/bcache/nvmpg.c | 164 ++++++++++++++++++++++++++++++++++++--
>   drivers/md/bcache/nvmpg.h |   3 +
>   2 files changed, 160 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c
> index a920779eb548..8ce0c4389b42 100644
> --- a/drivers/md/bcache/nvmpg.c
> +++ b/drivers/md/bcache/nvmpg.c
> @@ -248,6 +248,57 @@ static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns)
>   	return rc;
>   }
>   
> +static void __free_space(struct bch_nvmpg_ns *ns, unsigned long nvmpg_offset,
> +			 int order)
> +{
> +	unsigned long add_pages = (1L << order);
> +	pgoff_t pgoff;
> +	struct page *page;
> +	void *va;
> +
> +	if (nvmpg_offset == 0) {
> +		pr_err("free pages on offset 0\n");
> +		return;
> +	}
> +
> +	page = bch_nvmpg_va_to_pg(bch_nvmpg_offset_to_ptr(nvmpg_offset));
> +	WARN_ON((!page) || (page->private != order));
> +	pgoff = page->index;
> +
> +	while (order < BCH_MAX_ORDER - 1) {
> +		struct page *buddy_page;
> +
> +		pgoff_t buddy_pgoff = pgoff ^ (1L << order);
> +		pgoff_t parent_pgoff = pgoff & ~(1L << order);
> +
> +		if ((parent_pgoff + (1L << (order + 1)) > ns->pages_total))
> +			break;
> +
> +		va = bch_nvmpg_pgoff_to_ptr(ns, buddy_pgoff);
> +		buddy_page = bch_nvmpg_va_to_pg(va);
> +		WARN_ON(!buddy_page);
> +
> +		if (PageBuddy(buddy_page) && (buddy_page->private == order)) {
> +			list_del((struct list_head *)&buddy_page->zone_device_data);
> +			__ClearPageBuddy(buddy_page);
> +			pgoff = parent_pgoff;
> +			order++;
> +			continue;
> +		}
> +		break;
> +	}
> +
> +	va = bch_nvmpg_pgoff_to_ptr(ns, pgoff);
> +	page = bch_nvmpg_va_to_pg(va);
> +	WARN_ON(!page);
> +	list_add((struct list_head *)&page->zone_device_data,
> +		 &ns->free_area[order]);
> +	page->index = pgoff;
> +	set_page_private(page, order);
> +	__SetPageBuddy(page);
> +	ns->free += add_pages;
> +}
> +
>   static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns)
>   {
>   	unsigned int start, end, pages;
> @@ -261,21 +312,19 @@ static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns)
>   		pages = end - start;
>   
>   		while (pages) {
> -			void *addr;
> -
>   			for (i = BCH_MAX_ORDER - 1; i >= 0; i--) {
>   				if ((pgoff_start % (1L << i) == 0) &&
>   				    (pages >= (1L << i)))
>   					break;
>   			}
>   
> -			addr = bch_nvmpg_pgoff_to_ptr(ns, pgoff_start);
> -			page = bch_nvmpg_va_to_pg(addr);
> +			page = bch_nvmpg_va_to_pg(
> +					bch_nvmpg_pgoff_to_ptr(ns, pgoff_start));
>   			set_page_private(page, i);
>   			page->index = pgoff_start;
> -			__SetPageBuddy(page);
> -			list_add((struct list_head *)&page->zone_device_data,
> -				 &ns->free_area[i]);
> +
> +			/* In order to update ns->free */
> +			__free_space(ns, pgoff_start, i);

Hi,

While testing this patchset, we found that this is probably wrong,
pgoff_start represent page number here, however, __free_space expect
this to be page offset. Maybe this should be:

__free_space(ns, pgoff_start << PAGE_SHIFT, i);

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v13 05/12] bcache: bch_nvmpg_free_pages() of the buddy allocator
  2022-02-21 12:36   ` yukuai (C)
@ 2022-02-22  5:03     ` Ma, Jianpeng
  0 siblings, 0 replies; 28+ messages in thread
From: Ma, Jianpeng @ 2022-02-22  5:03 UTC (permalink / raw)
  To: yukuai (C), Li, Coly, axboe
  Cc: linux-bcache, linux-block, Ren, Qiaowei, Christoph Hellwig,
	Williams, Dan J, Hannes Reinecke

> -----Original Message-----
> From: yukuai (C) <yukuai3@huawei.com>
> Sent: Monday, February 21, 2022 8:36 PM
> To: Li, Coly <colyli@suse.de>; axboe@kernel.dk
> Cc: linux-bcache@vger.kernel.org; linux-block@vger.kernel.org; Ma, Jianpeng
> <jianpeng.ma@intel.com>; Ren, Qiaowei <qiaowei.ren@intel.com>; Christoph
> Hellwig <hch@lst.de>; Williams, Dan J <dan.j.williams@intel.com>; Hannes
> Reinecke <hare@suse.de>
> Subject: Re: [PATCH v13 05/12] bcache: bch_nvmpg_free_pages() of the buddy
> allocator
> 
> 在 2021/12/13 1:05, Coly Li 写道:
> > From: Jianpeng Ma <jianpeng.ma@intel.com>
> >
> > This patch implements the bch_nvmpg_free_pages() of the buddy allocator.
> >
> > The difference between this and page-buddy-free:
> > it need owner_uuid to free owner allocated pages, and must persistent
> > after free.
> >
> > Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
> > Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
> > Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Hannes Reinecke <hare@suse.de>
> > Cc: Jens Axboe <axboe@kernel.dk>
> > ---
> >   drivers/md/bcache/nvmpg.c | 164
> ++++++++++++++++++++++++++++++++++++--
> >   drivers/md/bcache/nvmpg.h |   3 +
> >   2 files changed, 160 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/md/bcache/nvmpg.c b/drivers/md/bcache/nvmpg.c
> > index a920779eb548..8ce0c4389b42 100644
> > --- a/drivers/md/bcache/nvmpg.c
> > +++ b/drivers/md/bcache/nvmpg.c
> > @@ -248,6 +248,57 @@ static int init_nvmpg_set_header(struct
> bch_nvmpg_ns *ns)
> >   	return rc;
> >   }
> >
> > +static void __free_space(struct bch_nvmpg_ns *ns, unsigned long
> nvmpg_offset,
> > +			 int order)
> > +{
> > +	unsigned long add_pages = (1L << order);
> > +	pgoff_t pgoff;
> > +	struct page *page;
> > +	void *va;
> > +
> > +	if (nvmpg_offset == 0) {
> > +		pr_err("free pages on offset 0\n");
> > +		return;
> > +	}
> > +
> > +	page = bch_nvmpg_va_to_pg(bch_nvmpg_offset_to_ptr(nvmpg_offset));
> > +	WARN_ON((!page) || (page->private != order));
> > +	pgoff = page->index;
> > +
> > +	while (order < BCH_MAX_ORDER - 1) {
> > +		struct page *buddy_page;
> > +
> > +		pgoff_t buddy_pgoff = pgoff ^ (1L << order);
> > +		pgoff_t parent_pgoff = pgoff & ~(1L << order);
> > +
> > +		if ((parent_pgoff + (1L << (order + 1)) > ns->pages_total))
> > +			break;
> > +
> > +		va = bch_nvmpg_pgoff_to_ptr(ns, buddy_pgoff);
> > +		buddy_page = bch_nvmpg_va_to_pg(va);
> > +		WARN_ON(!buddy_page);
> > +
> > +		if (PageBuddy(buddy_page) && (buddy_page->private == order)) {
> > +			list_del((struct list_head *)&buddy_page->zone_device_data);
> > +			__ClearPageBuddy(buddy_page);
> > +			pgoff = parent_pgoff;
> > +			order++;
> > +			continue;
> > +		}
> > +		break;
> > +	}
> > +
> > +	va = bch_nvmpg_pgoff_to_ptr(ns, pgoff);
> > +	page = bch_nvmpg_va_to_pg(va);
> > +	WARN_ON(!page);
> > +	list_add((struct list_head *)&page->zone_device_data,
> > +		 &ns->free_area[order]);
> > +	page->index = pgoff;
> > +	set_page_private(page, order);
> > +	__SetPageBuddy(page);
> > +	ns->free += add_pages;
> > +}
> > +
> >   static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns)
> >   {
> >   	unsigned int start, end, pages;
> > @@ -261,21 +312,19 @@ static void bch_nvmpg_init_free_space(struct
> bch_nvmpg_ns *ns)
> >   		pages = end - start;
> >
> >   		while (pages) {
> > -			void *addr;
> > -
> >   			for (i = BCH_MAX_ORDER - 1; i >= 0; i--) {
> >   				if ((pgoff_start % (1L << i) == 0) &&
> >   				    (pages >= (1L << i)))
> >   					break;
> >   			}
> >
> > -			addr = bch_nvmpg_pgoff_to_ptr(ns, pgoff_start);
> > -			page = bch_nvmpg_va_to_pg(addr);
> > +			page = bch_nvmpg_va_to_pg(
> > +					bch_nvmpg_pgoff_to_ptr(ns, pgoff_start));
> >   			set_page_private(page, i);
> >   			page->index = pgoff_start;
> > -			__SetPageBuddy(page);
> > -			list_add((struct list_head *)&page->zone_device_data,
> > -				 &ns->free_area[i]);
> > +
> > +			/* In order to update ns->free */
> > +			__free_space(ns, pgoff_start, i);
> 
> Hi,
> 
> While testing this patchset, we found that this is probably wrong, pgoff_start
> represent page number here, however, __free_space expect this to be page
> offset. Maybe this should be:
> 
> __free_space(ns, pgoff_start << PAGE_SHIFT, i);
> 
Yes,you are right. I'll fix in the next version.

Thanks!
Jianpeng.
> Thanks,
> Kuai

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-02-22  5:26 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-12 17:05 [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Coly Li
2021-12-12 17:05 ` [PATCH v13 01/12] bcache: add initial data structures for nvm pages Coly Li
2021-12-12 17:05 ` [PATCH v13 02/12] bcache: initialize the nvm pages allocator Coly Li
2021-12-12 19:34   ` Jens Axboe
2021-12-28  5:29     ` Coly Li
2021-12-12 17:05 ` [PATCH v13 03/12] bcache: initialization of the buddy Coly Li
2021-12-12 20:10   ` Jens Axboe
2021-12-28  5:29     ` Coly Li
2021-12-15 16:20   ` Dan Carpenter
2021-12-28  5:12     ` Coly Li
2021-12-12 17:05 ` [PATCH v13 04/12] bcache: bch_nvmpg_alloc_pages() " Coly Li
2021-12-12 20:14   ` Jens Axboe
2021-12-28  5:29     ` Coly Li
2021-12-12 17:05 ` [PATCH v13 05/12] bcache: bch_nvmpg_free_pages() of the buddy allocator Coly Li
2021-12-12 20:16   ` Jens Axboe
2021-12-28  5:29     ` Coly Li
2022-02-21 12:36   ` yukuai (C)
2022-02-22  5:03     ` Ma, Jianpeng
2021-12-12 17:05 ` [PATCH v13 06/12] bcache: get recs list head for allocated pages by specific uuid Coly Li
2021-12-12 20:18   ` Jens Axboe
2021-12-12 17:05 ` [PATCH v13 07/12] bcache: use bucket index to set GC_MARK_METADATA for journal buckets in bch_btree_gc_finish() Coly Li
2021-12-12 17:05 ` [PATCH v13 08/12] bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set Coly Li
2021-12-12 17:05 ` [PATCH v13 09/12] bcache: initialize bcache journal for NVDIMM meta device Coly Li
2021-12-12 17:05 ` [PATCH v13 10/12] bcache: support storing bcache journal into " Coly Li
2021-12-12 17:05 ` [PATCH v13 11/12] bcache: read jset from NVDIMM pages for journal replay Coly Li
2021-12-12 17:05 ` [PATCH v13 12/12] bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device Coly Li
2021-12-12 20:20 ` [PATCH v13 00/12] bcache for 5.17: enable NVDIMM for bcache journal Jens Axboe
2021-12-28  5:29   ` Coly Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).