linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [bch-nvm-pages v9 2/6] bcache: initialize the nvm pages allocator
  2021-04-28 21:39 ` [bch-nvm-pages v9 2/6] bcache: initialize the nvm pages allocator Qiaowei Ren
@ 2021-04-28 16:29   ` Randy Dunlap
  2021-04-29  0:13     ` Ma, Jianpeng
  2021-04-28 17:53   ` Randy Dunlap
  2021-05-11  3:53   ` Coly Li
  2 siblings, 1 reply; 17+ messages in thread
From: Randy Dunlap @ 2021-04-28 16:29 UTC (permalink / raw)
  To: Qiaowei Ren, linux-bcache; +Cc: jianpeng.ma, colyli, rdunlap, Colin Ian King

On 4/28/21 2:39 PM, Qiaowei Ren wrote:
> From: Jianpeng Ma <jianpeng.ma@intel.com>
> 
> This patch define the prototype data structures in memory and initializes
> the nvm pages allocator.
> 
> The nvm address space which is managed by this allocatior can consist of
> many nvm namespaces, and some namespaces can compose into one nvm set,
> like cache set. For this initial implementation, only one set can be
> supported.
> 
> The users of this nvm pages allocator need to call regiseter_namespace()
> to register the nvdimm device (like /dev/pmemX) into this allocator as
> the instance of struct nvm_namespace.
> 
> v9:
>   -Fix Kconfig dependance error(Reported-by Randy)
>   -Fix an uninitialized return value(Colin)
> 
> Reported-by: Randy Dunlap <rdunlap@infradead.org>
> Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
> Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
> Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
> Signed-off-by: Coly Li <colyli@suse.de>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  drivers/md/bcache/Kconfig     |   8 +
>  drivers/md/bcache/Makefile    |   2 +-
>  drivers/md/bcache/nvm-pages.c | 285 ++++++++++++++++++++++++++++++++++
>  drivers/md/bcache/nvm-pages.h |  74 +++++++++
>  drivers/md/bcache/super.c     |   3 +
>  5 files changed, 371 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/md/bcache/nvm-pages.c
>  create mode 100644 drivers/md/bcache/nvm-pages.h
> 
> diff --git a/drivers/md/bcache/Kconfig b/drivers/md/bcache/Kconfig
> index d1ca4d059c20..3057da4cf8ff 100644
> --- a/drivers/md/bcache/Kconfig
> +++ b/drivers/md/bcache/Kconfig
> @@ -35,3 +35,11 @@ config BCACHE_ASYNC_REGISTRATION
>  	device path into this file will returns immediately and the real
>  	registration work is handled in kernel work queue in asynchronous
>  	way.
> +
> +config BCACHE_NVM_PAGES
> +	bool "NVDIMM support for bcache (EXPERIMENTAL)"
> +	depends on BCACHE
> +	depends on LIBNVDIMM
> +	depends on DAX
> +	help
> +	nvm pages allocator for bcache.

Please follow coding-style for Kconfig files:

(from Documentation/process/coding-style.rst, section 10):

For all of the Kconfig* configuration files throughout the source tree,
the indentation is somewhat different.  Lines under a ``config`` definition
are indented with one tab, while help text is indented an additional two
spaces.


Also, that help text could be better.

-- 
~Randy


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [bch-nvm-pages v9 2/6] bcache: initialize the nvm pages allocator
  2021-04-28 21:39 ` [bch-nvm-pages v9 2/6] bcache: initialize the nvm pages allocator Qiaowei Ren
  2021-04-28 16:29   ` Randy Dunlap
@ 2021-04-28 17:53   ` Randy Dunlap
  2021-05-11  3:53   ` Coly Li
  2 siblings, 0 replies; 17+ messages in thread
From: Randy Dunlap @ 2021-04-28 17:53 UTC (permalink / raw)
  To: Qiaowei Ren, linux-bcache; +Cc: jianpeng.ma, colyli, rdunlap, Colin Ian King

Hi,

On 4/28/21 2:39 PM, Qiaowei Ren wrote:
> From: Jianpeng Ma <jianpeng.ma@intel.com>
> 
> This patch define the prototype data structures in memory and initializes
> the nvm pages allocator.
> 
> The nvm address space which is managed by this allocatior can consist of

                                                 allocator

> many nvm namespaces, and some namespaces can compose into one nvm set,
> like cache set. For this initial implementation, only one set can be
> supported.
> 
> The users of this nvm pages allocator need to call regiseter_namespace()

                                                     register_namespace()

> to register the nvdimm device (like /dev/pmemX) into this allocator as
> the instance of struct nvm_namespace.
> 
> v9:
>   -Fix Kconfig dependance error(Reported-by Randy)

                 dependence

>   -Fix an uninitialized return value(Colin)
> 
> Reported-by: Randy Dunlap <rdunlap@infradead.org>
> Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
> Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
> Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
> Signed-off-by: Coly Li <colyli@suse.de>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  drivers/md/bcache/Kconfig     |   8 +
>  drivers/md/bcache/Makefile    |   2 +-
>  drivers/md/bcache/nvm-pages.c | 285 ++++++++++++++++++++++++++++++++++
>  drivers/md/bcache/nvm-pages.h |  74 +++++++++
>  drivers/md/bcache/super.c     |   3 +
>  5 files changed, 371 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/md/bcache/nvm-pages.c
>  create mode 100644 drivers/md/bcache/nvm-pages.h
> 
> diff --git a/drivers/md/bcache/Kconfig b/drivers/md/bcache/Kconfig
> index d1ca4d059c20..3057da4cf8ff 100644
> --- a/drivers/md/bcache/Kconfig
> +++ b/drivers/md/bcache/Kconfig
> @@ -35,3 +35,11 @@ config BCACHE_ASYNC_REGISTRATION
>  	device path into this file will returns immediately and the real
>  	registration work is handled in kernel work queue in asynchronous
>  	way.
> +
> +config BCACHE_NVM_PAGES
> +	bool "NVDIMM support for bcache (EXPERIMENTAL)"
> +	depends on BCACHE
> +	depends on LIBNVDIMM
> +	depends on DAX
> +	help
> +	nvm pages allocator for bcache.
> diff --git a/drivers/md/bcache/Makefile b/drivers/md/bcache/Makefile
> index 5b87e59676b8..948e5ed2ca66 100644
> --- a/drivers/md/bcache/Makefile
> +++ b/drivers/md/bcache/Makefile
> @@ -4,4 +4,4 @@ obj-$(CONFIG_BCACHE)	+= bcache.o
>  
>  bcache-y		:= alloc.o bset.o btree.o closure.o debug.o extents.o\
>  	io.o journal.o movinggc.o request.o stats.o super.o sysfs.o trace.o\
> -	util.o writeback.o features.o
> +	util.o writeback.o features.o nvm-pages.o

This is not the right way to add an optional piece of code (nvm-pages.o)
to the full linked binary.

I.e., it is added unconditionally here and then inside its .c file, the
entire file is bracketed with (see below):

#ifdef CONFIG_BCACHE_NVM_PAGES
...
#endif /* CONFIG_BCACHE_NVM_PAGES */


The right way to do this is in Kconfig and Makefile changes alone,
and then nvm-pages.c does not need that huge #ifdef/#endif bracketing.


Documentation/kbuild/*.rst has some references to this, but it's probably
easier just to look at some examples.

E.g., see drivers/usb/common/: Kconfig and Makefile and how CONFIG_TRACING
adds debug.o to the usb-common-y binary build.
Same for USB_LED_TRIG and led.o.


> diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
> new file mode 100644
> index 000000000000..976ab9002c17
> --- /dev/null
> +++ b/drivers/md/bcache/nvm-pages.c
> @@ -0,0 +1,285 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Nvdimm page-buddy allocator
> + *
> + * Copyright (c) 2021, Intel Corporation.
> + * Copyright (c) 2021, Qiaowei Ren <qiaowei.ren@intel.com>.
> + * Copyright (c) 2021, Jianpeng Ma <jianpeng.ma@intel.com>.
> + */
> +
> +#ifdef CONFIG_BCACHE_NVM_PAGES

[delete many lines]

> +
> +#endif /* CONFIG_BCACHE_NVM_PAGES */


The header (.h) file also probably needs some fixing up.


thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [bch-nvm-pages v9 0/6] nvm page allocator for bcache
@ 2021-04-28 21:39 Qiaowei Ren
  2021-04-28 21:39 ` [bch-nvm-pages v9 1/6] bcache: add initial data structures for nvm pages Qiaowei Ren
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Qiaowei Ren @ 2021-04-28 21:39 UTC (permalink / raw)
  To: linux-bcache; +Cc: qiaowei.ren, jianpeng.ma, colyli, rdunlap

From: Jianpeng Ma <jianpeng.ma@intel.com>

This series implements nvm pages allocator for bcache. This idea is from
one discussion about nvdimm use case in kernel together with Coly. Coly
sent the following email about this idea to give some introduction on what
we will do before:

https://lore.kernel.org/linux-bcache/bc7e71ec-97eb-b226-d4fc-d8b64c1ef41a@suse.de/

Here this series focus on the first step in above email, that is to say,
this patch set implements a generic framework in bcache to allocate/release
NV-memory pages, and provide allocated pages for each requestor after reboot.
In order to do this, one simple buddy system is implemented to manage NV-memory
pages.

Coly Li (1):
  bcache: add initial data structures for nvm pages

Jianpeng Ma (5):
  bcache: initialize the nvm pages allocator
  bcache: initialization of the buddy
  bcache: bch_nvm_alloc_pages() of the buddy
  bcache: bch_nvm_free_pages() of the buddy
  bcache: get allocated pages from specific owner

 drivers/md/bcache/Kconfig       |   8 +
 drivers/md/bcache/Makefile      |   2 +-
 drivers/md/bcache/nvm-pages.c   | 747 ++++++++++++++++++++++++++++++++
 drivers/md/bcache/nvm-pages.h   |  92 ++++
 drivers/md/bcache/super.c       |   3 +
 include/uapi/linux/bcache-nvm.h | 206 +++++++++
 6 files changed, 1057 insertions(+), 1 deletion(-)
 create mode 100644 drivers/md/bcache/nvm-pages.c
 create mode 100644 drivers/md/bcache/nvm-pages.h
 create mode 100644 include/uapi/linux/bcache-nvm.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [bch-nvm-pages v9 1/6] bcache: add initial data structures for nvm pages
  2021-04-28 21:39 [bch-nvm-pages v9 0/6] nvm page allocator for bcache Qiaowei Ren
@ 2021-04-28 21:39 ` Qiaowei Ren
  2021-04-28 21:39 ` [bch-nvm-pages v9 2/6] bcache: initialize the nvm pages allocator Qiaowei Ren
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Qiaowei Ren @ 2021-04-28 21:39 UTC (permalink / raw)
  To: linux-bcache; +Cc: qiaowei.ren, jianpeng.ma, colyli, rdunlap

From: Coly Li <colyli@suse.de>

This patch initializes the prototype data structures for nvm pages
allocator,

- struct bch_nvm_pages_sb
This is the super block allocated on each nvdimm namespace. A nvdimm
set may have multiple namespaces, bch_nvm_pages_sb->set_uuid is used
to mark which nvdimm set this name space belongs to. Normally we will
use the bcache's cache set UUID to initialize this uuid, to connect this
nvdimm set to a specified bcache cache set.

- struct bch_owner_list_head
This is a table for all heads of all owner lists. A owner list records
which page(s) allocated to which owner. After reboot from power failure,
the ownwer may find all its requested and allocated pages from the owner
list by a handler which is converted by a UUID.

- struct bch_nvm_pages_owner_head
This is a head of an owner list. Each owner only has one owner list,
and a nvm page only belongs to an specific owner. uuid[] will be set to
owner's uuid, for bcache it is the bcache's cache set uuid. label is not
mandatory, it is a human-readable string for debug purpose. The pointer
*recs references to separated nvm page which hold the table of struct
bch_nvm_pgalloc_rec.

- struct bch_nvm_pgalloc_recs
This struct occupies a whole page, owner_uuid should match the uuid
in struct bch_nvm_pages_owner_head. recs[] is the real table contains all
allocated records.

- struct bch_nvm_pgalloc_rec
Each structure records a range of allocated nvm pages.
  - Bits  0 - 51: is pages offset of the allocated pages.
  - Bits 52 - 57: allocaed size in page_size * order-of-2
  - Bits 58 - 63: reserved.
Since each of the allocated nvm pages are power of 2, using 6 bits to
represent allocated size can have (1<<(1<<64) - 1) * PAGE_SIZE maximum
value. It can be a 76 bits width range size in byte for 4KB page size,
which is large enough currently.

Signed-off-by: Coly Li <colyli@suse.de>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
---
 include/uapi/linux/bcache-nvm.h | 200 ++++++++++++++++++++++++++++++++
 1 file changed, 200 insertions(+)
 create mode 100644 include/uapi/linux/bcache-nvm.h

diff --git a/include/uapi/linux/bcache-nvm.h b/include/uapi/linux/bcache-nvm.h
new file mode 100644
index 000000000000..1d62e9f0d4bf
--- /dev/null
+++ b/include/uapi/linux/bcache-nvm.h
@@ -0,0 +1,200 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef _UAPI_BCACHE_NVM_H
+#define _UAPI_BCACHE_NVM_H
+
+#ifdef CONFIG_64BIT
+/*
+ * Bcache on NVDIMM data structures
+ */
+
+/*
+ * - struct bch_nvm_pages_sb
+ *   This is the super block allocated on each nvdimm namespace. A nvdimm
+ * set may have multiple namespaces, bch_nvm_pages_sb->set_uuid is used to mark
+ * which nvdimm set this name space belongs to. Normally we will use the
+ * bcache's cache set UUID to initialize this uuid, to connect this nvdimm
+ * set to a specified bcache cache set.
+ *
+ * - struct bch_owner_list_head
+ *   This is a table for all heads of all owner lists. A owner list records
+ * which page(s) allocated to which owner. After reboot from power failure,
+ * the ownwer may find all its requested and allocated pages from the owner
+ * list by a handler which is converted by a UUID.
+ *
+ * - struct bch_nvm_pages_owner_head
+ *   This is a head of an owner list. Each owner only has one owner list,
+ * and a nvm page only belongs to an specific owner. uuid[] will be set to
+ * owner's uuid, for bcache it is the bcache's cache set uuid. label is not
+ * mandatory, it is a human-readable string for debug purpose. The pointer
+ * recs references to separated nvm page which hold the table of struct
+ * bch_pgalloc_rec.
+ *
+ *- struct bch_nvm_pgalloc_recs
+ *  This structure occupies a whole page, owner_uuid should match the uuid
+ * in struct bch_nvm_pages_owner_head. recs[] is the real table contains all
+ * allocated records.
+ *
+ * - struct bch_pgalloc_rec
+ *   Each structure records a range of allocated nvm pages. pgoff is offset
+ * in unit of page size of this allocated nvm page range. The adjoint page
+ * ranges of same owner can be merged into a larger one, therefore pages_nr
+ * is NOT always power of 2.
+ *
+ *
+ * Memory layout on nvdimm namespace 0
+ *
+ *    0 +---------------------------------+
+ *      |                                 |
+ *  4KB +---------------------------------+
+ *      |         bch_nvm_pages_sb        |
+ *  8KB +---------------------------------+ <--- bch_nvm_pages_sb.bch_owner_list_head
+ *      |       bch_owner_list_head       |
+ *      |                                 |
+ * 16KB +---------------------------------+ <--- bch_owner_list_head.heads[0].recs[0]
+ *      |       bch_nvm_pgalloc_recs      |
+ *      |  (nvm pages internal usage)     |
+ * 24KB +---------------------------------+
+ *      |                                 |
+ *      |                                 |
+ * 16MB  +---------------------------------+
+ *      |      allocable nvm pages        |
+ *      |      for buddy allocator        |
+ * end  +---------------------------------+
+ *
+ *
+ *
+ * Memory layout on nvdimm namespace N
+ * (doesn't have owner list)
+ *
+ *    0 +---------------------------------+
+ *      |                                 |
+ *  4KB +---------------------------------+
+ *      |         bch_nvm_pages_sb        |
+ *  8KB +---------------------------------+
+ *      |                                 |
+ *      |                                 |
+ *      |                                 |
+ *      |                                 |
+ *      |                                 |
+ *      |                                 |
+ * 16MB  +---------------------------------+
+ *      |      allocable nvm pages        |
+ *      |      for buddy allocator        |
+ * end  +---------------------------------+
+ *
+ */
+
+#include <linux/types.h>
+
+/* In sectors */
+#define BCH_NVM_PAGES_SB_OFFSET			4096
+#define BCH_NVM_PAGES_OFFSET			(16 << 20)
+
+#define BCH_NVM_PAGES_LABEL_SIZE		32
+#define BCH_NVM_PAGES_NAMESPACES_MAX		8
+
+#define BCH_NVM_PAGES_OWNER_LIST_HEAD_OFFSET	(8<<10)
+#define BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET	(16<<10)
+
+#define BCH_NVM_PAGES_SB_VERSION		0
+#define BCH_NVM_PAGES_SB_VERSION_MAX		0
+
+static const unsigned char bch_nvm_pages_magic[] = {
+	0x17, 0xbd, 0x53, 0x7f, 0x1b, 0x23, 0xd6, 0x83,
+	0x46, 0xa4, 0xf8, 0x28, 0x17, 0xda, 0xec, 0xa9 };
+static const unsigned char bch_nvm_pages_pgalloc_magic[] = {
+	0x39, 0x25, 0x3f, 0xf7, 0x27, 0x17, 0xd0, 0xb9,
+	0x10, 0xe6, 0xd2, 0xda, 0x38, 0x68, 0x26, 0xae };
+
+/* takes 64bit width */
+struct bch_pgalloc_rec {
+	__u64	pgoff:52;
+	__u64	order:6;
+	__u64	reserved:6;
+};
+
+struct bch_nvm_pgalloc_recs {
+union {
+	struct {
+		struct bch_nvm_pages_owner_head	*owner;
+		struct bch_nvm_pgalloc_recs	*next;
+		unsigned char			magic[16];
+		unsigned char			owner_uuid[16];
+		unsigned int			size;
+		unsigned int			used;
+		unsigned long			_pad[4];
+		struct bch_pgalloc_rec		recs[];
+	};
+	unsigned char				pad[8192];
+};
+};
+
+#define BCH_MAX_RECS					\
+	((sizeof(struct bch_nvm_pgalloc_recs) -		\
+	 offsetof(struct bch_nvm_pgalloc_recs, recs)) /	\
+	 sizeof(struct bch_pgalloc_rec))
+
+struct bch_nvm_pages_owner_head {
+	unsigned char			uuid[16];
+	unsigned char			label[BCH_NVM_PAGES_LABEL_SIZE];
+	/* Per-namespace own lists */
+	struct bch_nvm_pgalloc_recs	*recs[BCH_NVM_PAGES_NAMESPACES_MAX];
+};
+
+/* heads[0] is always for nvm_pages internal usage */
+struct bch_owner_list_head {
+union {
+	struct {
+		unsigned int			size;
+		unsigned int			used;
+		unsigned long			_pad[4];
+		struct bch_nvm_pages_owner_head	heads[];
+	};
+	unsigned char				pad[8192];
+};
+};
+#define BCH_MAX_OWNER_LIST				\
+	((sizeof(struct bch_owner_list_head) -		\
+	 offsetof(struct bch_owner_list_head, heads)) /	\
+	 sizeof(struct bch_nvm_pages_owner_head))
+
+/* The on-media bit order is local CPU order */
+struct bch_nvm_pages_sb {
+	unsigned long				csum;
+	unsigned long				ns_start;
+	unsigned long				sb_offset;
+	unsigned long				version;
+	unsigned char				magic[16];
+	unsigned char				uuid[16];
+	unsigned int				page_size;
+	unsigned int				total_namespaces_nr;
+	unsigned int				this_namespace_nr;
+	union {
+		unsigned char			set_uuid[16];
+		unsigned long			set_magic;
+	};
+
+	unsigned long				flags;
+	unsigned long				seq;
+
+	unsigned long				feature_compat;
+	unsigned long				feature_incompat;
+	unsigned long				feature_ro_compat;
+
+	/* For allocable nvm pages from buddy systems */
+	unsigned long				pages_offset;
+	unsigned long				pages_total;
+
+	unsigned long				pad[8];
+
+	/* Only on the first name space */
+	struct bch_owner_list_head		*owner_list_head;
+
+	/* Just for csum_set() */
+	unsigned int				keys;
+	unsigned long				d[0];
+};
+#endif /* CONFIG_64BIT */
+
+#endif /* _UAPI_BCACHE_NVM_H */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [bch-nvm-pages v9 2/6] bcache: initialize the nvm pages allocator
  2021-04-28 21:39 [bch-nvm-pages v9 0/6] nvm page allocator for bcache Qiaowei Ren
  2021-04-28 21:39 ` [bch-nvm-pages v9 1/6] bcache: add initial data structures for nvm pages Qiaowei Ren
@ 2021-04-28 21:39 ` Qiaowei Ren
  2021-04-28 16:29   ` Randy Dunlap
                     ` (2 more replies)
  2021-04-28 21:39 ` [bch-nvm-pages v9 3/6] bcache: initialization of the buddy Qiaowei Ren
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 17+ messages in thread
From: Qiaowei Ren @ 2021-04-28 21:39 UTC (permalink / raw)
  To: linux-bcache
  Cc: qiaowei.ren, jianpeng.ma, colyli, rdunlap, Randy Dunlap, Colin Ian King

From: Jianpeng Ma <jianpeng.ma@intel.com>

This patch define the prototype data structures in memory and initializes
the nvm pages allocator.

The nvm address space which is managed by this allocatior can consist of
many nvm namespaces, and some namespaces can compose into one nvm set,
like cache set. For this initial implementation, only one set can be
supported.

The users of this nvm pages allocator need to call regiseter_namespace()
to register the nvdimm device (like /dev/pmemX) into this allocator as
the instance of struct nvm_namespace.

v9:
  -Fix Kconfig dependance error(Reported-by Randy)
  -Fix an uninitialized return value(Colin)

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/md/bcache/Kconfig     |   8 +
 drivers/md/bcache/Makefile    |   2 +-
 drivers/md/bcache/nvm-pages.c | 285 ++++++++++++++++++++++++++++++++++
 drivers/md/bcache/nvm-pages.h |  74 +++++++++
 drivers/md/bcache/super.c     |   3 +
 5 files changed, 371 insertions(+), 1 deletion(-)
 create mode 100644 drivers/md/bcache/nvm-pages.c
 create mode 100644 drivers/md/bcache/nvm-pages.h

diff --git a/drivers/md/bcache/Kconfig b/drivers/md/bcache/Kconfig
index d1ca4d059c20..3057da4cf8ff 100644
--- a/drivers/md/bcache/Kconfig
+++ b/drivers/md/bcache/Kconfig
@@ -35,3 +35,11 @@ config BCACHE_ASYNC_REGISTRATION
 	device path into this file will returns immediately and the real
 	registration work is handled in kernel work queue in asynchronous
 	way.
+
+config BCACHE_NVM_PAGES
+	bool "NVDIMM support for bcache (EXPERIMENTAL)"
+	depends on BCACHE
+	depends on LIBNVDIMM
+	depends on DAX
+	help
+	nvm pages allocator for bcache.
diff --git a/drivers/md/bcache/Makefile b/drivers/md/bcache/Makefile
index 5b87e59676b8..948e5ed2ca66 100644
--- a/drivers/md/bcache/Makefile
+++ b/drivers/md/bcache/Makefile
@@ -4,4 +4,4 @@ obj-$(CONFIG_BCACHE)	+= bcache.o
 
 bcache-y		:= alloc.o bset.o btree.o closure.o debug.o extents.o\
 	io.o journal.o movinggc.o request.o stats.o super.o sysfs.o trace.o\
-	util.o writeback.o features.o
+	util.o writeback.o features.o nvm-pages.o
diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
new file mode 100644
index 000000000000..976ab9002c17
--- /dev/null
+++ b/drivers/md/bcache/nvm-pages.c
@@ -0,0 +1,285 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Nvdimm page-buddy allocator
+ *
+ * Copyright (c) 2021, Intel Corporation.
+ * Copyright (c) 2021, Qiaowei Ren <qiaowei.ren@intel.com>.
+ * Copyright (c) 2021, Jianpeng Ma <jianpeng.ma@intel.com>.
+ */
+
+#ifdef CONFIG_BCACHE_NVM_PAGES
+
+#include "bcache.h"
+#include "nvm-pages.h"
+
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/dax.h>
+#include <linux/pfn_t.h>
+#include <linux/libnvdimm.h>
+#include <linux/mm_types.h>
+#include <linux/err.h>
+#include <linux/pagemap.h>
+#include <linux/bitmap.h>
+#include <linux/blkdev.h>
+
+struct bch_nvm_set *only_set;
+
+static void release_nvm_namespaces(struct bch_nvm_set *nvm_set)
+{
+	int i;
+	struct bch_nvm_namespace *ns;
+
+	for (i = 0; i < nvm_set->total_namespaces_nr; i++) {
+		ns = nvm_set->nss[i];
+		if (ns) {
+			blkdev_put(ns->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXEC);
+			kfree(ns);
+		}
+	}
+
+	kfree(nvm_set->nss);
+}
+
+static void release_nvm_set(struct bch_nvm_set *nvm_set)
+{
+	release_nvm_namespaces(nvm_set);
+	kfree(nvm_set);
+}
+
+static int init_owner_info(struct bch_nvm_namespace *ns)
+{
+	struct bch_owner_list_head *owner_list_head = ns->sb->owner_list_head;
+
+	mutex_lock(&only_set->lock);
+	only_set->owner_list_head = owner_list_head;
+	only_set->owner_list_size = owner_list_head->size;
+	only_set->owner_list_used = owner_list_head->used;
+	mutex_unlock(&only_set->lock);
+
+	return 0;
+}
+
+static bool attach_nvm_set(struct bch_nvm_namespace *ns)
+{
+	bool rc = true;
+
+	mutex_lock(&only_set->lock);
+	if (only_set->nss) {
+		if (memcmp(ns->sb->set_uuid, only_set->set_uuid, 16)) {
+			pr_info("namespace id doesn't match nvm set\n");
+			rc = false;
+			goto unlock;
+		}
+
+		if (only_set->nss[ns->sb->this_namespace_nr]) {
+			pr_info("already has the same position(%d) nvm\n",
+					ns->sb->this_namespace_nr);
+			rc = false;
+			goto unlock;
+		}
+	} else {
+		memcpy(only_set->set_uuid, ns->sb->set_uuid, 16);
+		only_set->total_namespaces_nr = ns->sb->total_namespaces_nr;
+		only_set->nss = kcalloc(only_set->total_namespaces_nr,
+				sizeof(struct bch_nvm_namespace *), GFP_KERNEL);
+		if (!only_set->nss) {
+			rc = false;
+			goto unlock;
+		}
+	}
+
+	only_set->nss[ns->sb->this_namespace_nr] = ns;
+
+unlock:
+	mutex_unlock(&only_set->lock);
+	return rc;
+}
+
+static int read_nvdimm_meta_super(struct block_device *bdev,
+			      struct bch_nvm_namespace *ns)
+{
+	struct page *page;
+	struct bch_nvm_pages_sb *sb;
+
+	page = read_cache_page_gfp(bdev->bd_inode->i_mapping,
+			BCH_NVM_PAGES_SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL);
+
+	if (IS_ERR(page))
+		return -EIO;
+
+	sb = page_address(page) + offset_in_page(BCH_NVM_PAGES_SB_OFFSET);
+
+	/* temporary use for DAX API */
+	ns->page_size = sb->page_size;
+	ns->pages_total = sb->pages_total;
+
+	put_page(page);
+
+	return 0;
+}
+
+struct bch_nvm_namespace *bch_register_namespace(const char *dev_path)
+{
+	struct bch_nvm_namespace *ns;
+	int err;
+	pgoff_t pgoff;
+	char buf[BDEVNAME_SIZE];
+	struct block_device *bdev;
+	uint64_t expected_csum;
+	int id;
+	char *path = NULL;
+
+	path = kstrndup(dev_path, 512, GFP_KERNEL);
+	if (!path) {
+		pr_err("kstrndup failed\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	bdev = blkdev_get_by_path(strim(path),
+				  FMODE_READ|FMODE_WRITE|FMODE_EXEC,
+				  only_set);
+	if (IS_ERR(bdev)) {
+		pr_info("get %s error: %ld\n", dev_path, PTR_ERR(bdev));
+		kfree(path);
+		return ERR_PTR(PTR_ERR(bdev));
+	}
+
+	err = -ENOMEM;
+	ns = kzalloc(sizeof(struct bch_nvm_namespace), GFP_KERNEL);
+	if (!ns)
+		goto bdput;
+
+	err = -EIO;
+	if (read_nvdimm_meta_super(bdev, ns)) {
+		pr_info("%s read nvdimm meta super block failed.\n",
+			bdevname(bdev, buf));
+		goto free_ns;
+	}
+
+	err = -EOPNOTSUPP;
+	if (!bdev_dax_supported(bdev, ns->page_size)) {
+		pr_info("%s don't support DAX\n", bdevname(bdev, buf));
+		goto free_ns;
+	}
+
+	err = -EINVAL;
+	if (bdev_dax_pgoff(bdev, 0, ns->page_size, &pgoff)) {
+		pr_info("invalid offset of %s\n", bdevname(bdev, buf));
+		goto free_ns;
+	}
+
+	err = -ENOMEM;
+	ns->dax_dev = fs_dax_get_by_bdev(bdev);
+	if (!ns->dax_dev) {
+		pr_info("can't by dax device by %s\n", bdevname(bdev, buf));
+		goto free_ns;
+	}
+
+	err = -EINVAL;
+	id = dax_read_lock();
+	if (dax_direct_access(ns->dax_dev, pgoff, ns->pages_total,
+			      &ns->kaddr, &ns->start_pfn) <= 0) {
+		pr_info("dax_direct_access error\n");
+		dax_read_unlock(id);
+		goto free_ns;
+	}
+	dax_read_unlock(id);
+
+	ns->sb = ns->kaddr + BCH_NVM_PAGES_SB_OFFSET;
+
+	if (memcmp(ns->sb->magic, bch_nvm_pages_magic, 16)) {
+		pr_info("invalid bch_nvm_pages_magic\n");
+		goto free_ns;
+	}
+
+	if (ns->sb->sb_offset != BCH_NVM_PAGES_SB_OFFSET) {
+		pr_info("invalid superblock offset\n");
+		goto free_ns;
+	}
+
+	if (ns->sb->total_namespaces_nr != 1) {
+		pr_info("only one nvm device\n");
+		goto free_ns;
+	}
+
+	expected_csum = csum_set(ns->sb);
+	if (expected_csum != ns->sb->csum) {
+		pr_info("csum is not match with expected one\n");
+		goto free_ns;
+	}
+
+	err = -EEXIST;
+	if (!attach_nvm_set(ns))
+		goto free_ns;
+
+	/* Firstly attach */
+	if ((unsigned long)ns->sb->owner_list_head == BCH_NVM_PAGES_OWNER_LIST_HEAD_OFFSET) {
+		struct bch_nvm_pages_owner_head *sys_owner_head;
+		struct bch_nvm_pgalloc_recs *sys_pgalloc_recs;
+
+		ns->sb->owner_list_head = ns->kaddr + BCH_NVM_PAGES_OWNER_LIST_HEAD_OFFSET;
+		sys_pgalloc_recs = ns->kaddr + BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET;
+
+		sys_owner_head = &(ns->sb->owner_list_head->heads[0]);
+		sys_owner_head->recs[0] = sys_pgalloc_recs;
+		ns->sb->csum = csum_set(ns->sb);
+
+		sys_pgalloc_recs->owner = sys_owner_head;
+	} else
+		BUG_ON(ns->sb->owner_list_head !=
+			(ns->kaddr + BCH_NVM_PAGES_OWNER_LIST_HEAD_OFFSET));
+
+	ns->page_size = ns->sb->page_size;
+	ns->pages_offset = ns->sb->pages_offset;
+	ns->pages_total = ns->sb->pages_total;
+	ns->free = 0;
+	ns->bdev = bdev;
+	ns->nvm_set = only_set;
+	mutex_init(&ns->lock);
+
+	if (ns->sb->this_namespace_nr == 0) {
+		pr_info("only first namespace contain owner info\n");
+		err = init_owner_info(ns);
+		if (err < 0) {
+			pr_info("init_owner_info met error %d\n", err);
+			only_set->nss[ns->sb->this_namespace_nr] = NULL;
+			goto free_ns;
+		}
+	}
+
+	kfree(path);
+	return ns;
+free_ns:
+	kfree(ns);
+bdput:
+	blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXEC);
+	kfree(path);
+	return ERR_PTR(err);
+}
+EXPORT_SYMBOL_GPL(bch_register_namespace);
+
+int __init bch_nvm_init(void)
+{
+	only_set = kzalloc(sizeof(*only_set), GFP_KERNEL);
+	if (!only_set)
+		return -ENOMEM;
+
+	only_set->total_namespaces_nr = 0;
+	only_set->owner_list_head = NULL;
+	only_set->nss = NULL;
+
+	mutex_init(&only_set->lock);
+
+	pr_info("bcache nvm init\n");
+	return 0;
+}
+
+void bch_nvm_exit(void)
+{
+	release_nvm_set(only_set);
+	pr_info("bcache nvm exit\n");
+}
+
+#endif /* CONFIG_BCACHE_NVM_PAGES */
diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
new file mode 100644
index 000000000000..87a0d2c46788
--- /dev/null
+++ b/drivers/md/bcache/nvm-pages.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _BCACHE_NVM_PAGES_H
+#define _BCACHE_NVM_PAGES_H
+
+#ifdef CONFIG_BCACHE_NVM_PAGES
+#include <linux/bcache-nvm.h>
+#endif /* CONFIG_BCACHE_NVM_PAGES */
+
+/*
+ * Bcache NVDIMM in memory data structures
+ */
+
+/*
+ * The following three structures in memory records which page(s) allocated
+ * to which owner. After reboot from power failure, they will be initialized
+ * based on nvm pages superblock in NVDIMM device.
+ */
+struct bch_nvm_namespace {
+	struct bch_nvm_pages_sb *sb;
+	void *kaddr;
+
+	u8 uuid[16];
+	u64 free;
+	u32 page_size;
+	u64 pages_offset;
+	u64 pages_total;
+	pfn_t start_pfn;
+
+	struct dax_device *dax_dev;
+	struct block_device *bdev;
+	struct bch_nvm_set *nvm_set;
+
+	struct mutex lock;
+};
+
+/*
+ * A set of namespaces. Currently only one set can be supported.
+ */
+struct bch_nvm_set {
+	u8 set_uuid[16];
+	u32 total_namespaces_nr;
+
+	u32 owner_list_size;
+	u32 owner_list_used;
+	struct bch_owner_list_head *owner_list_head;
+
+	struct bch_nvm_namespace **nss;
+
+	struct mutex lock;
+};
+extern struct bch_nvm_set *only_set;
+
+#ifdef CONFIG_BCACHE_NVM_PAGES
+
+struct bch_nvm_namespace *bch_register_namespace(const char *dev_path);
+int bch_nvm_init(void);
+void bch_nvm_exit(void);
+
+#else
+
+static inline struct bch_nvm_namespace *bch_register_namespace(const char *dev_path)
+{
+	return NULL;
+}
+static inline int bch_nvm_init(void)
+{
+	return 0;
+}
+static inline void bch_nvm_exit(void) { }
+
+#endif /* CONFIG_BCACHE_NVM_PAGES */
+
+#endif /* _BCACHE_NVM_PAGES_H */
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 03e1fe4de53d..0674a76d9454 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -14,6 +14,7 @@
 #include "request.h"
 #include "writeback.h"
 #include "features.h"
+#include "nvm-pages.h"
 
 #include <linux/blkdev.h>
 #include <linux/debugfs.h>
@@ -2816,6 +2817,7 @@ static void bcache_exit(void)
 {
 	bch_debug_exit();
 	bch_request_exit();
+	bch_nvm_exit();
 	if (bcache_kobj)
 		kobject_put(bcache_kobj);
 	if (bcache_wq)
@@ -2914,6 +2916,7 @@ static int __init bcache_init(void)
 
 	bch_debug_init();
 	closure_debug_init();
+	bch_nvm_init();
 
 	bcache_is_reboot = false;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [bch-nvm-pages v9 3/6] bcache: initialization of the buddy
  2021-04-28 21:39 [bch-nvm-pages v9 0/6] nvm page allocator for bcache Qiaowei Ren
  2021-04-28 21:39 ` [bch-nvm-pages v9 1/6] bcache: add initial data structures for nvm pages Qiaowei Ren
  2021-04-28 21:39 ` [bch-nvm-pages v9 2/6] bcache: initialize the nvm pages allocator Qiaowei Ren
@ 2021-04-28 21:39 ` Qiaowei Ren
  2021-05-11  5:35   ` Coly Li
  2021-04-28 21:39 ` [bch-nvm-pages v9 4/6] bcache: bch_nvm_alloc_pages() " Qiaowei Ren
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Qiaowei Ren @ 2021-04-28 21:39 UTC (permalink / raw)
  To: linux-bcache
  Cc: qiaowei.ren, jianpeng.ma, colyli, rdunlap, kernel test robot,
	Dan Carpenter

From: Jianpeng Ma <jianpeng.ma@intel.com>

This nvm pages allocator will implement the simple buddy to manage the
nvm address space. This patch initializes this buddy for new namespace.

the unit of alloc/free of the buddy is page. DAX device has their
struct page(in dram or PMEM).

        struct {        /* ZONE_DEVICE pages */
                /** @pgmap: Points to the hosting device page map. */
                struct dev_pagemap *pgmap;
                void *zone_device_data;
                /*
                 * ZONE_DEVICE private pages are counted as being
                 * mapped so the next 3 words hold the mapping, index,
                 * and private fields from the source anonymous or
                 * page cache page while the page is migrated to device
                 * private memory.
                 * ZONE_DEVICE MEMORY_DEVICE_FS_DAX pages also
                 * use the mapping, index, and private fields when
                 * pmem backed DAX files are mapped.
                 */
        };

ZONE_DEVICE pages only use pgmap. Other 4 words[16/32 bytes] don't use.
So the second/third word will be used as 'struct list_head ' which list
in buddy. The fourth word(that is normal struct page::index) store pgoff
which the page-offset in the dax device. And the fifth word (that is
normal struct page::private) store order of buddy. page_type will be used
to store buddy flags.

V9:
 -Kernel test robot report the build-in u64/u32 in init_owner_info()
  don't work for m68k arch.explicit div_u64() should be used.(Coly fix)
 -GCC report warning: Unsigned variable 'i' can't be negative so it
  is unnecessary to test it. [unsignedPositive] (Reproted-by Dan)

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/nvm-pages.c   | 142 +++++++++++++++++++++++++++++++-
 drivers/md/bcache/nvm-pages.h   |   6 ++
 include/uapi/linux/bcache-nvm.h |  12 ++-
 3 files changed, 153 insertions(+), 7 deletions(-)

diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
index 976ab9002c17..810f65cf756a 100644
--- a/drivers/md/bcache/nvm-pages.c
+++ b/drivers/md/bcache/nvm-pages.c
@@ -34,6 +34,10 @@ static void release_nvm_namespaces(struct bch_nvm_set *nvm_set)
 	for (i = 0; i < nvm_set->total_namespaces_nr; i++) {
 		ns = nvm_set->nss[i];
 		if (ns) {
+			kvfree(ns->pages_bitmap);
+			if (ns->pgalloc_recs_bitmap)
+				bitmap_free(ns->pgalloc_recs_bitmap);
+
 			blkdev_put(ns->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXEC);
 			kfree(ns);
 		}
@@ -48,17 +52,122 @@ static void release_nvm_set(struct bch_nvm_set *nvm_set)
 	kfree(nvm_set);
 }
 
+static struct page *nvm_vaddr_to_page(struct bch_nvm_namespace *ns, void *addr)
+{
+	return virt_to_page(addr);
+}
+
+static void *nvm_pgoff_to_vaddr(struct bch_nvm_namespace *ns, pgoff_t pgoff)
+{
+	return ns->kaddr + (pgoff << PAGE_SHIFT);
+}
+
+static inline void remove_owner_space(struct bch_nvm_namespace *ns,
+					pgoff_t pgoff, u32 nr)
+{
+	bitmap_set(ns->pages_bitmap, pgoff, nr);
+}
+
 static int init_owner_info(struct bch_nvm_namespace *ns)
 {
 	struct bch_owner_list_head *owner_list_head = ns->sb->owner_list_head;
+	struct bch_nvm_pgalloc_recs *sys_recs;
+	int i, j, k, rc = 0;
 
 	mutex_lock(&only_set->lock);
 	only_set->owner_list_head = owner_list_head;
 	only_set->owner_list_size = owner_list_head->size;
 	only_set->owner_list_used = owner_list_head->used;
+
+	/* remove used space */
+	remove_owner_space(ns, 0, div_u64(ns->pages_offset, ns->page_size));
+
+	sys_recs = ns->kaddr + BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET;
+	/* suppose no hole in array */
+	for (i = 0; i < owner_list_head->used; i++) {
+		struct bch_nvm_pages_owner_head *head = &owner_list_head->heads[i];
+
+		for (j = 0; j < BCH_NVM_PAGES_NAMESPACES_MAX; j++) {
+			struct bch_nvm_pgalloc_recs *pgalloc_recs = head->recs[j];
+			unsigned long offset = (unsigned long)ns->kaddr >> PAGE_SHIFT;
+			struct page *page;
+
+			while (pgalloc_recs) {
+				u32 pgalloc_recs_pos = (unsigned long)(pgalloc_recs - sys_recs);
+
+				if (memcmp(pgalloc_recs->magic, bch_nvm_pages_pgalloc_magic, 16)) {
+					pr_info("invalid bch_nvm_pages_pgalloc_magic\n");
+					rc = -EINVAL;
+					goto unlock;
+				}
+				if (memcmp(pgalloc_recs->owner_uuid, head->uuid, 16)) {
+					pr_info("invalid owner_uuid in bch_nvm_pgalloc_recs\n");
+					rc = -EINVAL;
+					goto unlock;
+				}
+				if (pgalloc_recs->owner != head) {
+					pr_info("invalid owner in bch_nvm_pgalloc_recs\n");
+					rc = -EINVAL;
+					goto unlock;
+				}
+
+				/* recs array can has hole */
+				for (k = 0; k < pgalloc_recs->size; k++) {
+					struct bch_pgalloc_rec *rec = &pgalloc_recs->recs[k];
+
+					if (rec->pgoff) {
+						BUG_ON(rec->pgoff <= offset);
+
+						/* init struct page: index/private */
+						page = nvm_vaddr_to_page(ns,
+							BCH_PGOFF_TO_KVADDR(rec->pgoff));
+
+						set_page_private(page, rec->order);
+						page->index = rec->pgoff - offset;
+
+						remove_owner_space(ns,
+							rec->pgoff - offset,
+							1 << rec->order);
+					}
+				}
+				bitmap_set(ns->pgalloc_recs_bitmap, pgalloc_recs_pos, 1);
+				pgalloc_recs = pgalloc_recs->next;
+			}
+		}
+	}
+unlock:
 	mutex_unlock(&only_set->lock);
 
-	return 0;
+	return rc;
+}
+
+static void init_nvm_free_space(struct bch_nvm_namespace *ns)
+{
+	unsigned int start, end, pages;
+	int i;
+	struct page *page;
+	pgoff_t pgoff_start;
+
+	bitmap_for_each_clear_region(ns->pages_bitmap, start, end, 0, ns->pages_total) {
+		pgoff_start = start;
+		pages = end - start;
+
+		while (pages) {
+			for (i = BCH_MAX_ORDER - 1; i >= 0 ; i--) {
+				if ((pgoff_start % (1 << i) == 0) && (pages >= (1 << i)))
+					break;
+			}
+
+			page = nvm_vaddr_to_page(ns, nvm_pgoff_to_vaddr(ns, pgoff_start));
+			page->index = pgoff_start;
+			set_page_private(page, i);
+			__SetPageBuddy(page);
+			list_add((struct list_head *)&page->zone_device_data, &ns->free_area[i]);
+
+			pgoff_start += 1 << i;
+			pages -= 1 << i;
+		}
+	}
 }
 
 static bool attach_nvm_set(struct bch_nvm_namespace *ns)
@@ -123,7 +232,7 @@ static int read_nvdimm_meta_super(struct block_device *bdev,
 struct bch_nvm_namespace *bch_register_namespace(const char *dev_path)
 {
 	struct bch_nvm_namespace *ns;
-	int err;
+	int i, err;
 	pgoff_t pgoff;
 	char buf[BDEVNAME_SIZE];
 	struct block_device *bdev;
@@ -239,18 +348,43 @@ struct bch_nvm_namespace *bch_register_namespace(const char *dev_path)
 	ns->nvm_set = only_set;
 	mutex_init(&ns->lock);
 
+	ns->pages_bitmap = kvcalloc(BITS_TO_LONGS(ns->pages_total),
+					sizeof(unsigned long), GFP_KERNEL);
+	if (!ns->pages_bitmap) {
+		err = -ENOMEM;
+		goto clear_ns_nr;
+	}
+
+	if (ns->sb->this_namespace_nr == 0) {
+		ns->pgalloc_recs_bitmap = bitmap_zalloc(BCH_MAX_PGALLOC_RECS, GFP_KERNEL);
+		if (ns->pgalloc_recs_bitmap == NULL) {
+			err = -ENOMEM;
+			goto free_pages_bitmap;
+		}
+	}
+
+	for (i = 0; i < BCH_MAX_ORDER; i++)
+		INIT_LIST_HEAD(&ns->free_area[i]);
+
 	if (ns->sb->this_namespace_nr == 0) {
 		pr_info("only first namespace contain owner info\n");
 		err = init_owner_info(ns);
 		if (err < 0) {
 			pr_info("init_owner_info met error %d\n", err);
-			only_set->nss[ns->sb->this_namespace_nr] = NULL;
-			goto free_ns;
+			goto free_recs_bitmap;
 		}
+		/* init buddy allocator */
+		init_nvm_free_space(ns);
 	}
 
 	kfree(path);
 	return ns;
+free_recs_bitmap:
+	bitmap_free(ns->pgalloc_recs_bitmap);
+free_pages_bitmap:
+	kvfree(ns->pages_bitmap);
+clear_ns_nr:
+	only_set->nss[ns->sb->this_namespace_nr] = NULL;
 free_ns:
 	kfree(ns);
 bdput:
diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
index 87a0d2c46788..e08864af96a0 100644
--- a/drivers/md/bcache/nvm-pages.h
+++ b/drivers/md/bcache/nvm-pages.h
@@ -16,6 +16,7 @@
  * to which owner. After reboot from power failure, they will be initialized
  * based on nvm pages superblock in NVDIMM device.
  */
+#define BCH_MAX_ORDER 20
 struct bch_nvm_namespace {
 	struct bch_nvm_pages_sb *sb;
 	void *kaddr;
@@ -27,6 +28,11 @@ struct bch_nvm_namespace {
 	u64 pages_total;
 	pfn_t start_pfn;
 
+	unsigned long *pages_bitmap;
+	struct list_head free_area[BCH_MAX_ORDER];
+
+	unsigned long *pgalloc_recs_bitmap;
+
 	struct dax_device *dax_dev;
 	struct block_device *bdev;
 	struct bch_nvm_set *nvm_set;
diff --git a/include/uapi/linux/bcache-nvm.h b/include/uapi/linux/bcache-nvm.h
index 1d62e9f0d4bf..7df0934d8756 100644
--- a/include/uapi/linux/bcache-nvm.h
+++ b/include/uapi/linux/bcache-nvm.h
@@ -114,6 +114,8 @@ struct bch_pgalloc_rec {
 	__u64	reserved:6;
 };
 
+#define BCH_PGOFF_TO_KVADDR(pgoff) ((void *)((unsigned long)pgoff << PAGE_SHIFT))
+
 struct bch_nvm_pgalloc_recs {
 union {
 	struct {
@@ -130,11 +132,15 @@ union {
 };
 };
 
-#define BCH_MAX_RECS					\
-	((sizeof(struct bch_nvm_pgalloc_recs) -		\
-	 offsetof(struct bch_nvm_pgalloc_recs, recs)) /	\
+#define BCH_MAX_RECS							\
+	((sizeof(struct bch_nvm_pgalloc_recs) -				\
+	 offsetof(struct bch_nvm_pgalloc_recs, recs)) /			\
 	 sizeof(struct bch_pgalloc_rec))
 
+#define BCH_MAX_PGALLOC_RECS						\
+	((BCH_NVM_PAGES_OFFSET - BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET) /	\
+	 sizeof(struct bch_nvm_pgalloc_recs))
+
 struct bch_nvm_pages_owner_head {
 	unsigned char			uuid[16];
 	unsigned char			label[BCH_NVM_PAGES_LABEL_SIZE];
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [bch-nvm-pages v9 4/6] bcache: bch_nvm_alloc_pages() of the buddy
  2021-04-28 21:39 [bch-nvm-pages v9 0/6] nvm page allocator for bcache Qiaowei Ren
                   ` (2 preceding siblings ...)
  2021-04-28 21:39 ` [bch-nvm-pages v9 3/6] bcache: initialization of the buddy Qiaowei Ren
@ 2021-04-28 21:39 ` Qiaowei Ren
  2021-05-11 12:49   ` Coly Li
  2021-04-28 21:39 ` [bch-nvm-pages v9 5/6] bcache: bch_nvm_free_pages() " Qiaowei Ren
  2021-04-28 21:39 ` [bch-nvm-pages v9 6/6] bcache: get allocated pages from specific owner Qiaowei Ren
  5 siblings, 1 reply; 17+ messages in thread
From: Qiaowei Ren @ 2021-04-28 21:39 UTC (permalink / raw)
  To: linux-bcache; +Cc: qiaowei.ren, jianpeng.ma, colyli, rdunlap

From: Jianpeng Ma <jianpeng.ma@intel.com>

This patch implements the bch_nvm_alloc_pages() of the buddy.
In terms of function, this func is like current-page-buddy-alloc.
But the differences are:
a: it need owner_uuid as parameter which record owner info. And it
make those info persistence.
b: it don't need flags like GFP_*. All allocs are the equal.
c: it don't trigger other ops etc swap/recycle.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
[colyli: fix typo in commit log]
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/nvm-pages.c | 172 ++++++++++++++++++++++++++++++++++
 drivers/md/bcache/nvm-pages.h |   6 ++
 2 files changed, 178 insertions(+)

diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
index 810f65cf756a..2647ff997fab 100644
--- a/drivers/md/bcache/nvm-pages.c
+++ b/drivers/md/bcache/nvm-pages.c
@@ -68,6 +68,178 @@ static inline void remove_owner_space(struct bch_nvm_namespace *ns,
 	bitmap_set(ns->pages_bitmap, pgoff, nr);
 }
 
+/* If not found, it will create if create == true */
+static struct bch_nvm_pages_owner_head *find_owner_head(const char *owner_uuid, bool create)
+{
+	struct bch_owner_list_head *owner_list_head = only_set->owner_list_head;
+	struct bch_nvm_pages_owner_head *owner_head = NULL;
+	int i;
+
+	if (owner_list_head == NULL)
+		goto out;
+
+	for (i = 0; i < only_set->owner_list_used; i++) {
+		if (!memcmp(owner_uuid, owner_list_head->heads[i].uuid, 16)) {
+			owner_head = &(owner_list_head->heads[i]);
+			break;
+		}
+	}
+
+	if (!owner_head && create) {
+		int used = only_set->owner_list_used;
+
+		BUG_ON((used > 0) && (only_set->owner_list_size == used));
+		memcpy_flushcache(owner_list_head->heads[used].uuid, owner_uuid, 16);
+		only_set->owner_list_used++;
+
+		owner_list_head->used++;
+		owner_head = &(owner_list_head->heads[used]);
+	}
+
+out:
+	return owner_head;
+}
+
+static struct bch_nvm_pgalloc_recs *find_empty_pgalloc_recs(void)
+{
+	unsigned int start;
+	struct bch_nvm_namespace *ns = only_set->nss[0];
+	struct bch_nvm_pgalloc_recs *recs;
+
+	start = bitmap_find_next_zero_area(ns->pgalloc_recs_bitmap, BCH_MAX_PGALLOC_RECS, 0, 1, 0);
+	if (start > BCH_MAX_PGALLOC_RECS) {
+		pr_info("no free struct bch_nvm_pgalloc_recs\n");
+		return NULL;
+	}
+
+	bitmap_set(ns->pgalloc_recs_bitmap, start, 1);
+	recs = (struct bch_nvm_pgalloc_recs *)(ns->kaddr + BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET)
+		+ start;
+	return recs;
+}
+
+static struct bch_nvm_pgalloc_recs *find_nvm_pgalloc_recs(struct bch_nvm_namespace *ns,
+		struct bch_nvm_pages_owner_head *owner_head, bool create)
+{
+	int ns_nr = ns->sb->this_namespace_nr;
+	struct bch_nvm_pgalloc_recs *prev_recs = NULL, *recs = owner_head->recs[ns_nr];
+
+	/* If create=false, we return recs[nr] */
+	if (!create)
+		return recs;
+
+	/*
+	 * If create=true, it mean we need a empty struct bch_pgalloc_rec
+	 * So we should find non-empty struct bch_nvm_pgalloc_recs or alloc
+	 * new struct bch_nvm_pgalloc_recs. And return this bch_nvm_pgalloc_recs
+	 */
+	while (recs && (recs->used == recs->size)) {
+		prev_recs = recs;
+		recs = recs->next;
+	}
+
+	/* Found empty struct bch_nvm_pgalloc_recs */
+	if (recs)
+		return recs;
+	/* Need alloc new struct bch_nvm_galloc_recs */
+	recs = find_empty_pgalloc_recs();
+	if (recs) {
+		recs->next = NULL;
+		recs->owner = owner_head;
+		strncpy(recs->magic, bch_nvm_pages_pgalloc_magic, 16);
+		strncpy(recs->owner_uuid, owner_head->uuid, 16);
+		recs->size = BCH_MAX_RECS;
+		recs->used = 0;
+
+		if (prev_recs)
+			prev_recs->next = recs;
+		else
+			owner_head->recs[ns_nr] = recs;
+	}
+
+	return recs;
+}
+
+static void add_pgalloc_rec(struct bch_nvm_pgalloc_recs *recs, void *kaddr, int order)
+{
+	int i;
+
+	for (i = 0; i < recs->size; i++) {
+		if (recs->recs[i].pgoff == 0) {
+			recs->recs[i].pgoff = (unsigned long)kaddr >> PAGE_SHIFT;
+			recs->recs[i].order = order;
+			recs->used++;
+			break;
+		}
+	}
+	BUG_ON(i == recs->size);
+}
+
+void *bch_nvm_alloc_pages(int order, const char *owner_uuid)
+{
+	void *kaddr = NULL;
+	struct bch_nvm_pgalloc_recs *pgalloc_recs;
+	struct bch_nvm_pages_owner_head *owner_head;
+	int i, j;
+
+	mutex_lock(&only_set->lock);
+	owner_head = find_owner_head(owner_uuid, true);
+
+	if (!owner_head) {
+		pr_err("can't find bch_nvm_pgalloc_recs by(uuid=%s)\n", owner_uuid);
+		goto unlock;
+	}
+
+	for (j = 0; j < only_set->total_namespaces_nr; j++) {
+		struct bch_nvm_namespace *ns = only_set->nss[j];
+
+		if (!ns || (ns->free < (1 << order)))
+			continue;
+
+		for (i = order; i < BCH_MAX_ORDER; i++) {
+			struct list_head *list;
+			struct page *page, *buddy_page;
+
+			if (list_empty(&ns->free_area[i]))
+				continue;
+
+			list = ns->free_area[i].next;
+			page = container_of((void *)list, struct page, zone_device_data);
+
+			list_del(list);
+
+			while (i != order) {
+				buddy_page = nvm_vaddr_to_page(ns,
+					nvm_pgoff_to_vaddr(ns, page->index + (1 << (i - 1))));
+				set_page_private(buddy_page, i - 1);
+				buddy_page->index = page->index + (1 << (i - 1));
+				__SetPageBuddy(buddy_page);
+				list_add((struct list_head *)&buddy_page->zone_device_data,
+					&ns->free_area[i - 1]);
+				i--;
+			}
+
+			set_page_private(page, order);
+			__ClearPageBuddy(page);
+			ns->free -= 1 << order;
+			kaddr = nvm_pgoff_to_vaddr(ns, page->index);
+			break;
+		}
+
+		if (i != BCH_MAX_ORDER) {
+			pgalloc_recs = find_nvm_pgalloc_recs(ns, owner_head, true);
+			/* ToDo: handle pgalloc_recs==NULL */
+			add_pgalloc_rec(pgalloc_recs, kaddr, order);
+			break;
+		}
+	}
+
+unlock:
+	mutex_unlock(&only_set->lock);
+	return kaddr;
+}
+EXPORT_SYMBOL_GPL(bch_nvm_alloc_pages);
+
 static int init_owner_info(struct bch_nvm_namespace *ns)
 {
 	struct bch_owner_list_head *owner_list_head = ns->sb->owner_list_head;
diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
index e08864af96a0..4fd5205146a2 100644
--- a/drivers/md/bcache/nvm-pages.h
+++ b/drivers/md/bcache/nvm-pages.h
@@ -62,6 +62,7 @@ extern struct bch_nvm_set *only_set;
 struct bch_nvm_namespace *bch_register_namespace(const char *dev_path);
 int bch_nvm_init(void);
 void bch_nvm_exit(void);
+void *bch_nvm_alloc_pages(int order, const char *owner_uuid);
 
 #else
 
@@ -74,6 +75,11 @@ static inline int bch_nvm_init(void)
 	return 0;
 }
 static inline void bch_nvm_exit(void) { }
+static inline void *bch_nvm_alloc_pages(int order, const char *owner_uuid)
+{
+	return NULL;
+}
+
 
 #endif /* CONFIG_BCACHE_NVM_PAGES */
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [bch-nvm-pages v9 5/6] bcache: bch_nvm_free_pages() of the buddy
  2021-04-28 21:39 [bch-nvm-pages v9 0/6] nvm page allocator for bcache Qiaowei Ren
                   ` (3 preceding siblings ...)
  2021-04-28 21:39 ` [bch-nvm-pages v9 4/6] bcache: bch_nvm_alloc_pages() " Qiaowei Ren
@ 2021-04-28 21:39 ` Qiaowei Ren
  2021-05-11 13:41   ` Coly Li
  2021-04-28 21:39 ` [bch-nvm-pages v9 6/6] bcache: get allocated pages from specific owner Qiaowei Ren
  5 siblings, 1 reply; 17+ messages in thread
From: Qiaowei Ren @ 2021-04-28 21:39 UTC (permalink / raw)
  To: linux-bcache; +Cc: qiaowei.ren, jianpeng.ma, colyli, rdunlap

From: Jianpeng Ma <jianpeng.ma@intel.com>

This patch implements the bch_nvm_free_pages() of the buddy.

The difference between this and page-buddy-free:
it need owner_uuid to free owner allocated pages.And must
persistent after free.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
[colyli: fix typo in commit log]
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/nvm-pages.c | 164 ++++++++++++++++++++++++++++++++--
 drivers/md/bcache/nvm-pages.h |   3 +-
 2 files changed, 159 insertions(+), 8 deletions(-)

diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
index 2647ff997fab..39807046ecce 100644
--- a/drivers/md/bcache/nvm-pages.c
+++ b/drivers/md/bcache/nvm-pages.c
@@ -52,7 +52,7 @@ static void release_nvm_set(struct bch_nvm_set *nvm_set)
 	kfree(nvm_set);
 }
 
-static struct page *nvm_vaddr_to_page(struct bch_nvm_namespace *ns, void *addr)
+static struct page *nvm_vaddr_to_page(void *addr)
 {
 	return virt_to_page(addr);
 }
@@ -175,6 +175,155 @@ static void add_pgalloc_rec(struct bch_nvm_pgalloc_recs *recs, void *kaddr, int
 	BUG_ON(i == recs->size);
 }
 
+static inline void *nvm_end_addr(struct bch_nvm_namespace *ns)
+{
+	return ns->kaddr + (ns->pages_total << PAGE_SHIFT);
+}
+
+static inline bool in_nvm_range(struct bch_nvm_namespace *ns,
+		void *start_addr, void *end_addr)
+{
+	return (start_addr >= ns->kaddr) && (end_addr <= nvm_end_addr(ns));
+}
+
+static struct bch_nvm_namespace *find_nvm_by_addr(void *addr, int order)
+{
+	int i;
+	struct bch_nvm_namespace *ns;
+
+	for (i = 0; i < only_set->total_namespaces_nr; i++) {
+		ns = only_set->nss[i];
+		if (ns && in_nvm_range(ns, addr, addr + (1 << order)))
+			return ns;
+	}
+	return NULL;
+}
+
+static int remove_pgalloc_rec(struct bch_nvm_pgalloc_recs *pgalloc_recs, int ns_nr,
+				void *kaddr, int order)
+{
+	struct bch_nvm_pages_owner_head *owner_head = pgalloc_recs->owner;
+	struct bch_nvm_pgalloc_recs *prev_recs, *sys_recs;
+	u64 pgoff = (unsigned long)kaddr >> PAGE_SHIFT;
+	struct bch_nvm_namespace *ns = only_set->nss[0];
+	int i;
+
+	prev_recs = pgalloc_recs;
+	sys_recs = ns->kaddr + BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET;
+	while (pgalloc_recs) {
+		for (i = 0; i < pgalloc_recs->size; i++) {
+			struct bch_pgalloc_rec *rec = &(pgalloc_recs->recs[i]);
+
+			if (rec->pgoff == pgoff) {
+				WARN_ON(rec->order != order);
+				rec->pgoff = 0;
+				rec->order = 0;
+				pgalloc_recs->used--;
+
+				if (pgalloc_recs->used == 0) {
+					int recs_pos = pgalloc_recs - sys_recs;
+
+					if (pgalloc_recs == prev_recs)
+						owner_head->recs[ns_nr] = pgalloc_recs->next;
+					else
+						prev_recs->next = pgalloc_recs->next;
+
+					pgalloc_recs->next = NULL;
+					pgalloc_recs->owner = NULL;
+
+					bitmap_clear(ns->pgalloc_recs_bitmap, recs_pos, 1);
+				}
+				goto exit;
+			}
+		}
+		prev_recs = pgalloc_recs;
+		pgalloc_recs = pgalloc_recs->next;
+	}
+exit:
+	return pgalloc_recs ? 0 : -ENOENT;
+}
+
+static void __free_space(struct bch_nvm_namespace *ns, void *addr, int order)
+{
+	unsigned int add_pages = (1 << order);
+	pgoff_t pgoff;
+	struct page *page;
+
+	page = nvm_vaddr_to_page(addr);
+	WARN_ON((!page) || (page->private != order));
+	pgoff = page->index;
+
+	while (order < BCH_MAX_ORDER - 1) {
+		struct page *buddy_page;
+
+		pgoff_t buddy_pgoff = pgoff ^ (1 << order);
+		pgoff_t parent_pgoff = pgoff & ~(1 << order);
+
+		if ((parent_pgoff + (1 << (order + 1)) > ns->pages_total))
+			break;
+
+		buddy_page = nvm_vaddr_to_page(nvm_pgoff_to_vaddr(ns, buddy_pgoff));
+		WARN_ON(!buddy_page);
+
+		if (PageBuddy(buddy_page) && (buddy_page->private == order)) {
+			list_del((struct list_head *)&buddy_page->zone_device_data);
+			__ClearPageBuddy(buddy_page);
+			pgoff = parent_pgoff;
+			order++;
+			continue;
+		}
+		break;
+	}
+
+	page = nvm_vaddr_to_page(nvm_pgoff_to_vaddr(ns, pgoff));
+	WARN_ON(!page);
+	list_add((struct list_head *)&page->zone_device_data, &ns->free_area[order]);
+	page->index = pgoff;
+	set_page_private(page, order);
+	__SetPageBuddy(page);
+	ns->free += add_pages;
+}
+
+void bch_nvm_free_pages(void *addr, int order, const char *owner_uuid)
+{
+	struct bch_nvm_namespace *ns;
+	struct bch_nvm_pages_owner_head *owner_head;
+	struct bch_nvm_pgalloc_recs *pgalloc_recs;
+	int r;
+
+	mutex_lock(&only_set->lock);
+
+	ns = find_nvm_by_addr(addr, order);
+	if (!ns) {
+		pr_info("can't find nvm_dev by kaddr %p\n", addr);
+		goto unlock;
+	}
+
+	owner_head = find_owner_head(owner_uuid, false);
+	if (!owner_head) {
+		pr_info("can't found bch_nvm_pages_owner_head by(uuid=%s)\n", owner_uuid);
+		goto unlock;
+	}
+
+	pgalloc_recs = find_nvm_pgalloc_recs(ns, owner_head, false);
+	if (!pgalloc_recs) {
+		pr_info("can't find bch_nvm_pgalloc_recs by(uuid=%s)\n", owner_uuid);
+		goto unlock;
+	}
+
+	r = remove_pgalloc_rec(pgalloc_recs, ns->sb->this_namespace_nr, addr, order);
+	if (r < 0) {
+		pr_info("can't find bch_pgalloc_rec\n");
+		goto unlock;
+	}
+
+	__free_space(ns, addr, order);
+
+unlock:
+	mutex_unlock(&only_set->lock);
+}
+EXPORT_SYMBOL_GPL(bch_nvm_free_pages);
+
 void *bch_nvm_alloc_pages(int order, const char *owner_uuid)
 {
 	void *kaddr = NULL;
@@ -209,7 +358,7 @@ void *bch_nvm_alloc_pages(int order, const char *owner_uuid)
 			list_del(list);
 
 			while (i != order) {
-				buddy_page = nvm_vaddr_to_page(ns,
+				buddy_page = nvm_vaddr_to_page(
 					nvm_pgoff_to_vaddr(ns, page->index + (1 << (i - 1))));
 				set_page_private(buddy_page, i - 1);
 				buddy_page->index = page->index + (1 << (i - 1));
@@ -291,7 +440,7 @@ static int init_owner_info(struct bch_nvm_namespace *ns)
 						BUG_ON(rec->pgoff <= offset);
 
 						/* init struct page: index/private */
-						page = nvm_vaddr_to_page(ns,
+						page = nvm_vaddr_to_page(
 							BCH_PGOFF_TO_KVADDR(rec->pgoff));
 
 						set_page_private(page, rec->order);
@@ -330,11 +479,12 @@ static void init_nvm_free_space(struct bch_nvm_namespace *ns)
 					break;
 			}
 
-			page = nvm_vaddr_to_page(ns, nvm_pgoff_to_vaddr(ns, pgoff_start));
+			page = nvm_vaddr_to_page(nvm_pgoff_to_vaddr(ns, pgoff_start));
 			page->index = pgoff_start;
 			set_page_private(page, i);
-			__SetPageBuddy(page);
-			list_add((struct list_head *)&page->zone_device_data, &ns->free_area[i]);
+
+			/* in order to update ns->free */
+			__free_space(ns, nvm_pgoff_to_vaddr(ns, pgoff_start), i);
 
 			pgoff_start += 1 << i;
 			pages -= 1 << i;
@@ -515,7 +665,7 @@ struct bch_nvm_namespace *bch_register_namespace(const char *dev_path)
 	ns->page_size = ns->sb->page_size;
 	ns->pages_offset = ns->sb->pages_offset;
 	ns->pages_total = ns->sb->pages_total;
-	ns->free = 0;
+	ns->free = 0; /* increase by __free_space() */
 	ns->bdev = bdev;
 	ns->nvm_set = only_set;
 	mutex_init(&ns->lock);
diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
index 4fd5205146a2..918aee6a9afc 100644
--- a/drivers/md/bcache/nvm-pages.h
+++ b/drivers/md/bcache/nvm-pages.h
@@ -63,6 +63,7 @@ struct bch_nvm_namespace *bch_register_namespace(const char *dev_path);
 int bch_nvm_init(void);
 void bch_nvm_exit(void);
 void *bch_nvm_alloc_pages(int order, const char *owner_uuid);
+void bch_nvm_free_pages(void *addr, int order, const char *owner_uuid);
 
 #else
 
@@ -79,7 +80,7 @@ static inline void *bch_nvm_alloc_pages(int order, const char *owner_uuid)
 {
 	return NULL;
 }
-
+static inline void bch_nvm_free_pages(void *addr, int order, const char *owner_uuid) { }
 
 #endif /* CONFIG_BCACHE_NVM_PAGES */
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [bch-nvm-pages v9 6/6] bcache: get allocated pages from specific owner
  2021-04-28 21:39 [bch-nvm-pages v9 0/6] nvm page allocator for bcache Qiaowei Ren
                   ` (4 preceding siblings ...)
  2021-04-28 21:39 ` [bch-nvm-pages v9 5/6] bcache: bch_nvm_free_pages() " Qiaowei Ren
@ 2021-04-28 21:39 ` Qiaowei Ren
  2021-05-11 13:45   ` Coly Li
  5 siblings, 1 reply; 17+ messages in thread
From: Qiaowei Ren @ 2021-04-28 21:39 UTC (permalink / raw)
  To: linux-bcache; +Cc: qiaowei.ren, jianpeng.ma, colyli, rdunlap

From: Jianpeng Ma <jianpeng.ma@intel.com>

This patch implements bch_get_allocated_pages() of the buddy to be used to
get allocated pages from specific owner.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/nvm-pages.c | 6 ++++++
 drivers/md/bcache/nvm-pages.h | 5 +++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
index 39807046ecce..0be89a03255c 100644
--- a/drivers/md/bcache/nvm-pages.c
+++ b/drivers/md/bcache/nvm-pages.c
@@ -389,6 +389,12 @@ void *bch_nvm_alloc_pages(int order, const char *owner_uuid)
 }
 EXPORT_SYMBOL_GPL(bch_nvm_alloc_pages);
 
+struct bch_nvm_pages_owner_head *bch_get_allocated_pages(const char *owner_uuid)
+{
+	return find_owner_head(owner_uuid, false);
+}
+EXPORT_SYMBOL_GPL(bch_get_allocated_pages);
+
 static int init_owner_info(struct bch_nvm_namespace *ns)
 {
 	struct bch_owner_list_head *owner_list_head = ns->sb->owner_list_head;
diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
index 918aee6a9afc..cfb3e8ef01ee 100644
--- a/drivers/md/bcache/nvm-pages.h
+++ b/drivers/md/bcache/nvm-pages.h
@@ -64,6 +64,7 @@ int bch_nvm_init(void);
 void bch_nvm_exit(void);
 void *bch_nvm_alloc_pages(int order, const char *owner_uuid);
 void bch_nvm_free_pages(void *addr, int order, const char *owner_uuid);
+struct bch_nvm_pages_owner_head *bch_get_allocated_pages(const char *owner_uuid);
 
 #else
 
@@ -81,6 +82,10 @@ static inline void *bch_nvm_alloc_pages(int order, const char *owner_uuid)
 	return NULL;
 }
 static inline void bch_nvm_free_pages(void *addr, int order, const char *owner_uuid) { }
+static inline struct bch_nvm_pages_owner_head *bch_get_allocated_pages(const char *owner_uuid)
+{
+	return NULL;
+}
 
 #endif /* CONFIG_BCACHE_NVM_PAGES */
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [bch-nvm-pages v9 2/6] bcache: initialize the nvm pages allocator
  2021-04-28 16:29   ` Randy Dunlap
@ 2021-04-29  0:13     ` Ma, Jianpeng
  0 siblings, 0 replies; 17+ messages in thread
From: Ma, Jianpeng @ 2021-04-29  0:13 UTC (permalink / raw)
  To: Randy Dunlap, Ren, Qiaowei, linux-bcache; +Cc: colyli, rdunlap, Colin Ian King

> -----Original Message-----
> From: Randy Dunlap <rdunlap@infradead.org>
> Sent: Thursday, April 29, 2021 12:30 AM
> To: Ren, Qiaowei <qiaowei.ren@intel.com>; linux-bcache@vger.kernel.org
> Cc: Ma, Jianpeng <jianpeng.ma@intel.com>; colyli@suse.de;
> rdunlap@infradead.oom; Colin Ian King <colin.king@canonical.com>
> Subject: Re: [bch-nvm-pages v9 2/6] bcache: initialize the nvm pages allocator
> 
> On 4/28/21 2:39 PM, Qiaowei Ren wrote:
> > From: Jianpeng Ma <jianpeng.ma@intel.com>
> >
> > This patch define the prototype data structures in memory and
> > initializes the nvm pages allocator.
> >
> > The nvm address space which is managed by this allocatior can consist
> > of many nvm namespaces, and some namespaces can compose into one
> nvm
> > set, like cache set. For this initial implementation, only one set can
> > be supported.
> >
> > The users of this nvm pages allocator need to call
> > regiseter_namespace() to register the nvdimm device (like /dev/pmemX)
> > into this allocator as the instance of struct nvm_namespace.
> >
> > v9:
> >   -Fix Kconfig dependance error(Reported-by Randy)
> >   -Fix an uninitialized return value(Colin)
> >
> > Reported-by: Randy Dunlap <rdunlap@infradead.org>
> > Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
> > Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
> > Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
> > Signed-off-by: Coly Li <colyli@suse.de>
> > Signed-off-by: Colin Ian King <colin.king@canonical.com>
> > ---
> >  drivers/md/bcache/Kconfig     |   8 +
> >  drivers/md/bcache/Makefile    |   2 +-
> >  drivers/md/bcache/nvm-pages.c | 285
> > ++++++++++++++++++++++++++++++++++
> >  drivers/md/bcache/nvm-pages.h |  74 +++++++++
> >  drivers/md/bcache/super.c     |   3 +
> >  5 files changed, 371 insertions(+), 1 deletion(-)  create mode 100644
> > drivers/md/bcache/nvm-pages.c  create mode 100644
> > drivers/md/bcache/nvm-pages.h
> >
> > diff --git a/drivers/md/bcache/Kconfig b/drivers/md/bcache/Kconfig
> > index d1ca4d059c20..3057da4cf8ff 100644
> > --- a/drivers/md/bcache/Kconfig
> > +++ b/drivers/md/bcache/Kconfig
> > @@ -35,3 +35,11 @@ config BCACHE_ASYNC_REGISTRATION
> >  	device path into this file will returns immediately and the real
> >  	registration work is handled in kernel work queue in asynchronous
> >  	way.
> > +
> > +config BCACHE_NVM_PAGES
> > +	bool "NVDIMM support for bcache (EXPERIMENTAL)"
> > +	depends on BCACHE
> > +	depends on LIBNVDIMM
> > +	depends on DAX
> > +	help
> > +	nvm pages allocator for bcache.
> 
> Please follow coding-style for Kconfig files:
> 
> (from Documentation/process/coding-style.rst, section 10):
> 
> For all of the Kconfig* configuration files throughout the source tree, the
> indentation is somewhat different.  Lines under a ``config`` definition are
> indented with one tab, while help text is indented an additional two spaces.
> 
> 
> Also, that help text could be better.

Hi Randy:
     Thanks very much! I'll change in next patch set.

Thanks!
Jianpeng
> --
> ~Randy


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [bch-nvm-pages v9 2/6] bcache: initialize the nvm pages allocator
  2021-04-28 21:39 ` [bch-nvm-pages v9 2/6] bcache: initialize the nvm pages allocator Qiaowei Ren
  2021-04-28 16:29   ` Randy Dunlap
  2021-04-28 17:53   ` Randy Dunlap
@ 2021-05-11  3:53   ` Coly Li
  2 siblings, 0 replies; 17+ messages in thread
From: Coly Li @ 2021-05-11  3:53 UTC (permalink / raw)
  To: Qiaowei Ren, jianpeng.ma; +Cc: linux-bcache

On 4/29/21 5:39 AM, Qiaowei Ren wrote:
> From: Jianpeng Ma <jianpeng.ma@intel.com>
> 
> This patch define the prototype data structures in memory and initializes
> the nvm pages allocator.
> 
> The nvm address space which is managed by this allocatior can consist of
> many nvm namespaces, and some namespaces can compose into one nvm set,
> like cache set. For this initial implementation, only one set can be
> supported.
> 
> The users of this nvm pages allocator need to call regiseter_namespace()
> to register the nvdimm device (like /dev/pmemX) into this allocator as
> the instance of struct nvm_namespace.
> 
> v9:
>   -Fix Kconfig dependance error(Reported-by Randy)
>   -Fix an uninitialized return value(Colin)
> 
> Reported-by: Randy Dunlap <rdunlap@infradead.org>
> Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
> Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
> Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
> Signed-off-by: Coly Li <colyli@suse.de>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  drivers/md/bcache/Kconfig     |   8 +
>  drivers/md/bcache/Makefile    |   2 +-
>  drivers/md/bcache/nvm-pages.c | 285 ++++++++++++++++++++++++++++++++++
>  drivers/md/bcache/nvm-pages.h |  74 +++++++++
>  drivers/md/bcache/super.c     |   3 +
>  5 files changed, 371 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/md/bcache/nvm-pages.c
>  create mode 100644 drivers/md/bcache/nvm-pages.h
> 
> diff --git a/drivers/md/bcache/Kconfig b/drivers/md/bcache/Kconfig
> index d1ca4d059c20..3057da4cf8ff 100644
> --- a/drivers/md/bcache/Kconfig
> +++ b/drivers/md/bcache/Kconfig
> @@ -35,3 +35,11 @@ config BCACHE_ASYNC_REGISTRATION
>  	device path into this file will returns immediately and the real
>  	registration work is handled in kernel work queue in asynchronous
>  	way.
> +
> +config BCACHE_NVM_PAGES
> +	bool "NVDIMM support for bcache (EXPERIMENTAL)"
> +	depends on BCACHE
> +	depends on LIBNVDIMM
> +	depends on DAX
> +	help
> +	nvm pages allocator for bcache.
> diff --git a/drivers/md/bcache/Makefile b/drivers/md/bcache/Makefile
> index 5b87e59676b8..948e5ed2ca66 100644
> --- a/drivers/md/bcache/Makefile
> +++ b/drivers/md/bcache/Makefile
> @@ -4,4 +4,4 @@ obj-$(CONFIG_BCACHE)	+= bcache.o
>  
>  bcache-y		:= alloc.o bset.o btree.o closure.o debug.o extents.o\
>  	io.o journal.o movinggc.o request.o stats.o super.o sysfs.o trace.o\
> -	util.o writeback.o features.o
> +	util.o writeback.o features.o nvm-pages.o
> diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
> new file mode 100644
> index 000000000000..976ab9002c17
> --- /dev/null
> +++ b/drivers/md/bcache/nvm-pages.c
> @@ -0,0 +1,285 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Nvdimm page-buddy allocator
> + *
> + * Copyright (c) 2021, Intel Corporation.
> + * Copyright (c) 2021, Qiaowei Ren <qiaowei.ren@intel.com>.
> + * Copyright (c) 2021, Jianpeng Ma <jianpeng.ma@intel.com>.
> + */
> +
> +#ifdef CONFIG_BCACHE_NVM_PAGES
> +
> +#include "bcache.h"
> +#include "nvm-pages.h"
> +
> +#include <linux/slab.h>
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +#include <linux/dax.h>
> +#include <linux/pfn_t.h>
> +#include <linux/libnvdimm.h>
> +#include <linux/mm_types.h>
> +#include <linux/err.h>
> +#include <linux/pagemap.h>
> +#include <linux/bitmap.h>
> +#include <linux/blkdev.h>
> +
> +struct bch_nvm_set *only_set;
> +
> +static void release_nvm_namespaces(struct bch_nvm_set *nvm_set)
> +{
> +	int i;
> +	struct bch_nvm_namespace *ns;
> +
> +	for (i = 0; i < nvm_set->total_namespaces_nr; i++) {
> +		ns = nvm_set->nss[i];
> +		if (ns) {
> +			blkdev_put(ns->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXEC);
> +			kfree(ns);
> +		}
> +	}
> +
> +	kfree(nvm_set->nss);
> +}
> +
> +static void release_nvm_set(struct bch_nvm_set *nvm_set)
> +{
> +	release_nvm_namespaces(nvm_set);
> +	kfree(nvm_set);
> +}
> +
> +static int init_owner_info(struct bch_nvm_namespace *ns)
> +{
> +	struct bch_owner_list_head *owner_list_head = ns->sb->owner_list_head;
> +
> +	mutex_lock(&only_set->lock);
> +	only_set->owner_list_head = owner_list_head;
> +	only_set->owner_list_size = owner_list_head->size;
> +	only_set->owner_list_used = owner_list_head->used;
> +	mutex_unlock(&only_set->lock);
> +
> +	return 0;
> +}
> +
> +static bool attach_nvm_set(struct bch_nvm_namespace *ns)
> +{
> +	bool rc = true;
> +
> +	mutex_lock(&only_set->lock);
> +	if (only_set->nss) {
> +		if (memcmp(ns->sb->set_uuid, only_set->set_uuid, 16)) {
> +			pr_info("namespace id doesn't match nvm set\n");> +			rc = false;
> +			goto unlock;
> +		}
> +
> +		if (only_set->nss[ns->sb->this_namespace_nr]) {
> +			pr_info("already has the same position(%d) nvm\n",
> +					ns->sb->this_namespace_nr);
> +			rc = false;
> +			goto unlock;
> +		}
> +	} else {
> +		memcpy(only_set->set_uuid, ns->sb->set_uuid, 16);
> +		only_set->total_namespaces_nr = ns->sb->total_namespaces_nr;
> +		only_set->nss = kcalloc(only_set->total_namespaces_nr,
> +				sizeof(struct bch_nvm_namespace *), GFP_KERNEL);
> +		if (!only_set->nss) {
> +			rc = false;
> +			goto unlock;
> +		}
> +	}
> +
> +	only_set->nss[ns->sb->this_namespace_nr] = ns;
> +
> +unlock:
> +	mutex_unlock(&only_set->lock);
> +	return rc;
> +}
> +
> +static int read_nvdimm_meta_super(struct block_device *bdev,
> +			      struct bch_nvm_namespace *ns)
> +{
> +	struct page *page;
> +	struct bch_nvm_pages_sb *sb;
> +
> +	page = read_cache_page_gfp(bdev->bd_inode->i_mapping,
> +			BCH_NVM_PAGES_SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL);
> +
> +	if (IS_ERR(page))
> +		return -EIO;
> +
> +	sb = page_address(page) + offset_in_page(BCH_NVM_PAGES_SB_OFFSET);

Maybe a type casting is needed like,
	sb = (struct bch_nvm_pages_sb *)(page_address(page) +
offset_in_page(BCH_NVM_PAGES_SB_OFFSET));


> +
> +	/* temporary use for DAX API */
> +	ns->page_size = sb->page_size;
> +	ns->pages_total = sb->pages_total;
> +

I see you will use ns->pages_total for DAX mapping, so the magic (maybe
checksum too) should be checked here, to make sure the values page_size
and pages_total are valid. Then we can avoid to send invalid value for
DAX mapping.


> +	put_page(page);
> +
> +	return 0;
> +}
> +
> +struct bch_nvm_namespace *bch_register_namespace(const char *dev_path)
> +{
> +	struct bch_nvm_namespace *ns;
> +	int err;
> +	pgoff_t pgoff;
> +	char buf[BDEVNAME_SIZE];
> +	struct block_device *bdev;
> +	uint64_t expected_csum;
> +	int id;
> +	char *path = NULL;
> +
> +	path = kstrndup(dev_path, 512, GFP_KERNEL);
> +	if (!path) {
> +		pr_err("kstrndup failed\n");
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	bdev = blkdev_get_by_path(strim(path),
> +				  FMODE_READ|FMODE_WRITE|FMODE_EXEC,
> +				  only_set);
> +	if (IS_ERR(bdev)) {
> +		pr_info("get %s error: %ld\n", dev_path, PTR_ERR(bdev));
> +		kfree(path);
> +		return ERR_PTR(PTR_ERR(bdev));
> +	}
> +
> +	err = -ENOMEM;
> +	ns = kzalloc(sizeof(struct bch_nvm_namespace), GFP_KERNEL);
> +	if (!ns)
> +		goto bdput;
> +
> +	err = -EIO;
> +	if (read_nvdimm_meta_super(bdev, ns)) {
> +		pr_info("%s read nvdimm meta super block failed.\n",
> +			bdevname(bdev, buf));
> +		goto free_ns;
> +	}
> +
> +	err = -EOPNOTSUPP;
> +	if (!bdev_dax_supported(bdev, ns->page_size)) {
> +		pr_info("%s don't support DAX\n", bdevname(bdev, buf));
> +		goto free_ns;
> +	}
> +
> +	err = -EINVAL;
> +	if (bdev_dax_pgoff(bdev, 0, ns->page_size, &pgoff)) {
> +		pr_info("invalid offset of %s\n", bdevname(bdev, buf));
> +		goto free_ns;
> +	}
> +
> +	err = -ENOMEM;
> +	ns->dax_dev = fs_dax_get_by_bdev(bdev);
> +	if (!ns->dax_dev) {
> +		pr_info("can't by dax device by %s\n", bdevname(bdev, buf));
> +		goto free_ns;
> +	}
> +
> +	err = -EINVAL;
> +	id = dax_read_lock();
> +	if (dax_direct_access(ns->dax_dev, pgoff, ns->pages_total,
> +			      &ns->kaddr, &ns->start_pfn) <= 0) {
> +		pr_info("dax_direct_access error\n");
> +		dax_read_unlock(id);
> +		goto free_ns;
> +	}
> +	dax_read_unlock(id);
> +
> +	ns->sb = ns->kaddr + BCH_NVM_PAGES_SB_OFFSET;
> +
> +	if (memcmp(ns->sb->magic, bch_nvm_pages_magic, 16)) {
> +		pr_info("invalid bch_nvm_pages_magic\n");
> +		goto free_ns;
> +	}

When you checking the magic and csum in read_nvdimm_meta_super(), such
checks are still necessary here, to make sure the DAX mapping is
correct. Just FYI :-)


> +
> +	if (ns->sb->sb_offset != BCH_NVM_PAGES_SB_OFFSET) {
> +		pr_info("invalid superblock offset\n");
> +		goto free_ns;
> +	}
> +
> +	if (ns->sb->total_namespaces_nr != 1) {
> +		pr_info("only one nvm device\n");
> +		goto free_ns;
> +	}
> +
> +	expected_csum = csum_set(ns->sb);
> +	if (expected_csum != ns->sb->csum) {
> +		pr_info("csum is not match with expected one\n");
> +		goto free_ns;
> +	}
> +
> +	err = -EEXIST;
> +	if (!attach_nvm_set(ns))
> +		goto free_ns;
> +
> +	/* Firstly attach */
> +	if ((unsigned long)ns->sb->owner_list_head == BCH_NVM_PAGES_OWNER_LIST_HEAD_OFFSET) {
> +		struct bch_nvm_pages_owner_head *sys_owner_head;
> +		struct bch_nvm_pgalloc_recs *sys_pgalloc_recs;
> +
> +		ns->sb->owner_list_head = ns->kaddr + BCH_NVM_PAGES_OWNER_LIST_HEAD_OFFSET;
> +		sys_pgalloc_recs = ns->kaddr + BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET;
> +
> +		sys_owner_head = &(ns->sb->owner_list_head->heads[0]);
> +		sys_owner_head->recs[0] = sys_pgalloc_recs;
> +		ns->sb->csum = csum_set(ns->sb);
> +
> +		sys_pgalloc_recs->owner = sys_owner_head;
> +	} else
> +		BUG_ON(ns->sb->owner_list_head !=
> +			(ns->kaddr + BCH_NVM_PAGES_OWNER_LIST_HEAD_OFFSET));

Can the above code chunk be moved into attach_nvm_set() ?

> +
> +	ns->page_size = ns->sb->page_size;
> +	ns->pages_offset = ns->sb->pages_offset;
> +	ns->pages_total = ns->sb->pages_total;
> +	ns->free = 0;
> +	ns->bdev = bdev;
> +	ns->nvm_set = only_set;
> +	mutex_init(&ns->lock);
> +
> +	if (ns->sb->this_namespace_nr == 0) {
> +		pr_info("only first namespace contain owner info\n");
> +		err = init_owner_info(ns);
> +		if (err < 0) {
> +			pr_info("init_owner_info met error %d\n", err);
> +			only_set->nss[ns->sb->this_namespace_nr] = NULL;
> +			goto free_ns;
> +		}
> +	}
> +
> +	kfree(path);
> +	return ns;
> +free_ns:
> +	kfree(ns);
> +bdput:
> +	blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXEC);
> +	kfree(path);
> +	return ERR_PTR(err);
> +}
> +EXPORT_SYMBOL_GPL(bch_register_namespace);
> +
> +int __init bch_nvm_init(void)
> +{
> +	only_set = kzalloc(sizeof(*only_set), GFP_KERNEL);
> +	if (!only_set)
> +		return -ENOMEM;
> +
> +	only_set->total_namespaces_nr = 0;
> +	only_set->owner_list_head = NULL;
> +	only_set->nss = NULL;
> +
> +	mutex_init(&only_set->lock);
> +
> +	pr_info("bcache nvm init\n");
> +	return 0;
> +}
> +
> +void bch_nvm_exit(void)
> +{
> +	release_nvm_set(only_set);
> +	pr_info("bcache nvm exit\n");
> +}
> +
> +#endif /* CONFIG_BCACHE_NVM_PAGES */
> diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
> new file mode 100644
> index 000000000000..87a0d2c46788
> --- /dev/null
> +++ b/drivers/md/bcache/nvm-pages.h
> @@ -0,0 +1,74 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _BCACHE_NVM_PAGES_H
> +#define _BCACHE_NVM_PAGES_H
> +
> +#ifdef CONFIG_BCACHE_NVM_PAGES
> +#include <linux/bcache-nvm.h>
> +#endif /* CONFIG_BCACHE_NVM_PAGES */
> +
> +/*
> + * Bcache NVDIMM in memory data structures
> + */
> +
> +/*
> + * The following three structures in memory records which page(s) allocated
> + * to which owner. After reboot from power failure, they will be initialized
> + * based on nvm pages superblock in NVDIMM device.
> + */
> +struct bch_nvm_namespace {
> +	struct bch_nvm_pages_sb *sb;
> +	void *kaddr;
> +
> +	u8 uuid[16];
> +	u64 free;
> +	u32 page_size;
> +	u64 pages_offset;
> +	u64 pages_total;
> +	pfn_t start_pfn;
> +
> +	struct dax_device *dax_dev;
> +	struct block_device *bdev;
> +	struct bch_nvm_set *nvm_set;
> +
> +	struct mutex lock;
> +};
> +
> +/*
> + * A set of namespaces. Currently only one set can be supported.
> + */
> +struct bch_nvm_set {
> +	u8 set_uuid[16];
> +	u32 total_namespaces_nr;
> +
> +	u32 owner_list_size;
> +	u32 owner_list_used;
> +	struct bch_owner_list_head *owner_list_head;
> +
> +	struct bch_nvm_namespace **nss;
> +
> +	struct mutex lock;
> +};
> +extern struct bch_nvm_set *only_set;
> +
> +#ifdef CONFIG_BCACHE_NVM_PAGES
> +
> +struct bch_nvm_namespace *bch_register_namespace(const char *dev_path);
> +int bch_nvm_init(void);
> +void bch_nvm_exit(void);
> +
> +#else
> +
> +static inline struct bch_nvm_namespace *bch_register_namespace(const char *dev_path)
> +{
> +	return NULL;
> +}
> +static inline int bch_nvm_init(void)
> +{
> +	return 0;
> +}
> +static inline void bch_nvm_exit(void) { }
> +
> +#endif /* CONFIG_BCACHE_NVM_PAGES */
> +
> +#endif /* _BCACHE_NVM_PAGES_H */
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 03e1fe4de53d..0674a76d9454 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -14,6 +14,7 @@
>  #include "request.h"
>  #include "writeback.h"
>  #include "features.h"
> +#include "nvm-pages.h"
>  
>  #include <linux/blkdev.h>
>  #include <linux/debugfs.h>
> @@ -2816,6 +2817,7 @@ static void bcache_exit(void)
>  {
>  	bch_debug_exit();
>  	bch_request_exit();
> +	bch_nvm_exit();
>  	if (bcache_kobj)
>  		kobject_put(bcache_kobj);
>  	if (bcache_wq)
> @@ -2914,6 +2916,7 @@ static int __init bcache_init(void)
>  
>  	bch_debug_init();
>  	closure_debug_init();
> +	bch_nvm_init();
>  
>  	bcache_is_reboot = false;
>  
> 

Thanks for great job :-)

Coly Li


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [bch-nvm-pages v9 3/6] bcache: initialization of the buddy
  2021-04-28 21:39 ` [bch-nvm-pages v9 3/6] bcache: initialization of the buddy Qiaowei Ren
@ 2021-05-11  5:35   ` Coly Li
  0 siblings, 0 replies; 17+ messages in thread
From: Coly Li @ 2021-05-11  5:35 UTC (permalink / raw)
  To: Qiaowei Ren, jianpeng.ma; +Cc: linux-bcache

On 4/29/21 5:39 AM, Qiaowei Ren wrote:
> From: Jianpeng Ma <jianpeng.ma@intel.com>
> 
> This nvm pages allocator will implement the simple buddy to manage the
> nvm address space. This patch initializes this buddy for new namespace.
> 
> the unit of alloc/free of the buddy is page. DAX device has their
> struct page(in dram or PMEM).
> 
>         struct {        /* ZONE_DEVICE pages */
>                 /** @pgmap: Points to the hosting device page map. */
>                 struct dev_pagemap *pgmap;
>                 void *zone_device_data;
>                 /*
>                  * ZONE_DEVICE private pages are counted as being
>                  * mapped so the next 3 words hold the mapping, index,
>                  * and private fields from the source anonymous or
>                  * page cache page while the page is migrated to device
>                  * private memory.
>                  * ZONE_DEVICE MEMORY_DEVICE_FS_DAX pages also
>                  * use the mapping, index, and private fields when
>                  * pmem backed DAX files are mapped.
>                  */
>         };
> 
> ZONE_DEVICE pages only use pgmap. Other 4 words[16/32 bytes] don't use.
> So the second/third word will be used as 'struct list_head ' which list
> in buddy. The fourth word(that is normal struct page::index) store pgoff
> which the page-offset in the dax device. And the fifth word (that is
> normal struct page::private) store order of buddy. page_type will be used
> to store buddy flags.
> 
> V9:
>  -Kernel test robot report the build-in u64/u32 in init_owner_info()
>   don't work for m68k arch.explicit div_u64() should be used.(Coly fix)
>  -GCC report warning: Unsigned variable 'i' can't be negative so it
>   is unnecessary to test it. [unsignedPositive] (Reproted-by Dan)
> 
> Reported-by: kernel test robot <lkp@intel.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
> Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
> Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
> Signed-off-by: Coly Li <colyli@suse.de>
> ---
>  drivers/md/bcache/nvm-pages.c   | 142 +++++++++++++++++++++++++++++++-
>  drivers/md/bcache/nvm-pages.h   |   6 ++
>  include/uapi/linux/bcache-nvm.h |  12 ++-
>  3 files changed, 153 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
> index 976ab9002c17..810f65cf756a 100644
> --- a/drivers/md/bcache/nvm-pages.c
> +++ b/drivers/md/bcache/nvm-pages.c
> @@ -34,6 +34,10 @@ static void release_nvm_namespaces(struct bch_nvm_set *nvm_set)
>  	for (i = 0; i < nvm_set->total_namespaces_nr; i++) {
>  		ns = nvm_set->nss[i];
>  		if (ns) {
> +			kvfree(ns->pages_bitmap);
> +			if (ns->pgalloc_recs_bitmap)
> +				bitmap_free(ns->pgalloc_recs_bitmap);
> +
>  			blkdev_put(ns->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXEC);
>  			kfree(ns);
>  		}
> @@ -48,17 +52,122 @@ static void release_nvm_set(struct bch_nvm_set *nvm_set)
>  	kfree(nvm_set);
>  }
>  
> +static struct page *nvm_vaddr_to_page(struct bch_nvm_namespace *ns, void *addr)
> +{
> +	return virt_to_page(addr);
> +}
> +
> +static void *nvm_pgoff_to_vaddr(struct bch_nvm_namespace *ns, pgoff_t pgoff)
> +{
> +	return ns->kaddr + (pgoff << PAGE_SHIFT);
> +}
> +
> +static inline void remove_owner_space(struct bch_nvm_namespace *ns,
> +					pgoff_t pgoff, u32 nr)
> +{

Parameter nr should be 64bit value here. rec->order is 6 bits, the
maximum value can be (1<<64)-1, 1<<rec->order may overflow 32bit width.

Myabe (correct me if I am wrong) you may change "u32 nr" to "unsigned
long order", then do something like this,

/* value range of order is [0, 1<<64) */
unsigned long total = 1 << order,
while (total > 0) {
	unsigned int nr = UINT_MAX;
	if (nr > total)
		nr = total;
	bitmap_set(ns->pages_bitmap, pgoff, nr);
	pg_off += nr;
	total -= nr;
};

Similar problem also exists for pg_off. The second parameter of
bitmap_set() is in unsigned int type, pgoff_t pgoff may also overflow on
it. But this is not critical, it seems the overflow may happen on order
0 with single name space is larger than 128TB for 4KB kernel page size.
Currently we don't need to worry about this.

Just my suggestion for your reference.

> +	bitmap_set(ns->pages_bitmap, pgoff, nr);
> +}
> +
>  static int init_owner_info(struct bch_nvm_namespace *ns)
>  {
>  	struct bch_owner_list_head *owner_list_head = ns->sb->owner_list_head;
> +	struct bch_nvm_pgalloc_recs *sys_recs;
> +	int i, j, k, rc = 0;
>  
>  	mutex_lock(&only_set->lock);
>  	only_set->owner_list_head = owner_list_head;
>  	only_set->owner_list_size = owner_list_head->size;
>  	only_set->owner_list_used = owner_list_head->used;
> +
> +	/* remove used space */
> +	remove_owner_space(ns, 0, div_u64(ns->pages_offset, ns->page_size));
> +
> +	sys_recs = ns->kaddr + BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET;
> +	/* suppose no hole in array */
> +	for (i = 0; i < owner_list_head->used; i++) {
> +		struct bch_nvm_pages_owner_head *head = &owner_list_head->heads[i];
> +
> +		for (j = 0; j < BCH_NVM_PAGES_NAMESPACES_MAX; j++) {
> +			struct bch_nvm_pgalloc_recs *pgalloc_recs = head->recs[j];
> +			unsigned long offset = (unsigned long)ns->kaddr >> PAGE_SHIFT;
> +			struct page *page;
> +
> +			while (pgalloc_recs) {
> +				u32 pgalloc_recs_pos = (unsigned long)(pgalloc_recs - sys_recs);
> +
> +				if (memcmp(pgalloc_recs->magic, bch_nvm_pages_pgalloc_magic, 16)) {
> +					pr_info("invalid bch_nvm_pages_pgalloc_magic\n");
> +					rc = -EINVAL;
> +					goto unlock;
> +				}
> +				if (memcmp(pgalloc_recs->owner_uuid, head->uuid, 16)) {
> +					pr_info("invalid owner_uuid in bch_nvm_pgalloc_recs\n");
> +					rc = -EINVAL;
> +					goto unlock;
> +				}
> +				if (pgalloc_recs->owner != head) {
> +					pr_info("invalid owner in bch_nvm_pgalloc_recs\n");
> +					rc = -EINVAL;
> +					goto unlock;
> +				}
> +
> +				/* recs array can has hole */
> +				for (k = 0; k < pgalloc_recs->size; k++) {
> +					struct bch_pgalloc_rec *rec = &pgalloc_recs->recs[k];
> +
> +					if (rec->pgoff) {
> +						BUG_ON(rec->pgoff <= offset);
> +
> +						/* init struct page: index/private */
> +						page = nvm_vaddr_to_page(ns,
> +							BCH_PGOFF_TO_KVADDR(rec->pgoff));
> +
> +						set_page_private(page, rec->order);
> +						page->index = rec->pgoff - offset;
> +
> +						remove_owner_space(ns,
> +							rec->pgoff - offset,
> +							1 << rec->order);

Here is potential overflow. rec->order is 6 bit value, the maximum value
can be (1<<6)-1 = 63. But nr in remove_owner_space() is u32 ...

> +					}
> +				}
> +				bitmap_set(ns->pgalloc_recs_bitmap, pgalloc_recs_pos, 1);
> +				pgalloc_recs = pgalloc_recs->next;
> +			}
> +		}
> +	}
> +unlock:
>  	mutex_unlock(&only_set->lock);
>  
> -	return 0;
> +	return rc;
> +}
> +
> +static void init_nvm_free_space(struct bch_nvm_namespace *ns)
> +{
> +	unsigned int start, end, pages;
> +	int i;
> +	struct page *page;
> +	pgoff_t pgoff_start;
> +
> +	bitmap_for_each_clear_region(ns->pages_bitmap, start, end, 0, ns->pages_total) {
> +		pgoff_start = start;
> +		pages = end - start;
> +
> +		while (pages) {
> +			for (i = BCH_MAX_ORDER - 1; i >= 0 ; i--) {
> +				if ((pgoff_start % (1 << i) == 0) && (pages >= (1 << i)))
> +					break;
> +			}
> +
> +			page = nvm_vaddr_to_page(ns, nvm_pgoff_to_vaddr(ns, pgoff_start));
> +			page->index = pgoff_start;
> +			set_page_private(page, i);
> +			__SetPageBuddy(page);
> +			list_add((struct list_head *)&page->zone_device_data, &ns->free_area[i]);
> +
> +			pgoff_start += 1 << i;
> +			pages -= 1 << i;
> +		}
> +	}
>  }
>  
>  static bool attach_nvm_set(struct bch_nvm_namespace *ns)
> @@ -123,7 +232,7 @@ static int read_nvdimm_meta_super(struct block_device *bdev,
>  struct bch_nvm_namespace *bch_register_namespace(const char *dev_path)
>  {
>  	struct bch_nvm_namespace *ns;
> -	int err;
> +	int i, err;
>  	pgoff_t pgoff;
>  	char buf[BDEVNAME_SIZE];
>  	struct block_device *bdev;
> @@ -239,18 +348,43 @@ struct bch_nvm_namespace *bch_register_namespace(const char *dev_path)
>  	ns->nvm_set = only_set;
>  	mutex_init(&ns->lock);
>  
> +	ns->pages_bitmap = kvcalloc(BITS_TO_LONGS(ns->pages_total),
> +					sizeof(unsigned long), GFP_KERNEL);
> +	if (!ns->pages_bitmap) {
> +		err = -ENOMEM;
> +		goto clear_ns_nr;
> +	}
> +
> +	if (ns->sb->this_namespace_nr == 0) {
> +		ns->pgalloc_recs_bitmap = bitmap_zalloc(BCH_MAX_PGALLOC_RECS, GFP_KERNEL);
> +		if (ns->pgalloc_recs_bitmap == NULL) {
> +			err = -ENOMEM;
> +			goto free_pages_bitmap;
> +		}
> +	}
> +
> +	for (i = 0; i < BCH_MAX_ORDER; i++)
> +		INIT_LIST_HEAD(&ns->free_area[i]);
> +
>  	if (ns->sb->this_namespace_nr == 0) {
>  		pr_info("only first namespace contain owner info\n");
>  		err = init_owner_info(ns);
>  		if (err < 0) {
>  			pr_info("init_owner_info met error %d\n", err);
> -			only_set->nss[ns->sb->this_namespace_nr] = NULL;
> -			goto free_ns;
> +			goto free_recs_bitmap;
>  		}
> +		/* init buddy allocator */
> +		init_nvm_free_space(ns);
>  	}
>  
>  	kfree(path);
>  	return ns;
> +free_recs_bitmap:
> +	bitmap_free(ns->pgalloc_recs_bitmap);
> +free_pages_bitmap:
> +	kvfree(ns->pages_bitmap);
> +clear_ns_nr:
> +	only_set->nss[ns->sb->this_namespace_nr] = NULL;
>  free_ns:
>  	kfree(ns);
>  bdput:
> diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
> index 87a0d2c46788..e08864af96a0 100644
> --- a/drivers/md/bcache/nvm-pages.h
> +++ b/drivers/md/bcache/nvm-pages.h
> @@ -16,6 +16,7 @@
>   * to which owner. After reboot from power failure, they will be initialized
>   * based on nvm pages superblock in NVDIMM device.
>   */
> +#define BCH_MAX_ORDER 20
>  struct bch_nvm_namespace {
>  	struct bch_nvm_pages_sb *sb;
>  	void *kaddr;
> @@ -27,6 +28,11 @@ struct bch_nvm_namespace {
>  	u64 pages_total;
>  	pfn_t start_pfn;
>  
> +	unsigned long *pages_bitmap;
> +	struct list_head free_area[BCH_MAX_ORDER];
> +
> +	unsigned long *pgalloc_recs_bitmap;
> +
>  	struct dax_device *dax_dev;
>  	struct block_device *bdev;
>  	struct bch_nvm_set *nvm_set;
> diff --git a/include/uapi/linux/bcache-nvm.h b/include/uapi/linux/bcache-nvm.h
> index 1d62e9f0d4bf..7df0934d8756 100644
> --- a/include/uapi/linux/bcache-nvm.h
> +++ b/include/uapi/linux/bcache-nvm.h
> @@ -114,6 +114,8 @@ struct bch_pgalloc_rec {
>  	__u64	reserved:6;
>  };
>  
> +#define BCH_PGOFF_TO_KVADDR(pgoff) ((void *)((unsigned long)pgoff << PAGE_SHIFT))


BCH_PGOFF_TO_KVADDR is not used inside this header, and no on-media
format involved in. It is used in nvm-pages.c, maybe it can be moved
into drivers/md/bcache/nvm-pages.h



> +
>  struct bch_nvm_pgalloc_recs {
>  union {
>  	struct {
> @@ -130,11 +132,15 @@ union {
>  };
>  };
>  
> -#define BCH_MAX_RECS					\
> -	((sizeof(struct bch_nvm_pgalloc_recs) -		\
> -	 offsetof(struct bch_nvm_pgalloc_recs, recs)) /	\
> +#define BCH_MAX_RECS							\
> +	((sizeof(struct bch_nvm_pgalloc_recs) -				\
> +	 offsetof(struct bch_nvm_pgalloc_recs, recs)) /			\
>  	 sizeof(struct bch_pgalloc_rec))
>  
> +#define BCH_MAX_PGALLOC_RECS						\
> +	((BCH_NVM_PAGES_OFFSET - BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET) /	\
> +	 sizeof(struct bch_nvm_pgalloc_recs))
> +
>  struct bch_nvm_pages_owner_head {
>  	unsigned char			uuid[16];
>  	unsigned char			label[BCH_NVM_PAGES_LABEL_SIZE];
> 

The rested looks fine to me. Thanks for the great work :-)

Coly Li

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [bch-nvm-pages v9 4/6] bcache: bch_nvm_alloc_pages() of the buddy
  2021-04-28 21:39 ` [bch-nvm-pages v9 4/6] bcache: bch_nvm_alloc_pages() " Qiaowei Ren
@ 2021-05-11 12:49   ` Coly Li
  2021-05-18  2:27     ` Ma, Jianpeng
  0 siblings, 1 reply; 17+ messages in thread
From: Coly Li @ 2021-05-11 12:49 UTC (permalink / raw)
  To: Qiaowei Ren, jianpeng.ma; +Cc: linux-bcache

On 4/29/21 5:39 AM, Qiaowei Ren wrote:
> From: Jianpeng Ma <jianpeng.ma@intel.com>
> 
> This patch implements the bch_nvm_alloc_pages() of the buddy.
> In terms of function, this func is like current-page-buddy-alloc.
> But the differences are:
> a: it need owner_uuid as parameter which record owner info. And it
> make those info persistence.
> b: it don't need flags like GFP_*. All allocs are the equal.
> c: it don't trigger other ops etc swap/recycle.
> 
> Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
> Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
> Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
> [colyli: fix typo in commit log]

Once you fixed the typo in this version, you may remove the above line
from this patch. Such comment is used when I submit patches to mainline
kernel.

> Signed-off-by: Coly Li <colyli@suse.de>

And this is maintainer's SOB not author's, you may remove it from all
rested patches except for the first one of this series.


> ---
>  drivers/md/bcache/nvm-pages.c | 172 ++++++++++++++++++++++++++++++++++
>  drivers/md/bcache/nvm-pages.h |   6 ++
>  2 files changed, 178 insertions(+)
> 
> diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
> index 810f65cf756a..2647ff997fab 100644
> --- a/drivers/md/bcache/nvm-pages.c
> +++ b/drivers/md/bcache/nvm-pages.c
> @@ -68,6 +68,178 @@ static inline void remove_owner_space(struct bch_nvm_namespace *ns,
>  	bitmap_set(ns->pages_bitmap, pgoff, nr);
>  }
>  
> +/* If not found, it will create if create == true */
> +static struct bch_nvm_pages_owner_head *find_owner_head(const char *owner_uuid, bool create)
> +{
> +	struct bch_owner_list_head *owner_list_head = only_set->owner_list_head;
> +	struct bch_nvm_pages_owner_head *owner_head = NULL;
> +	int i;
> +
> +	if (owner_list_head == NULL)
> +		goto out;
> +
> +	for (i = 0; i < only_set->owner_list_used; i++) {
> +		if (!memcmp(owner_uuid, owner_list_head->heads[i].uuid, 16)) {
> +			owner_head = &(owner_list_head->heads[i]);
> +			break;
> +		}
> +	}
> +
> +	if (!owner_head && create) {
> +		int used = only_set->owner_list_used;
> +
> +		BUG_ON((used > 0) && (only_set->owner_list_size == used));


It seems the above condition may happen when owner_list is full? Maybe
return NULL for a full-occupied owner header list can be better?


> +		memcpy_flushcache(owner_list_head->heads[used].uuid, owner_uuid, 16);
> +		only_set->owner_list_used++;
> +
> +		owner_list_head->used++;
> +		owner_head = &(owner_list_head->heads[used]);
> +	}
> +
> +out:
> +	return owner_head;
> +}
> +
> +static struct bch_nvm_pgalloc_recs *find_empty_pgalloc_recs(void)
> +{
> +	unsigned int start;
> +	struct bch_nvm_namespace *ns = only_set->nss[0];
> +	struct bch_nvm_pgalloc_recs *recs;
> +
> +	start = bitmap_find_next_zero_area(ns->pgalloc_recs_bitmap, BCH_MAX_PGALLOC_RECS, 0, 1, 0);

We cannot assume all space in [BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET,
BCH_NVM_PAGES_OFFSET) are dedicated to recs structures.

Right now each bch_nvm_pgalloc_recs is 8KB, we can assume the
BCH_MAX_PGALLOC_RECS to be 64, that means 512KB space starts at
BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET. we won't exceed the limition in next
12 months at least.


> +	if (start > BCH_MAX_PGALLOC_RECS) {
> +		pr_info("no free struct bch_nvm_pgalloc_recs\n");
> +		return NULL;
> +	}
> +
> +	bitmap_set(ns->pgalloc_recs_bitmap, start, 1);
> +	recs = (struct bch_nvm_pgalloc_recs *)(ns->kaddr + BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET)
> +		+ start;
> +	return recs;
> +}
> +
> +static struct bch_nvm_pgalloc_recs *find_nvm_pgalloc_recs(struct bch_nvm_namespace *ns,
> +		struct bch_nvm_pages_owner_head *owner_head, bool create)
> +{
> +	int ns_nr = ns->sb->this_namespace_nr;
> +	struct bch_nvm_pgalloc_recs *prev_recs = NULL, *recs = owner_head->recs[ns_nr];
> +
> +	/* If create=false, we return recs[nr] */
> +	if (!create)
> +		return recs;
> +

I assume when create == false, at least the first non-full recs
structure should be returned before iterated all the list nodes. How
about return recs by the way in my next comment.



> +	/*
> +	 * If create=true, it mean we need a empty struct bch_pgalloc_rec
> +	 * So we should find non-empty struct bch_nvm_pgalloc_recs or alloc
> +	 * new struct bch_nvm_pgalloc_recs. And return this bch_nvm_pgalloc_recs
> +	 */
> +	while (recs && (recs->used == recs->size)) {
> +		prev_recs = recs;
> +		recs = recs->next;
> +	}
> +
> +	/* Found empty struct bch_nvm_pgalloc_recs */
> +	if (recs)
> +		return recs;

If recs is NULL here, and create == false, we may return the NULL as
well here,

	if (recs || !create)
		return recs;

Just for your reference.

> +	/* Need alloc new struct bch_nvm_galloc_recs */
> +	recs = find_empty_pgalloc_recs();
> +	if (recs) {
> +		recs->next = NULL;
> +		recs->owner = owner_head;
> +		strncpy(recs->magic, bch_nvm_pages_pgalloc_magic, 16);
> +		strncpy(recs->owner_uuid, owner_head->uuid, 16);

Maybe memcpy_flushcache() is better here.

> +		recs->size = BCH_MAX_RECS;
> +		recs->used = 0;
> +
> +		if (prev_recs)
> +			prev_recs->next = recs;
> +		else
> +			owner_head->recs[ns_nr] = recs;
> +	}
> +
> +	return recs;
> +}
> +
> +static void add_pgalloc_rec(struct bch_nvm_pgalloc_recs *recs, void *kaddr, int order)
> +{
> +	int i;
> +
> +	for (i = 0; i < recs->size; i++) {
> +		if (recs->recs[i].pgoff == 0) {
> +			recs->recs[i].pgoff = (unsigned long)kaddr >> PAGE_SHIFT;
> +			recs->recs[i].order = order;
> +			recs->used++;
> +			break;
> +		}
> +	}
> +	BUG_ON(i == recs->size);> +}
> +
> +void *bch_nvm_alloc_pages(int order, const char *owner_uuid)
> +{
> +	void *kaddr = NULL;
> +	struct bch_nvm_pgalloc_recs *pgalloc_recs;
> +	struct bch_nvm_pages_owner_head *owner_head;
> +	int i, j;
> +
> +	mutex_lock(&only_set->lock);
> +	owner_head = find_owner_head(owner_uuid, true);
> +
> +	if (!owner_head) {
> +		pr_err("can't find bch_nvm_pgalloc_recs by(uuid=%s)\n", owner_uuid);
> +		goto unlock;
> +	}
> +
> +	for (j = 0; j < only_set->total_namespaces_nr; j++) {
> +		struct bch_nvm_namespace *ns = only_set->nss[j];
> +
> +		if (!ns || (ns->free < (1 << order)))
> +			continue;
> +
> +		for (i = order; i < BCH_MAX_ORDER; i++) {
> +			struct list_head *list;
> +			struct page *page, *buddy_page;
> +
> +			if (list_empty(&ns->free_area[i]))
> +				continue;
> +
> +			list = ns->free_area[i].next;
> +			page = container_of((void *)list, struct page, zone_device_data);
> +
> +			list_del(list);
> +
> +			while (i != order) {
> +				buddy_page = nvm_vaddr_to_page(ns,
> +					nvm_pgoff_to_vaddr(ns, page->index + (1 << (i - 1))));
> +				set_page_private(buddy_page, i - 1);
> +				buddy_page->index = page->index + (1 << (i - 1));
> +				__SetPageBuddy(buddy_page);
> +				list_add((struct list_head *)&buddy_page->zone_device_data,
> +					&ns->free_area[i - 1]);
> +				i--;
> +			}
> +
> +			set_page_private(page, order);
> +			__ClearPageBuddy(page);
> +			ns->free -= 1 << order;
> +			kaddr = nvm_pgoff_to_vaddr(ns, page->index);
> +			break;
> +		}
> +
> +		if (i != BCH_MAX_ORDER) {

In case to avoid an invalid order is sent in, maybe it is better to be
		if (i < BCH_MAX_ORDER) {


> +			pgalloc_recs = find_nvm_pgalloc_recs(ns, owner_head, true);
> +			/* ToDo: handle pgalloc_recs==NULL */
> +			add_pgalloc_rec(pgalloc_recs, kaddr, order);
> +			break;
> +		}
> +	}
> +
> +unlock:
> +	mutex_unlock(&only_set->lock);
> +	return kaddr;
> +}
> +EXPORT_SYMBOL_GPL(bch_nvm_alloc_pages);
> +
>  static int init_owner_info(struct bch_nvm_namespace *ns)
>  {
>  	struct bch_owner_list_head *owner_list_head = ns->sb->owner_list_head;
> diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
> index e08864af96a0..4fd5205146a2 100644
> --- a/drivers/md/bcache/nvm-pages.h
> +++ b/drivers/md/bcache/nvm-pages.h
> @@ -62,6 +62,7 @@ extern struct bch_nvm_set *only_set;
>  struct bch_nvm_namespace *bch_register_namespace(const char *dev_path);
>  int bch_nvm_init(void);
>  void bch_nvm_exit(void);
> +void *bch_nvm_alloc_pages(int order, const char *owner_uuid);
>  
>  #else
>  
> @@ -74,6 +75,11 @@ static inline int bch_nvm_init(void)
>  	return 0;
>  }
>  static inline void bch_nvm_exit(void) { }
> +static inline void *bch_nvm_alloc_pages(int order, const char *owner_uuid)
> +{
> +	return NULL;
> +}
> +
>  
>  #endif /* CONFIG_BCACHE_NVM_PAGES */
>  
> 

The rested are fine to me. Thanks for the great work :-)

Coly Li


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [bch-nvm-pages v9 5/6] bcache: bch_nvm_free_pages() of the buddy
  2021-04-28 21:39 ` [bch-nvm-pages v9 5/6] bcache: bch_nvm_free_pages() " Qiaowei Ren
@ 2021-05-11 13:41   ` Coly Li
  0 siblings, 0 replies; 17+ messages in thread
From: Coly Li @ 2021-05-11 13:41 UTC (permalink / raw)
  To: Qiaowei Ren, jianpeng.ma; +Cc: linux-bcache

On 4/29/21 5:39 AM, Qiaowei Ren wrote:
> From: Jianpeng Ma <jianpeng.ma@intel.com>
> 
> This patch implements the bch_nvm_free_pages() of the buddy.
> 
> The difference between this and page-buddy-free:
> it need owner_uuid to free owner allocated pages.And must
> persistent after free.
> 
> Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
> Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
> Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
> [colyli: fix typo in commit log]
> Signed-off-by: Coly Li <colyli@suse.de>
> ---
>  drivers/md/bcache/nvm-pages.c | 164 ++++++++++++++++++++++++++++++++--
>  drivers/md/bcache/nvm-pages.h |   3 +-
>  2 files changed, 159 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
> index 2647ff997fab..39807046ecce 100644
> --- a/drivers/md/bcache/nvm-pages.c
> +++ b/drivers/md/bcache/nvm-pages.c
> @@ -52,7 +52,7 @@ static void release_nvm_set(struct bch_nvm_set *nvm_set)
>  	kfree(nvm_set);
>  }
>  
> -static struct page *nvm_vaddr_to_page(struct bch_nvm_namespace *ns, void *addr)
> +static struct page *nvm_vaddr_to_page(void *addr)
>  {
>  	return virt_to_page(addr);
>  }
> @@ -175,6 +175,155 @@ static void add_pgalloc_rec(struct bch_nvm_pgalloc_recs *recs, void *kaddr, int
>  	BUG_ON(i == recs->size);
>  }
>  
> +static inline void *nvm_end_addr(struct bch_nvm_namespace *ns)
> +{
> +	return ns->kaddr + (ns->pages_total << PAGE_SHIFT);
> +}
> +
> +static inline bool in_nvm_range(struct bch_nvm_namespace *ns,
> +		void *start_addr, void *end_addr)
> +{
> +	return (start_addr >= ns->kaddr) && (end_addr <= nvm_end_addr(ns));


Maybe it should be
return (start_addr >= ns->kaddr) && (end_addr < nvm_end_addr(ns)) ?


> +}
> +
> +static struct bch_nvm_namespace *find_nvm_by_addr(void *addr, int order)
> +{
> +	int i;
> +	struct bch_nvm_namespace *ns;
> +
> +	for (i = 0; i < only_set->total_namespaces_nr; i++) {
> +		ns = only_set->nss[i];
> +		if (ns && in_nvm_range(ns, addr, addr + (1 << order)))
> +			return ns;
> +	}
> +	return NULL;
> +}
> +
> +static int remove_pgalloc_rec(struct bch_nvm_pgalloc_recs *pgalloc_recs, int ns_nr,
> +				void *kaddr, int order)
> +{
> +	struct bch_nvm_pages_owner_head *owner_head = pgalloc_recs->owner;
> +	struct bch_nvm_pgalloc_recs *prev_recs, *sys_recs;
> +	u64 pgoff = (unsigned long)kaddr >> PAGE_SHIFT;
> +	struct bch_nvm_namespace *ns = only_set->nss[0];
> +	int i;
> +
> +	prev_recs = pgalloc_recs;
> +	sys_recs = ns->kaddr + BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET;
> +	while (pgalloc_recs) {
> +		for (i = 0; i < pgalloc_recs->size; i++) {
> +			struct bch_pgalloc_rec *rec = &(pgalloc_recs->recs[i]);
> +
> +			if (rec->pgoff == pgoff) {
> +				WARN_ON(rec->order != order);
> +				rec->pgoff = 0;
> +				rec->order = 0;
> +				pgalloc_recs->used--;
> +
> +				if (pgalloc_recs->used == 0) {
> +					int recs_pos = pgalloc_recs - sys_recs;
> +
> +					if (pgalloc_recs == prev_recs)
> +						owner_head->recs[ns_nr] = pgalloc_recs->next;
> +					else
> +						prev_recs->next = pgalloc_recs->next;
> +
> +					pgalloc_recs->next = NULL;
> +					pgalloc_recs->owner = NULL;
> +
> +					bitmap_clear(ns->pgalloc_recs_bitmap, recs_pos, 1);
> +				}
> +				goto exit;
> +			}
> +		}
> +		prev_recs = pgalloc_recs;
> +		pgalloc_recs = pgalloc_recs->next;
> +	}
> +exit:
> +	return pgalloc_recs ? 0 : -ENOENT;
> +}
> +
> +static void __free_space(struct bch_nvm_namespace *ns, void *addr, int order)
> +{
> +	unsigned int add_pages = (1 << order);
> +	pgoff_t pgoff;
> +	struct page *page;
> +
> +	page = nvm_vaddr_to_page(addr);
> +	WARN_ON((!page) || (page->private != order));
> +	pgoff = page->index;
> +
> +	while (order < BCH_MAX_ORDER - 1) {
> +		struct page *buddy_page;
> +
> +		pgoff_t buddy_pgoff = pgoff ^ (1 << order);
> +		pgoff_t parent_pgoff = pgoff & ~(1 << order);
> +
> +		if ((parent_pgoff + (1 << (order + 1)) > ns->pages_total))
> +			break;
> +
> +		buddy_page = nvm_vaddr_to_page(nvm_pgoff_to_vaddr(ns, buddy_pgoff));
> +		WARN_ON(!buddy_page);
> +
> +		if (PageBuddy(buddy_page) && (buddy_page->private == order)) {
> +			list_del((struct list_head *)&buddy_page->zone_device_data);
> +			__ClearPageBuddy(buddy_page);
> +			pgoff = parent_pgoff;
> +			order++;
> +			continue;
> +		}
> +		break;
> +	}
> +
> +	page = nvm_vaddr_to_page(nvm_pgoff_to_vaddr(ns, pgoff));
> +	WARN_ON(!page);
> +	list_add((struct list_head *)&page->zone_device_data, &ns->free_area[order]);
> +	page->index = pgoff;
> +	set_page_private(page, order);
> +	__SetPageBuddy(page);
> +	ns->free += add_pages;
> +}
> +
> +void bch_nvm_free_pages(void *addr, int order, const char *owner_uuid)
> +{
> +	struct bch_nvm_namespace *ns;
> +	struct bch_nvm_pages_owner_head *owner_head;
> +	struct bch_nvm_pgalloc_recs *pgalloc_recs;
> +	int r;
> +
> +	mutex_lock(&only_set->lock);
> +
> +	ns = find_nvm_by_addr(addr, order);
> +	if (!ns) {
> +		pr_info("can't find nvm_dev by kaddr %p\n", addr);

Let's use pr_err() for this error condition.

> +		goto unlock;
> +	}
> +
> +	owner_head = find_owner_head(owner_uuid, false);
> +	if (!owner_head) {
> +		pr_info("can't found bch_nvm_pages_owner_head by(uuid=%s)\n", owner_uuid);

Let's use pr_err() for this error condition.

> +		goto unlock;
> +	}
> +
> +	pgalloc_recs = find_nvm_pgalloc_recs(ns, owner_head, false);
> +	if (!pgalloc_recs) {
> +		pr_info("can't find bch_nvm_pgalloc_recs by(uuid=%s)\n", owner_uuid);

Let's use pr_err() for this error condition.

> +		goto unlock;
> +	}
> +
> +	r = remove_pgalloc_rec(pgalloc_recs, ns->sb->this_namespace_nr, addr, order);
> +	if (r < 0) {
> +		pr_info("can't find bch_pgalloc_rec\n");

Let's use pr_err() for this error condition.

> +		goto unlock;
> +	}
> +
> +	__free_space(ns, addr, order);
> +
> +unlock:
> +	mutex_unlock(&only_set->lock);
> +}
> +EXPORT_SYMBOL_GPL(bch_nvm_free_pages);
> +
>  void *bch_nvm_alloc_pages(int order, const char *owner_uuid)
>  {
>  	void *kaddr = NULL;
> @@ -209,7 +358,7 @@ void *bch_nvm_alloc_pages(int order, const char *owner_uuid)
>  			list_del(list);
>  
>  			while (i != order) {
> -				buddy_page = nvm_vaddr_to_page(ns,
> +				buddy_page = nvm_vaddr_to_page(
>  					nvm_pgoff_to_vaddr(ns, page->index + (1 << (i - 1))));
>  				set_page_private(buddy_page, i - 1);
>  				buddy_page->index = page->index + (1 << (i - 1));
> @@ -291,7 +440,7 @@ static int init_owner_info(struct bch_nvm_namespace *ns)
>  						BUG_ON(rec->pgoff <= offset);
>  
>  						/* init struct page: index/private */
> -						page = nvm_vaddr_to_page(ns,
> +						page = nvm_vaddr_to_page(
>  							BCH_PGOFF_TO_KVADDR(rec->pgoff));
>  
>  						set_page_private(page, rec->order);
> @@ -330,11 +479,12 @@ static void init_nvm_free_space(struct bch_nvm_namespace *ns)
>  					break;
>  			}
>  
> -			page = nvm_vaddr_to_page(ns, nvm_pgoff_to_vaddr(ns, pgoff_start));
> +			page = nvm_vaddr_to_page(nvm_pgoff_to_vaddr(ns, pgoff_start));
>  			page->index = pgoff_start;
>  			set_page_private(page, i);
> -			__SetPageBuddy(page);
> -			list_add((struct list_head *)&page->zone_device_data, &ns->free_area[i]);
> +
> +			/* in order to update ns->free */
> +			__free_space(ns, nvm_pgoff_to_vaddr(ns, pgoff_start), i);
>  
>  			pgoff_start += 1 << i;
>  			pages -= 1 << i;
> @@ -515,7 +665,7 @@ struct bch_nvm_namespace *bch_register_namespace(const char *dev_path)
>  	ns->page_size = ns->sb->page_size;
>  	ns->pages_offset = ns->sb->pages_offset;
>  	ns->pages_total = ns->sb->pages_total;
> -	ns->free = 0;
> +	ns->free = 0; /* increase by __free_space() */
>  	ns->bdev = bdev;
>  	ns->nvm_set = only_set;
>  	mutex_init(&ns->lock);
> diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
> index 4fd5205146a2..918aee6a9afc 100644
> --- a/drivers/md/bcache/nvm-pages.h
> +++ b/drivers/md/bcache/nvm-pages.h
> @@ -63,6 +63,7 @@ struct bch_nvm_namespace *bch_register_namespace(const char *dev_path);
>  int bch_nvm_init(void);
>  void bch_nvm_exit(void);
>  void *bch_nvm_alloc_pages(int order, const char *owner_uuid);
> +void bch_nvm_free_pages(void *addr, int order, const char *owner_uuid);
>  
>  #else
>  
> @@ -79,7 +80,7 @@ static inline void *bch_nvm_alloc_pages(int order, const char *owner_uuid)
>  {
>  	return NULL;
>  }
> -
> +static inline void bch_nvm_free_pages(void *addr, int order, const char *owner_uuid) { }
>  
>  #endif /* CONFIG_BCACHE_NVM_PAGES */
>  
> 

The rested are fine to me. Thanks for the great work :-)

Coly Li

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [bch-nvm-pages v9 6/6] bcache: get allocated pages from specific owner
  2021-04-28 21:39 ` [bch-nvm-pages v9 6/6] bcache: get allocated pages from specific owner Qiaowei Ren
@ 2021-05-11 13:45   ` Coly Li
  0 siblings, 0 replies; 17+ messages in thread
From: Coly Li @ 2021-05-11 13:45 UTC (permalink / raw)
  To: Qiaowei Ren, jianpeng.ma; +Cc: linux-bcache

On 4/29/21 5:39 AM, Qiaowei Ren wrote:
> From: Jianpeng Ma <jianpeng.ma@intel.com>
> 
> This patch implements bch_get_allocated_pages() of the buddy to be used to
> get allocated pages from specific owner.
> 
> Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
> Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
> Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
> Signed-off-by: Coly Li <colyli@suse.de>
> ---
>  drivers/md/bcache/nvm-pages.c | 6 ++++++
>  drivers/md/bcache/nvm-pages.h | 5 +++++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
> index 39807046ecce..0be89a03255c 100644
> --- a/drivers/md/bcache/nvm-pages.c
> +++ b/drivers/md/bcache/nvm-pages.c
> @@ -389,6 +389,12 @@ void *bch_nvm_alloc_pages(int order, const char *owner_uuid)
>  }
>  EXPORT_SYMBOL_GPL(bch_nvm_alloc_pages);
>  
> +struct bch_nvm_pages_owner_head *bch_get_allocated_pages(const char *owner_uuid)
> +{
> +	return find_owner_head(owner_uuid, false);
> +}
> +EXPORT_SYMBOL_GPL(bch_get_allocated_pages);
> +
>  static int init_owner_info(struct bch_nvm_namespace *ns)
>  {
>  	struct bch_owner_list_head *owner_list_head = ns->sb->owner_list_head;
> diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
> index 918aee6a9afc..cfb3e8ef01ee 100644
> --- a/drivers/md/bcache/nvm-pages.h
> +++ b/drivers/md/bcache/nvm-pages.h
> @@ -64,6 +64,7 @@ int bch_nvm_init(void);
>  void bch_nvm_exit(void);
>  void *bch_nvm_alloc_pages(int order, const char *owner_uuid);
>  void bch_nvm_free_pages(void *addr, int order, const char *owner_uuid);
> +struct bch_nvm_pages_owner_head *bch_get_allocated_pages(const char *owner_uuid);
>  
>  #else
>  
> @@ -81,6 +82,10 @@ static inline void *bch_nvm_alloc_pages(int order, const char *owner_uuid)
>  	return NULL;
>  }
>  static inline void bch_nvm_free_pages(void *addr, int order, const char *owner_uuid) { }
> +static inline struct bch_nvm_pages_owner_head *bch_get_allocated_pages(const char *owner_uuid)
> +{
> +	return NULL;
> +}
>  
>  #endif /* CONFIG_BCACHE_NVM_PAGES */
>  
> 

It looks fine to me. Thanks for the great work :-)

Coly Li


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [bch-nvm-pages v9 4/6] bcache: bch_nvm_alloc_pages() of the buddy
  2021-05-11 12:49   ` Coly Li
@ 2021-05-18  2:27     ` Ma, Jianpeng
  2021-05-18  2:45       ` Coly Li
  0 siblings, 1 reply; 17+ messages in thread
From: Ma, Jianpeng @ 2021-05-18  2:27 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache, Ren, Qiaowei

> -----Original Message-----
> From: Coly Li <colyli@suse.de>
> Sent: Tuesday, May 11, 2021 8:49 PM
> To: Ren, Qiaowei <qiaowei.ren@intel.com>; Ma, Jianpeng
> <jianpeng.ma@intel.com>
> Cc: linux-bcache@vger.kernel.org
> Subject: Re: [bch-nvm-pages v9 4/6] bcache: bch_nvm_alloc_pages() of the
> buddy
> 
> On 4/29/21 5:39 AM, Qiaowei Ren wrote:
> > From: Jianpeng Ma <jianpeng.ma@intel.com>
> >
> > This patch implements the bch_nvm_alloc_pages() of the buddy.
> > In terms of function, this func is like current-page-buddy-alloc.
> > But the differences are:
> > a: it need owner_uuid as parameter which record owner info. And it
> > make those info persistence.
> > b: it don't need flags like GFP_*. All allocs are the equal.
> > c: it don't trigger other ops etc swap/recycle.
> >
> > Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
> > Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
> > Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
> > [colyli: fix typo in commit log]
> 
> Once you fixed the typo in this version, you may remove the above line from
> this patch. Such comment is used when I submit patches to mainline kernel.
> 
> > Signed-off-by: Coly Li <colyli@suse.de>
> 
> And this is maintainer's SOB not author's, you may remove it from all rested
> patches except for the first one of this series.
> 
> 
> > ---
> >  drivers/md/bcache/nvm-pages.c | 172
> ++++++++++++++++++++++++++++++++++
> >  drivers/md/bcache/nvm-pages.h |   6 ++
> >  2 files changed, 178 insertions(+)
> >
> > diff --git a/drivers/md/bcache/nvm-pages.c
> > b/drivers/md/bcache/nvm-pages.c index 810f65cf756a..2647ff997fab
> > 100644
> > --- a/drivers/md/bcache/nvm-pages.c
> > +++ b/drivers/md/bcache/nvm-pages.c
> > @@ -68,6 +68,178 @@ static inline void remove_owner_space(struct
> bch_nvm_namespace *ns,
> >  	bitmap_set(ns->pages_bitmap, pgoff, nr);  }
> >
> > +/* If not found, it will create if create == true */ static struct
> > +bch_nvm_pages_owner_head *find_owner_head(const char
> *owner_uuid,
> > +bool create) {
> > +	struct bch_owner_list_head *owner_list_head = only_set-
> >owner_list_head;
> > +	struct bch_nvm_pages_owner_head *owner_head = NULL;
> > +	int i;
> > +
> > +	if (owner_list_head == NULL)
> > +		goto out;
> > +
> > +	for (i = 0; i < only_set->owner_list_used; i++) {
> > +		if (!memcmp(owner_uuid, owner_list_head->heads[i].uuid, 16))
> {
> > +			owner_head = &(owner_list_head->heads[i]);
> > +			break;
> > +		}
> > +	}
> > +
> > +	if (!owner_head && create) {
> > +		int used = only_set->owner_list_used;
> > +
> > +		BUG_ON((used > 0) && (only_set->owner_list_size == used));
> 
> 
> It seems the above condition may happen when owner_list is full? Maybe
> return NULL for a full-occupied owner header list can be better?
> 
> 
> > +		memcpy_flushcache(owner_list_head->heads[used].uuid,
> owner_uuid, 16);
> > +		only_set->owner_list_used++;
> > +
> > +		owner_list_head->used++;
> > +		owner_head = &(owner_list_head->heads[used]);
> > +	}
> > +
> > +out:
> > +	return owner_head;
> > +}
> > +
> > +static struct bch_nvm_pgalloc_recs *find_empty_pgalloc_recs(void) {
> > +	unsigned int start;
> > +	struct bch_nvm_namespace *ns = only_set->nss[0];
> > +	struct bch_nvm_pgalloc_recs *recs;
> > +
> > +	start = bitmap_find_next_zero_area(ns->pgalloc_recs_bitmap,
> > +BCH_MAX_PGALLOC_RECS, 0, 1, 0);
> 
> We cannot assume all space in [BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET,
> BCH_NVM_PAGES_OFFSET) are dedicated to recs structures.
> 
> Right now each bch_nvm_pgalloc_recs is 8KB, we can assume the
> BCH_MAX_PGALLOC_RECS to be 64, that means 512KB space starts at
> BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET. we won't exceed the limition in
> next
> 12 months at least.
> 
> 
> > +	if (start > BCH_MAX_PGALLOC_RECS) {
> > +		pr_info("no free struct bch_nvm_pgalloc_recs\n");
> > +		return NULL;
> > +	}
> > +
> > +	bitmap_set(ns->pgalloc_recs_bitmap, start, 1);
> > +	recs = (struct bch_nvm_pgalloc_recs *)(ns->kaddr +
> BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET)
> > +		+ start;
> > +	return recs;
> > +}
> > +
> > +static struct bch_nvm_pgalloc_recs *find_nvm_pgalloc_recs(struct
> bch_nvm_namespace *ns,
> > +		struct bch_nvm_pages_owner_head *owner_head, bool create)
> {
> > +	int ns_nr = ns->sb->this_namespace_nr;
> > +	struct bch_nvm_pgalloc_recs *prev_recs = NULL, *recs =
> > +owner_head->recs[ns_nr];
> > +
> > +	/* If create=false, we return recs[nr] */
> > +	if (!create)
> > +		return recs;
> > +
> 
> I assume when create == false, at least the first non-full recs structure should
> be returned before iterated all the list nodes. How about return recs by the
> way in my next comment.

Hi Coly:
   For create == false,  it mean find the recs and find the rec(when free page or list all rec).
  So we return the firstly recs(iterated based on address do in other function, you can see patch5).

Thanks!
Jianpeng.
> 
> 
> > +	/*
> > +	 * If create=true, it mean we need a empty struct bch_pgalloc_rec
> > +	 * So we should find non-empty struct bch_nvm_pgalloc_recs or alloc
> > +	 * new struct bch_nvm_pgalloc_recs. And return this
> bch_nvm_pgalloc_recs
> > +	 */
> > +	while (recs && (recs->used == recs->size)) {
> > +		prev_recs = recs;
> > +		recs = recs->next;
> > +	}
> > +
> > +	/* Found empty struct bch_nvm_pgalloc_recs */
> > +	if (recs)
> > +		return recs;
> 
> If recs is NULL here, and create == false, we may return the NULL as well here,
> 
> 	if (recs || !create)
> 		return recs;
> 
> Just for your reference.
> 
> > +	/* Need alloc new struct bch_nvm_galloc_recs */
> > +	recs = find_empty_pgalloc_recs();
> > +	if (recs) {
> > +		recs->next = NULL;
> > +		recs->owner = owner_head;
> > +		strncpy(recs->magic, bch_nvm_pages_pgalloc_magic, 16);
> > +		strncpy(recs->owner_uuid, owner_head->uuid, 16);
> 
> Maybe memcpy_flushcache() is better here.
> 
> > +		recs->size = BCH_MAX_RECS;
> > +		recs->used = 0;
> > +
> > +		if (prev_recs)
> > +			prev_recs->next = recs;
> > +		else
> > +			owner_head->recs[ns_nr] = recs;
> > +	}
> > +
> > +	return recs;
> > +}
> > +
> > +static void add_pgalloc_rec(struct bch_nvm_pgalloc_recs *recs, void
> > +*kaddr, int order) {
> > +	int i;
> > +
> > +	for (i = 0; i < recs->size; i++) {
> > +		if (recs->recs[i].pgoff == 0) {
> > +			recs->recs[i].pgoff = (unsigned long)kaddr >> PAGE_SHIFT;
> > +			recs->recs[i].order = order;
> > +			recs->used++;
> > +			break;
> > +		}
> > +	}
> > +	BUG_ON(i == recs->size);> +}
> > +
> > +void *bch_nvm_alloc_pages(int order, const char *owner_uuid) {
> > +	void *kaddr = NULL;
> > +	struct bch_nvm_pgalloc_recs *pgalloc_recs;
> > +	struct bch_nvm_pages_owner_head *owner_head;
> > +	int i, j;
> > +
> > +	mutex_lock(&only_set->lock);
> > +	owner_head = find_owner_head(owner_uuid, true);
> > +
> > +	if (!owner_head) {
> > +		pr_err("can't find bch_nvm_pgalloc_recs by(uuid=%s)\n",
> owner_uuid);
> > +		goto unlock;
> > +	}
> > +
> > +	for (j = 0; j < only_set->total_namespaces_nr; j++) {
> > +		struct bch_nvm_namespace *ns = only_set->nss[j];
> > +
> > +		if (!ns || (ns->free < (1 << order)))
> > +			continue;
> > +
> > +		for (i = order; i < BCH_MAX_ORDER; i++) {
> > +			struct list_head *list;
> > +			struct page *page, *buddy_page;
> > +
> > +			if (list_empty(&ns->free_area[i]))
> > +				continue;
> > +
> > +			list = ns->free_area[i].next;
> > +			page = container_of((void *)list, struct page,
> zone_device_data);
> > +
> > +			list_del(list);
> > +
> > +			while (i != order) {
> > +				buddy_page = nvm_vaddr_to_page(ns,
> > +					nvm_pgoff_to_vaddr(ns, page->index + (1 <<
> (i - 1))));
> > +				set_page_private(buddy_page, i - 1);
> > +				buddy_page->index = page->index + (1 << (i - 1));
> > +				__SetPageBuddy(buddy_page);
> > +				list_add((struct list_head *)&buddy_page-
> >zone_device_data,
> > +					&ns->free_area[i - 1]);
> > +				i--;
> > +			}
> > +
> > +			set_page_private(page, order);
> > +			__ClearPageBuddy(page);
> > +			ns->free -= 1 << order;
> > +			kaddr = nvm_pgoff_to_vaddr(ns, page->index);
> > +			break;
> > +		}
> > +
> > +		if (i != BCH_MAX_ORDER) {
> 
> In case to avoid an invalid order is sent in, maybe it is better to be
> 		if (i < BCH_MAX_ORDER) {
> 
> 
> > +			pgalloc_recs = find_nvm_pgalloc_recs(ns, owner_head,
> true);
> > +			/* ToDo: handle pgalloc_recs==NULL */
> > +			add_pgalloc_rec(pgalloc_recs, kaddr, order);
> > +			break;
> > +		}
> > +	}
> > +
> > +unlock:
> > +	mutex_unlock(&only_set->lock);
> > +	return kaddr;
> > +}
> > +EXPORT_SYMBOL_GPL(bch_nvm_alloc_pages);
> > +
> >  static int init_owner_info(struct bch_nvm_namespace *ns)  {
> >  	struct bch_owner_list_head *owner_list_head =
> > ns->sb->owner_list_head; diff --git a/drivers/md/bcache/nvm-pages.h
> > b/drivers/md/bcache/nvm-pages.h index e08864af96a0..4fd5205146a2
> > 100644
> > --- a/drivers/md/bcache/nvm-pages.h
> > +++ b/drivers/md/bcache/nvm-pages.h
> > @@ -62,6 +62,7 @@ extern struct bch_nvm_set *only_set;  struct
> > bch_nvm_namespace *bch_register_namespace(const char *dev_path);
> int
> > bch_nvm_init(void);  void bch_nvm_exit(void);
> > +void *bch_nvm_alloc_pages(int order, const char *owner_uuid);
> >
> >  #else
> >
> > @@ -74,6 +75,11 @@ static inline int bch_nvm_init(void)
> >  	return 0;
> >  }
> >  static inline void bch_nvm_exit(void) { }
> > +static inline void *bch_nvm_alloc_pages(int order, const char
> > +*owner_uuid) {
> > +	return NULL;
> > +}
> > +
> >
> >  #endif /* CONFIG_BCACHE_NVM_PAGES */
> >
> >
> 
> The rested are fine to me. Thanks for the great work :-)
> 
> Coly Li


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [bch-nvm-pages v9 4/6] bcache: bch_nvm_alloc_pages() of the buddy
  2021-05-18  2:27     ` Ma, Jianpeng
@ 2021-05-18  2:45       ` Coly Li
  0 siblings, 0 replies; 17+ messages in thread
From: Coly Li @ 2021-05-18  2:45 UTC (permalink / raw)
  To: Ma, Jianpeng; +Cc: linux-bcache, Ren, Qiaowei

On 5/18/21 10:27 AM, Ma, Jianpeng wrote:
>> -----Original Message-----
>> From: Coly Li <colyli@suse.de>
>> Sent: Tuesday, May 11, 2021 8:49 PM
>> To: Ren, Qiaowei <qiaowei.ren@intel.com>; Ma, Jianpeng
>> <jianpeng.ma@intel.com>
>> Cc: linux-bcache@vger.kernel.org
>> Subject: Re: [bch-nvm-pages v9 4/6] bcache: bch_nvm_alloc_pages() of the
>> buddy
>>
>> On 4/29/21 5:39 AM, Qiaowei Ren wrote:
>>> From: Jianpeng Ma <jianpeng.ma@intel.com>
>>>
>>> This patch implements the bch_nvm_alloc_pages() of the buddy.
>>> In terms of function, this func is like current-page-buddy-alloc.
>>> But the differences are:
>>> a: it need owner_uuid as parameter which record owner info. And it
>>> make those info persistence.
>>> b: it don't need flags like GFP_*. All allocs are the equal.
>>> c: it don't trigger other ops etc swap/recycle.
>>>
>>> Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
>>> Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
>>> Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
>>> [colyli: fix typo in commit log]
>>
>> Once you fixed the typo in this version, you may remove the above line from
>> this patch. Such comment is used when I submit patches to mainline kernel.
>>
>>> Signed-off-by: Coly Li <colyli@suse.de>
>>
>> And this is maintainer's SOB not author's, you may remove it from all rested
>> patches except for the first one of this series.
>>
>>
>>> ---
>>>  drivers/md/bcache/nvm-pages.c | 172
>> ++++++++++++++++++++++++++++++++++
>>>  drivers/md/bcache/nvm-pages.h |   6 ++
>>>  2 files changed, 178 insertions(+)
>>>
>>> diff --git a/drivers/md/bcache/nvm-pages.c
>>> b/drivers/md/bcache/nvm-pages.c index 810f65cf756a..2647ff997fab
>>> 100644
>>> --- a/drivers/md/bcache/nvm-pages.c
>>> +++ b/drivers/md/bcache/nvm-pages.c
>>> @@ -68,6 +68,178 @@ static inline void remove_owner_space(struct
>> bch_nvm_namespace *ns,
>>>  	bitmap_set(ns->pages_bitmap, pgoff, nr);  }
>>>
>>> +/* If not found, it will create if create == true */ static struct
>>> +bch_nvm_pages_owner_head *find_owner_head(const char
>> *owner_uuid,
>>> +bool create) {
>>> +	struct bch_owner_list_head *owner_list_head = only_set-
>>> owner_list_head;
>>> +	struct bch_nvm_pages_owner_head *owner_head = NULL;
>>> +	int i;
>>> +
>>> +	if (owner_list_head == NULL)
>>> +		goto out;
>>> +
>>> +	for (i = 0; i < only_set->owner_list_used; i++) {
>>> +		if (!memcmp(owner_uuid, owner_list_head->heads[i].uuid, 16))
>> {
>>> +			owner_head = &(owner_list_head->heads[i]);
>>> +			break;
>>> +		}
>>> +	}
>>> +
>>> +	if (!owner_head && create) {
>>> +		int used = only_set->owner_list_used;
>>> +
>>> +		BUG_ON((used > 0) && (only_set->owner_list_size == used));
>>
>>
>> It seems the above condition may happen when owner_list is full? Maybe
>> return NULL for a full-occupied owner header list can be better?
>>
>>
>>> +		memcpy_flushcache(owner_list_head->heads[used].uuid,
>> owner_uuid, 16);
>>> +		only_set->owner_list_used++;
>>> +
>>> +		owner_list_head->used++;
>>> +		owner_head = &(owner_list_head->heads[used]);
>>> +	}
>>> +
>>> +out:
>>> +	return owner_head;
>>> +}
>>> +
>>> +static struct bch_nvm_pgalloc_recs *find_empty_pgalloc_recs(void) {
>>> +	unsigned int start;
>>> +	struct bch_nvm_namespace *ns = only_set->nss[0];
>>> +	struct bch_nvm_pgalloc_recs *recs;
>>> +
>>> +	start = bitmap_find_next_zero_area(ns->pgalloc_recs_bitmap,
>>> +BCH_MAX_PGALLOC_RECS, 0, 1, 0);
>>
>> We cannot assume all space in [BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET,
>> BCH_NVM_PAGES_OFFSET) are dedicated to recs structures.
>>
>> Right now each bch_nvm_pgalloc_recs is 8KB, we can assume the
>> BCH_MAX_PGALLOC_RECS to be 64, that means 512KB space starts at
>> BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET. we won't exceed the limition in
>> next
>> 12 months at least.
>>
>>
>>> +	if (start > BCH_MAX_PGALLOC_RECS) {
>>> +		pr_info("no free struct bch_nvm_pgalloc_recs\n");
>>> +		return NULL;
>>> +	}
>>> +
>>> +	bitmap_set(ns->pgalloc_recs_bitmap, start, 1);
>>> +	recs = (struct bch_nvm_pgalloc_recs *)(ns->kaddr +
>> BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET)
>>> +		+ start;
>>> +	return recs;
>>> +}
>>> +
>>> +static struct bch_nvm_pgalloc_recs *find_nvm_pgalloc_recs(struct
>> bch_nvm_namespace *ns,
>>> +		struct bch_nvm_pages_owner_head *owner_head, bool create)
>> {
>>> +	int ns_nr = ns->sb->this_namespace_nr;
>>> +	struct bch_nvm_pgalloc_recs *prev_recs = NULL, *recs =
>>> +owner_head->recs[ns_nr];
>>> +
>>> +	/* If create=false, we return recs[nr] */
>>> +	if (!create)
>>> +		return recs;
>>> +
>>
>> I assume when create == false, at least the first non-full recs structure should
>> be returned before iterated all the list nodes. How about return recs by the
>> way in my next comment.
> 
> Hi Coly:
>    For create == false,  it mean find the recs and find the rec(when free page or list all rec).
>   So we return the firstly recs(iterated based on address do in other function, you can see patch5).
> 

Aha, yes you are right :-) Please ignore my noise here.

Thanks.

Coly Li

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-05-18  2:45 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-28 21:39 [bch-nvm-pages v9 0/6] nvm page allocator for bcache Qiaowei Ren
2021-04-28 21:39 ` [bch-nvm-pages v9 1/6] bcache: add initial data structures for nvm pages Qiaowei Ren
2021-04-28 21:39 ` [bch-nvm-pages v9 2/6] bcache: initialize the nvm pages allocator Qiaowei Ren
2021-04-28 16:29   ` Randy Dunlap
2021-04-29  0:13     ` Ma, Jianpeng
2021-04-28 17:53   ` Randy Dunlap
2021-05-11  3:53   ` Coly Li
2021-04-28 21:39 ` [bch-nvm-pages v9 3/6] bcache: initialization of the buddy Qiaowei Ren
2021-05-11  5:35   ` Coly Li
2021-04-28 21:39 ` [bch-nvm-pages v9 4/6] bcache: bch_nvm_alloc_pages() " Qiaowei Ren
2021-05-11 12:49   ` Coly Li
2021-05-18  2:27     ` Ma, Jianpeng
2021-05-18  2:45       ` Coly Li
2021-04-28 21:39 ` [bch-nvm-pages v9 5/6] bcache: bch_nvm_free_pages() " Qiaowei Ren
2021-05-11 13:41   ` Coly Li
2021-04-28 21:39 ` [bch-nvm-pages v9 6/6] bcache: get allocated pages from specific owner Qiaowei Ren
2021-05-11 13:45   ` Coly Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).