linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Coly Li <colyli@suse.de>
To: linux-bcache@vger.kernel.org
Cc: linux-block@vger.kernel.org, linux-nvdimm@lists.linux.dev,
	axboe@kernel.dk, hare@suse.com, jack@suse.cz,
	dan.j.williams@intel.com, hch@lst.de, ying.huang@intel.com,
	Coly Li <colyli@suse.de>
Subject: [PATCH v12 00/12] bcache: support NVDIMM for journaling
Date: Thu, 12 Aug 2021 01:02:12 +0800	[thread overview]
Message-ID: <20210811170224.42837-1-colyli@suse.de> (raw)

This is the v12 effort for supporting NVDIMM for bcache journal (some
versions may not posted with version numbers).

The major change of this version is the full pointer of on-media data
structure is replaced by per-namespace offset. Now a pointer address is
calculated by namespace base mapping address + per-namespace offset.
The code logic is same as previous version, all changes are only related
to the base+offset style pointer replacement.

The nvm-pages allocator is a buddy-like allocator, which allocates size
in power-of-2 pages from the NVDIMM namespace. User space tool 'bcache'
has a new added '-M' option to format a NVDIMM namespace and register it
via sysfs interface as a bcache meta device. The nvm-pages kernel code
does a DAX mapping to map the whole namespace into system's memory
address range, and allocating the pages to requestion like typical buddy
allocator does. The major difference is nvm-pages allocator maintains
the pages allocated to each requester by an allocation list which stored
on NVDIMM too. Allocation list of different requester is tracked by a
pre-defined UUID, all the pages tracked in all allocation lists are
treated as allocated busy pages and won't be initialized into buddy
system after the system reboot.

The bcache journal code may request a block of power-of-2 size pages
from the nvm-pages allocator, normally it is a range of 256MB or 512MB
continuous pages range. During meta data journaling, the in-memory jsets
go into the calculated nvdimm pages location by kernel memcpy routine.
So the journaling I/Os won't go into block device (e.g. SSD) anymore, 
the write and read for journal jsets happen on NVDIMM.

Intel developers Jianpeng Ma and Qiaowei Ren compose the initial code of
nvm-pages, the related patches are,
- bcache: initialize the nvm pages allocator
- bcache: initialization of the buddy
- bcache: bch_nvm_alloc_pages() of the buddy
- bcache: bch_nvm_free_pages() of the buddy
- bcache: get recs list head for allocated pages by specific uuid
All the code depends on Linux libnvdimm and dax drivers, the bcache nvm-
pages allocator can be treated as user of these two drivers.

I modify the bcache code to recognize the nvm meta device feature,
initialize journal on NVDIMM, and do journal I/Os on NVDIMM in the
following patches,
- bcache: add initial data structures for nvm pages
- bcache: use bucket index to set GC_MARK_METADATA for journal buckets
  in bch_btree_gc_finish()
- bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
- bcache: initialize bcache journal for NVDIMM meta device
- bcache: support storing bcache journal into NVDIMM meta device
- bcache: read jset from NVDIMM pages for journal replay
- bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
  meta device

In this series, all previously addressed issue via code reviews are all
fixed. And all known issue during testing are fixed. The code survives
from 24+ hours smoking and I/O pressure testing among many reboots, it
works well as expected.

All the code is EXPERIMENTAL, they won't be enabled by default until we
feel the NVDIMM support is completed and stable.

Although there are some experts helped to review the code logic, but we
do appreciate if more people may help to review the code. It is quite
common that bcache patches don't have enough code reviewer, but this
time I do need help for more review or comments on this series.

Thanks in advance.

Coly Li
---

Coly Li (7):
  bcache: add initial data structures for nvm pages
  bcache: use bucket index to set GC_MARK_METADATA for journal buckets
    in bch_btree_gc_finish()
  bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
  bcache: initialize bcache journal for NVDIMM meta device
  bcache: support storing bcache journal into NVDIMM meta device
  bcache: read jset from NVDIMM pages for journal replay
  bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
    meta device

Jianpeng Ma (5):
  bcache: initialize the nvm pages allocator
  bcache: initialization of the buddy
  bcache: bch_nvmpg_alloc_pages() of the buddy
  bcache: bch_nvmpg_free_pages() of the buddy allocator
  bcache: get recs list head for allocated pages by specific uuid

 drivers/md/bcache/Kconfig       |  10 +
 drivers/md/bcache/Makefile      |   1 +
 drivers/md/bcache/btree.c       |   6 +-
 drivers/md/bcache/features.h    |   9 +
 drivers/md/bcache/journal.c     | 325 +++++++++--
 drivers/md/bcache/journal.h     |   2 +-
 drivers/md/bcache/nvm-pages.c   | 931 ++++++++++++++++++++++++++++++++
 drivers/md/bcache/nvm-pages.h   | 127 +++++
 drivers/md/bcache/super.c       |  53 +-
 include/uapi/linux/bcache-nvm.h | 253 +++++++++
 10 files changed, 1649 insertions(+), 68 deletions(-)
 create mode 100644 drivers/md/bcache/nvm-pages.c
 create mode 100644 drivers/md/bcache/nvm-pages.h
 create mode 100644 include/uapi/linux/bcache-nvm.h

-- 
2.26.2


             reply	other threads:[~2021-08-11 17:03 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-11 17:02 Coly Li [this message]
2021-08-11 17:02 ` [PATCH v12 01/12] bcache: add initial data structures for nvm pages Coly Li
2021-08-11 17:02 ` [PATCH v12 02/12] bcache: initialize the nvm pages allocator Coly Li
2021-08-12  5:43   ` Dan Williams
2021-08-12  8:26     ` Coly Li
2021-08-11 17:02 ` [PATCH v12 03/12] bcache: initialization of the buddy Coly Li
2021-08-11 17:02 ` [PATCH v12 04/12] bcache: bch_nvmpg_alloc_pages() " Coly Li
2021-08-11 17:02 ` [PATCH v12 05/12] bcache: bch_nvmpg_free_pages() of the buddy allocator Coly Li
2021-08-11 17:02 ` [PATCH v12 06/12] bcache: get recs list head for allocated pages by specific uuid Coly Li
2021-08-11 17:02 ` [PATCH v12 07/12] bcache: use bucket index to set GC_MARK_METADATA for journal buckets in bch_btree_gc_finish() Coly Li
2021-08-11 17:02 ` [PATCH v12 08/12] bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set Coly Li
2021-08-11 17:02 ` [PATCH v12 09/12] bcache: initialize bcache journal for NVDIMM meta device Coly Li
2021-08-11 17:02 ` [PATCH v12 10/12] bcache: support storing bcache journal into " Coly Li
2021-08-11 17:02 ` [PATCH v12 11/12] bcache: read jset from NVDIMM pages for journal replay Coly Li
2021-08-11 17:02 ` [PATCH v12 12/12] bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device Coly Li
2021-08-15 16:21 ` [PATCH v12 00/12] bcache: support NVDIMM for journaling Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210811170224.42837-1-colyli@suse.de \
    --to=colyli@suse.de \
    --cc=axboe@kernel.dk \
    --cc=dan.j.williams@intel.com \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvdimm@lists.linux.dev \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).